I have about 30,000 bank names in dataframe. I would like to group them into a base group as most of them are same except that they are located in different location. However I do not know what bank names are in there.
Given below is a subset of the dataset. From this data I could identify 2 banks namely ROYAL BANK and BARCLAYS. So I would like to get 2 groups.
ROYAL BANK(count:13) BARCLAYS(count:7)
ROYAL BANK OF CANADA
ROYAL BANK OF CANADA
THE ROYAL BANK OF SCOTLAND PLC
THE ROYAL BANK OF SCOTLAND PLC
ROYAL BANK OF CANADA CAYMAN ISLANDS
RBC ROYAL BANK (TRINIDAD AND TOBAGO), LTD.
RBC ROYAL BANK (TRINIDAD AND TOBAGO), LTD.
THE ROYAL BANK OF SCOTLAND INTERNATIONAL, LTD.
THE ROYAL BANK OF SCOTLAND INTERNATIONAL LTD.
ROYAL BANK OF SCOTLAND, N.V.
RBC ROYAL BANK (BAHAMAS), LTD.
ROYAL BANK OF SCOTLAND PLC
ROYAL BANK OF SCOTLAND PLC
BARCLAYS BANK PLC
BARCLAYS BANK DELAWARE
BARCLAYS BANK OF GHANA, LTD.
BARCLAYS BANK DELAWARE
BARCLAYCARD GERMANY
BARCLAYS BANK PLC
BARCLAYS BANK PLC
There are other banks as well with similar pattern and I would like to have a generalized method to identify the list unique groups(bank names) and group similar ones under these.