3
    u'가'  u'나'    
0     
1   
...


       A      B
0
1
...

There were two pandas dataframe like above which called 'left', 'right' each. and I tried merge like below code.

result = pandas.merge(left, right, how='left', left_on=[u'가'], right_on=['A'])

But unfortunately, the error occurred. It seems pandas merge left(right)_on=key feature couldn't recognize unicode column name.

  File "?.py", line ?, in merger
    pandas.merge(left, right, how='left', left_on=[u'가'], right_on=['A'])
  File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 37, in merge
copy=copy)
  File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 183, in __init__
self.join_names) = self._get_merge_keys()
  File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 352, in _get_merge_keys
left_keys.append(left[lk].values)
  File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 1797, in __getitem__
return self._getitem_column(key)
  File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
  File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
  File "C:\Anaconda\lib\site-packages\pandas\core\internals.py", line 2851, in get
loc = self.items.get_loc(item)
  File "C:\Anaconda\lib\site-packages\pandas\core\index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
  File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)
  File "pandas\hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12280)
  File "pandas\hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231)
KeyError: u'\uac00'

Does someone experienced this kind of error before? If it does, please let me know and give me your tips.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
su79eu7k
  • 7,031
  • 3
  • 34
  • 40
  • 1
    Just out of curiosity does the following work: `result = pandas.merge(left, right, how='left', left_on=left.columns[0], right_on=right.columns[0])`? – EdChum Jul 21 '15 at 12:43
  • Your are right. Sorry for confusion everyone. It seemed to me but was not unicode issue. It's just because of I tried _merge_ right after _groupby_. http://stackoverflow.com/a/24980809/3054161 – su79eu7k Jul 21 '15 at 13:04

3 Answers3

1

I guess you construct DataFrame from file such as .csv or .excel. Then, you need to set encoding option:

left=pd.read_csv('kor.csv', encoding='utf-8')
#or
left=pd.read_excel('kor.xlsx', encoding='utf-8')

It will solve the issue.

Jihun
  • 1,415
  • 1
  • 12
  • 16
1

Sorry for confusion everyone. It seemed to me but was not unicode issue. It's just because of I tried merge right after groupby. like this.

By default, groupby output has the grouping columns as indicies, not columns, which is why the merge is failing.

There are a couple different ways to handle it, probably the easiest is using the as_index parameter when you define the groupby object.

po_grouped_df = poagg_df.groupby(['EID','PCODE'], as_index=False)

Then, your merge should work as expected.

Anyhow, back to the example of my question, the dataframe 'left' column u'가' was an index not column because I did groupby on 'left' without as_index=False just before merge.

Community
  • 1
  • 1
su79eu7k
  • 7,031
  • 3
  • 34
  • 40
0

I haven't encountered this issue before, but a possible work around would be:

left_no_unicode=left.copy()
left_no_unicode.columns=[c if c!=u'가' else 'A' for c in left_no_unicode.columns]
result = pandas.merge(left_no_unicode, right, how='left', on=['A'])
Uri Goren
  • 13,386
  • 6
  • 58
  • 110