0

I have been stuck for a while with a UnicodeEncodeError in Python.

Here is what I am doing:

  1. I create a Dataframe as a result of a various analysis. In total, the dataframe has 30 columns with multiple types of values (int,string,datetime,etc).
  2. I create an SSH connection to a remote instance in Azure where I have installed MySQL. I create the connection using SQLAlchemy.
  3. I run the df.to_sql command and get the following error

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 8: ordinal not in range(256)

I tried doing this but it didn't seem to work.

engine = create_engine('mysql+pymysql://user:pwd@host:%s/db?charset=utf8' % server.local_bind_port)

I have read here that I can use u.encode('latin-1', 'replace'). But would I need to perform that and go through every String column and encode it? Or is there something else that I can do?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Max Payne
  • 387
  • 3
  • 17
  • Which version of python are you using? – elPastor Jun 02 '17 at 00:42
  • @pshep123 - In Azure I am using Python 2.7.12 - In my local PC 2.7.13 Anaconda 4.4.0 – Max Payne Jun 02 '17 at 00:44
  • Thanks. Unfortunately I'm not going to be able to help you, but I've been running into unicode issues myself and through my recent research, have come to realize that python 3 and python 2 handle text formatting differently and thus it's important for those more knowledgeable than I to know which version. Here is some reading in the meantime: https://docs.python.org/2/howto/unicode.html, might help. – elPastor Jun 02 '17 at 00:47

1 Answers1

0

This is the solution that I came up with.

I created a function that encoded the different characters in my data.

def custom_encoder(x):
    #Check if the value is Unicode
    if type(x)==type(u''):
        return x.encode('utf8','ignore')
    else:
        return x

The I looped through all the columns and encoded all the values. After this, MySQL allowed the data to be written.

Max Payne
  • 387
  • 3
  • 17