2

When I started my website I set all the table columns to "utf8_general_ci" thinking this would make everything store in UTF8.

According to the mysql_client_encoding() function in PHP, I've been using the latin1 for my connection all along.

I understand this isn't a new problem. My question is how do I correctly update my database so that it utf8 and without affecting the data that exists in my tables?

There are a bunch of answers StackOverflow but a lot I find vague. A couple more helpful ones were:

Query all data and update as UTF8 https://stackoverflow.com/a/2335254/158126

Use a script built to convert tables https://stackoverflow.com/a/13400547/158126

In your experience, what have you done to fix this issue and retain all user data in the MySQL tables?

Community
  • 1
  • 1
Ben Sinclair
  • 3,896
  • 7
  • 54
  • 94

1 Answers1

1

For your situation, I'd suggest trying to following for each bad column (connected over a utf8 connection):

// create a new column to store the latin1
alter table <table> add <column-latin1> <length> character set latin1;

// copy the utf8 data into the latin1 column without charset conversion
update <table> set <column-latin1> = binary <column-utf8>;

// verify the latin1 data looks OK
select <column-latin1> from <table>;

// copy the latin1 column into the utf8 column WITH charset conversion
update <table> set <column-utf8> = <column-latin1>;

// verify the new, properly encoded UTF8 data looks OK
select <column-latin1> from <table>;

// remove the temporary columns
alter <table> drop <column-latin1>;

And set your clients to use a UTF8 connection.

svidgen
  • 13,744
  • 4
  • 33
  • 58
  • Thanks @svidgen. If I set my client to use the UTF8 connection, do I need to update the settings of the MySQL database to UTF8? E.g: default-character-set=utf8? Or will the mysql_set_charset() in my code be all I need for things to work in the future? And with your second example, would it be BINARY(column-utf8) in the MySQL query? I've never used the BINARY function before. – Ben Sinclair Apr 30 '13 at 04:02
  • `Server characterset` and the `default*` settings on the server, database, and tables determine the charset for fields with no charset explicitly assigned at creation time. The `BINARY` keyword instructs MySQL to handle the data as though it's not text. So, the first `update` above will copy the contents of `column-utf8` to `column-latin1` byte-for-byte. The 2nd `update` copies the other way *through* a charset conversion, because MySQL interprets it as text. – svidgen Apr 30 '13 at 05:28