How to write unicode japanese from a file to another file?

Question

I have some json files and there're some places with encoded japanese like \u672c\u30fb\u96d1\u8a8c\u30fb\u66f8\u7c4d\u60c5\u5831 in the files, and I want to decode them into japanese.

The problem is when I use this method:

text = '\u672c\u30fb\u96d1\u8a8c\u30fb\u66f8\u7c4d\u60c5\u5831'
print(text)

And it printed

本・雑誌・書籍情報

But when I read it directly from file, for example, the prepared file is index.json and its content is just:

\u672c\u30fb\u96d1\u8a8c\u30fb\u66f8\u7c4d\u60c5\u5831

and the method I used is

file = open('index.json','r')
text = file.read()
print(text)

and it just printed

\u672c\u30fb\u96d1\u8a8c\u30fb\u66f8\u7c4d\u60c5\u5831

One thing I found kinda wierd is that when I tried to print:

print(file.read())
print(text)

The file.read() returns nothing, even with file.read(1).

Edit: I found out that the main problem is when you write text = '\u672c', python would recognize \u672c as a single character. But when you read from a file, then it would recognize it as a string with 6 characters. Anyway to convert it?

Its an encoding problem have a look at : https://stackoverflow.com/questions/14682933/chinese-and-japanese-character-support-in-python — Rohi, Jan 09 '19 at 10:49
@pruggg. The file contains the string `'\\u672c`, not `'\u672c'`. That's pretty version independent. — Mad Physicist, Jan 11 '19 at 03:10

score 2 · Accepted Answer · answered Jan 11 '19 at 03:18

There are a couple of issues here.

Let's say that your file contains the following (literal) text:

\u672c\u30fb\u96d1\u8a8c\u30fb\u66f8\u7c4d\u60c5\u5831

You could represent this in Python as either

text = '\\u672c\\u30fb\\u96d1\\u8a8c\\u30fb\\u66f8\\u7c4d\\u60c5\\u5831'

OR

text = r'\u672c\u30fb\u96d1\u8a8c\u30fb\u66f8\u7c4d\u60c5\u5831'

To convert the literal escapes into the Unicode characters they represent, you need to decode them properly:

text.encode('ascii').decode('unicode-escape')

results in

本・雑誌・書籍情報

The reason that file.read() and file.read(1) did not work for you is that a file does not automatically rewind. Once you read in the file, it's at the end until you manually rewind it or close and reopen it.

Thank you so much! This is exactly what I am looking for! – pruggg Jan 11 '19 at 03:33 — pruggg, Jan 11 '19 at 03:33

How to write unicode japanese from a file to another file?

1 Answers1