0

This code downloads a sample latin1/ISO-8859-1 encoded file and saves it to disk. Open that file and you'll see the strange question mark characters �. https://stackoverflow.com/a/3527176/779159 explains it's because of the wrong encoding being applied, and latin1 should fix it.

const url = 'http://vancouver-webpages.com/multilingual/french.asis'
request.get(url, { encoding: null })
  .pipe(fs.createWriteStream('/tmp/file.txt', { defaultEncoding: 'latin1' }))

But using the request and fs modules, I can't get it to save in latin1 encoding. How do I fix this code?

Community
  • 1
  • 1
user779159
  • 9,034
  • 14
  • 59
  • 89

1 Answers1

0

Node v8.1.4 should support latin1 (aka 'binary') as one of its encodings for Buffer. I just tested your code and it actually works fine. I use Atom as my text editor and, initially, it thought it was 'UTF-8' so the question mark characters appeared. When I switched from UTF-8 to 'Auto-Detect', everything appeared okay. See the screenshot below.

Windows 1252 Encoding on Atom

Note how it says 'Windows 1252' for the encoding, but it works the same way if I selected 'ISO 8859-1'. So make sure that whatever editor you are using detects the character encoding correctly. It is not Node's fault!

By the way, an interesting thing to note, according to the docs for Node v8.1.4, in one of the sections for Buffer:

Today's browsers follow the WHATWG spec which aliases both 'latin1' and ISO-8859-1 to win-1252. This means that while doing something like http.get(), if the returned charset is one of those listed in the WHATWG spec it's possible that the server actually returned win-1252-encoded data, and using 'latin1' encoding may incorrectly decode the characters.

nbkhope
  • 7,360
  • 4
  • 40
  • 58