1

In my code

string[] Lines = reactor.GetMergedLines();
string fileName = "foo.bar";
StreamWriter sw = new StreamWriter(File.Open(fileName, FileMode.CreateNew), Encoding.GetEncoding(28605));
foreach (string line in Lines)
{
    sw.WriteLine(line);
}
sw.Close();

the file, which gets created is not encoded with the given codepage. Lines is filled with strings out of an iso-8859-1-file. I tried it with the code page number Encoding.GetEncoding(28605), it's name Encoding.GetEncoding("ISO-8859-15") and with File.WriteAllLines(fileName, Lines, Encoding.GetEncoding(28605)) instead of StreamWriter. But if I take a look at the file with cygwin file -bi [filename], it tells me, the encoding would be "us-ascii". Also, some characters aren't encoded properly and replaced by question marks.

How to write out a text file in C# with a code page other than utf-8? didn't helped, as you can see.

What is the problem?

Community
  • 1
  • 1
Marco Frost
  • 780
  • 3
  • 12
  • 25
  • 1
    `some characters aren't encoded properly and replaced by question marks` how do you know that? Is the tool you use to look at the file capable of setting the desired code page? `file` has no way of guessing the file is ISO-8859-15 encoded, there is no metadata. You need to read the file with a ISO-8859-15 decoding to be able to say is properly saved or not. – Remus Rusanu Apr 10 '14 at 08:39
  • I tried it with notepad++ and set the encoding to ISO-8859-15. Also with other text viewers. Still question marks. And those aren't the standard chars for characters with unknown encoding. – Marco Frost Apr 10 '14 at 08:42
  • Is the text you're saving a valid ISO-8859-15 text? – Remus Rusanu Apr 10 '14 at 08:46
  • @RemusRusanu What do you mean by that? How can it be invalid? – Marco Frost Apr 10 '14 at 08:48
  • [ISO-8859-15](http://en.wikipedia.org/wiki/ISO/IEC_8859-15) is a "8-bit single-byte coded graphic character set". It has a limited set of characters it can encode, about 255 of them. Your C# `string` are Unicode, with several tens of thousands of valid [code points](http://en.wikipedia.org/wiki/Code_point). Obviously not all C# string values have a valid single-byte code page equivalent. – Remus Rusanu Apr 10 '14 at 08:56
  • Okay. I checked that. Every character in the source file, which are also the characters in `Lines`, are valid for ISO-8859-1 and ISO-8859-15. It contains only characters of an intersection of both code pages. – Marco Frost Apr 10 '14 at 09:00

1 Answers1

1

You can use other overloads of Encoding.GetEncoding to handle all cases when an Unicode character can't be converted to your target code page. More information on this MSDN topic. The same could be achieved if you explicitly set the Encoding.EncoderFallback property (link to MSDN).

For example you can use the following to throw an exception every time when conversion of one Unicode character fails:

Encoding enc = Encoding.GetEncoding(28605, EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback);

Note: The default EncoderFallback is System.Text.InternalEncoderBestFitFallback which produces question marks for unknown code points.

Anateus
  • 439
  • 4
  • 13
  • Thank you for your answer. It helps troubleshooting my problem. But the error message I got was actually more confusing. I wrote to the file line by line and got an exception at a particular line with about 10 characters. But the message was, that there was an error at index 262. How can that be? – Marco Frost Apr 11 '14 at 14:31
  • Sorry for the delay. The actual writing with `StreamWriter` is buffered, so the index you see is within this buffer. You can control the buffer size with another constructor overload. You can also control the flushing mode with the Boolean property `AutoFlush`. This means that this `StreamWriter` actually writes its data only if the buffer is full (`false`) or when any of the overridden `Write` methods are called (`true`). The default is `false` but in any case the buffer is also flushed when `StreamWriter` will be disposed and **not** when it is finalized by the GC. – Anateus Apr 22 '14 at 12:48