0

I asked a question here which got answers, but changed the topic in another direction. So that's why I formulate the question newly. My old question: File encoding doesn't work

My new question: How can I check, if a character in a string is encodable with a particular encoding? I want to know which character is creating the problems in my original code. I tried it with an approach of an answer I got on my old question, but that just created an error message which doesn't seem to make sense.

The message was, that there was an "error at index 262" on a line of about 10 characters.

That's the code:

string[] Lines = reactor.GetMergedLines();
string fileName = "foo.bar";
try 
{           
    Encoding encoding = Encoding.GetEncoding(28605, EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback);
    for (int i = 0; i < Lines.Length; i++)
    {
        File.WriteAllLines(fileName, Lines, encoding);
    }
}
catch (Exception ex)
{
    MessageBox.Show(ex.Message);
}
Community
  • 1
  • 1
Marco Frost
  • 780
  • 3
  • 12
  • 25
  • Could you try counting the characters written in all lines, including newlines, from the start to see what character 262 is in that situation? – Louis Ingenthron Apr 16 '14 at 14:52
  • If you don't like an exception then don't use EncoderFallback.ExceptionFallback. Use EncoderReplacementFallback instead. Picking the replacement is up to you, nothing is ideal of course. – Hans Passant Apr 16 '14 at 14:57
  • @HansPassant I decided to use the ExceptionFallback to get rid of encoding error replacements. That was the problem in my old question :) – Marco Frost Apr 16 '14 at 15:04
  • Like I pointed out in my old question: In theory all characters from my source file should be encodable to my target encoding without any problems. I want to know, which characters are creating the problems. I have no idea how to tackle this problem in another way... The overall problem is, that some characters in the target file are not properly encoded. I.e. they are replaced by question marks. – Marco Frost Apr 16 '14 at 15:20

1 Answers1

1

Instead of using EncoderFallback.ExceptionFallback you can use EncoderFallback.ReplacementFallback and specify the DefaultString to be used in the case of an unmappable character.

To get the answer to your question, you can roll your own EncoderFallback subclass that provides your own EncoderFallbackBuffer. This buffer is given characters and positions as character encoding is processed.

Here is a quick-and-dirty implementation.

class MyEncoderFallback: EncoderFallback
{
    public override int MaxCharCount { get { return 11; } }
    public override EncoderFallbackBuffer CreateFallbackBuffer()
    {
        return new MyEncoderFallbackBuffer();
    }
}

class MyEncoderFallbackBuffer: EncoderFallbackBuffer
{
    private List<char> _encoded = new List<char>();
    private int _nextIndex = 0;

    public override int Remaining { get { return _encoded.Count - _nextIndex; } }

    public override bool Fallback(char unknownChar, int index)
    {
        var encoded = String.Format("#{0:d4}:{1:x4}#", index, (int)unknownChar);

        _encoded.Clear();
        _encoded.AddRange(encoded.AsEnumerable());

        _nextIndex = 0;

        return true;
    }

    public override bool Fallback(char charUnknownHigh, char charUnknownLow, int index)
    {
        return false;
    }

    public override char GetNextChar()
    {
        char next;
        if(_nextIndex < _encoded.Count)
        {
            next = _encoded[_nextIndex];
            _nextIndex += 1;
        }
        else 
        {
            next = default(char);
        }

        return next;
    }

    public override bool MovePrevious()
    {
        bool result;

        if(_nextIndex > 0)
        {
            _nextIndex -= 1;
            result = true;
        }
        else
        {
            result = false;
        }

        return result;
    }

    public override void Reset()
    {
        _encoded.Clear();
        _nextIndex = 0;     
    }
}

Replace your encoding with the following.

Encoding encoding = Encoding.GetEncoding(28605, new MyEncoderFallback(), DecoderFallback.ExceptionFallback);

In my test, "abcdおはようefgh" is encoded to "abcd#0004:304a##0005:306f##0006:3088##0007:3046#efgh"

Chris Hinton
  • 866
  • 5
  • 15