Having read utf8 everywhere I attempted to change some of my code to use std::string. I assumed if I set a std::string to u8"€" (that's the euro symbol AltGr+4 on my keyboard) the std::string would have 3 bytes containing the unicode code (\U20AC) for the euro symbol. It doesn't. Consider
std::string x[] = {"€", u8"€", u8"\€", "\u20AC", u8"\u20AC"}
size_t size[] = {x[0].size(), x[1].size(), x[2].size(), x[3].size(), x[4].size()};
If I view the results in the debugger local variables I see
x[] = {"€", "€", "â??", "€", "€"}
and
size[] = {1, 1, 3, 3, 3}
From what I can see the last two are the only ones that give me the expected result. I'm obviously missing something to do with string literals but I'm also puzzled how the debugger shows the correct string for the first two given it thinks they're one char long and (int64_t(x[0].c_str()[0]) == int64_t(x[1].c_str()[0]) == -128
.
Also why does '€' == '\€' but "€" != "\€" and u8"€" != u8"\€".(Edit: ignore this. Remy pointed out my error below re comparing char pointers).
The results also beg the question what is the purpose of the u8 string literal prefix?
Can anybody explain before I revert to wchar_t?
I'm on Windows 10 using RAD studio 10.2.
Edit: Tried it with various non-ASCII Unicode characters using the character map facility. Couldn't get it to work with any of them. size() was always 1 and the debugger showed a different character (often '?') to the one I used. I'm using the surface pro type cover and, from what I can find, there's no way to enter random Unicode chars using the keyboard (apart from €). Strictly backslashed codes for me from now on. Glad I've cleared it up even if I did waste a whole day. Thanks all.