2

Is there a standard way in R to transliterate ASCII HTML codes to a standard character? For example, ' is an apostrophe, like ' or ' (I typed an apostrophe for the second one and the HTML code for the first). I'd like to change the following text

text = "Met with Mark's boss today to discuss performance"

to be

"Met with Mark's boss today to discuss performance"

I tried using iconv like below but the HTML code is all valid encoding, so nothing changes.

iconv(text, from="ASCII", to="UTF-8//TRANSLIT")

I could get a lookup table and do it that way but thought I'd check if there's an existing method to accomplish this.

Mark
  • 4,387
  • 2
  • 28
  • 48
  • if there isn't a function please do a lookuptable (its only 127 cases) and post your answer as reference :-) – Andre Elrico Nov 28 '18 at 15:31
  • Isnt that what your [LOOKING](https://stackoverflow.com/questions/42724885/convert-html-entity-to-proper-character-r) for? – Andre Elrico Nov 28 '18 at 15:33
  • HTML doesn't have anything to do with ASCII. Those are Unicode codepoints formatted as HTML numeric character entity references. All characters in an HTML document are Unicode, after the document encoding is applied while reading. – Tom Blodget Nov 29 '18 at 00:19
  • See [this](https://stackoverflow.com/a/22157168/1548942) answer. – Davor Josipovic Dec 23 '19 at 15:38

0 Answers0