-3

I read HTML text with the following command: text <- read_html("linkoftext") The result is:

"Veri analizi, farklı iş, bilim ve sosyal bilim alanlarında çeşitli isimler altında çeşitli teknikleri kapsayan çok yönlü ve farklı yaklaşımlara sahiptir. Veri entegrasyonu veri analizinin öncüsüdür."

This text includes letters like "ş ç ü ö ı". I need to substitute them to "s c u o i". I wrote the following code:

string <- "ş ç ı ğ ü ö f s x q"
chartr("ş ç ı ğ ü ö", "s c i g u o", string)

How can apply the code above to my text, since I have still these letters as the output of chartr?

Artem
  • 3,304
  • 3
  • 18
  • 41
SBA
  • 259
  • 1
  • 2
  • 10
  • add an example of `text` to your question. – Andre Elrico Sep 25 '18 at 08:56
  • It is an html text file. I have taken it like that `text<- read_html("linkoftext").` if you are asking to see how it is look like >> `"Veri analizi, farklı iş, bilim ve sosyal bilim alanlarında çeşitli isimler altında çeşitli teknikleri kapsayan çok yönlü ve farklı yaklaşımlara sahiptir. Veri entegrasyonu veri analizinin öncüsüdür."` As you see from here, I need to change some letters, because R do not recognize them. – SBA Sep 25 '18 at 09:23
  • 1
    **I dont see here** that you have to change some letters because R does not recognize. – Andre Elrico Sep 25 '18 at 09:27
  • Okay, better to say, when I try to create a word cloud of this text, I can not see these letters in the word cloud. For example; ü seen as something like " t' ". I mean it changes these letters with different 'letters'. So my wordcloud seem meaningless. – SBA Sep 25 '18 at 09:30
  • I believe it has something to to with [TEXT ENCODING](https://www.google.de/search?safe=active&rlz=1C1CHBD_deDE736DE736&ei=BwGqW_6bCo_isAertKSIAQ&q=r+text+encoding+stackoverflow&oq=r+text+encoding+stackoverflow&gs_l=psy-ab.3..33i160k1j33i10k1.3351.5974.0.6087.14.12.0.0.0.0.316.1612.0j6j2j1.9.0....0...1c.1.64.psy-ab..5.9.1606...33i22i29i30k1j33i13i21k1.0.klHUj6YE2c4) – Andre Elrico Sep 25 '18 at 09:36
  • 1
    I found the solution from here `https://stackoverflow.com/questions/47944331/keeping-turkish-characters-with-the-text-mining-package-for-r` . Thank you. – SBA Sep 25 '18 at 09:47
  • 1
    @SBA, you can post the answer to your question and accept it. – Artem Sep 26 '18 at 09:02

1 Answers1

0

Your locale setting is different from native, i.e. Turkish. So to change locale you can use Sys.setlocale function, please see the code below:

Sys.setlocale("LC_CTYPE", "Turkish") # switch to Turkish locale
string <- "ş ç ı ğ ü ö f s x q"
string
# [1] "ş ç ı ğ ü ö f s x q"

chartr("şçığüö", "sciguo", string)
# [1] "s c i g u o f s x q"

Sys.setlocale("LC_CTYPE", "") # switch to native locale
Artem
  • 3,304
  • 3
  • 18
  • 41