Removing paragraphs in txt file with R

Question

Using the readLines() function, I have imported a txt file, which stored sentences within multiple paragraphs like this:

sentence1. sentence2. sentence3.

sentence4. sentence5.

sentence6. sentence7.

For further analysis I would like to apply the sentiment_by() function on my imported txt file. When I do so, I receive sentiment values for each paragraph rather than the whole txt file itself. Therefore I want to remove the paragraphs within the txt file so that I receive only one sentiment coefficient. To do so I would need to transform my txt file so that the text looks like this:

sentence1. sentence2. sentence3. sentence4. sentence5. sentence6. sentence7.

If I were to run the sentiment_by() function on this piece of text it would yield one coefficient for the whole text. Is there a way I can transform the text by removing the paragraphs in R before I carry on with the analysis?

Can you be more precise what you mean by paragraphs and what you want to achieve? What do you mean by one block of text? Do you want to remove just some line feeds (carriage returns) or similar? This is easy to achieve without using R, for example in NotePad++. Also check https://en.wikipedia.org/wiki/Newline — Samuel, Aug 03 '20 at 17:54
My txt file has sometimes enters signs in between the text (its structured in paragraphs) meaning if I apply the sentiment_by() function, I get a sentiment coefficient for each text block within my txt file and not the whole file itself. Therefore I would like to make my whole file one text block, so that when running the sentiment_by() function, it analyses the text as a whole. — , Aug 03 '20 at 18:01
Does this answer your question? [remove all line breaks (enter symbols) from the string using R](https://stackoverflow.com/questions/21781014/remove-all-line-breaks-enter-symbols-from-the-string-using-r) — Samuel, Aug 03 '20 at 18:07
Not really. When I run the sentiment_by() function of my file it still get the coefficients for each individual paragraph of text and not the whole block of text itself. — , Aug 03 '20 at 18:17
Please create a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — phiver, Aug 04 '20 at 08:11

score 0 · Answer 1 · answered Aug 03 '20 at 18:10

If each paragraph you grab is a character vector you can strip tabs and newlines away (and other whitespace characters if needed).

trimmed_text = trimws(text_var, which = "both", whitespace = "[\t\r\n]")

There are other things you can tweak as shown here.

Removing paragraphs in txt file with R

1 Answers1