0

I've got a dataframe which looks like this:

X1
7 [0005]
8
9 [I/0006]
10 Athenaeum.
11 Eine Zeitschrift

I've got these numbers in brackets throughout the entire dataframe and want to delete all of the brackets as well as their content.

I've tried two approaches so far, both of them haven't worked. With filter I've tried this:

A1_msw<- A1798 %>%
  filter(str_detect(A1_msw, "[:digit:]")) %>% 
  unnest_tokens(output = bigram, input = X1, token = "ngrams", n = 2) %>% 
  na.omit(A1_msw)

The problem here is that since there are sometimes characters in the brackets, it only deletes the rows with only digits in it. It also removes everything with punctuation, which is a problem since there are now words missing. The result looks like this:

bigram
11 i 0006
13 eine zeitschrift

I've also tried it with gsub but I think I've got the syntax wrong (but also don't know how it's supposed to look):

A1_msw<- A1798 %>%
  gsub(A1_msw, "\\[.*\\d]\\","") %>%
  unnest_tokens(output = bigram, input = X1, token = "ngrams", n = 2) %>% 
  na.omit(A1_msw)

I'd appreciate your help!

psh
  • 11
  • 3
  • Removing `[]`s is done with `gsub("\\[[0-9]+]", "", A1_msw)`. Deleting all `[...]` is done with `"\\[[^][]*]"`. See [Remove any text inside square brackets in r](https://stackoverflow.com/questions/52677243/remove-any-text-inside-square-brackets-in-r/52677485#52677485) – Wiktor Stribiżew Mar 16 '21 at 12:18
  • Thank you for your help! It still doesn't work though and I get the following error message: "Error in gsub(., "\\[[0-9]+]", "", A1_msw) : invalid regular expression 'c("[0001]", "\f", ..." – psh Mar 16 '21 at 12:34
  • In your `%>%` based workflow, it should be `gsub("\\[[0-9]+]", "", .)`, I believe. – Wiktor Stribiżew Mar 16 '21 at 13:03
  • Thank you so much for your help, Wiktor! I now got rid of all the extra letters and numbers! – psh Mar 17 '21 at 12:03

0 Answers0