I've got a dataframe which looks like this:
X1 | |
---|---|
7 | [0005] |
8 | |
9 | [I/0006] |
10 | Athenaeum. |
11 | Eine Zeitschrift |
I've got these numbers in brackets throughout the entire dataframe and want to delete all of the brackets as well as their content.
I've tried two approaches so far, both of them haven't worked. With filter
I've tried this:
A1_msw<- A1798 %>%
filter(str_detect(A1_msw, "[:digit:]")) %>%
unnest_tokens(output = bigram, input = X1, token = "ngrams", n = 2) %>%
na.omit(A1_msw)
The problem here is that since there are sometimes characters in the brackets, it only deletes the rows with only digits in it. It also removes everything with punctuation, which is a problem since there are now words missing. The result looks like this:
bigram | |
---|---|
11 | i 0006 |
13 | eine zeitschrift |
I've also tried it with gsub
but I think I've got the syntax wrong (but also don't know how it's supposed to look):
A1_msw<- A1798 %>%
gsub(A1_msw, "\\[.*\\d]\\","") %>%
unnest_tokens(output = bigram, input = X1, token = "ngrams", n = 2) %>%
na.omit(A1_msw)
I'd appreciate your help!