3
Row<-c(1,2,3,4,5)
Content<-c("I love cheese", "whre is the fish", "Final Countdow", "show me your s", "where is what")
Data<-cbind(Row, Content)
View(Data)

I wanted to create a function which tells me how many words are wrong per Row.

A intermediate step would be to have it look like this:

Row<-c(1,2,3,4,5)
Content<-c("I love cheese", "whre is the fs", "Final Countdow", "show me your s", "where is     what")
MisspelledWords<-c(NA, "whre, fs", "Countdow","s",NA)
Data<-cbind(Row, Content,MisspelledWords)

I know that i have to use aspell but i'm having problems to perform aspell on only rows and not always directly on the whole file, finally i want to Count how many words are wrong on every Row For this i would take code of: Count the number of words in a string in R?

Community
  • 1
  • 1
Carlo
  • 397
  • 1
  • 3
  • 14

2 Answers2

6

To use aspell you have to use a file. It's pretty straightforward to use a function to dump a column to a file, run aspell and get the counts (but it will not be all that efficient if you have a large matrix/dataframe).

countMispelled <- function(words) {  

  # do a bit of cleanup (if necessary)
  words <- gsub("  *", " ", gsub("[[:punct:]]", "", words))

  temp_file <- tempfile()
  writeLines(words, temp_file);
  res <- aspell(temp_file)
  unlink(temp_file)  

  # return # of mispelled words
  length(res$Original)

}

Data <- cbind(Data, Errors=unlist(lapply(Data[,2], countMispelled)))

Data

##      Row Content             Errors
## [1,] "1" "I love cheese"     "0"   
## [2,] "2" "whre is thed fish" "2"   
## [3,] "3" "Final Countdow"    "1"   
## [4,] "4" "show me your s"    "0"   
## [5,] "5" "where is what"     "0"  

You might be better off using a data frame vs a matrix (I just worked with what you provided) since you can keep Row and Errors numeric that way.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
6

Inspired by this article, here's a try with which_misspelled and check_spelling in library(qdap).

library(qdap)

# which_misspelled
n_misspelled <- sapply(Content, function(x){
  length(which_misspelled(x, suggest = FALSE))
})

data.frame(Content, n_misspelled, row.names = NULL)
#             Content n_misspelled
# 1     I love cheese            0
# 2    whre is the fs            2
# 3    Final Countdow            1
# 4    show me your s            0
# 5 where is     what            0


# check_spelling
df <- check_spelling(Content, n.suggest = 0)                        

n_misspelled <- as.vector(table(factor(df$row, levels = Row)))

data.frame(Content, n_misspelled)
#             Content n_misspelled
# 1     I love cheese            0
# 2    whre is the fs            2
# 3    Final Countdow            1
# 4    show me your s            0
# 5 where is     what            0
Henrik
  • 65,555
  • 14
  • 143
  • 159