I am trying to run kNN on a dataset but I keep getting some NA error. I have exhausted stack overflow trying to find a solution to this problem. I could not find anything useful anywhere.
This is the dataset I am working with : https://www.kaggle.com/tsiaras/uk-road-safety-accidents-and-vehicles
I have converted every single factor variable and integer variable for my predictor and target to numeric so it can do Euclidean distance. I have removed all the NA's but kNN keeps throwing the following error message :
NAs introduced by coercionNAs introduced by coercionError in knn(train[2:nrow(train), c(11, 22, 23, 25, 27, 28)], test[(2:nrow(test)), :
NA/NaN/Inf in foreign function call (arg 6)
This is one example of how I am converting all the predictors and running kNN :
as.numeric(levels(test$Road_Type))[levels(test$Road_Type)]
as.numeric(levels(train$Road_Type))[levels(train$Road_Type)]
train <- na.exclude(train)
test <- na.exclude(test)
cl=as.numeric(train[2:nrow(train),5])
cl <- na.exclude(cl)
knn0 <- knn(train[2:nrow(train),c(11,22,23,25,27,28)], test[(2:nrow(test)),c(11,22,23,25,27,28)], cl)
I am doing the as.numeric stuff for all the columns 11,22,23,25,27,28 and also the target. I am starting the row at 2 so it doesn't include the labels. I have also tried running the following code before passing the parameters into the kNN function :
sum(is.na(train[2:nrow(train),c(11,22,23,25,27,28)]))
sum(is.na(test[2:nrow(test),c(11,22,23,25,27,28)]))
sum(is.na(cl))
All 3 of these return 0 so there are no NA values before I am passing it into the kNN function.
EDIT
Fixed the issue by converting to numeric like this :
train$Road_Type <- as.numeric(as.integer(factor(train$Road_Type)))
Thanks to everyone who helped!