Getting probability value greater than 1 from my glm model

Question

I have created a logistic regression model in r to try to predict the outcome of cricket matches. However, my model produces probability values greater than 1. The output is 1.031704 Any tips on how I could improve my model to get an accurate estimation of probability?

set.seed(1)

#Use 70% of dataset as training set and remaining 30% as testing set
sample <- sample(c(TRUE, FALSE), nrow(ODIMT), replace=TRUE, prob=c(0.7,0.3))
train <- ODIMT[sample, ]
test <- ODIMT[!sample, ] 


model <- glm(Result~Target+Opposition+Country, family="binomial", data=ODIMT)

options(scipen=999)

summary(model)

pscl::pR2(model)["McFadden"]
caret::varImp(model)
car::vif(model)

new <- data.frame(Target = 226,Opposition = "v India", Country = "England")
predict(model, new, type="response")

Result variable is 1 or 0, Target is 0-400, and the other two are character variables.

data:

            Country    Target       Result Opposition    Ground
          England         NA          1     v India   Kolkata
          Australia      251          0  v Pakistan   Kolkata
          South Africa   168          0     v India     Delhi
          Bangladesh      NA          1  v Pakistan     Delhi
          England        306          0 v Australia Melbourne
          New Zealand     NA          1 v Sri Lanka Melbourne

Output of summary:

That shouldn't happen. Please provide a reproducible example (we are missing the data). (Btw. only three lines of your code seem to be relevant.) — Roland, Sep 14 '22 at 10:31
This doesn't seem possible. What do you get if you use predict without type = "response"? This should give you the log odds, which you can convert to probability. There is no value that would lead to p > 1. Please include your data so we can try to replicate this. — Allan Cameron, Sep 14 '22 at 10:33
Any chance you could share the output of `summary(model)`. Were there any warnings or messages when running `glm`? — Benjamin, Sep 14 '22 at 11:10
This is curious. Based on your summary, the predicted `y` would be `y <- 3.215183 + -0.020266 * 226 + 0.097631 + 2.298948` which equals 1.031646 (which is relatively close to your result, and might be rounding error). What happens if you run `predict.glm(model, new, type="response")`? I'm wondering if there's a dispatch issue. (not a very good guess, but it's worth looking at) — Benjamin, Sep 14 '22 at 11:46
In that case, I'd be curious to see what `class(model)` returns. if `predict` is returning the log-odds (~1.03) and `predict.glm` is returning the probability (~0.7), that would suggest that the `predict` generic is not seeing `model` as a `glm` object. — Benjamin, Sep 14 '22 at 12:04

score 1 · Answer 1 · answered Sep 14 '22 at 11:56

I think you are predicting the log-odds value. From the docs:

the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.

As noted in the comments, if you use type="response" you get the predicted probabilities.

Have a look at this question for more info.

Getting probability value greater than 1 from my glm model

1 Answers1