-2

I'm trying to do a polynomial regression in R using lm, I have

Time=c(1980:2016)
y= rnorm(length(Time))

I used :

reg=lm(y~poly(Time,2))
round(reg$coefficients,3)

which gives :

 (Intercept) poly(Time, 2)1 poly(Time, 2)2 
        -0.110         -1.298          0.172

And :

Time2=Time^2
reg2=lm(y~Time+Time2)
round(reg2$coefficients,3)

It gives

(Intercept)        Time       Time2 
   1146.590      -1.128       0.000 

Where is the problem?

Math
  • 1,274
  • 3
  • 14
  • 32
  • "Where is the problem?" - I'm inclined to ask you the same thing. You've fit two different models and gotten two different answers. Where is the problem? – joran Nov 29 '16 at 16:59
  • @ZheyuanLi Time2=Tme^2 – Math Nov 29 '16 at 17:03
  • @joran I used https://www.r-bloggers.com/fitting-polynomial-regression-in-r/ – Math Nov 29 '16 at 17:04
  • @ZheyuanLi, thanks for pointing out the dupe. At least my answer adds a little (I think) to existing answers ... – Ben Bolker Nov 29 '16 at 17:06

1 Answers1

5

By default poly() uses an orthogonal polynomial representation, which is more numerically stable. You can use raw=TRUE if you want to match the naive representation.

set.seed(101)
dd <- data.frame(Time=c(1980:2016),
                y=rnorm(2016-1980+1))
(c1 <- coef(lm(y~Time+I(Time^2),dd)))
##   (Intercept)          Time     I(Time^2) 
##  6.684138e+03 -6.686392e+00  1.672101e-03 
(c2 <- coef(lm(y~poly(Time,2),dd)))
##    (Intercept) poly(Time, 2)1 poly(Time, 2)2 
##    -0.04713527    -0.30359154     1.03594479 
c3 <- coef(lm(y~poly(Time,2,raw=TRUE),dd))
all.equal(unname(c1),unname(c3))  ## TRUE

You'll notice among other things that the intercept and slope given by the raw polynomials refer to the expected values at Time=0, which if you're using Time in CE (Common Era=AD) years is a little ridiculous.

If you want both interpretability and numerical stability you can get a reasonable compromise by centering your time variable (setting time to zero at the beginning of your observation period is also reasonable).

dd$cTime <- dd$Time-mean(dd$Time)
c4 <- coef(lm(y~poly(cTime,2,raw=TRUE),dd))
unname(c4)
## [1] -0.237754839 -0.004674513  0.001672101
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453