Fitting sequential datapoints to seasonal curve in R

Question

I'm trying to fit data points that describe a temperature change within a short period to a seasonal curve over one year. Also, I want to measure just how well they fit.

I used modern temperature data to construct a sine curve:

y=0.77*sin(-0.45*x)-1.5

The sequential data points are isotope values coming from shell carbonate (that's why the y-values look unlike any temperature scale) and the analysed shells grow about 13 mm a year, which explains the weird x-values.

The fitted data are sequences of 10-15 points and can cover anything between 6 weeks and half a year. So there is not always a whole lot of temperature change recorded. Also the accuracy of some points can be bad. Those problems are OK with me, because I want to have a measurement of how well their best fit is.

I use this nls() function:

fit=nls(y~-0.77*sin(b*x+c)+d, start=list(b=-0.45,c=0,d=-1.5))

(The reason why I don't want to find a new value for the amplitude is, because R tries to fit the maximum and the minimum of the sine curve into my limited dataset, skewing everything else)

here are some values with good results:

x=c(0.00, 0.16, 0.32, 0.37, 0.81, 1.30, 1.70, 1.95, 2.16, 2.43, 2.57, 2.86, 3.23, 3.57, 4.06, 4.16, 4.52, 4.87, 4.91, 5.27, 5.71, 6.20, 6.75, 7.31, 7.94, 8.50, 9.02, 9.46, 9.93, 10.47, 10.82, 11.13, 11.36, 11.78, 12.25, 12.70, 13.00)
y=c(-1.20, -1.49, -1.37, -1.50, -0.75, -0.80, -0.60, -0.75, -0.70, -1.07, -1.20, -0.70, -0.60, -0.84, -0.76, -0.80, -1.10, -0.88, -1.30, -1.10, -1.40, -1.70, -1.90, -2.10, -2.10, -1.90, -2.60, -2.30, -2.10, -1.80, -1.80, -2.80, -2.10, -1.90, -2.20, -1.90, -1.40)

Formula: y ~ -0.77 * sin(b * x + c) + d

    Parameters:
  Estimate Std. Error t value Pr(>|t|)    
b -0.73227    0.01978  -37.02   <2e-16 ***
c  3.60132    0.17710   20.34   <2e-16 ***
d -1.02625    0.05027  -20.41   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2549 on 25 degrees of freedom

Number of iterations to convergence: 12 
Achieved convergence tolerance: 7.375e-06

I also plotted the values and fitted a line to them to visualise the results.

xyear=c(0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9.0, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 12.0, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 13.0)

bb=coef(fit)[1]
cc=coef(fit)[2]
dd=coef(fit)[3]
lines(xyear,-0.77*sin(bb*xyear+cc)+dd, col=c("blue"))
lines(xyear,-0.77*sin(-0.45*xyear+cc)-1.55)

What I would be interested in here are the residuals (which I had no problems with) and the difference between the starting points (b,c,d) and the new parameters (bb,cc,dd):

b tells me about the shells individual growth rate
c, time of the year the sampling started (how it intersects with the y-axis).
d tells me about the general enrichment of the isotope data (how high or low it is on the y-axis)

And also how sure I can be of each single new parameter. Something that nls() can give me all. Great! But that set of data was also a good one. If I use a shorter one like this

x8t4= 0.447702, 0.788084, 1.214854, 1.669275, 1.947880, 2.422088, 3.539721, 4.654005, 6.709839, 7.000000
y8t4=-1.09, -1.08, -1.23, -1.21, -1.29, -1.33, -1.45, -1.47, -1.35, -1.35

I get results that are just not likely:

Formula: y ~ -0.77 * sin(b * x + c) + d

Parameters:
  Estimate Std. Error t value Pr(>|t|)    
b  -0.3493     0.1221  -2.860 0.028798 *  
c  -1.8694     0.7101  -2.633 0.038914 *  
d  -1.7227     0.2870  -6.002 0.000963 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.05984 on 6 degrees of freedom

Number of iterations to convergence: 7 
Achieved convergence tolerance: 7.59e-07

enter image description here

I can see that there is a seasonal change happening in the dataset. But R fails to recognise it. Maybe because the curve is too shallow?

For other less clear sequences I also get the error messages single gradient or step factor 0.000488281 reduced below 'minFactor'of 0.000976562. I would like to count it as a "Fail" but then I am not sure, how R decides what to do.

The seasonal curve itself is only a rough fit to the temperature change (which is fairly sinusoidal), but I couldn't find a better way to measure the fit. Also the growth rate of the shells varies, which distorts the seasonal curve anyway. Maybe I am just using the wrong function? Any ideas for improvement?

Fitting sinusoidal models is very sensitive to starting values. The answers to this question might be helpful: http://stackoverflow.com/q/18515903/1412059 — Roland, Feb 23 '15 at 10:59
I think you are right, the starting values changed my results a lot. Funnily the link above and the use of fft() to get a good starting point, didn't help with the second set of datapoints. It gives me 1.26 as a result, when I should have something around 0.3. Funnily, it works perfectly for the first dataset, but (again) I don't know why. I could just start to guesstimate the starting values, where it doesn't work. — Niklas, Feb 23 '15 at 14:48

Fitting sequential datapoints to seasonal curve in R

0 Answers0