I have following set of data (pandas.DataFrame
) which I would like to use scipy.interpolate.UnivariateSpline
to fit. Let's call the data data
.
Date
2018-04-02 09:00:00 16249
2018-04-02 10:00:00 45473
2018-04-02 11:00:00 32050
2018-04-02 12:00:00 35898
2018-04-02 13:00:00 21577
2018-04-02 14:00:00 30545
2018-04-02 15:00:00 60925
2018-04-02 16:00:00 47124
2018-04-03 09:00:00 18534
2018-04-03 10:00:00 36064
2018-04-03 11:00:00 32387
2018-04-03 12:00:00 15903
2018-04-03 13:00:00 22291
2018-04-03 14:00:00 26367
2018-04-03 15:00:00 66269
2018-04-03 16:00:00 38478
2018-04-04 09:00:00 15803
2018-04-04 10:00:00 22511
2018-04-04 11:00:00 33123
2018-04-04 12:00:00 21000
2018-04-04 13:00:00 23132
2018-04-04 14:00:00 39270
2018-04-04 15:00:00 102544
2018-04-04 16:00:00 143421
2018-04-04 17:00:00 200
2018-04-05 09:00:00 23377
2018-04-05 10:00:00 52089
2018-04-05 11:00:00 99298
2018-04-05 12:00:00 24627
2018-04-05 13:00:00 33467
2018-04-05 14:00:00 26498
2018-04-05 15:00:00 114794
2018-04-05 16:00:00 44904
2018-04-06 09:00:00 12180
2018-04-06 10:00:00 41658
2018-04-06 11:00:00 64066
2018-04-06 12:00:00 12517
2018-04-06 13:00:00 12610
2018-04-06 14:00:00 43544
2018-04-06 15:00:00 65533
2018-04-06 16:00:00 123885
2018-04-09 09:00:00 13425
2018-04-09 10:00:00 38354
2018-04-09 11:00:00 59491
2018-04-09 12:00:00 21402
2018-04-09 13:00:00 24550
2018-04-09 14:00:00 25189
2018-04-09 15:00:00 67751
2018-04-09 16:00:00 16071
2018-04-10 09:00:00 35587
2018-04-10 10:00:00 58667
2018-04-10 11:00:00 41831
2018-04-10 12:00:00 35196
2018-04-10 13:00:00 22611
2018-04-10 14:00:00 23070
2018-04-10 15:00:00 40819
2018-04-10 16:00:00 20337
2018-04-11 09:00:00 7962
2018-04-11 10:00:00 23982
2018-04-11 11:00:00 21794
2018-04-11 12:00:00 16835
2018-04-11 13:00:00 16821
2018-04-11 14:00:00 13270
2018-04-11 15:00:00 34954
2018-04-11 16:00:00 15772
2018-04-12 09:00:00 8587
2018-04-12 10:00:00 47950
2018-04-12 11:00:00 24742
2018-04-12 12:00:00 16743
2018-04-12 13:00:00 21917
2018-04-12 14:00:00 43272
2018-04-12 15:00:00 50630
2018-04-12 16:00:00 104656
2018-04-13 09:00:00 15282
2018-04-13 10:00:00 30304
2018-04-13 11:00:00 65737
2018-04-13 12:00:00 17467
2018-04-13 13:00:00 10439
2018-04-13 14:00:00 19836
2018-04-13 15:00:00 52051
2018-04-13 16:00:00 99462
what I have done so far is:
import matplotlib.pyplot as plt
import numpy as np
import scipy.interpolate as interp
x = [i for i in range(1, data.size+1)] # this gives x as an array from 1 to 82.
spl = interp.UnivariateSpline(x, data.values, s=0.5)
xx = np.linspace(min(x), max(x), 1000) # 1000 is an arbitrary number here.
plt.plot(x, data.values, 'bo')
plt.plot(xx, spl(xx), 'r')
plt.show()
# the plot is below and it seems to be very linear and does not look like a cubic spline at all. Cubic Spline is the default.
when I run spl
against x
, others remain unchanged, which is:
plt.plot(x, spl(x), 'r')
I get following:
the only different is the y axis is topped at 14,000, which seems to mean the previous plot showed some degree of curvature. (or not?)
I am not sure what I am missing here but I apparently missed something. I am still very new to spline
fitting in python generally.
can you tell me how I can correctly spline fit my time series above?
EDIT
upon comment from you, I wanted to add another plot to hopefully explain myself a bit better. I didn't really mean it is linear but I couldn't find a better word. To illustrate,
xxx = [10,20,40,60,80]
plt(x, data.values, 'bo')
plt(xx, sp(xx), 'r')
plt.show()
I think below plot looks reasonably linear-ish in my sense. I am guessing, probably my question should be, how scipy.UnivariateSpline
really works?
does it only show the plot for the values evaluated at the points we supplied (e.g. for this plot it is xxx
) ?
I was expecting a much smoother plot with decent curvature demonstrated. this question's answer is showing a plot that I would expect; it looks more like a plot that piece-wise cubic functions would generate, whereas mine looks, to me, and compared to that plot, linear-ish (or first order if it is more appropriate.)