0

I have already checked post1, post2, post3 and post4 but didn't help.
I have a data about a specific plant including two variables called "Age" and "Height". The correlation between them is non-linear. enter image description here To fit a model, one solution I assume is as follows:
If the non-linear function is
enter image description here

then we can bring in a new variable k where
enter image description here

so we have changed the first non-linear function into a multilinear regression one. Based on this, I have the following code:

data['K'] = data["Age"].pow(2)

x = data[["Age", "K"]]
y = data["Height"]

model = LinearRegression().fit(x, y)
print(model.score(x, y)) # = 0.9908571840250205

  1. Am I doing correctly?
  2. How to do with cubic and exponential functions?

Thanks.

Coder
  • 431
  • 4
  • 11

2 Answers2

1

Hopefully you don't have a religious fervor for using SKLearn here because the answer I'm going to suggest is going to completely ignore it.

If you're interested doing regression analysis where you get to have complete autonomy with the fitting function, I'd suggest cutting directly down to the least-squares optimization algorithm that drives a lot of this type of work, which you can do using scipy


import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq

x, y = np.array([0,1,2,3,4,5]), np.array([0,1,4,9,16,25])

# initial_guess[i] maps to p[x] in function_to_fit, must be reasonable
initial_guess = [1, 1, 1] 

def function_to_fit(x, p):
    return pow(p[0]*x, 2) + p[1]*x + p[2]

def residuals(p,y,x):
    return y - function_to_fit(x,p)

cnsts = leastsq(
    residuals, 
    initial_guess, 
    args=(y, x)
)[0]

fig, ax = plt.subplots()
ax.plot(x, y, 'o')

xi = np.arange(0,10,0.1)
ax.plot(xi, [function_to_fit(x, cnsts) for x in xi])

plt.show()

graph of example code

Now this is a numeric approach to the solution, so I would recommend taking a moment to make sure you understand the limitations of such an approach - but for problems like these I've found they're more than adequate for functionalizing non-linear data sets without trying to do some hand-waving to make it if inside a linearizable manifold.

Michael Green
  • 719
  • 6
  • 15
1

for cubic polynomials

data['x2'] = data["Age"].pow(2)
data['x3'] = data["Age"].pow(3)

x = data[["Age", "x2","x3"]]
y = data["Height"]

model = LinearRegression().fit(x, y)
print(model.score(x, y))

you can handle exponential data by fitting log(y). or find some library that can fit polynomials automatically t.ex: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html