0

In logestic regression algorithm in scikit-learn library of python, there is a "class_weight" argument. I wish to know what is the mathematical principal of realizing setting class_weight during model fitting. Is it related to modify the target function:

https://drive.google.com/open?id=16TKZFCwkMXRKx_fMnn3d1rvBWwsLbgAU

And what is the specific modification?

Thank you in advance! I will appreciate any help from you!

Yeping Sun
  • 405
  • 1
  • 6
  • 18

1 Answers1

0

Yes, it affects the loss function and it is very commonly used when your labels are imbalanced. Mathematically, the loss function just becomes a weighted average of per sample losses where the weights depend on the class of the given sample. If no class_weight is used, then all samples are weighted uniformly (as in the picture you attached).

The idea is to punish mistakes on predictions of underrepresented classes more than mistakes on the overrepresented classes.

See a more detailed discussion here.

Jan K
  • 4,040
  • 1
  • 15
  • 16
  • 1
    Thank you Jan K. Could you directly give the modified loss function with assigned class_weight? In logestic regression code of scikit-learn, it reads: # Logistic loss is the negative of the log of the logistic function. out = -np.sum(sample_weight * log_logistic(yz)) + .5 * alpha * np.dot(w, w). It seems different from the formula I gave in my above question. How to understand this formula? – Yeping Sun Apr 23 '19 at 13:50