2

I have two panda dataframe: price and sales dataframe.

price dataframe records price for each product (columns) in each year (index)

    |a  |b  |c  |d  |e  |
2018|3.2|4.5|5.6|7.8|8.1|
2017|6.2|1.5|2.6|7.8|2.1|
2016|2.2|9.5|0.6|6.8|4.1|
2015|2.2|6.5|7.6|7.8|2.1|

sales dataframe (see below) records sales for each product (columns) in each year (index)

    |a  |b  |c  |d  |e  |
2018|101|405|526|108|801|
2017|601|105|726|308|201|
2016|202|965|856|408|411|
2015|322|615|167|458|211|

I would like to calculate spearman correlation between price and sales for each year. I know scipy.stats.spearmanr function does the similar job, but I need to apply scipy.stats.spearmanr fucction for each row in the two dataframes.

For example, for 2018, i need to calculate the spearman correlation between

    |a  |b  |c  |d  |e  |
2018|3.2|4.5|5.6|7.8|8.1|

and

    |a  |b  |c  |d  |e  |
2018|101|405|526|108|801|

May I know what is the best to do that? The results i want a output like below:

2018|spearman cor btw price and sales in 2018
2017|spearman cor btw price and sales in 2017
2016|spearman cor btw price and sales in 2016
AAA
  • 695
  • 1
  • 7
  • 21

1 Answers1

1

Guess you could do

import scipy.stats as st

>>> pd.Series(map(lambda k: st.spearmanr(k[0], k[1])[0],
                  zip(df.values, df2.values)),    
              index=df.index)
2018    0.7
2017    0.6
2016    0.3
2015    0.2
dtype: float64
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • How I make to transform this answer in a df with columns? – Vivian Jul 20 '21 at 14:10
  • @Vivian what do you mean? – rafaelc Jul 20 '21 at 14:11
  • I wanted it to return a df with the column 'year' and 'spearman corr' – Vivian Jul 20 '21 at 14:15
  • Just call `.reset_index()` at the end – rafaelc Jul 20 '21 at 14:17
  • I have another question. In my case the correlation returns some values ​​like 'NaN', but in both dfs where I compare all columns, there are values ​​('float64'). What can it be? – Vivian Jul 20 '21 at 14:36
  • Maybe in this cases there isn't a correlation ? Maybe that's why it's returning 'NaN' – Vivian Jul 20 '21 at 14:41
  • @Vivian there might be many reasons for that.. from bugs in your code (e.g. strings instead of floats or very small numbers etc) to reasons like https://stackoverflow.com/questions/59002624/why-i-get-nan-in-spearman-correlation-in-python and https://stackoverflow.com/questions/32115900/python-scipy-scipy-stats-spearmanr-returning-nans) – rafaelc Jul 20 '21 at 14:46