3

I am trying to generate a 7th column in a dataframe:

arb_ser_num       = 'zDfDD45'
predefined_number = 878

                 DATE                    Q1    Q2    Q3    Q4    Q5
0 2012-08-20 00:00:00   [Atlantic, Z, dEdd]  None  None  None  None 
1 2012-08-21 00:00:00    [Pacific, Y, dEdd]  None  None  None  None
2 2012-08-22 00:00:00     [Indian, Y, dRdd]  None  None  None  None
3 2012-08-23 00:00:00    [Meditar, Z, dEdd]  None  None  None  None
4 2012-08-24 00:00:00     [Arctic, Z, dRdd]  None  None  None  None


df['Q6'] = df.apply(lambda row: get_q6(arb_ser_num, row, predefined_number), axis = 1)

Sometimes get_q6 will return [1,2,3,4,5] and other times it will return [None]. I keep getting the error:

Shape of passed values is (5,), indices imply (5, 6)

and I am not sure how to fix it. I found something similar here but I don't think it applies to me. I am trying to track ocean temperatures/currents.

Community
  • 1
  • 1
user1367204
  • 4,549
  • 10
  • 49
  • 78
  • can you show the bug in example? something like df['Q6'] = df.apply(lambda x: None if x['Q5'] == 1 else [1,2,3,4,5], axis=1) on sample dataframe where Q5 in [1,2,3] works fine – Roman Pekar Oct 28 '13 at 07:09
  • did you try to use `axis=0`? This should be the case since you want to apply the function for each row... – Saullo G. P. Castro Oct 28 '13 at 13:34
  • I just tried it and it didn't work. According to http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html axis=1 is to apply to each row. – user1367204 Oct 28 '13 at 13:39
  • @RomanPekar Do you mean that I should post the entire error message? – user1367204 Oct 28 '13 at 16:08
  • It seems the error is in ``get_q6``. Can you give us a specific case where the output of ``get_q6`` is different from the expected output? – silvado Oct 29 '13 at 08:39
  • I've reduced this problem to its core error and posted it here: http://stackoverflow.com/questions/19666904/pandas-dataframe-valueerror-shape-of-passed-values-is-x-indices-imply-x – user1367204 Oct 29 '13 at 19:01

2 Answers2

3

I also experienced this error. It turned out that the pandas Time Series data type was causing the problem. When I applied the function with the time expressed in epoch (or anything) success, but with the time converted to pandas Time Series, there was this error. So my suggestion would be to convert to Time Series after you apply the function, which obviously is contingent that you don't need your time variable in the function being applied.

*apply function not tested with pandas Time Spans.

limbsical
  • 31
  • 2
1

Solution, TL;DR

Make the function return equal number of elements as the number of columns in the original dataframe. So in this case, make get_q6 return 6 elements so the returned array's first row has exactly 6 elements.

Reason

Going thru the Pandas source code. In your case, original dataframe has shape implied=(5,6). So internals.construction_error() inside Pandas tries to verify if the returned array after applying the function get_q6 has the same shape.

In the returned array, you have 5 rows as you are applying the func on each row. Now to find the column, it takes the first row of the returned array. If get_q6 had 6 elements, then it would verify that they both have the shape (5,6).

But in your case, the returned array has either 5 elements (when get_q6 returns [1,2,3,4,5]) or just 1 (when get_q6 returns [None]), NOT 6 elements as it wants. Probably, in the first row get_q6 returns[None]. So the shape of the returned array is calculated aspassed=(5,1)`.

Finally, implied==passed evaluates false and it throws an error.

Asif Rehan
  • 983
  • 2
  • 8
  • 25