Pandas Dataframe ValueError: Shape of passed values is (X, ), indices imply (X, Y)

Question

I am getting an error and I'm not sure how to fix it.

The following seems to work:

def random(row):
   return [1,2,3,4]

df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))

df.apply(func = random, axis = 1)

and my output is:

[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]

However, when I change one of the of the columns to a value such as 1 or None:

def random(row):
   return [1,2,3,4]

df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df['E'] = 1

df.apply(func = random, axis = 1)

I get the the error:

ValueError: Shape of passed values is (5,), indices imply (5, 5)

I've been wrestling with this for a few days now and nothing seems to work. What is interesting is that when I change

def random(row):
   return [1,2,3,4]

to

def random(row):
   print [1,2,3,4]

everything seems to work normally.

This question is a clearer way of asking this question, which I feel may have been confusing.

My goal is to compute a list for each row and then create a column out of that.

EDIT: I originally start with a dataframe that hase one column. I add 4 columns in 4 difference apply steps, and then when I try to add another column I get this error.

what are you actually trying to do? using apply with a function that returns a list will try to coerce this to a Series, thus it needs the same length as the original lenght, OR a scalar (including None). — Jeff, Oct 29 '13 at 19:04
Output in your question is not the one you get from apply. Your output in first case is DataFrame with 4 columns, as @Jeff said, it's coersed list into rows. — Roman Pekar, Oct 29 '13 at 19:06
I am trying to add a column to the dataframe. This column is to be filled with a computed value. The computed value is computed from the values of each row. The function random is the thing that computes the value. — user1367204, Oct 29 '13 at 19:08
@RomanPekar I think that the output is the output from apply because apply will run each row through func=random, and that func will print out [1,2,3,4]. I am not sure what you are pointing out. — user1367204, Oct 29 '13 at 19:11
I had this issue, and my solution was just to join my list into a string... then split it after the apply. — pocketfullofcheese, Jul 24 '14 at 03:22
copypasting your code into ipython does not reproduce your exception. Try upgrading to pandas 0.16 or check if you copied your code correctly. Furthermore, replacing return with print will of course not produce the same error, as it will be assigning None to each field of the dataframe, which is valid. But so is a list btw. — firelynx, May 20 '15 at 16:10

Roman Pekar · Accepted Answer · 2013-10-29T19:30:06.493

10

If your goal is add new column to DataFrame, just write your function as function returning scalar value (not list), something like this:

>>> def random(row):
...     return row.mean()

and then use apply:

>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
          A         B         C         D       new
0  0.201143 -2.345828 -2.186106 -0.784721 -1.278878
1 -0.198460  0.544879  0.554407 -0.161357  0.184867
2  0.269807  1.132344  0.120303 -0.116843  0.351403
3 -1.131396  1.278477  1.567599  0.483912  0.549648
4  0.288147  0.382764 -0.840972  0.838950  0.167222

I don't know if it possible for your new column to contain lists, but it deinitely possible to contain tuples ((...) instead of [...]):

>>> def random(row):
...    return (1,2,3,4,5)
...
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
          A         B         C         D              new
0  0.201143 -2.345828 -2.186106 -0.784721  (1, 2, 3, 4, 5)
1 -0.198460  0.544879  0.554407 -0.161357  (1, 2, 3, 4, 5)
2  0.269807  1.132344  0.120303 -0.116843  (1, 2, 3, 4, 5)
3 -1.131396  1.278477  1.567599  0.483912  (1, 2, 3, 4, 5)
4  0.288147  0.382764 -0.840972  0.838950  (1, 2, 3, 4, 5)

edited Oct 29 '13 at 19:30

answered Oct 29 '13 at 19:14

Roman Pekar

107,110
28
195
197

But the return from the function will be a list of items. In other words, the 'new' column with be a bunch of lists. I can't get it to work with returning lists. – user1367204 Oct 29 '13 at 19:18
Would you please give an example? Do you mean that I should return a tuple instead of a list? I tried switching return [1,2,3,4] for return (1,2,3,4) and got the same error. – user1367204 Oct 29 '13 at 19:34
1

This doesn't work on my example because there is one line of code that is different. It is missing df['E'] = 1. I add the column 'E' and then I do apply. I think that that is throwing it all off. The problem that I am working on starts with a dataframe with one column and then I keep doing apply to the dataframe to add columns. I add 4 columns and then when I try to add a fifth column, I get that error. – user1367204 Oct 29 '13 at 19:45
@user1367204 I see, strange – Roman Pekar Oct 29 '13 at 19:48
Does my example work you your machine or is there something that is wrong with my machine? – user1367204 Oct 29 '13 at 19:57
@user1367204 nope, it doesn't work on my machine either. I'll try to understand why is that later – Roman Pekar Oct 29 '13 at 20:06
Thank you very much. If you have a chance to get back or some kind of work-around it would really be appreciated. – user1367204 Oct 29 '13 at 20:11

score -1 · Answer 2 · answered Feb 26 '18 at 02:25

-1

I use the code below it is just fine

import numpy as np    
df = pd.DataFrame(np.array(your_data), columns=columns)

answered Feb 26 '18 at 02:25

KeepLearning

517
7
10

Could you please edit your answer to provide a little more context? How does it solve the original problem? Please see the help section for a great introduction to writing answers here: https://stackoverflow.com/help/answering – Graham Feb 26 '18 at 03:01

Pandas Dataframe ValueError: Shape of passed values is (X, ), indices imply (X, Y)

2 Answers2

Linked