19

I am getting an error and I'm not sure how to fix it.

The following seems to work:

def random(row):
   return [1,2,3,4]

df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))

df.apply(func = random, axis = 1)

and my output is:

[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]

However, when I change one of the of the columns to a value such as 1 or None:

def random(row):
   return [1,2,3,4]

df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df['E'] = 1

df.apply(func = random, axis = 1)

I get the the error:

ValueError: Shape of passed values is (5,), indices imply (5, 5)

I've been wrestling with this for a few days now and nothing seems to work. What is interesting is that when I change

def random(row):
   return [1,2,3,4]

to

def random(row):
   print [1,2,3,4]

everything seems to work normally.

This question is a clearer way of asking this question, which I feel may have been confusing.

My goal is to compute a list for each row and then create a column out of that.

EDIT: I originally start with a dataframe that hase one column. I add 4 columns in 4 difference apply steps, and then when I try to add another column I get this error.

Community
  • 1
  • 1
user1367204
  • 4,549
  • 10
  • 49
  • 78
  • 1
    what are you actually trying to do? using apply with a function that returns a list will try to coerce this to a Series, thus it needs the same length as the original lenght, OR a scalar (including None). – Jeff Oct 29 '13 at 19:04
  • Output in your question is not the one you get from apply. Your output in first case is DataFrame with 4 columns, as @Jeff said, it's coersed list into rows. – Roman Pekar Oct 29 '13 at 19:06
  • I am trying to add a column to the dataframe. This column is to be filled with a computed value. The computed value is computed from the values of each row. The function random is the thing that computes the value. – user1367204 Oct 29 '13 at 19:08
  • @RomanPekar I think that the output is the output from apply because apply will run each row through func=random, and that func will print out [1,2,3,4]. I am not sure what you are pointing out. – user1367204 Oct 29 '13 at 19:11
  • I had this issue, and my solution was just to join my list into a string... then split it after the apply. – pocketfullofcheese Jul 24 '14 at 03:22
  • 1
    This doesn't seem to be happening in 0.16 version of pandas – fixxxer Apr 28 '15 at 14:27
  • copypasting your code into ipython does not reproduce your exception. Try upgrading to pandas 0.16 or check if you copied your code correctly. Furthermore, replacing return with print will of course not produce the same error, as it will be assigning None to each field of the dataframe, which is valid. But so is a list btw. – firelynx May 20 '15 at 16:10

2 Answers2

10

If your goal is add new column to DataFrame, just write your function as function returning scalar value (not list), something like this:

>>> def random(row):
...     return row.mean()

and then use apply:

>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
          A         B         C         D       new
0  0.201143 -2.345828 -2.186106 -0.784721 -1.278878
1 -0.198460  0.544879  0.554407 -0.161357  0.184867
2  0.269807  1.132344  0.120303 -0.116843  0.351403
3 -1.131396  1.278477  1.567599  0.483912  0.549648
4  0.288147  0.382764 -0.840972  0.838950  0.167222

I don't know if it possible for your new column to contain lists, but it deinitely possible to contain tuples ((...) instead of [...]):

>>> def random(row):
...    return (1,2,3,4,5)
...
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
          A         B         C         D              new
0  0.201143 -2.345828 -2.186106 -0.784721  (1, 2, 3, 4, 5)
1 -0.198460  0.544879  0.554407 -0.161357  (1, 2, 3, 4, 5)
2  0.269807  1.132344  0.120303 -0.116843  (1, 2, 3, 4, 5)
3 -1.131396  1.278477  1.567599  0.483912  (1, 2, 3, 4, 5)
4  0.288147  0.382764 -0.840972  0.838950  (1, 2, 3, 4, 5)
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • But the return from the function will be a list of items. In other words, the 'new' column with be a bunch of lists. I can't get it to work with returning lists. – user1367204 Oct 29 '13 at 19:18
  • Would you please give an example? Do you mean that I should return a tuple instead of a list? I tried switching return [1,2,3,4] for return (1,2,3,4) and got the same error. – user1367204 Oct 29 '13 at 19:34
  • 1
    This doesn't work on my example because there is one line of code that is different. It is missing df['E'] = 1. I add the column 'E' and then I do apply. I think that that is throwing it all off. The problem that I am working on starts with a dataframe with one column and then I keep doing apply to the dataframe to add columns. I add 4 columns and then when I try to add a fifth column, I get that error. – user1367204 Oct 29 '13 at 19:45
  • @user1367204 I see, strange – Roman Pekar Oct 29 '13 at 19:48
  • Does my example work you your machine or is there something that is wrong with my machine? – user1367204 Oct 29 '13 at 19:57
  • @user1367204 nope, it doesn't work on my machine either. I'll try to understand why is that later – Roman Pekar Oct 29 '13 at 20:06
  • Thank you very much. If you have a chance to get back or some kind of work-around it would really be appreciated. – user1367204 Oct 29 '13 at 20:11
-1

I use the code below it is just fine

import numpy as np    
df = pd.DataFrame(np.array(your_data), columns=columns)
KeepLearning
  • 517
  • 7
  • 10
  • Could you please edit your answer to provide a little more context? How does it solve the original problem? Please see the help section for a great introduction to writing answers here: https://stackoverflow.com/help/answering – Graham Feb 26 '18 at 03:01