0

I have a pandas data frame with shape 1725 rows X 4 columns.

      date     size           state type   
408      1    32000        Virginia  EDU
...

I need to replace the state column with the following numpy array with shape (1725, 52).

[[0. 1. 0. ... 0. 0. 0.]
...
[0. 0. 1. ... 0. 0. 0.]] 

The final result should be like this:

      date     size                   state type   
408      1    32000 [0. 1. 0. ... 0. 0. 0.]  EDU
...

So far I tried the following based on this answer:

col = 2
df.iloc[:, col] = np_arr.tolist()

The problem is that I get this error:

    dataSet.iloc[:, col] = tempData.tolist()
  File "/home/marcus/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 205, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File "/home/marcus/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 527, in _setitem_with_indexer
    "Must have equal len keys and value "
ValueError: Must have equal len keys and value when setting with an ndarray
Marcus
  • 289
  • 1
  • 3
  • 19

1 Answers1

1

I believe you need to try reshaping your array to a turn it into a single feature before actually adding it to the column. This problem often arises when preprocessing. Try with the following:

df['state'] = np_arr.reshape(-1,1)

If that doesn't work, you can try first turning it into an array and then to a list:

df['state'] = np_arr.toarray().tolist()

Working with multiple columns: You can try doing these replacements in a for loop using either list(df) which returns a list of all the column names and then accessing them with their index value or with iloc[]:

cols = list(df) #Get a list with all column names
column_positions = [0,2,4,5] #Here we will select columns in position 0,2,4 and 5
for i in column_positions: 
    df[cols[i]] = np_arr.tolist() #Iterate over those specific columns and replace their values.
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • The second option worked. Thank you. Although np_arr is already in numpy array format so toarray() is not necessary. Do you know how to make this method more generic? For example using the column number instead of its name. – Marcus Feb 13 '20 at 14:51
  • 1
    You can either use `.iloc[]` as you stated in your example, or `list(df)` that will return a list with all column names, then you pass the index value of the column you wish to access and that will work too. – Celius Stingher Feb 13 '20 at 14:52
  • 1
    I couldn't get the `iloc[]` method to work but the `list(df)` solved my problem. Thank you! – Marcus Feb 13 '20 at 14:56