Storing list in a pandas DataFrame column

Question

I am trying to do some text processing using NLTK and Pandas.

I have DataFrame with column 'text'. I want to add column 'text_tokenized' that will be stored as a nested list.

My code for tokenizing text is:

def sent_word_tokenize(text):
    text = unicode(text, errors='replace')
    sents = sent_tokenize(text)
    tokens = map(word_tokenize, sents)

    return tokens

Currently, I am trying to apply this function as following:

df['text_tokenized'] = df.apply(lambda row: sent_word_tokenize(row.text), axis=1)

Which gives me error:

ValueError: Shape of passed values is (100, 3), indices imply (100, 21)

Not sure how to fix it and what is wrong here.

Hard to say for sure, but looks like axis=1 is a _row_ operation when you have a _column_ of text? — benten, Aug 02 '16 at 00:51
@user2241910, I do not think it is related to axis. You can still retrieve data from row by doing row.text. `df_small['text_tokenized'] = df_small.apply(lambda row: row.text, axis=1)` works well — ymoiseev, Aug 02 '16 at 03:14
@RAVI, I tried wrapping return statement in both tuple and list, but still have similar error: `ValueError: Shape of passed values is (100, 6), indices imply (100, 20)` — ymoiseev, Aug 02 '16 at 03:15

score 2 · Accepted Answer · answered Aug 02 '16 at 03:38

2

Solved my own question by using different axis:

Instead of:

df['text_tokenized'] = df.apply(lambda row: sent_word_tokenize(row.text), axis=1)

I used:

df['text_tokenized'] = df.text.apply(lambda text: sent_word_tokenize(text))

Although I am not sure why it works and I really appreciate if somebody could explain it to me.

answered Aug 02 '16 at 03:38

ymoiseev

416
5
18

1

When you specified `axis=1`, the apply function was operating column-wise(across **all the columns** of the dataframe). But you instead had to do the computation row-wise(across **each row** of the dataframe). Hence, the need to specify `axis=0`. – Nickil Maveli Aug 02 '16 at 10:54

Storing list in a pandas DataFrame column

1 Answers1