2

I have dataframe with categorical variable Segment

ID  Segment Var
1   AAA     1
2   BBB     0
3   BBB     1
4   AAA     1
5   CCC     1
6   AAA     0 
7   AAA     1
8   AAA     0
9   BBB     0
10  CCC     0

And I would like to transform column Segment into 3 category like this:

ID  SegmentAAA  SegmentBBB  SegmentCCC
1   1           null        null
2   null        0           null
3   null        1           null
4   1           null        null
5   null        null        1
6   0           null        null
7   1           null        null
8   0           null        null
9   null        0           null
10  null        null        0

Could you please help me with that. Thank you very much.

hta
  • 93
  • 7

4 Answers4

5

Is that pivot ?

df.pivot(*df.columns)
Out[70]: 
Segment  AAA  BBB  CCC
ID                    
1        1.0  NaN  NaN
2        NaN  0.0  NaN
3        NaN  1.0  NaN
4        1.0  NaN  NaN
5        NaN  NaN  1.0
6        0.0  NaN  NaN
7        1.0  NaN  NaN
8        0.0  NaN  NaN
9        NaN  0.0  NaN
10       NaN  NaN  0.0
BENY
  • 317,841
  • 20
  • 164
  • 234
3

Use:

df.set_index(['ID','Segment'])['Var']\
  .unstack()\
  .add_prefix('Segment')\
  .rename_axis([None], axis=1)\
  .reset_index()

Output:

   ID  SegmentAAA  SegmentBBB  SegmentCCC
0   1         1.0         NaN         NaN
1   2         NaN         0.0         NaN
2   3         NaN         1.0         NaN
3   4         1.0         NaN         NaN
4   5         NaN         NaN         1.0
5   6         0.0         NaN         NaN
6   7         1.0         NaN         NaN
7   8         0.0         NaN         NaN
8   9         NaN         0.0         NaN
9  10         NaN         NaN         0.0

Option 2:

pd.crosstab(df.ID,df.Segment,df.Var,aggfunc='first')
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    Thank you very much for your quick help. I used second option. May I ask you how i could do it also for more Segment columns? (E.g. Segment1, Segment2, Segment3 etc.) I tried this but it does not work: pd.crosstab(df.ID,df.Segment,[col for col in df.columns if 'Segm' in col],aggfunc='first') – hta Jun 14 '18 at 17:47
  • You could reshape the dataframe and stack all those Segment columns into one column and do the same. If would create a new question with the appropriate data, we'll see what is most efficient. – Scott Boston Jun 14 '18 at 17:50
2
pd.get_dummies(df).drop('Var', axis=1)
harvpan
  • 8,571
  • 2
  • 18
  • 36
0

Pivot would be the best option. If you don't want ID to be the index and to get the column names as indicated, this is a little more specific than Wen's answer

df.pivot(index='ID',columns='Segment',values='Var').add_prefix('Segment').reset_index()
Segment  ID  SegmentAAA  SegmentBBB  SegmentCCC
0         1         1.0         NaN         NaN
1         2         NaN         0.0         NaN
2         3         NaN         1.0         NaN
3         4         1.0         NaN         NaN
4         5         NaN         NaN         1.0
5         6         0.0         NaN         NaN
6         7         1.0         NaN         NaN
7         8         0.0         NaN         NaN
8         9         NaN         0.0         NaN
9        10         NaN         NaN         0.0
msolomon87
  • 56
  • 1
  • 6