0

What am I doing wrong with the pandas package when pivoting a list of lists?

paramMatrix is a list of lists:

def populateMatrix():
    mydir="C:\\Python27"
    os.chdir(mydir)
    paramMatrix=[]
    for file in glob.glob("*.txt1"):
        print (file)
        with open(mydir+'\\'+file) as f:
            for line in f:
                paramMatrix.append((file + '=' + line).split('='))
    doPivot(paramMatrix)


def doPivot(data):
    df=pandas.DataFrame(data, columns=['Fruit', 'Shop', 'Price'])
    print (df.pivot(index='Fruit', columns='Shop', values='Price'))

The result I'm currently getting is:

Shop                            paframC paramA paramB    paramC  \
Fruit                                                             
New Text Document.txt1  nordfggdfgmal\n   Y3\n    NaN       NaN   
file.txt1                           NaN    Y\n   30\n  normal\n   

Shop                                          paramD           paramE  \
Fruit                                                                   
New Text Document.txt1  SOMEdgdfg_ITEM_IN_ALL_CAPS\n  5 6 7 4448 9 \n   
file.txt1                    SOME_ITEM_IN_ALL_CAPS\n     5 6 7 8 9 \n   

Shop                                      paramF paramG   pardamB  \
Fruit                                                               
New Text Document.txt1  /dir/7y7456456to/stuff\n    NaN  305456\n   
file.txt1                        /dir/to/stuff\n      y       NaN   

Shop                                     parfamG  
Fruit                                             
New Text Document.txt1  y33333333333333333333333  
file.txt1                                    NaN 

the list of lists is paramMatrix, and it looks like this before running the doPivot function:

[['file.txt1', 'paramA', 'Y\n'], ['file.txt1', 'paramB', '30\n'], ['file.txt1', 'paramC', 'normal\n'], ['file.txt1', 'paramD', 'SOME_ITEM_IN_ALL_CAPS\n'], ['file.txt1', 'paramE', '5 6 7 8 9 \n'], ['file.txt1', 'paramF', '/dir/to/stuff\n'], ['file.txt1', 'paramG', 'y'], ['New Text Document.txt1', 'paramA', 'Y3\n'], ['New Text Document.txt1', 'pardamB', '305456\n'], ['New Text Document.txt1', 'paframC', 'nordfggdfgmal\n'], ['New Text Document.txt1', 'paramD', 'SOMEdgdfg_ITEM_IN_ALL_CAPS\n'], ['New Text Document.txt1', 'paramE', '5 6 7 4448 9 \n'], ['New Text Document.txt1', 'paramF', '/dir/7y7456456to/stuff\n'], ['New Text Document.txt1', 'parfamG', 'y33333333333333333333333']]

or formatted in an easier to read format:

[['file.txt1', 'paramA', 'Y\n']
, ['file.txt1', 'paramB', '30\n']
, ['file.txt1', 'paramC', 'normal\n']
, ['file.txt1', 'paramD', 'SOME_ITEM_IN_ALL_CAPS\n']
, ['file.txt1', 'paramE', '5 6 7 8 9 \n']
, ['file.txt1', 'paramF', '/dir/to/stuff\n']
, ['file.txt1', 'paramG', 'y']
, ['New Text Document.txt1', 'paramA', 'Y3\n']
, ['New Text Document.txt1', 'pardamB', '305456\n']
, ['New Text Document.txt1', 'paframC', 'nordfggdfgmal\n']
, ['New Text Document.txt1', 'paramD', 'SOMEdgdfg_ITEM_IN_ALL_CAPS\n']
, ['New Text Document.txt1', 'paramE', '5 6 7 4448 9 \n']
, ['New Text Document.txt1', 'paramF', '/dir/7y7456456to/stuff\n']
, ['New Text Document.txt1', 'parfamG', 'y33333333333333333333333']
]

doPivot is a function from here

I have a list of lists:

+-------+--------+-----+
| file1 | parama | 112 |
| file1 | paramb |  54 |
| file1 | paramd | 234 |
| file2 | paramb |  63 |
| file2 | paramd | 334 |
| file2 | paramz |  11 |
| file3 | parama |  34 |
+-------+--------+-----+

And I am trying to pivot it to look like this (header row doesnt matter much to me):

+--------+-------+-------+-------+
|        | File1 | File2 | File3 |
| parama | 112   | -     | 34    |
| paramb | 54    | 63    | -     |
| paramd | 234   | 334   | -     |
| paramz | -     | 11    | -     |
+--------+-------+-------+-------+

What am I doing wrong with the pandas package when pivoting a list of lists?

Please note that I've tried [this approach as well][2] ; however, I kept getting a syntax error here: enter image description here

Community
  • 1
  • 1
Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062

1 Answers1

2

In the first part, Shop should be the index, not Fruit, if you want the desired result to be like the one in the second part. Your code works well if this is the case. See below.

import pandas as pd

list_ = [['file.txt1', 'paramA', 'Y\n']
, ['file.txt1', 'paramB', '30\n']
, ['file.txt1', 'paramC', 'normal\n']
, ['file.txt1', 'paramD', 'SOME_ITEM_IN_ALL_CAPS\n']
, ['file.txt1', 'paramE', '5 6 7 8 9 \n']
, ['file.txt1', 'paramF', '/dir/to/stuff\n']
, ['file.txt1', 'paramG', 'y']
, ['New Text Document.txt1', 'paramA', 'Y3\n']
, ['New Text Document.txt1', 'pardamB', '305456\n']
, ['New Text Document.txt1', 'paframC', 'nordfggdfgmal\n']
, ['New Text Document.txt1', 'paramD', 'SOMEdgdfg_ITEM_IN_ALL_CAPS\n']
, ['New Text Document.txt1', 'paramE', '5 6 7 4448 9 \n']
, ['New Text Document.txt1', 'paramF', '/dir/7y7456456to/stuff\n']
, ['New Text Document.txt1', 'parfamG', 'y33333333333333333333333']
]

df = pd.DataFrame(list_, columns=['File','Param','Values'])
df_p = df.pivot(index='Param',columns='File',values='Values')
print df_p

Result:

File           New Text Document.txt1                file.txt1
Param                                                         
paframC               nordfggdfgmal\n                      NaN
paramA                           Y3\n                      Y\n
paramB                            NaN                     30\n
paramC                            NaN                 normal\n
paramD   SOMEdgdfg_ITEM_IN_ALL_CAPS\n  SOME_ITEM_IN_ALL_CAPS\n
paramE                5 6 7 4448 9 \n             5 6 7 8 9 \n
paramF       /dir/7y7456456to/stuff\n          /dir/to/stuff\n
paramG                            NaN                        y
pardamB                      305456\n                      NaN
parfamG      y33333333333333333333333                      NaN

Similar approach to the second part.

import pandas as pd

d = {'File': {0: 'file1',
  1: 'file1',
  2: 'file1',
  3: 'file2',
  4: 'file2',
  5: 'file2',
  6: 'file3'},
 'Param': {0: 'parama',
  1: 'paramb',
  2: 'paramd',
  3: 'paramb',
  4: 'paramd',
  5: 'paramz',
  6: 'parama'},
 'Values': {0: 112, 1: 54, 2: 234, 3: 63, 4: 334, 5: 11, 6: 34}}

df = pd.DataFrame.from_dict(d)
df_p = df.pivot(index='Param',columns='File',values='Values')

print df_p

Similar result as well:

File    file1  file2  file3
Param                      
parama    112    NaN     34
paramb     54     63    NaN
paramd    234    334    NaN
paramz    NaN     11    NaN
WGS
  • 13,969
  • 4
  • 48
  • 51