0

I have some large csv file, with about 200 header names (the first one of which is empty). I want to get some chosen columns and copy them to a new output.csv file. My problem comes grabbing the header which has no name! (empty first element in the header)

So the input.csv looks something like,

            ,header1,header2,header3,header4, ... , header200
            value0, value2, value2, value3, value4, ..., value200
            ,2,3,30,,, ... , 10
            66,2,3,30,, ... , 10

etc (all rows have the same number of elements even if empty).

After reading various questions I've recycled some code from write CSV columns out in a different order in Python

to write,

import csv
from operator import itemgetter         

SelectedSignals = ['header1',  'header4'] 



fiin=open('input.csv','rb') #open to read "r" in binary mode "b"
fiout=open('output.csv','wb') #open to write "w" in binary mode "b"

reader = csv.reader(fiin, delimiter=',')
writer = csv.writer(fiout, delimiter=',')

AllSignalNames = reader.next()
name2index = dict((name, index) for index, name in enumerate(AllSignalNames))
writeindices = [name2index[name] for name in SelectedSignals]
reorderfunc = itemgetter(*writeindices) # itemgetter was imported from operator module
writer.writerow(SelectedSignals)

for row in reader:
    writer.writerow(reorderfunc(row))

this gives the desired output, say,

            ,header1,header4
            value0, value4
            ,30
            66,30

but the problem is doing,

  SelectedSignals = [' ', 'header1',  'header4'] 

to grab the first column. which returns KeyError

I'm a python beginner, so any hints are appreciated.

Community
  • 1
  • 1
Massagran
  • 1,781
  • 1
  • 20
  • 29

1 Answers1

1

In the CSV format, the first header should be a zero-length string (''), not a space (' '), which is what you use in SelectedSignals.

You could also add a fake column name to your name2index dict, for example name2index['header0'] = 0 just after name2index = ... and then use 'header0' in SelectedSignals.

Alternatively, you could use a default value for the dict (when it can't find the header you want, it would use this default value): name2index.get(name, 0) instead of name2index[name] in your writeindices expression.

Bruno
  • 119,590
  • 31
  • 270
  • 376