I want to find list of columns present in more than 500 csv files. Read only columns & put in dataframe or list and later I want to export to csv.
Asked
Active
Viewed 3,652 times
1 Answers
2
pandas
can easily extract only certain columns and write out a new CSV file for you:
import pandas as pd
data = pd.read_csv('input.csv', usecols=[1, 3], index_col=1)
data.to_csv('output.csv')
If you wanted to extract the same two columns from multiple files into a single output, you could do the following:
import pandas as pd
import glob
combined_data = pd.DataFrame()
for csv_file in glob.glob('*.csv'):
data = pd.read_csv(csv_file, usecols=[1, 3], index_col=1, skiprows=1, header=None, names=['Col1', 'Col2'])
combined_data = combined_data.append(data)
combined_data.to_csv('output.csv')
This might need tweaking depending on whether or not your files have headers and the same column names.
To extract the headers from all of the files (without loading the whole file), you could simply use the CSV library as follows:
import csv
import glob
with open('headers.csv', 'w', newline='') as f_headers:
csv_headers = csv.writer(f_headers)
for csv_file in glob.glob('input*.csv'):
with open(csv_file, 'r', newline='') as f_csv:
csv_headers.writerow([csv_file] + next(csv.reader(f_csv)))
This would also put the filename as the first entry on each row.

Martin Evans
- 45,791
- 17
- 81
- 97
-
thanks, but I wanted column name, this also helps for my other requirements. – Learnings Sep 23 '17 at 12:52
-
So in effect you are trying to create a CSV file containing all of the header rows from your 500 files? – Martin Evans Sep 23 '17 at 13:08
-
I just want to know columns present in all the 500 csv files – Learnings Sep 23 '17 at 13:10
-
I have added a simple CSV reader which should help. It adds the filename as the first entry on each row. – Martin Evans Sep 23 '17 at 13:14
-
thanks this is I wanted – Learnings Sep 23 '17 at 13:17