An elegant way of reading multiple pandas DataFrames and assigning dataframes names in Python using Pandas

Question

Excuse my question, I know this is trivial but for some reasons I am not getting it right. Reading dataframes one by one is highly inefficient especially if you have a lot of dataframes you would like to read from. Remember DRY - DO NOT REPEAT YOURSELF

So here is my approach:

files = ["company.csv", "house.csv", "taxfile.csv", "reliablity.csv", "creditloan.csv", "medicalfunds.csv"]

DataFrameName =  ["company_df", "house_df", "taxfile_df", "reliablity_df", "creditloan_df", "medicalfunds_df"]

for file in files:
    for df in DataFrameName:
        df = pd.read_csv(file)

This only gives me df as one of the frames, I am not sure which of them but I guess the last one. How can I read through the csv files and store them with a dataframe names in the DataFrameName

My goal:

To have 6 dataframes loaded in the workspace spaced in the DataFrameName

For example company_df holds the data from "company.csv"

score 1 · Answer 1 · answered Apr 05 '20 at 19:10

You could set up

    DataFrameDic =  {"company":[], "house":[], "taxfile":[], "reliablity":[], "creditloan":[], "medicalfunds":[]}

    for key in DataFrameDic:
        DataFrameDic[key] = pd.read_csv(key+'.csv')

This should return a dictionary containing of dataframes.

score 1 · Answer 2 · answered Apr 05 '20 at 19:11

1

Something like this:

files = [
    "company.csv",
    "house.csv",
    "taxfile.csv",
    "reliablity.csv",
    "creditloan.csv",
    "medicalfunds.csv",
]

DataFrameName = [
    "company_df",
    "house_df",
    "taxfile_df",
    "reliablity_df",
    "creditloan_df",
    "medicalfunds_df",
]

dfs = {}

for name, file in zip(DataFrameName, files):
    dfs[name] = pd.read_csv(file)

zip lets you iterate two lists at the same time, so you can get both the name and the filename.

You'll end up with a dict of DataFrames

answered Apr 05 '20 at 19:11

Alex

6,610
3
20
38

1

Didn't know that zip feature - really like it! – greenPlant Apr 05 '20 at 19:17
Looks great but is it possible to have them as they are in DataFrameName like company_df not referenced like a dictionary – JA-pythonista Apr 05 '20 at 19:19
2

@PandasJ No, if you want to use the names in `DataFrameName` then you will have to set each variable manually. – Alex Apr 05 '20 at 19:20
@FrancisWebb it's super handy for these kind of things! – Alex Apr 05 '20 at 19:21
Okay great! Thanks – JA-pythonista Apr 05 '20 at 19:21
@PandasJ take a look at the answers here for dynamically assigning variable names in python, dictionary seems to be the clearest and easiest way https://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables-via-a-while-loop (Especially given you know your variable names from the get go!) – greenPlant Apr 05 '20 at 19:30

score 0 · Answer 3 · edited Apr 12 '20 at 14:47

0

Dictionary are the way, since you can name their content dynamically.

names = ["company", "house", "taxfile", "reliablity", "creditloan", "medicalfunds"]
dataframes = {}
for name in names:
    dataframes[f"{name}_df"] = pd.read_csv(f"{name}.csv")

The fact that you have a nice naming convention allows us to append easily the _df or .csv part to the name when needed.

edited Apr 12 '20 at 14:47

marc_s

732,580
175
1,330
1,459

answered Apr 05 '20 at 19:14

Guimoute

4,407
3
12
28

Umar.H · Answer 4 · 2020-04-05T19:48:23.107

using pathlib we can create a generator expression then create a dictionary with the file name as the name and the value as the dataframe.

with pathlib we can use the .glob module to grab all the csv's in a target path.

replace "\tmp\files" with the path to your files, if your using windows use a raw string or escape the slashes.

from pathlib import Path
trg_files = (f for f in Path("\tmp\files").glob("*.csv"))

dataframe_dict = {f"{file.stem}_df": pd.read_csv(file) for file in trg_files}

print(dataframe_dict.keys())

'company_df'

print(datarame_dict['company_df'])

An elegant way of reading multiple pandas DataFrames and assigning dataframes names in Python using Pandas

4 Answers4