2

Excuse my question, I know this is trivial but for some reasons I am not getting it right. Reading dataframes one by one is highly inefficient especially if you have a lot of dataframes you would like to read from. Remember DRY - DO NOT REPEAT YOURSELF

So here is my approach:

files = ["company.csv", "house.csv", "taxfile.csv", "reliablity.csv", "creditloan.csv", "medicalfunds.csv"]

DataFrameName =  ["company_df", "house_df", "taxfile_df", "reliablity_df", "creditloan_df", "medicalfunds_df"]

for file in files:
    for df in DataFrameName:
        df = pd.read_csv(file)

This only gives me df as one of the frames, I am not sure which of them but I guess the last one. How can I read through the csv files and store them with a dataframe names in the DataFrameName

My goal:

To have 6 dataframes loaded in the workspace spaced in the DataFrameName

For example company_df holds the data from "company.csv"

JA-pythonista
  • 1,225
  • 1
  • 21
  • 44

4 Answers4

1

You could set up

    DataFrameDic =  {"company":[], "house":[], "taxfile":[], "reliablity":[], "creditloan":[], "medicalfunds":[]}

    for key in DataFrameDic:
        DataFrameDic[key] = pd.read_csv(key+'.csv')

This should return a dictionary containing of dataframes.

greenPlant
  • 482
  • 4
  • 16
1

Something like this:

files = [
    "company.csv",
    "house.csv",
    "taxfile.csv",
    "reliablity.csv",
    "creditloan.csv",
    "medicalfunds.csv",
]

DataFrameName = [
    "company_df",
    "house_df",
    "taxfile_df",
    "reliablity_df",
    "creditloan_df",
    "medicalfunds_df",
]

dfs = {}

for name, file in zip(DataFrameName, files):
    dfs[name] = pd.read_csv(file)

zip lets you iterate two lists at the same time, so you can get both the name and the filename.

You'll end up with a dict of DataFrames

Alex
  • 6,610
  • 3
  • 20
  • 38
  • 1
    Didn't know that zip feature - really like it! – greenPlant Apr 05 '20 at 19:17
  • Looks great but is it possible to have them as they are in DataFrameName like company_df not referenced like a dictionary – JA-pythonista Apr 05 '20 at 19:19
  • 2
    @PandasJ No, if you want to use the names in `DataFrameName` then you will have to set each variable manually. – Alex Apr 05 '20 at 19:20
  • @FrancisWebb it's super handy for these kind of things! – Alex Apr 05 '20 at 19:21
  • Okay great! Thanks – JA-pythonista Apr 05 '20 at 19:21
  • @PandasJ take a look at the answers here for dynamically assigning variable names in python, dictionary seems to be the clearest and easiest way https://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables-via-a-while-loop (Especially given you know your variable names from the get go!) – greenPlant Apr 05 '20 at 19:30
0

Dictionary are the way, since you can name their content dynamically.

names = ["company", "house", "taxfile", "reliablity", "creditloan", "medicalfunds"]
dataframes = {}
for name in names:
    dataframes[f"{name}_df"] = pd.read_csv(f"{name}.csv")

The fact that you have a nice naming convention allows us to append easily the _df or .csv part to the name when needed.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Guimoute
  • 4,407
  • 3
  • 12
  • 28
0

using pathlib we can create a generator expression then create a dictionary with the file name as the name and the value as the dataframe.

with pathlib we can use the .glob module to grab all the csv's in a target path.

replace "\tmp\files" with the path to your files, if your using windows use a raw string or escape the slashes.

from pathlib import Path
trg_files = (f for f in Path("\tmp\files").glob("*.csv"))

dataframe_dict = {f"{file.stem}_df": pd.read_csv(file) for file in trg_files}

print(dataframe_dict.keys())

'company_df'

print(datarame_dict['company_df'])
Umar.H
  • 22,559
  • 7
  • 39
  • 74