Pandas read_csv giving a larger dataframe than the one I wrote it with using pd.to_csv()

Asked Apr 01 '20 at 19:18

Active Apr 02 '20 at 10:36

Viewed 133 times

I have a dataframe in one file I am writing to a csv using pd.to_csv() and then in another file I am opening that csv into a dataframe using pd.read_csv() but they are different sizes? The number of rows it the dataframe is larger when I am reading the csv back in the other file. Any thoughts on what this could be?

So here the snippet of code in question. I cleared the csv before running again and the shape of the df before writing to the csv was (32636, 28). Opening it up the shape is (68450, 29).

I am only running this on 2 files currently.

writing csv code

edit:

The json is twitter api data if that helps.

edited Apr 02 '20 at 10:36

Burhan Ali

2,258
1
28
38

asked Apr 01 '20 at 19:18

Payton Weatherspoon

you are probably re-writing to the file. first delete the file. then run your code – Aven Desta Apr 01 '20 at 19:20
1

What is the size difference? Is it one row? or is it double the number of rows? Is it duplicating the header row, duplicating all the rows? Some other behavior? Please provide a more detailed description of the problem; a [mcve] would help – G. Anderson Apr 01 '20 at 19:23
You're appending to the file. – AMC Apr 01 '20 at 20:16
[You should not post code as an image because...](https://meta.stackoverflow.com/a/285557/1422451). – Parfait Apr 01 '20 at 22:34
Also [never call `DataFrame.append` or `pd.concat` inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/a/36489724/1422451) – Parfait Apr 01 '20 at 22:34

Pandas read_csv giving a larger dataframe than the one I wrote it with using pd.to_csv()

0 Answers0