I have a very simple code which parses a JSON file. The file contains each line as a JSON object. For some reason, the processing time for each row increases as I run the code.
Can someone explain to me why this is happening and how to stop this?
Here is the code snippet:
from ast import literal_eval as le
import re
import string
from pandas import DataFrame
import pandas
import time
f = open('file.json')
df = DataFrame(columns=(column_names))
row_num = 0
while True:
t = time.time()
for line in f:
line = line.strip()
d = le(line)
df.loc[row_num] = [d[column_name1], d[column_name2]]
row_num+=1
if(row_num%5000 == 0):
print row_num, 'done', time.time() - t
break
df.to_csv('MetaAnalysis', encoding='utf-8')
Part of the output is as follows:
5000 done 11.4549999237
10000 done 16.5380001068
15000 done 24.2339999676
20000 done 36.3680000305
25000 done 50.0610001087
30000 done 57.0130000114
35000 done 65.9800000191
40000 done 74.4649999142
As visible the time is increasing for each row.