I have some huge csv files (hundreds of megabytes). From this post here Why reading rows is faster than reading columns? it seems that storing and reading csv files by rows is more cache efficient and would be 30 times faster than using columns. However, when I tried this the file stored by row is actually a lot slower:
t = get_ms()
i = None
cols = csv.reader(open(col_csv, "r"))
for c in cols:
for e in c:
i = e
s = get_ms()
print("open cols file takes : " + str(s - t))
t = get_ms()
rows = csv.reader(open(row_csv, "r"))
i = None
for r in rows:
for e in r:
i = e
s = get_ms()
print("open rows file takes : " + str(s - t))
output:
open cols file takes : 13698
open rows file takes : 14971
Is this problem specific to python? I know that in C++ wide tables are usually faster than long tables but I'm not sure if it's the same thing in python.
edit : typo