2

I have huge files in c# (more than 300 MB). I need effiecent way to read the file line by line because once I try to read it it takes more than 30 minutes while the target time is around 3 minutes. I tried File.ReadAllBytes which reads the files successfully and very fast and load it to the string. But after that it takes very long time to process the string line by line. Is there better way or faster way to do so.

Thank in advance.

Kamil Budziewski
  • 22,699
  • 14
  • 85
  • 105
Abo Ahmed
  • 93
  • 2
  • 10
  • 2
    It sounds like your problem isn't the loading, but the processing. You didn't post the relevant code. – CodesInChaos Nov 04 '15 at 07:03
  • 2
    What kind of a process we are talking about? – Kemal Kefeli Nov 04 '15 at 07:04
  • 2
    File.ReadLines (suggested by wudzik) is definitely a Good Idea. But "300MB" really isn't that big. Q: How many lines are in this 300MB file? More importantly: Q: exactly what are you doing in the loop for each line? Q: Is there any way to optimize the "per line" processing? – paulsm4 Nov 04 '15 at 07:05
  • Based on current information your question is duplicate of many "read huge file line by line". Note that it is unlikely that your actual problem directly related to reading from file as reading 300MB from local disk should take seconds, not minutes - providing some information on what you actually doing with strings (possibly in separate question after you've profiled your code to provide necessary information). – Alexei Levenkov Nov 04 '15 at 07:13

1 Answers1

13

You can use File.ReadLines, it will enumerate through lines of file:

var lines = File.ReadLines(path);

foreach(var line in lines)
{
    // do your logic here
}

It will not load file at the first line. It will load it while looping through lines, so it's better way to read bigger files, than loading it at once.

MSDN says in description of File.ReadLines

Remarks The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient.

Kamil Budziewski
  • 22,699
  • 14
  • 85
  • 105
  • 2
    Addition info from msdn: > When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient. – Nikhil Vartak Nov 04 '15 at 07:04
  • @dotnetkid I was just adding a quote from MSDN :) – Kamil Budziewski Nov 04 '15 at 07:05