0

Consider a file of the following format containing n number of records & i number of columns. If data needs to be grouped by the first column, what is the efficient way to process it using Java? Let's say that n = 40 million.

The standard way could be using BufferedReader to loop through each line and organized the data in a map with Column A as the key. Is there a more efficient & optimal way?

A1~1~2
A1~2~5
A2~1~3
A1~3~4
....
....

The above file needs to be organized to Map of ColumnA & a POJO as below.

A1 [(1,2),(2,5),(3,4)]
A2 [(1,3)]
ram
  • 747
  • 2
  • 11
  • 34
  • 1
    Are you asking if there is way to read in this data without reading every line? What part of what you are trying to do do you think is ineffecient? You are not showing your code to read the file so its hard to tell what you are worried about. – thatidiotguy Oct 15 '15 at 19:13
  • If the final result is a `Map` containing all the records, a regular loop is just as efficient as using streams. – VGR Oct 15 '15 at 19:37
  • @thatidiotguy I'm asking if there is a way to parallelize the operation & speed it up. The code would be same as reading a file using BufferedReader the traditional way. I'll update the question shortly with a code example. – ram Oct 15 '15 at 21:21
  • You can't really read a single file with multiple threads ([see this question](http://stackoverflow.com/q/8809894/1743880)) (well you could but the bottleneck is the IO so it won't be any faster). – Tunaki Oct 16 '15 at 07:08
  • Either load the whole file in memory and work from there or use a database... – assylias Oct 16 '15 at 09:24

0 Answers0