8

I want to get top n records using unix command:

e.g. input:

  • 1 a
  • 2 b
  • 3 c
  • 4 d
  • 5 e

output(get top 3):

  • 5 e
  • 4 d
  • 3 c

Current I am doing:

cat myfile.txt | sort -k1nr | head -3 > my_output.txt

It works fine but when the file gets large, it becomes very slow.

It is slow because it sorts the file completely, while what I need is just the top 3 records.

Is there any command I can use to get the top 3 records?

user3110379
  • 159
  • 1
  • 2
  • 8
  • See the following for a good answer: http://stackoverflow.com/questions/7074430/how-do-we-sort-faster-using-unix-sort Unix `sort`, is by far, not the fastest way to sort large files. If your input is that big, you need to look at a different approach. That linked SO post should help. – David Atchley Jun 17 '14 at 04:57
  • 2
    Mostly agree about the general case. However, given a fixed N, you could to this in a dedicated program in single pass over the input keeping the top-N seen. A priority queue might be handy. With cheap look at the lowest entry and a count of entries, for each record, if record value > lowest, insert; if count >= limit, delete lowest. – dbrower Jul 11 '14 at 22:11

2 Answers2

2
perl -ane '
    BEGIN {@top = ([-1]) x 3} 
    if ($F[0] > $top[0][0]) {
        @top = sort {$a->[0] <=> $b->[0]} @top[1,2], [$F[0], $_];
    } 
    END {print for reverse map {$_->[1]} @top}
' << END_DATA
1 a
2 b
3 c
4 d
5 e
END_DATA
5 e
4 d
3 c
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
-2

Have you tried changing the order of your command?

Like this.

sort -k1nr myfile.txt | head -3 > my_output.txt

MPH426
  • 49
  • 2
  • 1
    This won't make any difference in the work `sort` has to do, which is the real problem. – chepner Aug 11 '14 at 20:08
  • 2
    It will speed up execution on larger files. Anytime you can eliminate a process, you gain speed. Allowing sort to work directly on the file without having to read it first works. Shaves a full minute off sorting a 2GB file. Granted it took 9 minutes running through cat first, on a 10 year old computer. ;) – MPH426 Aug 12 '14 at 15:24