How can I improve performance of this minmax implementation?

Question

Here is my current code. It's a naive implementation.

import System.Environment (getArgs)

minmax [] = Nothing
minmax [x] = Just (x, x)
minmax (a:b:xs) = Just $ minmax' xs $ sort a b
    where minmax' [] lohi = lohi
          minmax' [x] lohi@(lo, hi)
            | x < lo = (x, hi)
            | x > hi = (lo, x)
            | otherwise = lohi
          minmax' (x:y:xs) (lo, hi) = minmax' xs $ newlo `seq` newhi `seq` (newlo, newhi)
              where (lo', hi') = sort x y
                    newlo = if lo' < lo then lo' else lo
                    newhi = if hi' > hi then hi' else hi

sort x y = if x < y then (x, y) else (y, x)

main = do
    args <- getArgs
    let [from, till] = map read args :: [Int]
    print $ minmax [from..till]

I want to see how far I can go with haskell compared to c++ rust and maybe using three approaches in all languages. Naive, naive parallel and using matrices (repa for Haskell and eigen for C++)

From this profiler report I see that Haskell allocates a lot of memory.

   minmax +RTS -p -RTS 1 10000000

total time  =        0.13 secs   (132 ticks @ 1000 us, 1 processor)
total alloc = 800,062,584 bytes  (excludes profiling overheads)

I've compiled with ghc -O2 -fllvm (ghc 8.2.1). From my point of view naive implementation shouldn't allocate much memory because it should just iterate a list in pairs. How to achieve that, is there anything else I can do to improve performance?

I wanted to try stream-fusion package, but it doesn't support base 4.10. Is there workaround for this? Couldn't find anything about this problem.

Give a monomorphic type signature to `minmax`, and possibly `sort` as well. I can't see if on `minmax'` it matters (but it can't hurt, I guess). Passing `lohi` as two arguments could also avoid the boxing. — chi, Sep 05 '17 at 21:47
If memory usage is a concern, You should begin by [heap profiling](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#profiling-memory-usage) your program. Note that total allocation isn't necessarily a useful metric - you may be more interested in maximum residency. Laziness allows you to examine only two elements at a time, so this algorithm should run in constant space - this can mean high allocation, but very low residency. — user2407038, Sep 05 '17 at 21:56
@user2407038 Oh, thanks for explaining. Still I'm interested in wether this can be written in a way where it would reuse memory and wouldn't allocate much. — user1685095, Sep 06 '17 at 07:34
@chi How would I pass lohi as two arguments if they returned as a tuple by sort? Not sure how to do that with Haskell. — user1685095, Sep 06 '17 at 08:02
@chi I did what you suggested, now my machine just hangs. Probably too much memory is used. Could you provide code which works better then this one? — user1685095, Sep 06 '17 at 08:26
I tried my suggestions but I got no real improvement -- GHC's optimizer might already be performing them. — chi, Sep 06 '17 at 18:28

Aadit M Shah · Answer 1 · 2017-09-05T22:12:09.620

3

Finding the minimum and maximum can be done in linear time and constant space as follows:

sort :: (Int, Int) -> Int -> (Int, Int)
sort (min, max) val
    | val < min = (val, max)
    | val > max = (min, val)
    | otherwise = (min, max)

minmax :: [Int] -> Maybe (Int, Int)
minmax []     = Nothing
minmax (x:xs) = Just $ foldl sort (x, x) xs

Haskell's strictness analyzer should figure out that this code must be strict and optimize it.

edited Sep 05 '17 at 22:12

answered Sep 05 '17 at 21:58

Aadit M Shah

72,912
30
168
299

This requires `2n` comparisons. Algorithm I use requires `3/2n`. I appreciate the effort, but I want to optimize the most optimal algorithm I know, not the trivial one. – user1685095 Sep 06 '17 at 07:28
Actually, this only requires `2n` comparisons in the worst case. In the best case, it only requires `n` comparisons and in the average case it requires `1.5n` comparisons. On the other hand, your algorithm always requires `1.5n` comparisons. I wouldn't worry about performance too much because whether you use the trivial algorithm or the algorithm that always performs `1.5n` comparisons, your asymptotic running time is always linear. The difference between `2n` comparisons, `1.5n` comparisons and `n` comparisons is negligible. – Aadit M Shah Sep 06 '17 at 16:04
You seem to be missing the point. The point of this question is to optimize this algorithm (and I stated it clearly in the title). You right about average case of your implementation though. But you're wrong about the difference being negligible. In reality it depends. If it wouldn't, everybody would still use `quicksort` instead of more complicated algorithms. – user1685095 Sep 07 '17 at 18:56

How can I improve performance of this minmax implementation?

1 Answers1