I have a non-recursive function to calculate longest common subsequence that seems to perform well (ghc 7.6.1
, compiled with -O2 -fllvm
flags) if I measure it with Criterion
in the same module. On the other hand, if I convert the function into a module, export just that function (as recommended here), and then measure again with Criterion, I get ~2x slowdown (which goes away if I move the criterion test back to the module where function is defined). I tried marking the function with INLINE
pragma which didn't make any difference in cross-module performance measurements.
It seems to me that GHC might be doing a strictness analysis that works well when the function and the main (from which the function is reachable) are in the same module, but not when they are split across. I would appreciate pointers on how to modularize the function so that it performs well when called from other modules. The code in question is too big to paste here - you can see it here if you want to try it out. A small example of what I am trying to do is below (with snippets of code):
-- Function to find longest common subsequence given unboxed vectors a and b
-- It returns indices of LCS in a and b
lcs :: (U.Unbox a, Eq a) => Vector a -> Vector a -> (Vector Int,Vector Int)
lcs a b | (U.length a > U.length b) = lcsh b a True
| otherwise = lcsh a b False
-- This section below measures performance of lcs function - if I move it to
-- a different module, performance degrades ~2x - mean goes from ~1.25us to ~2.4us
-- on my test machine
{--
config :: Config
config = defaultConfig { cfgSamples = ljust 100 }
a = U.fromList ['a'..'j'] :: Vector Char
b = U.fromList ['a'..'k'] :: Vector Char
suite :: [Benchmark]
suite = [
bench "lcs 10" $ whnf (lcs a) b
]
main :: IO()
main = defaultMainWith config (return ()) suite
--}