2

I'm trying to understand what is the effect that filter has in the shrink tree of a generator when using hedgehog integrated shrinking.

Consider the following function:

{-# LANGUAGE OverloadedStrings #-}

import Hedgehog
import qualified Hedgehog.Gen as Gen

aFilteredchar:: Gen Char
aFilteredchar =
  Gen.filter (`elem` ("x" :: String)) (Gen.element "yx")

When a print the shrink tree:

>>>  Gen.printTree aFilteredchar

I'd get shrink trees that look as follow:

'x'
 └╼'x'
    └╼'x'
       └╼'x'
               ...

                   └╼<discard>

this is, a very deep tree containing only x's, and a discard at the end.

Why does the shrink function keeps on returning x's, instead of an empty list, which signals that there are no further shrinks possible?

Impredicative
  • 5,039
  • 1
  • 17
  • 43
Damian Nadales
  • 4,907
  • 1
  • 21
  • 34

2 Answers2

2

Gen is essentially a composition of a probability monad and a tree monad, and the behavior you observe mostly arises from the tree monad and the definition of Gen.filter.

Basically, Gen.filter p g is a simple monadic loop, try 0 where:

-- simplified body of filter
try k =
  if k > 100 then
    discard  -- empty tree
  else do
    x <- g
    if p x then
      pure x  -- singleton tree
    else
      try (k + 1)  -- keep looping

So to understand the tree you got, you must understand the tree monad under the do notation here.

The tree monad

The Tree type in hedgehog that is internally used by Gen looks roughly like this (if you are looking at the linked implementation in hedgehog, set m ~ Maybe):

data Tree a = Empty | Node a [Tree a]  -- node label and children

There are many other Tree-like types that are monads, and the monadic bind (>>=) generally takes the form of a tree substitution.

Say you have a tree t = Node x [t1, t2, ...] :: Tree a, and a continuation/substitution k :: a -> Tree b, which replaces every node/variable x :: a with the tree k x :: Tree b. We can describe t >>= k in two steps, fmap then join, as follows. First the fmap applies the substitution on every node label. So we obtain a tree where every node is labeled by another tree. For concreteness, say k x = Node y [u1, u2, ...]:

fmap k t
=
Node
  (k x)                        -- node label
  [fmap k t1, fmap k t2, ...]  -- node children
=
Node
  (Node y [u1, u2, ...])       -- node label
  [fmap k t1, fmap k t2, ...]  -- node children

Then the join step flattens the nested tree structure, concatenating the children from inside the label with those outside:

t >>= k
=
join (fmap k t)
=
Node
  y
  ([join (fmap k t1), join (fmap k t2), ...] ++ [u1, u2, ...])

To complete the Monad instance, note that we have pure x = Node x [].

The try loop

Now that we have some intuition for the tree monad we can turn to your particular generator. We want to evaluate try k above, where p = (== 'x') and g = elements "yx". I'm waving my hands here, but you should imagine that g evaluates randomly to either the tree Node 'y' [] (generate 'y' with no shrinkings), aka. pure 'y', or Node 'x' [Node 'y' []] (generate 'x' and shrink to 'y'; indeed, "elements shrinks to the left"), and that every occurence of g is independent from others, so we get a different result when we retry.

Let's examine each case separately. What happens if g = pure 'y'? Assume k <= 100 so we're in the else branch of the toplevel if, already simplified away below:

-- simplified body of filter
try k = do
    c <- pure 'y'     -- g = pure 'y'
    if c == 'x' then  -- p c = (c == 'x')
      pure c
    else
      try (k + 1)

-- since   (do c <- pure 'y' ; s c) = s 'y'  (monad law)   and   ('y' == 'x') = False

try k = try (k + 1)

So all the times where g evaluates to pure 'y' end up simplified away as the recursive term try (k + 1), and we are left with the cases where g evaluates to the other tree Node 'x' [Node 'y' []]:

try k = do
  c <- Node 'x' [Node 'y' []]  -- g
  if c == 'x' then
    pure c
  else
    try (k + 1)

As illustrated in the previous section, the monadic bind is equivalent to the following, and we finish with some equational reasoning.

try k = join (Node (s 'x') [Node (s 'y') []])
  where
    s c = if c == 'x' then pure c else try (k + 1)

try k = join (Node (pure 'x') [Node (try (k + 1)) []])
try k = join (Node (pure 'x') [pure (try (k + 1))]  -- simplifying join
try k = Node 'x' [join (pure (try (k + 1)))]        -- join . pure = id
try k = Node 'x' [try (k + 1)]

In summary, starting from try 0, with half probability try k = try (k + 1), and with the other half try k = Node 'x' [try (k + 1)], finally we stop at try 100. This explains the tree you observe.

try 0 = Node 'x' [Node 'x' [ ... ]]  -- about 50 nodes

(I believe this also provides at least a partial answer to your other question, since this shows how shrinking a Gen.filter often amounts to rerunning the generator from scratch.)

Li-yao Xia
  • 31,896
  • 2
  • 33
  • 56
  • I really appreciate the time you took to write such a detailed (and helpful answer)! This seems to clarify why if I would change `Gen.element "yx"` to `Gen.element "xy"` I will most of the times get a singleton tree (however I'm still puzzled as to why I don't *always* get a singleton tree, but that's probably a different question). – Damian Nadales Jan 29 '19 at 12:28
1

While Li-yao Xia's detailed answer correctly describes how this happens, it doesn't address the why; why does it re-run the generator after each shrink? The answer is that it shouldn't; this is a bug. See bug report Improve Filter on GitHub.

edsko
  • 1,628
  • 12
  • 19
  • 2
    A fix has been merged https://github.com/hedgehogqa/haskell-hedgehog/pull/282, it will be in the next Hackage release – Jacob Stanley Apr 30 '19 at 00:17