1

I'd like to build active patterns with type Token list -> Ast * Token list.

I have a simple Pascal grammar <program> ::= "program" <id> ";" <block> ".". By remodelling code to this grammar, I have to make Program, Identifier and Block patterns like:

let rec (|Program|_|) tokens =
    match tokens with
    | Token.Program :: tokens ->
        match tokens with
        | Identifier (identifier, tokens) ->
            match tokens with
            | Semicolon :: tokens ->
                match tokens with
                | Block (block, tokens) ->
                    match tokens with
                    | Dot :: tokens -> Some (Ast.Program (identifier, block), tokens)
                    | _ -> failwithf "Expected %A" Dot
                | _ -> failwith "Expected Block"
            | _ -> failwithf "Expected %A" Semicolon
        | _ -> failwith "Expected Identifier"
    | _ -> failwithf "Expected %A" Token.Program

and (|Identifier|_|) tokens =
    match tokens with
    | Token.Identifier identifier :: tokens ->
        Some (Ast.Identifier identifier, tokens)
    | _ -> failwith "Expected Identifier"
...

Because there are many duplicated Option None, I tried to shorten the pattern to this:

type rec (|Program|_|) tokens =
    match tokens with
    | Token.Program :: Identifier (ident, Semicolon :: Block (block, Dot :: rest)) ->
        Some (Ast.Program (identifier, block), rest)
    | _ -> None

Could the new pattern work as expected? And how can I make the active patterns shorter, but somehow also return errors on bad input?

Also, I read that FParsec use monad to parse and retain the errors, how could I make a similar simple computing expression like FParsec.

MiP
  • 5,846
  • 3
  • 26
  • 41
  • For writing computation expressions, https://fsharpforfunandprofit.com/series/computation-expressions.html is your best guide. – rmunn May 15 '17 at 09:27
  • Also, your AST design looks a bit strange to me. Why do the `Identifier` and `Block` union cases absorb the following tokens "inside" them, but the `Program` token does not? I'd expect to see something more like `match tokens with Identifier ident :: tokens`, i.e. the `Identifier` case only "absorbs" the name of the identifier. It doesn't make sense for all the following tokens to be "inside" the `Identifier` case. I don't know if I'm communicating clearly what I'm trying to say; do you understand what I mean, or should I try to rephrase it? – rmunn May 15 '17 at 09:35
  • @rmunn All active patterns absorb tokens and return the remaining ones. Because I read from http://stackoverflow.com/a/4418823/7821462 they should have type `Token list -> Ast * Token list`. – MiP May 15 '17 at 09:41
  • Your `Identifier` token breaks that rule, though. Whatever function you have that's creating `Identifier` should have a signature like `Token list -> Identifier * Token list`, but instead its signature is `Token list -> Identifier`. The rest of the tokens have been absorbed inside the `Identifier` token's second half, which is not how it's supposed to go. – rmunn May 15 '17 at 09:48
  • @rmunn Perhaps you should check the signature again, I added the `Identifier` pattern for clear visual, it returns `Ast.Identifier * Token list`. – MiP May 15 '17 at 13:40
  • Okay, now I see what you're doing. You have an `Identifier` active pattern, and a `Token.Identifier` DU case, and I was confusing the former with the latter. – rmunn May 15 '17 at 13:42
  • I feel like the active-pattern approach you're going for is full of unnecessary complexity. Personally, I'd recommend working through the [FParsec tutorial](http://www.quanttec.com/fparsec/tutorial.html), then trying to build small parts of your Pascal parser in FParsec instead of via active patterns. And start *small* and build up from there, like writing the definition to parse a string, or the parameter list of a function (a series of identifiers, separated by commas and surrounded by parentheses). I feel like you might get further with FParsec than with active patterns. – rmunn May 15 '17 at 13:58

1 Answers1

1

Easy way is just to flatten everything:

let rec private (|Program|_|) tokens =
    match tokens with
    | Token.Program ::Identifier (identifier, Semicolon :: Block (block,Dot :: tokens) ) ->Some (Ast.Program (identifier, block), tokens)
    | Token.Program ::Identifier (identifier, Semicolon :: Block (block,_) ) ->failwith "Expected %A" Dot 
    | Token.Program ::Identifier (identifier, Semicolon ::_ ) ->failwith "Expected %A" Block
....
John Palmer
  • 25,356
  • 3
  • 48
  • 67
  • Do you know how to sprintf the Identifier or Block ones? e.g. when I have `type Token = | Identifier of string`, I'd like to announce "Expected Identifier", and `failwithf "Expected %A" (Identifier "")` won't work as intended. – MiP May 15 '17 at 09:45
  • Can't think of anything off hand – John Palmer May 15 '17 at 09:48