Easy parsing with reasonable error messages in OCaml's Angstrom
PARSER combinators are widely used in the world of functional programming, and OCaml's Angstrom library is one of them. It is used to implement many foundational parsers in the OCaml ecosystem, eg HTTP parsers for the httpaf stack. However, one of their bigger downsides is the lack of accurate parse error reporting. Let's take a look. Suppose you want to parse records of this format: 1 Bob ie an ID number followed by one or more spaces, followed by an alphabetic word (a name). Here's a basic Angstrom parser for this: open Angstrom type person = { id : int; name : string; } let sp = skip_many1 (char ' ') let word = take_while1 (function 'A' .. 'Z' |'a'..'z' -> true | _ -> false) let num = take_while1 (function '0'..'9' -> true | _ -> false) let person = let+ id = num and+ _ = sp and+ name = word in { id = int_of_string id; name } Let's try out various bad inputs and check the errors: # parse_string ~consume:Consume.All person "";; - : (person, string) result = Error ": count_while1" # parse_string ~consume:Consume.All person "1";; - : (person, string) result = Error ": not enough input" # parse_string ~consume:Consume.All person "1 ";; - : (person, string) result = Error ": count_while1" # parse_string ~consume:Consume.All person "1 1";; - : (person, string) result = Error ": count_while1" The error messages are not great, unfortunately! It's hard to tell what went wrong. Of course, in this case we know what caused each error because we are feeding small inputs to the parser. But it's easy to imagine that for larger inputs it may be difficult to understand why a parse is failing. Fortunately, parser combinator libraries usually provide a 'label' function to improve the error messages slightly. In Angstrom, a label works like this: parser "label string". Let's apply labels to our parser: let person = (let+ id = (num "expected a numeric ID eg [1] Bob") and+ _ = (sp "expected one or more spaces eg 1[ ]Bob") and+ name = (word "expected a name eg 1 [Bob]") in { id = int_of_string id; name }) "expected a person eg [1 Bob]" Now let's try out the same error scenarios as before: # parse_string ~consume:Consume.All person "";; - : (person, string) result = Error "expected a person eg [1 Bob] > expected a numeric ID eg [1] Bob: count_while1" # parse_string ~consume:Consume.All person "1";; - : (person, string) result = Error "expected a person eg [1 Bob] > expected one or more spaces eg 1[ ]Bob: not enough input" # parse_string ~consume:Consume.All person "1 ";; - : (person, string) result = Error "expected a person eg [1 Bob] > expected a name eg 1 [Bob]: count_while1" # parse_string ~consume:Consume.All person "1 1";; - : (person, string) result = Error "expected a person eg [1 Bob] > expected a name eg 1 [Bob]: count_while1" The error messages are much more helpful, thanks to the labels! They are stacked together along the same path as the component parsers which are composed together to build the higher-level person parser. And the examples give a hint as to what went wrong. Of course, this is not perfect, because it doesn't show the actual substring that failed to parse. But it seems like a nice improvement!

PARSER combinators are widely used in the world of functional programming, and OCaml's Angstrom library is one of them. It is used to implement many foundational parsers in the OCaml ecosystem, eg HTTP parsers for the httpaf stack.
However, one of their bigger downsides is the lack of accurate parse error reporting. Let's take a look. Suppose you want to parse records of this format: 1 Bob
ie an ID number followed by one or more spaces, followed by an alphabetic word (a name). Here's a basic Angstrom parser for this:
open Angstrom
type person = { id : int; name : string; }
let sp = skip_many1 (char ' ')
let word = take_while1 (function 'A' .. 'Z' |'a'..'z' -> true | _ -> false)
let num = take_while1 (function '0'..'9' -> true | _ -> false)
let person =
let+ id = num
and+ _ = sp
and+ name = word in
{ id = int_of_string id; name }
Let's try out various bad inputs and check the errors:
# parse_string ~consume:Consume.All person "";;
- : (person, string) result = Error ": count_while1"
# parse_string ~consume:Consume.All person "1";;
- : (person, string) result = Error ": not enough input"
# parse_string ~consume:Consume.All person "1 ";;
- : (person, string) result = Error ": count_while1"
# parse_string ~consume:Consume.All person "1 1";;
- : (person, string) result = Error ": count_while1"
The error messages are not great, unfortunately! It's hard to tell what went wrong. Of course, in this case we know what caused each error because we are feeding small inputs to the parser. But it's easy to imagine that for larger inputs it may be difficult to understand why a parse is failing.
Fortunately, parser combinator libraries usually provide a 'label' function to improve the error messages slightly. In Angstrom, a label works like this: parser > "label string"
.
Let's apply labels to our parser:
let person =
(let+ id = (num > "expected a numeric ID eg [1] Bob")
and+ _ = (sp > "expected one or more spaces eg 1[ ]Bob")
and+ name = (word > "expected a name eg 1 [Bob]") in
{ id = int_of_string id; name }) > "expected a person eg [1 Bob]"
Now let's try out the same error scenarios as before:
# parse_string ~consume:Consume.All person "";;
- : (person, string) result =
Error
"expected a person eg [1 Bob] > expected a numeric ID eg [1] Bob: count_while1"
# parse_string ~consume:Consume.All person "1";;
- : (person, string) result =
Error
"expected a person eg [1 Bob] > expected one or more spaces eg 1[ ]Bob: not enough input"
# parse_string ~consume:Consume.All person "1 ";;
- : (person, string) result =
Error "expected a person eg [1 Bob] > expected a name eg 1 [Bob]: count_while1"
# parse_string ~consume:Consume.All person "1 1";;
- : (person, string) result =
Error "expected a person eg [1 Bob] > expected a name eg 1 [Bob]: count_while1"
The error messages are much more helpful, thanks to the labels! They are stacked together along the same path as the component parsers which are composed together to build the higher-level person
parser. And the examples give a hint as to what went wrong.
Of course, this is not perfect, because it doesn't show the actual substring that failed to parse. But it seems like a nice improvement!