Today we will explore how to build a small parser combinator library in Haskell from scratch. This blog post is the result of an experiment to see if I could actually implement this by only looking at the base and text documentation, explicitly without looking at other parser implementations or examples.
I think most other Haskell parser examples will work on Strings, but since String come with a lot of downsides I will try to run our parser on Text from the text package and see where that gets me. Thus, our parser will take a Text and return the leftover unparsed Text with a parsed result a or a parse error:
To go on, we should implement useful type classes such as Functor, Applicative, Alternative and Monad. This will give us lots of combinators for free.
Now let’s define a Functor instance for our Parser. The complete minimal definition is fmap :: (a -> b) -> f a -> f b (apply a function a -> b to the “contained” value of f):
Next up is the Applicative instance which requires the definitions of pure :: a -> f a to lift a value a into the f “universe” (in our case Parser), and (<*>) :: f (a -> b) -> f a -> f b to sequentially apply two parsers and combine the result.
The <*> implementation is a little bit more interesting, so here is what it does in words: first, we run the left hand parser to receive the function (a -> b) which we must then apply to the value of the second parser. Next, we either propagate failure, or we use our previously defined Functor instance to convert the second parser Parser a to a Parser b and run that on the left over of the left hand parser.
After defining Applicative we will also implement it’s close friend and very handy Alternative:
empty is just a failing Parser that does nothing - this is because we can not invent an arbitrary a. <|> will first try to run the left hand side parser, and if that succeeds then it will return the result. Otherwise, the right hand parser is run.
Now it’s time for becoming a Monad!
With all those abstract concepts implemented we are ready to write concrete parsers. Let’s start out by writing a parser that reads input until the predicate on each subsequent character fails:
This is very simple because the text library already provides a function span :: (Char -> Bool) -> Text -> (Text, Text) that essentially does the heavy lifting efficiently for us. We also want a satisfy1 function, that requires that we read at least one character:
This combination gives us skileWhile and skipWhile1 for free:
Now we’ll write a parser for a specific single character Char and a whole string T.Text.
To implement parsers for Int and Double we will cheat a little and use the read :: Read a => String -> a function from base. Usually I’d go for readMay :: Read a => String -> Maybe a from the safe package, but thanks to our already defined parser combinators we can be quite sure that our read will not crash at runtime:
Now is probably a good time to define some unit tests for our parsers. We use the excellent HTF package for this:
Great, our basic building blocks seem to be working! As you can see the error messages our parsers produce are not quite useful (yet?), but this might be material for a possible blog post in the near future.
Now let’s write a parser for this simple data file:
We would like to parse it into the following Haskell data structures:
We start off by writing parsers for the building blocks, with tests:
and combining them to parse a single row:
To write a parser for the whole file, we need to introduce two new parser combinators. sepBy will be used to parse values separated by a separator:
and endOfInput is a combinator to check if we consumed all input:
Putting it all together, our file parser (with test of course!) will look like this:
Success! Now this is just the beginning of a parser combinator library, there are still many areas to be explored such as nicer error messages, backtracking, performance concerns and of course more combinators! You should probably use one of the awesome parser combinator libraries out there that address these issues:
attoparsec: Addresses backtracking and performance concerns
parsec: Addresses error messages and backtracking (try operator)