Haskell Project: Intro to Parsers

Continuing with the todo.txt haskell project, this post will introduce parsers.

Previous project post: [Unit Testing with Hspec][1].

Prep Work

This post is going to be a little more "theory" and explanation but I know I always like trying things out as I read so lets get the project updated to allow you to follow along.

The following changes will be needed to import Parsec into both the library and the unit test code:

Next time we call `stack build`, `stack test`, or `stack ghci`, the Parsec libraries will be downloaded, configured and imported into our project.

Here is the final list of imports when the code is complete. We will go through them all as we introduce them. But to ease the walk through process we might as well get rid of any of the compile errors that come from missing imports.

If you want links to the documentation you can check out the [References](#references).

Basic Layout of a Parser

There is a common data type pattern you'll see in much of the Parsec library You can just view it as a "Parser Response", `ParsecT s u m a`. Some of the unnamed fields will be replaced with specifics, but you'll definitely see something close to this throughout the documentation.

The code we will be writing takes a stream of data and tokenizes it into haskell data structures, atoms. We will create a set of stream to data type parsers and combine them until eventually we can create a `Task`.

Deeper Explanation: ParsecT

Looking over the [Parsec Documentation][2] you'll see the common structure `ParsecT s u m a` which breaks down pretty straight forward.

- `s` is the stream type

- `u` is the user state type

- `m` is the underlying monad

- `a` is the return type.

Given a stream (`s`) and executing in the current state (`u`), the parser performs a set of operations to extract data that creates type `a`. All of this occurs within a monad (`m`).

Understanding this isn't really required to create a parser. Just knowing that `ParsecT s u m a` is the common data type in all parsers should be good enough as you will start to notice it in some form or another (typically replacing a letter for an actual type.

Often instead of using the transformer (denoted by the 'T') you'll see the following type used:

Using the `Identity` monad ([docs][3]) we get back exactly what type we pass in. And by using currying we can add the `a` return type to the end.

The type we will most commonly be using is ([doc][4]):

Our stream type is `String` and we aren't using any User State. So a type of `Parser Int` means `String` in and `Int` out. We can make all sorts of parsers with this type signature:

A simple example

Now that we know how the construct a parser by creating smaller atom parsers, and we know how the `Parser` data type works, let us create our first atom and walk through the parts.

We will start with a very simple example: Priority. Looking at [Rule 1][5] of the todo.txt documentation,

The priority is an uppercase character from A-Z enclosed in parentheses and
followed by a space.

Here is the code:

We will do the TDD work in a later post. But for now lets de-construct the parser for Priorities.

The type signature for the parser function is pretty straight forward. We are creating a parser (`Parser a`) with a return type of `Tasks.Priority`.

`Parser a` is a monad, so to make the code easy to read we use the "do notation". You could use all of the monadic operators and be very functional but I think that when it comes to code like a parser, readability is important. Using "do notation" you can write your parser in an imperative (procedural) way which many programmers already understand.

This line reads in a specific character ([doc][6]). If it is found the value is returned (and we ignore it by not assigning it a name). If the next character in the stream doesn't match, the parser fails.

Similar to the previous line, this one reads a specific character. [letter][7] parse any "upper case or lower case character" and returns the parsed character. We assign that value to `p`

Again, parsing and ignoring a parenthesis.

Lastly we do a little house keeping, forcing the character to be upper case ([doc][8]), and then returning it. Since our function is of type `Parser Tasks.Priority` and `Tasks.Priority` is just a type of `Char`, everything matches up with the type.

Up Next

In the [next post][13] we will create some unit tests and figure out how chain a bunch of parsers together to create larger data structures.

<a name="references"></a> References

- [Control.Monad.Identity][3]

- [Data.Char][9]

- [toUpper][8]

- [Text.Parsec][10]

- [ParsecT][2]

- [Text.Parsec.Char][11]

- [char][6]

- [letter][7]

- [Text.Parsec.String][12]

- [Parser][4]

Haskell Project: Unit Testing with Hspec [1]

Text.Parsec [2]

Control.Monad.Identity [3]

Text.Parsec.String:Parser [4]

Rule 1 - If priority existss it always appears first [5]

Text.Parsec.Char:char [6]

Text.Parsec.Char:letter [7]

Data.Char:toUpper [8]

Data.Char [9]

Text.Parsec:ParsecT [10]

Text.Parsec.Char [11]

Text.Parsec.String [12]

Haskell Project: More Parsers [13]

$ date: 2017-05-21 12:19 $

$ tags: haskell, tutorial, parsec $

-- CC-BY-4.0 jecxjo 2017-05-21

Comments?

back

Proxied content from gemini://gemini.sh0.xyz/log/2017-05-21-haskell-project-intro-to-parsers.gmi

Gemini request details:

Original URL
gemini://gemini.sh0.xyz/log/2017-05-21-haskell-project-intro-to-parsers.gmi
Status code
Success
Meta
text/gemini
Proxied by
kineto

Be advised that no attempt was made to verify the remote SSL certificate.