16 February 2015

In the previous part, the progam could parse text for emphasized and bold characters. So there’s some functionality, except that it doesn’t *do* anything useful like produce output. In this part, I’ll implement parsing links Markdown stlye (this is really more of an implementation of Markdown than anything else at the moment), and then turn the parsed text into something useful!

To start off with, I removed the ability for bold emphasized text. This is largely because there’s no plan on how to deal with embedded styles, so I’d rather not start now. Implementation of link parsing exists in the `link`

function, which is very similiar to the other parsers, except it produces two strings. Suprisingly easy to put together in this case, and illustrates the parsing process a bit more clearly. Here is the resulting code:

```
1 import System.IO
2 import Control.Monad
3 import Text.ParserCombinators.Parsec
4 import Data.List
5
6 type Latex = String
7
8 emphasisSymbol :: Parser Char
9 emphasisSymbol = char '*'
10
11 boldSymbol :: Parser String
12 boldSymbol = string "**"
13
14 emphacizedChar :: Parser Char
15 emphacizedChar = noneOf "*"
16
17 boldChar :: Parser Char
18 boldChar = try (do char '*'
19 noneOf "*")
20 <|> noneOf "*"
21 <?> "Didn't find bold."
22
23 textChar :: Parser Char
24 textChar = noneOf "*("
25
26 linkChar :: Parser Char
27 linkChar = noneOf "]"
28
29 linkDescriptionChar :: Parser Char
30 linkDescriptionChar = noneOf ")"
31
32 emphasis = do emphasisSymbol
33 content <- many1 emphacizedChar
34 emphasisSymbol
35 return (content
36
37 bold = do boldSymbol
38 content <- many1 boldChar
39 boldSymbol
40 return content
41
42 link = do char '('
43 description <- many1 linkDescriptionChar
44 string ")["
45 link <- many1 linkChar
46 char ']'
47 return (link ++ " " ++ description)
48
49 bodyText = try (link) <|> try (bold) <|> try (emphasis) <|> many1 (textChar)
50
51 htexFile = many bodyText
52
53 readInput :: String -> [Latex]
54 readInput input = case parse htexFile "" input of
55 Left err -> ["No match " ++ show err]
56 Right val -> val
57
58 main = do
59 contents <- readFile "input.htex"
60 putStrLn $ foldl (\acc x -> acc ++ x) "" (readInput contents)
```

So not a whole lot changed. without too much effort its resonably easy follow the parser from `bodyText`

.

The next thing to do is make the parser return LaTeXified text. This turned out to be more simple than I thought, though it helps that all I am doing it returning strings. All that was required is to wrap the content in the return statements with LaTeX commands, and altering the `main`

function simply concatanate the resulting parsed text:

```
1 emphasis = do emphasisSymbol
2 content <- many1 emphacizedChar
3 emphasisSymbol
4 return ("\\emph{" ++ content ++ "}")
5
6 bold = do boldSymbol
7 content <- many1 boldChar
8 boldSymbol
9 return ("\\textbf{" ++ content ++ "}")
10
11 link = do char '('
12 description <- many1 linkDescriptionChar
13 string ")["
14 link <- many1 linkChar
15 char ']'
16 return ("\\href{" ++ link ++ "}{" ++ description ++ "}")
17
18 main = do
19 contents <- readFile "E:\\Google Drive\\Code\\latexpreprocessor\\input.htex"
20 putStrLn $ foldl (\acc x -> acc ++ x) "" (readInput contents)
```

Running it with `input.htex`

as `This is not in italics. *But this is.* **This is bold.** This is not bold. (Description for a link)[link]. This is not in bold italics.`

results in

```
This is not in italics. \emph{But this is.} \textbf{This is bold.} This is not bold. \href{link}{Description for a link}. This is not in bold italics.
```

Easy! It’s not quite right, since you can’t just run LaTeX on it on the output, but that requires a bit more thinking to manage package handling and other preamble bits.

At this point, I could continue and create a fully fledged pre-preprocessor, but would this satisfy a true need for it. As I said in the previous part, LaTeX syntax is somewhat verbose for a lot of uses. What I can see my pre-processor doing is allowing users to only need content to create pdfs, with some customization via something like a YAML header (like Jekyll), allowing for a bit of a “hands off” experience. Pandoc already allows for the creation of LaTeX files from Markdown, so what further use would a slightly different Markdown derivative be?

Templates are another possible extension, but this functionality is already available in Pandoc, and can be implemented fairly easily with Python using Jinja. One problem with templates is that with the seperation of content and design, the content must be written without regard for where it is in the page. This reduces to having to edit a `.tex`

file anyway.

This all applies to normal text, not math notation since there’s no real alternative markup. Which is real shame since latex is over a bit *too* verbose what you need it for (especially if you’re writing any calculus). To suit the needs of ordinary text, in my experience with helping create my partners masters thesis in LyX, all that’s needed it a good framework to handle bibliographies and references.

So with all of the above, I’m going leave this project as is. On the plus side, what if you could write math in Haskell? Say, if you wanted to typeset

```
naturalNumbers = [1..]
[x | x <- naturalNumbers, x < 20]
```

as LaTeX? This doesn’t look too diffucult, and could produce the output:

`\{x | x \in \mathbb{N}, x < 20 \}`

Looks fairly trivial to do. How about some calculus? How would you represent

`\frac{d \theta}{d \gamma} = \lambda^{x + \epsilon} \frac{d^2 \theta}{d \gamma^2}`

in Haskell? This example is a bit complicated complicated math wise, since we don’t know what `\theta`

is as a function, and `\lambda`

may not be invertable, so rearraging is not so easy. In addition, both side of the equations have functions applied to `\theta`

. But luckely, we don’t care about the math, but the typesettings instead! Inside of a LaTeX (or even html file), one could have:

```
1 gamma = Var
2 lambda = Var
3 epsilon = Var
4 x = Var
5 theta = Func gamma
6
7 leftEquation = Derivative theta gamma
8 rightEquation = (lambda `pow` (x + epsilon)) `multiply` (Derivative (Derivative theta gamma) gamma)
9
10 equationOne :: (Expression, Expression)
11 equationOne = (leftEquation, rightEquation)
```

Where the type definitions are:

```
1 class ExpressionOps a where
2 multiply :: a -> a -> b
3 pow :: a -> a -> b
4 frac :: a -> a -> b
5
6 data Expression = Var
7 | Func Expression
8 | Func2 Expression Expression
9 | Derivative Expression Expression
10
11 instance ExpressionOps Expression where
12 multiply exp1 exp2 = Func2 exp1 exp2
13 pow exp1 exp2 = Func2 exp1 exp2
14 frac exp1 exp2 = Func2 exp1 exp2
```

Then a parser could use type introspection (for variable names/operations) alongside evaluating `equationOne`

, which maps names to LaTeX commands contained within a text file.

In this case, the Haskell version is a bit longer, and it’s readability compared to the LaTeX version is debatable. But one thing it does allow you to do is reuse functions quickly (without using those bloody backslashes everywhere!). For instance, if say you wanted to rewrite the above to a system of differential equations, all you would have to do is write

```
diff1 = (y1, leftExpression)
diff2 = (y2, rightExpression)
```

And your set!

One other way that may work is to take working mathematical functions and rely heavily on type introspection to get the meta-data. But in the case of the above differential equation, `theta`

is unknown and may not have a closed form. So you would need some method of saying * theta is a function that explicity depends on x, y, and z*. Using this method opens up support for incorperating other languages, but I’m not sure there’d be a reliable way of implementing it considering considering the diverse ranges of languages used in the computing world. I’ll stick to the former method and see where it takes me.