# Converting Markdown to PDF Inside a Binary

Suyoung Hwang · 2026-05-15

At some point every service ends up needing to generate PDFs on the fly. People just want things as documents. That's apparently the protocol now.

We had the same requirement, but with a twist: not a fixed-format report, but a dynamic PDF generated from LLM-produced Markdown, integrated into an [MCP](https://modelcontextprotocol.io/) server.

When you look for technical solutions to this, you generally land on one of two paths:

- The TeX route: [Pandoc](https://pandoc.org/) with a TeX backend (Markdown → TeX → PDF)
- The browser route: [Playwright](https://playwright.dev/) or similar (Markdown → HTML → PDF)

Outside of scientific or math-heavy documents where TeX genuinely shines, the browser-based approach seems to be where most people land these days, and it's easy to see why. CSS gives you a lot of customization for free. But both approaches share the same core problems: you're shipping a huge binary into your production image (Chromium, or TeX Live), you need a process pool or a message queue to make it work reliably at scale, and performance characteristics are unpredictable enough that you end up tuning it as its own separate concern.

So you're stuck. **What if there were a third option?**

## Litho

**Litho** is a Markdown-to-PDF converter that uses neither TeX nor a browser. Instead it goes through [Typst](https://typst.app/): Markdown → Typst → PDF.

And the headline number is kind of absurd: **a single function call, ~2ms per document**.

```rust
let markdown_text: String = std::fs::read_to_string("document.md")?;
let pdf_bytes: Vec<u8> = litho::markdown_to_pdf(markdown_text)?;  // 2ms
std::fs::write("converted.pdf", pdf_bytes)?;
```

Typst is a typesetting compiler written in Rust. The [organization](https://github.com/typst) has it fully open-sourced, and it's published on [crates.io](http://crates.io). From a programming standpoint, the interesting part is that the Typst crate exposes a `World` trait that lets you embed the compiler into your own binary, which means third-party crates like Litho can customize the compilation pipeline without forking anything.

At runtime, `litho::markdown_to_pdf` does two things in sequence:

1. Markdown → Typst: parses the input Markdown and produces a `.typ` source file
2. Typst → PDF: hands that source file to the Typst compiler and gets bytes back

Most of the interesting engineering happened in building those two stages, and that's what the rest of this post is about.

## From Regex to an AST

The first attempt at the Markdown-to-Typst conversion was regex-based. Match headers, then bold/italic, then code blocks, then lists, tables, links, footnotes, in some order, and transform each one.

Problems showed up quickly. The classic example: pipe characters inside a Python code block getting picked up as table delimiters.

````markdown
Take a look at this:

```python
row = "|name|age|"
cells = row.split("|")
```

| input | output |
|-------|--------|
| ...   | ...    |
````

I fixed bugs by adjusting processing order and adding tests, but every fix broke something else. There was always another edge case lurking.

The next attempt introduced placeholders as a kind of ad-hoc state management: replace protected content with special tokens like `<<CODEBLOCK0>>`, run the transformations, then restore. It worked, but every new Markdown feature you added required thinking through its interactions with all the existing ones. With $N$ features, you're looking at $O(N^2)$ failure modes to keep in your head. Nested blockquotes, lists inside blockquotes with math blocks inside those, and on and on. Things that seem simple in Markdown are genuinely nasty to handle with string substitution.

Building this was not fun, and it increasingly felt like I was in the process of reimplementing an AST parser from scratch, badly. So I switched to [pulldown-cmark](https://docs.rs/pulldown-cmark), a proper Markdown AST parser.

pulldown-cmark processes Markdown source and emits a stream of typed events: `Start(Heading)`, `Text("hello")`, `End(Heading)`, and so on. Having an event stream means you need something state-machine-like to consume it, so Litho now maintains a struct that tracks the relevant state throughout a conversion:

```rust
struct ConversionState {
    list_stack: Vec<bool>,      // ordered vs. unordered at each depth
    in_table_cell: bool,
    in_code_block: bool,
    table_rows: Vec<Vec<String>>,
    emitted_footnotes: HashSet<String>,
    // ...
}
```

Tables were one of the clearest wins from this approach. Typst's `#table()` requires the column count to be declared upfront. In the regex version I was counting pipe characters line-by-line and hand-rolling inline code exceptions. With the event stream, it's almost trivial: watch for the table start event, accumulate rows in `table_rows` as cell events come in, and flush everything to Typst source when the table ends. What was a nightmare becomes just bookkeeping.

Footnotes had their own interesting constraint. Typst footnotes work like this: first use requires `#footnote[content] <label>` to declare the label and content together, and subsequent uses just reference back with `@label`.

```text
Markdown:  Text with a citation[^1] and reused[^1].
           [^1]: See Knuth, 1984.

Typst:     Text with a citation#footnote[See Knuth, 1984.] <1> and reused@1.
```

The implementation collects all `[^label]: content` definitions upfront into a `HashMap`. Then during conversion, the first `FootnoteReference` event for a given label emits `#footnote[content] <label>`, and every subsequent one emits `@label`. Whether a footnote has been emitted yet is tracked in `emitted_footnotes`.

## AST and Regex, Together

pulldown-cmark isn't a complete solution on its own. It does support an `ENABLE_MATH` flag that parses `$...$` into math events, but it doesn't handle LaTeX conventions like `\[...\]` or `\(...\)`. We disabled `ENABLE_MATH` and went back to regex for math specifically.

Concretely: we placeholder-protect inline and block math expressions before passing to the AST parser, then post-process them. Because math only really interferes with code blocks and blockquotes, the $O(N^2)$ blowup from before doesn't resurface. The math-to-Typst conversion itself is handled by [MiTeX](https://typst.app/universe/package/mitex/), a Typst package that provides `#mi` and `#mimath` functions for rendering TeX-syntax math in Typst.

We considered converting the LaTeX convention syntax to dollar-sign syntax first, then feeding it to pulldown-cmark, but making that reliable required understanding and matching pulldown-cmark's internal parsing behavior. The work was roughly the same either way, and doing it all in regex/placeholder was just simpler to reason about.

## Bundling Everything into the Binary

Rust's build-time metaprogramming is genuinely useful here. The `include_bytes!` family of macros lets you embed arbitrary files into your binary at compile time, where they show up as `&'static [u8]` slices. Litho uses this to bundle 13 fonts and the MiTeX package directly:

```rust
static FONTS: [&[u8]; 13] = [
    include_bytes!(concat!(env!("CARGO_MANIFEST_DIR"), "/fonts/DejaVuSansMono.ttf")),
    include_bytes!(concat!(env!("CARGO_MANIFEST_DIR"), "/fonts/LibertinusSerif-Regular.ttf")),
    // ... 11 more
];

static MITEX_TARBALL: &[u8] = include_bytes!(concat!(
    env!("CARGO_MANIFEST_DIR"),
    "/packages/mitex-0.2.6.tar.gz"
));
```

The paths are constructed with `env!` and `concat!`, and `include_bytes!` pulls the file contents in as `&'static [u8; N]`. These end up in `.rodata` on ELF and `__TEXT/__const` on Mach-O. This adds a few MB to the binary, and LTO won't strip it since the data is actually referenced.

For MiTeX, a `OnceLock` handles the first-access decompression: the tarball is extracted into a `HashMap<FileId, Vec<u8>>` on the first call and shared from then on. Since it never changes after initialization, there's no locking needed after that point. Typst doesn't have a dependency manifest like `typst.toml`; packages are imported implicitly from source. MiTeX v0.2.6 has no further dependencies (earlier versions depended on `xarrow`), so the bundle stays self-contained.

## When a Library Has Outlived Its Usefulness

The early version of Litho used [typst-as-lib](https://github.com/Relacibo/typst-as-lib), a crate that wraps the Typst compiler in a friendlier interface. Even after exhausting the optimizations available without looking inside it, single-threaded compilation was around 15ms. Worse, adding threads made things slower in a roughly linear way.

Performance degrading linearly with thread count is a strong sign of a critical section somewhere. Profiling all the way down into the library's internals pointed at a struct called `CachedFileResolver`. Every call to `source()` or `file()` acquired a mutex, then hashed a `FileId` to look up a value inside a `Mutex<HashMap>`. The Typst compiler calls these methods many times per compilation, and the mutex contention accumulated into a serious bottleneck.

The fix was to implement Typst's `World` trait directly. `World` is the interface the compiler uses to talk to the outside world: the standard library, source files, fonts, file data, the current date. Implement that, and you control everything. It sounded daunting but turned out to be straightforward. There are five required methods, most of them read-only lookups, and the whole implementation ended up around 80 lines:

```rust
#[comemo::track]
impl typst::World for LithoWorld {
    fn source(&self, id: FileId) -> FileResult<Source> {
        if id == self.main.id() { return Ok(self.main.clone()); }
        let bytes = self.resolver.resolve_bytes(id)?;
        let text = std::str::from_utf8(&bytes)...;
        Ok(Source::new(id, text.into()))
    }

    fn file(&self, id: FileId) -> FileResult<Bytes> {
        self.resolver.resolve_bytes(id).map(Bytes::new)
    }
    // ...
}
```

With the mutex gone, package lookups go through an `Arc<HashMap>` initialized once at startup. Fonts and the standard library are handled via `OnceLock` and shared across all compilations. Single-threaded compile time dropped from 15ms to 2ms.

There's a general lesson here that I think applies more broadly: most libraries exist to let you focus on your actual problem. When a library starts getting in the way of that, it's probably run its course. In 2026, with LLM assistance, the cost of just implementing the thing yourself has come down a lot. I'd reach for that option sooner next time.

## Discarding Compiler Errors

There are two places in Litho where errors can occur: Typst compilation and PDF rendering. The Markdown parser and converter are fail-safe.

The Typst compiler does return diagnostics, but surfacing them directly would be confusing. An error there could be from the input, or it could be a bug in Litho's conversion logic. Exposing Typst source-level error messages to callers of `litho::markdown_to_pdf` would tell them something about the internals they probably don't care about and likely couldn't act on. Instead, Litho logs diagnostics at `error` or `warn` level via `tracing`.

So the public error type ends up looking like this:

```rust
pub enum Error {
    Compile,
    Pdf,
}
```

If something goes wrong and you need to debug it, `markdown_to_typst` and `typst_to_pdf` are exposed separately so you can inspect the intermediate Typst source directly.

## Benchmarks

Before the numbers: Litho does a subset of what Pandoc does. No citation processing, no template expansion, nowhere near the full LaTeX feature set. So this isn't a fair apples-to-apples comparison, and you should read it with that in mind.

The test document is a 653-byte Markdown file containing math, tables, code blocks, and footnotes. Measurements on Apple M4 Pro, release build.

**Single-threaded performance** (10 runs):

| Engine | Mean | Median | Min | Max |
|---|---|---|---|---|
| **Litho** | **2.14ms** | **2.10ms** | **1.90ms** | **2.75ms** |
| Chromium | 2.27s | 2.26s | 2.25s | 2.33s |
| pandoc + pdflatex | 884.85ms | 844.71ms | 826.12ms | 1.18s |
| pandoc + tectonic | 1.06s | 1.05s | 1.04s | 1.10s |

Two to three orders of magnitude faster. Again, not the same workload, so take it with a grain of salt. But it's enough to call this a real third option.

**Concurrency:**

Because all shared state (fonts, package files, standard library) is read-only after initialization, Litho should scale linearly with core count. It does.

| Threads | Wall time | Per-thread mean | Min | Max |
|---|---|---|---|---|
| 1 | 2.39ms | 2.36ms | 2.33ms | 2.38ms |
| 2 | 2.66ms | 2.62ms | 2.40ms | 2.84ms |
| 4 | 3.21ms | 3.08ms | 2.68ms | 3.49ms |
| 8 | 4.96ms | 4.75ms | 3.72ms | 5.46ms |
| 16 | 10.62ms | 9.92ms | 5.54ms | 11.49ms |

Wall time climbing into the 10ms range at 16 threads makes sense given the M4 Pro's 14-core layout (10 performance + 4 efficiency). At that point you're competing for CPU, not shared data.

**Deployment:**

| Engine | Cold start | Size / dependencies |
|---|---|---|
| **Litho** | **~61ms (tarball decompression)** | **47 MB self-contained** |
| Chromium | >15s (Chromium download) | 253 MB |
| pandoc + pdflatex | >15s (TeX Live install) | 226 MB + TeX Live |
| pandoc + tectonic | >15s (TeX package fetch) | 226 MB + tectonic |

Tectonic takes a similar philosophical approach on the TeX side: self-contained engine, on-demand package fetching. But Pandoc still shells out to it as a subprocess, so you can't get the in-process benefits that make Litho's latency what it is.

## What We Left Out

Litho doesn't support images, figure/table captions, or citations.

Images would require fetching bytes, decoding them, and injecting them into the in-memory file resolver. You could expose a resolver interface for callers to provide, but general URL-based image loading would require runtime I/O, which cuts against the core design goal of doing everything in-process without network or filesystem access during compilation. Citations require a separate `.bib` or [Hayagriva](https://github.com/typst/hayagriva) file; that's just how Typst is designed.

Captions are a different kind of problem. Markdown doesn't have a standard syntax for attaching a caption to a figure or table, so supporting it would mean inventing a syntax extension or adopting an existing one and threading it through the entire converter. That's non-trivial.

None of these are technically impossible. They're all tradeoffs where you give up API simplicity in exchange for scope, and so far that tradeoff hasn't seemed worth it.

Litho is currently running in production as an internal library on our server. If you're solving a similar problem, hopefully this writeup saves you some time.