# Why We Use Rust

Suyoung Hwang, Donghyun Koh · 2026-05-23

The common picture of an AI company's codebase is mostly `model.py` and `train.py`. At actual research labs — with wide variation between them — that's not entirely wrong. But look across most companies, and what fills the codebase, loosely labelled "infrastructure," is mostly scaffolding less interesting than the application logic itself.

Our codebase comprises schedulers, internal libraries, preprocessing pipelines, and microservices.

None of that is model code — it's closer to application code. We've standardized on Rust for all of it, partly because we like Rust, but mostly because we didn't see a strong alternative. And the role Rust can play in the ML ecosystem, and around it, turned out to be larger than we expected.

This is not a call to use Rust for everything. We still use PyTorch for parameter computation, CUDA and C++ for kernels, and Python for notebooks and one-off scripts. But for the layer above and around all of that, we use Rust.

## Shortcomings of Other Languages (Especially Python) That Led Us to Rust

### Types

Python is an untyped language. Type annotations via `typing` are possible, and checkers like `mypy` and `ty` exist, but violating those annotations does not cause the program to fail — silent failures are common instead. In practice, those checks are often deliberately ignored, for a variety of reasons.

In the AI field, type problems are especially dangerous. In data-preparation and model-evaluation pipelines, a type error doesn't throw an exception; it silently passes bad data downstream. Empty reward signals, malformed model outputs, and `None` values standing in for numbers can all make their way into a training loop, where they might go undetected until days of compute have been burned.

Consider this Python code:

```python
def search_papers(query: str) -> dict:
    """'authors' holds a list of author name strings."""
    ...

result = search_papers(query)
for paper in result["papers"]:
    for author in paper["authors"]:   # used to be ["Smith", "Jones"]
        register_author(author)
```

When `authors` changes from `["Smith", "Jones"]` to the **single string** `"Smith, Jones"` following a library update, `for author in "Smith, Jones":` raises no exception. It iterates character by character, passing `'S'`, `'m'`, `'i'`, `'t'`, `'h'`, `','`, `' '` … to `register_author`. The type is still `str`, the loop runs normally, nothing crashes. What flows downstream is corrupt data with individual characters registered as authors.

When this happens in code we own, it's at least traceable and fixable. When a dependency introduces a silent breaking change, debugging becomes a nightmare.

In Rust, and in statically typed languages generally, a function's return type is a contract enforced at compile time:

```rust
pub struct PaperSearchResult {
    pub papers: Vec<Paper>,
    pub more_pages: bool,
}

pub async fn search_papers(query: &str) -> Result<PaperSearchResult, SearchError> { ... }
```

If `papers` is renamed to `results` six months later, convention dictates a SemVer bump — and the compiler flags every call site that uses that field.

TypeScript makes a similar promise, but type quality across the npm ecosystem is uneven (better than PyPI, but still). Well-maintained packages have accurate types; poorly maintained ones let `any` leak in.

Go is statically typed, but its type system is thin. Generics were added only recently; there are still no algebraic data types, no exhaustive matching, and error handling uses tuple returns — a convention, not an enforcement. It's hard to believe that in 2026, you still need a third-party library to get monadic types or anything resembling method chaining.

### Memory

Memory problems have always been important and hard to catch, long before the AI era. In the AI field specifically, long-running loops and fleets collecting RL rollouts can run for hours or days in a single execution. As the software grows, memory leaks become increasingly notorious. In a Python program — or most other languages — a memory leak manifests as gradually increasing memory usage, until the process is OOM-killed near the end of a run, or worse, it keeps growing and puts pressure on the entire system. Debugging it means running a profiler and re-running the multi-hour job — assuming the problem reproduces and the cause is even traceable.

Every Python object is reference-counted — what modern languages call a smart pointer. In Rust terms, every variable is an `Rc<T>`; in C++, a `std::shared_ptr`. This means callbacks and closures can unintentionally hold references to large objects, preventing the reference count from ever reaching zero and leaving the GC unable to collect them. When you spawn many tasks with `asyncio.gather`, objects accumulated inside them aren't released until the tasks finish. As small-object fragmentation accumulates, usable memory quietly shrinks — and this is less a bug at the application level than a structural limitation of the language.

Go's memory management is more predictable than most GC languages, but there's still a GC, and controlling its timing in long-running processes is hard. Allocation and deallocation are largely invisible at the code level. `defer` hooks and goroutines are elegant, but they abstract important tuning knobs into the language runtime. Go references also behave like smart pointers. In practice, Go-based software like Weaviate has shown real scaling limits.

Rust's ownership model was designed from the start to eliminate these problems. Memory is freed the moment it goes out of scope, and this is guaranteed at compile time. Allocation and deallocation are also tunable — using the `Drop` trait, for instance, appears in real production code even outside embedded systems. When you need shared state, `Arc<Mutex<T>>` makes it explicit, which itself becomes a natural review point.

### Concurrency — Scheduling

Concurrency is unavoidable when listing Python's shortcomings. Python has the Global Interpreter Lock — an infamously bad piece of design, except in the recent [free-threading variant](https://docs.python.org/3/howto/free-threading-python.html). Earlier I noted that every Python variable is effectively an `Rc<T>` smart pointer, which implies that `T` has interior mutability and something like `Send + Sync` capability. Why doesn't this cause race conditions in general? Because the GIL ensures, through perhaps the worst possible mechanism, that only one statement executes at a time — regardless of how parallel your code looks.

To make this more concrete, consider a workflow using `asyncio`:

```python
async def run_all(problems):
    return await asyncio.gather(*[run_one(p) for p in problems])
```

Even in this short code, there are multiple problems. A single blocking call anywhere in the call tree silently stalls the entire event loop. CPU-bound work serializes under the GIL regardless of which thread it escapes to. `async def` starts to feel deceptive — it's synchronous, isn't it? On top of that, errors from nested `asyncio.gather` calls will silently swallow their tracebacks unless handled carefully. You typically find out after the fact.

Bringing this up usually invites a mention of Ray, which is genuinely widely used in the ML ecosystem. Wrapping a Temporal-style job scheduling cluster in an async runtime is one thing, but with Ray, the underlying tasks are still Python and still subject to the GIL. `@ray.remote` even returns `ObjectRef[Any]` — a type that inspires no confidence. And having a cluster or control plane of any kind means a black box plus additional operational overhead.

The Python ecosystem has produced a remarkable number of orchestration tools to work around this problem — Prefect, Dagster, Airflow, Argo, Modal, Temporal, Ray, and more. That proliferation is itself evidence of the underlying shortcoming.

Node.js and its successors — Deno, Bun — address some of this. async/await is native and well-suited to I/O-bound work. V8 is fast, but there's still an order-of-magnitude gap compared to native code. And compute between awaits is still single-threaded.

In Rust, the equivalent code looks like this:

```rust
async fn run_all(problems: Vec<Problem>) -> Vec<Result<Eval, RunError>> {
    stream::iter(problems)
        .map(|p| async move { run_one(&p).await })
        .buffered(8)
        .collect()
        .await
}
```

Backpressure is not a discipline here — it's a single call to `buffered(8)`. Errors flow into a `Result` type that the compiler (clippy, more precisely) won't let you discard. And there's no GIL, so CPU-bound work doesn't block I/O.

`tokio` is the de facto standard async runtime in the Rust ecosystem — resilient and high-performance (not HPC-level performance, but nothing in the application tier comes close). For blocking operations, tokio provides an explicit escape hatch: `tokio::task::spawn_blocking`. If you misuse it, the misuse is at least visible in the code.

The reason async runtimes can exist in the Rust ecosystem at all is that `Future` is abstracted as a [trait in the standard library](https://doc.rust-lang.org/std/future/trait.Future.html). Each task's "stack" becomes a compiler-generated state machine. Only the variables alive across `await` points are heap-allocated, which makes memory usage theoretically efficient.

Go handles all concurrency through goroutines — lightweight entities managed directly by the Go runtime, far cheaper than OS threads. But Go's design philosophy is fundamentally: keep the internals flexible, keep the developer unaware.

This produces behavior that isn't always intuitive. When a blocking syscall occurs, the GMP scheduler performs M/P separation; the netpoller intercepts network I/O. Since [Go 1.14](https://go.dev/doc/go1.14), the scheduler uses SIGURG-based async preemption, which meaningfully changes scheduling characteristics. Each goroutine starts with a minimal stack, growing dynamically through allocation and copy when a larger heap is needed. Scheduler overhead aside, the deeper issue is that none of this is visible or controllable by the developer.

### Concurrency — Data Races

Any discussion of Rust's concurrency story has to include **compile-time prevention of data races**.

Consider a common pattern: running many LLM calls in parallel and writing results to a shared cache or state object. In Python, you might write:

```python
shared_cache = {}

async def run_one(p):
    result = await call_llm(p)
    # This runs without error in Python.
    # The single-threaded event loop and GIL happen to prevent race conditions.
    # But switch to a ThreadPool or a different runtime and data silently starts to corrupt.
    shared_cache[p.id] = result

await asyncio.gather(*[run_one(p) for p in problems])
```

In Python, this kind of state sharing is frictionless. Skipping a lock raises no error. With luck, the GIL makes it appear to work — until multithreading enters the picture, or an operation turns out to be non-atomic, and data gets silently corrupted. The GIL does not prevent semantic race conditions.

The same applies in Go:

```go
var sharedCache = map[string]Result{}

func runOne(p Problem, wg *sync.WaitGroup) {
    defer wg.Done()
    result := callLLM(p)
    // This panics.
    // Goroutines run truly in parallel on OS threads — unlike Python asyncio, there's no GIL to catch this.
    // Running with `go run -race` produces a "concurrent map writes" panic.
    sharedCache[p.ID] = result
}

var wg sync.WaitGroup
for _, p := range problems {
    wg.Add(1)
    go runOne(p, &wg)
}
wg.Wait()
```

In Go, the failure is at least louder. Concurrent map writes are **explicitly unsafe** in Go, and the race detector catches them at runtime. Fixing it requires explicitly reaching for `sync.Mutex` or `sync.Map` — the same level of discipline as proper error handling.

In async Rust, equivalent code **doesn't compile at all**:

```rust
let mut shared_cache = HashMap::new();

let results: Vec<_> = stream::iter(problems)
    .map(|p| async {
        let result = run_one(&p).await;
        // ❌ COMPILE ERROR:
        // `shared_cache` cannot be shared and mutated between threads safely.
        shared_cache.insert(p.id, result.clone());
        Ok(result)
    })
    .buffered(8)
    .collect()
    .await;
```

Rust uses the `Send` and `Sync` traits to rigorously enforce whether data can cross thread boundaries and whether concurrent access is safe. Any attempt to mutate shared state from multiple tasks simultaneously results in a compile error.

The fix requires explicitly wrapping shared state in `Arc` and `Mutex`:

```rust
// Wrap shared state safely with Arc (Atomic Reference Count) and Mutex
let shared_cache = Arc::new(Mutex::new(HashMap::new()));

let results: Vec<_> = stream::iter(problems)
    .map(|p| {
        // Clone the reference count to move ownership into each async task
        let cache_clone = Arc::clone(&shared_cache);
        async move {
            let result = run_one(&p).await;

            // ✅ The compiler enforces that the lock must be acquired before accessing the data
            cache_clone.lock().unwrap().insert(p.id, result.clone());
            Ok(result)
        }
    })
    .buffered(8)
    .collect()
    .await;
```

As applications grow, shared state becomes inevitable. In C++ or Go, forgetting a lock leads to intermittent production failures and debugging nightmares — or dependence on a runtime race detector.

Rust demotes concurrency bugs — not to runtime panics, but to **ordinary build errors**. For a pipeline like ours that orchestrates hundreds of concurrent LLM calls per run, the guarantee that data-race corruption is not a runtime concern provides a kind of practical confidence that is simply non-negotiable.

### Performance

Rust is the fastest language after carefully tuned C/C++. There's not much more to say there, so let the numbers speak. We evaluated our LM agent framework by running a tool-using agent loop at various concurrency levels, with LLM calls simulated at fixed latency:

| **Framework** | **Median (ms)** | **p95 (ms)** | **CPU (sec)** | **Peak RSS (MiB)** |
| --- | --- | --- | --- | --- |
| ours (Rust) | 2012 | 2013 | 0.10 | 25  |
| [rig](https://github.com/0xPlaygrounds/rig) (Rust) | 2013 | 2013 | 0.12 | 16  |
| langchain (Python) | 2475 | 2495 | 3.28 | 115 |
| openai-agents (Python) | 2417 | 2442 | 2.95 | 112 |

All frameworks are waiting on the same simulated LLM calls. The Python frameworks use 25–30× more CPU time and 4–7× more memory. Python cannot even do nothing cheaply.

The Rust implementations are essentially identical. The point of this table isn't that ours is the best — it's to show what language choice alone determines.

## Other Advantages of Rust

### Toolchain and Ecosystem

Rust has exactly one toolchain: `cargo`. No environment activation, no proliferation of commands between `pip install` and `uv sync`. For dependencies, there is — and this should be obvious, but apparently isn't — a robust lockfile. Anyone who's spent time getting a new Python or Node repo into a working state knows this is not a cosmetic concern. The contrast with C++ — a comparable class of language — is stark: writing `CMakeLists.txt` effectively requires learning a separate language.

The [crates.io](https://crates.io/) ecosystem is also much larger than we initially expected. `serde` is the standard for serialization. `tonic` and `prost` cover gRPC. `tracing` is the observability standard. For performance work there are `parking_lot`, `crossbeam`, and others; and beyond these fundamentals, there are numerous useful application-level crates like `clap` and `ratatui`.

Some of these have no real equivalent elsewhere. `rustls`, for instance, is a [memory-safe](https://doc.rust-lang.org/std/future/trait.Future.html) TLS stack with no OpenSSL dependency. Every Python and Node equivalent depends on OpenSSL bindings in some form.

### Build-Time Code Generation

There's genuine debate about whether this counts as an advantage, but in practice we still put it in the plus column.

Rust lacks reflection, but compensates with extensive build-time customization. What other languages handle through runtime reflection becomes a compile-time guarantee.

The canonical example is `sqlx`, which connects to a live database at build time and validates SQL queries against the schema:

```rust
let row = sqlx::query!(
    r#"SELECT id, name, created_at
       FROM users
       WHERE id = $1"#,
    user_id
).fetch_one(&pool).await?;

println!("{}: {}", row.id, row.name);
// row.id and row.name are typed; row.created_at is a chrono::DateTime.
// If the SQL is malformed, or if `users` doesn't have a `name` column, the build fails.
```

A typo in the query, or a type mismatch, produces a build error. Somehow this is still surprising.

`tonic` generates typed clients and servers from `.proto` files at build time. Add a field to a proto, rebuild, and every call site using that field is verified.

More generally: if a spec can be encoded in a static artifact and a bridge exists between that artifact and the code (usually in the form of a procedural macro), this benefit is almost always available.

Paradoxically, this power allows people unfamiliar with programming — or with Rust specifically — to write less code and produce better software. Python is an easy language, but it's not easier than writing TOML.

## Honest Drawbacks

### Compile Time and Resource Usage

Rust's compile times are long — especially compared to Go. An unlucky clean build can exceed ten minutes, and incremental builds are faster in theory than in practice. This is a significant weakness in CI/CD environments.

Development resource consumption is also substantial. The `target/` directory accumulates source code from every dependent crate, and `cargo build` regularly produces tens of gigabytes of artifacts. `rust-analyzer` is powerful, and it optimizes aggressively using salsa, but the language's complexity means it tends to consume significant memory and CPU regardless.

### Complexity

Rust's inherent complexity is considerable. What takes a single variable in other languages requires explaining mutability, lifetimes, and references separately. What a struct or class captures in one concept is split across traits, structs, and impl blocks. `Future` is conceptually tractable, but dive into the details and the rough edges multiply fast.

These rough edges are numerous, and their intersections are worse. The language is resolving them incrementally through RFCs, but the path is strewn with unintuitive behavior. Missing or incomplete support for specialization and generic const items leaves real gaps. GATs, HRTBs, and async Rust in particular have been criticized continuously. That notation like `where for<'a> F: Fn(&'a (u8, u16)) -> &'a u8` exists and appears in real code is itself a strange fact. When to use `impl Future<Output = T>` versus `Pin<Box<dyn Future<Output = T> + Send + 'a>>` and why these differ is deeply unintuitive. This document isn't the place to catalog all of it; for those who want to go further, browsing the tracking issues on the [Rust repository](https://github.com/rust-lang/rust) is instructive.

### Recruiting and Onboarding

This complexity has a hiring corollary: the pool of developers comfortable with Rust is small. Onboarding engineers from other backgrounds takes months, not weeks. Introducing `Send + Sync` and async lifetimes to someone whose background is Python is, for practical purposes, retraining.

## Conclusion

To repeat: we are not arguing that everyone should use Rust for everything, nor declaring that Rust is the only right tool for this kind of work. We are saying that we have genuinely benefited from what Rust offers at the language and ecosystem level — typed contracts, predictable concurrency, build-verified schemas, performance — and that these gains outweigh what we've given up.

A team that chooses the right tools spends its time on actual research, not scaffolding and debugging. That was our entire criterion.