# Faking Specialization on Stable Rust: Field Notes on Autoref Specialization

Jiho Park, Suyoung Hwang · 2026-05-22

Rust has a handful of features that still haven't stabilized, and one that's been stuck for ages is _specialization_. It's the feature where the compiler automatically picks the right impl: if this type implements this trait, use the more specific one; otherwise fall back to the general one. But `#![feature(specialization)]` is still locked to nightly, tangled up with soundness issues, and there's no telling when it'll land.

And yet, when you're writing macros or code generators, the moment comes when you _genuinely_ need it. The type of an expression the user hands you, or the type of the slot you're shoving a value into, might be one of two (or more) things, and you want to take it and handle each case differently. The workaround for this, usable even on the stable channel, is the **autoref specialization** trick that @dtolnay collected in [case-studies](https://github.com/dtolnay/case-studies). Today I'll lay out what it is, plus two cases where we actually put it to work. Both started out feeling like "surely this needs nightly?" and both ended up resolving cleanly on stable.

## Exploiting the method resolution rules

The core idea is surprisingly simple. When Rust resolves `x.method()`, it adds autorefs _one step at a time_ to settle on the receiver, looking for a matching impl as it goes. It tries `x` → `&x` → `&mut x`, in that order, and **picks the impl that matches with the fewest references.**

You can use this priority to place the "more specific impl" and the "fallback impl" at different reference depths.

```rust
struct Wrapper<T>(T);

// High priority: applies only when T: Display. Implemented directly on Wrapper<T>.
trait ViaDisplay {
    fn tell(&self) -> &'static str;
}
impl<T: std::fmt::Display> ViaDisplay for Wrapper<T> {
    fn tell(&self) -> &'static str { "display" }
}

// Low priority: applies to any T, but implemented on &Wrapper<T> -> one extra autoref needed.
trait ViaFallback {
    fn tell(&self) -> &'static str;
}
impl<T> ViaFallback for &Wrapper<T> {
    fn tell(&self) -> &'static str { "fallback" }
}

let w = Wrapper(value);
let s = (&w).tell();
// "display" if value: Display, otherwise "fallback"
```

Call `(&w).tell()` and the compiler tries to match on the receiver type `&Wrapper<T>`. If `T: Display`, the `ViaDisplay` impl attached to `Wrapper<T>` matches _without any extra reference_, so it wins. If `T` isn't `Display`, that candidate disappears, and resolution falls to `ViaFallback` on `&Wrapper<T>`, which takes one more reference to reach. The whole branch resolves at compile time, so the runtime cost is zero.

In practice you usually add one more step. You have each impl return a distinct **tag type (a zero-sized type)**, then hang another trait method off that tag. Now "which path did we take" propagates as a _type_ rather than a value, and the real work happens on top of that tag.

```rust
struct TagHigh;
struct TagLow;

// Step 1: encode which impl was selected as a tag type.
// Step 2: dispatch the real work on that tag.
let tag = (&wrapper).kind();   // TagHigh or TagLow
let out = tag.run(raw);        // handle `raw` differently per tag
```

That's the whole pattern. Once it's in your hands, you can draw "branches that split on type at compile time" wherever you like, all on stable.

## Proof it isn't a toy: anyhow

The best evidence that this technique holds up in the real world is `anyhow`. Pass a single argument to the `anyhow!(...)` macro, and if it's an error type implementing `std::error::Error`, the macro wraps it as is; if not, it treats it as a message or format string. The user never has to pick a different macro for the two cases. Whether you write `anyhow!(err)` or `anyhow!("failed: {code}")`, it just works.

That "the user never has to think about it" experience is the whole point. A macro's ergonomics come down to _how many cases it absorbs on the user's behalf_, and autoref specialization makes that possible on stable, at zero cost. A good part of why anyhow is so widely used is this smoothness.

## Where we used it (1): absorbing the four block shapes of a caching macro

We built a macro that stores results in a persistent cache. The user wants to express "this computation is expensive, so cache it and pull it from the cache next time" in a single line. Set the incidental arguments (key, dependencies, and so on) aside for a moment; the core looks like this.

```rust
let x = cached!(..., { compute() });
```

The problem is that the shape of the block the user puts inside the braces isn't just one thing. There are two independent axes.

- **Sync / async:** it might be a block that simply computes a value, or one that uses `.await` inside and ultimately produces a `Future`.
- **Infallible / fallible:** it might just return a value, or it might return a `Result<T, E>` so it can use `?` partway through.

Multiply the two axes and you get four block shapes: sync-infallible, sync-fallible, async-infallible, async-fallible. We didn't want to split this into four macros, or tell the user "this is the async version, use a different macro." The user should just write the block however feels natural to them, and the macro should adapt.

So we absorbed both axes with autoref specialization. We wrap the block's evaluated result, first figuring out "is this still a `Future` that needs awaiting, or an already-ready value" and unrolling it with await if needed, then figuring out "is this a `Result<T, E>` or a plain `T`" and handling it accordingly.

```rust
// Normalize the block's result: await it if it's still a Future, unwrap it (like `?`)
// if it's a Result, so every input shape merges into one "value + maybe-error" path.
let wrapper = Wrapper::new(&raw);
let tag = (&wrapper).result_kind();        // TagResult or TagItem
let normalized = tag.execute_result(raw);  // Result -> map error into the cache's error type; value -> wrap in Ok
```

If it's identified as a `Result`, we convert the user's error into the cache library's error type and pass it along; if it's an ordinary value, we wrap it in `Ok` and merge it onto the same path. The `Future` axis works the same way. All four input shapes converge onto a single shared one-liner inside the macro, and the user never has to care whether they used `?`, `.await`, both, or neither inside the block.

What's fun here is that we _composed_ two specialization axes. Autoref specialization splits on only one axis at a time, but by passing tag types through stage by stage, you can apply several axes in sequence and cover the entire combinatorial space.

## Where we used it (2): molding a TOML document into a Rust struct

The second case is the bigger one. The job: **take a TOML document and generate code that builds a strongly-typed Rust struct out of it.** The config values don't stay as a dynamic `Value` tree — they get assembled into the actual struct the program declared.

Here's what makes it tricky. The target is a _fixed_ struct, with a definite type and definite fields. But it's _unknown to the code generator_: it's whatever struct the user pointed us at, and the generator only ever sees the TOML, never the type definition. So the generator just walks the TOML tree and emits construction code that mirrors its structure.

The snag is a pattern in nearly every Rust config struct: some fields are `Option<T>`, some are plain `T`.

```rust
struct ServerConfig {
    name: String,
    timeout: Option<u32>,
    // ...
}
```

In the TOML, `name = "x"` and `timeout = 30` look identical — a key with a scalar value. But the generated code has to emit `name: "x".into()` for one and `timeout: Some(30)` for the other, and the generator can't tell them apart: it doesn't know the field types.

Settling this at codegen time would mean threading the struct definition through the generator and emitting different code per field — and every other shape it has to handle (nested structs, vectors, enums, …) multiplies that branching until the generator is a monster you're afraid to touch.

So we did the opposite. The generator emits _one shape of code for every field_, and the `Option`\-or-not decision is pushed entirely onto the type system. The generated code rests on just three pieces, none of which inspect a field's type:

- `set` — write a value into a field. If the field is `Option<U>`, wrap the value in `Some`; if it's plain `U`, assign directly.
- `peel` — hand back a `&mut` into a field so the recursion can drill deeper. If the field is `Option<U>`, fill it with a default and return `&mut U` of the inside; if it's plain `U`, just return `&mut U`.
- `navigate!` — a helper macro that expands into a chain of `peel`s, one per step of the path, walking down to the target slot and transparently stripping any `Option` layer it crosses along the way.

With those, the generated code is always the same shape, whatever the field:

```rust
// The generator never looks at the field's type. Always this:
navigate!(root.field).set(value);
```

`set` (and `peel` likewise) splits via autoref specialization:

```rust
// navigate!(...) hands back a handle pointing at one field.
struct Slot<'a, T>(&'a mut T);

// High priority: the field is Option<U> -> wrap the value in Some.
trait SetOpt<U> {
    fn set(&mut self, value: U);
}
impl<U> SetOpt<U> for Slot<'_, Option<U>> {
    fn set(&mut self, value: U) { *self.0 = Some(value); }
}

// Low priority (one extra autoref): the field is plain U -> assign directly.
trait SetPlain<U> {
    fn set(&mut self, value: U);
}
impl<U> SetPlain<U> for &mut Slot<'_, U> {
    fn set(&mut self, value: U) { *self.0 = value; }
}

// One call site, both field types:
(&mut Slot(&mut field)).set(10);
// field: Option<i32>  =>  Some(10)
// field: i32          =>  10
```

A `Slot<'_, Option<U>>` matches the high-priority `SetOpt` impl with no extra autoref, so it wins; a plain `Slot<'_, U>` finds nothing there and falls through to `SetPlain` on `&mut Slot`. `peel` is built the same way — the `Option<U>` impl does a `get_or_insert_with(U::default)` before handing back the inner reference, the plain `U` impl returns the reference as-is — so the recursion can step into a nested field without ever knowing whether it had to cross an `Option` to get there.

The payoff: the generator stays a dumb recursion that only sees TOML structure, and every "is this field optional?" question is answered at compile time from the type at the call site.

Why go to this trouble when serde already turns TOML into structs? Because serde does it at runtime: every time the program starts, it parses the document, walks a `Value` tree, and matches it against the target type. If the config doesn't fit, you find out then, as a startup crash. Our generator does all of that at build time and emits a plain construction expression — `ServerConfig { name: "x".into(), timeout: Some(30) }`. A mistyped value or a missing field stops being a runtime failure and becomes a compile error. The generator doesn't have to know the which fields are `Option` and which aren't. Autoref specialization settles that at each call site, at compile time, from the type alone.

## To sum up

Where autoref specialization shines is, ultimately, **when you have to absorb a diversity of types at an API boundary but don't want to push that diversity onto the user (or the code generator).**

- When a macro has to take a block the same way no matter which combination of sync/async, infallible/fallible it is
- When you want a code generator to emit one uniform kind of code without knowing whether the target type is `Option`\-wrapped, needs a conversion, or has enum structure, and to let the type system fill in the rest
- When an argument might be an error type, or might just be a message (anyhow)

The common threads: it (1) works on the stable channel, (2) resolves at compile time so there's no runtime cost, and (3) lets you compose multiple axes via tag types to cover the entire combinatorial space. And above all, because it soaks up that diversity, _the outer interface gets simpler._ The user doesn't have to distinguish four block shapes, and the code generator only has to emit one kind of code.

Until nightly specialization stabilizes, this method-resolution trick looks like it'll stay useful for a good while yet. Thanks again to dtolnay. This person's case-studies repo is a treasure trove of "wait, you can do _that_ in stable Rust?"