Blog.

Bar chart comparing FrontierScience-Research pass rates for tested models in descending order by pass@ep; Spacer 1.0 ranks second at 33.9%, closely trailing GPT-5.4 Pro, while open-weight base models lag further behind.

Rivaling Frontier Models with an Open-Weight System on FrontierScience-Research

Seunghyun Moon, Johyun Park, Hojin Yoo, Suyoung Hwang · 2026-05-28

We built an open-weight multi-agent system that rivals frontier proprietary models on the most demanding scientific reasoning benchmark available — and the gap it closes has less to do with what the model knows than with how we put it in contact with what it needs to know.


Two white circles on a black background, separated by a thin vertical white line: a diffuse, blurred glow on the left and a crisp, sharply defined disc on the right.

The Prior

Suyoung Hwang · 2026-05-28

Everyone talks about models. This is about what we build around them to make AI a genuine participant in science, and why that scaffolding may matter more than any parameter count.


Flowchart of a node cycle: a transcript feeds an LLM step, which either runs tool calls to produce the next transcript and loops back, or yields a final answer and terminates.

A New Primitive for Agentic Workflows

Jiho Park, Suyoung Hwang · 2026-05-26

The agent loop is a state machine. Our library exposes one step of it as a node: a single LLM call and the tool executions it triggers, with a typed output, a content-addressed cache, and callbacks at named points inside.


The Mercedes-Benz logo beside two skeletal diagrams of "mercedesbenzene" — a fictitious molecule from an actual HLE question, drawn as a hexagonal ring with three bonds meeting at a central carbon to mimic the iconic three-pointed star.

Humanity's Last Exam is Significantly Flawed

Hojin Yoo, Johyun Park · 2026-05-26

While the Humanity's Last Exam (HLE) has been widely accepted as the definitive standard for benchmarking scientific and reasoning capabilities of language models, a direct examination of its contents exposes a benchmark unworthy of the name.


Flow diagram of the control flow of a Rust async future: spawning, queuing, executing while polled by the scheduler, then completing, waiting in a pending state for IO or a timer, or being dropped prematurely on cancellation.

Cancel safety in Rust's Future

Jeongyun Moon, Donghyun Koh · 2026-05-23

Rust futures can be cancelled mid-execution, which can leave async programs in an inconsistent state. This post covers where cancellation comes from and how to write cancel-safe code.


Illustration of the Rust mascot crab holding a shield with a checkmark, connected by lines to code blocks, servers, and databases.

Why We Use Rust

Suyoung Hwang, Donghyun Koh · 2026-05-23

Why we standardized on Rust for the infrastructure around our ML systems — typed contracts, predictable concurrency, and build-verified schemas — and the tradeoffs that came with it.


Autoref method resolution: Rust tries receiver types w, &w, and &mut w in order and picks &w, the first match with the fewest references.

Faking Specialization on Stable Rust: Field Notes on Autoref Specialization

Jiho Park, Suyoung Hwang · 2026-05-22

Field notes on autoref specialization, a technique for faking specialization on stable Rust


Title card for "Converting Markdown to PDF Inside a Binary" with a Litho code snippet and a benchmark table: Litho runs in 2.14 ms with a 47 MB binary, versus 2.27 s and 253 MB for Chromium and 1.06 s and 226 MB for pandoc + tectonic.

Converting Markdown to PDF Inside a Binary

Suyoung Hwang · 2026-05-15

We built Litho, a Rust library that converts Markdown to PDF using the Typst compiler as an in-process dependency. This post covers the design decisions, challenges, and tradeoffs we encountered along the way.


Line chart of end-to-end MoE layer throughput in TFLOP/s versus token count for DeepSeek V3 on a single B200 GPU; both AsteroMoE configurations outperform sonic-moe, DeepGEMM, and FlashInfer across all batch sizes.

AsteroMoE: A Faster MoE Kernel for Blackwell

Seungwon Kim · 2026-03-22

AsteroMoE is a new MoE kernel designed for Blackwell that achieves better performance than sonic-moe by using TMA gather4 instructions and a more efficient tile scheduler.