Blog.

Two white circles on a black background, separated by a thin vertical white line: a diffuse, blurred glow on the left and a crisp, sharply defined disc on the right.

The Prior

Suyoung Hwang · 2026-05-28

Everyone talks about models. This is about what we build around them to make AI a genuine participant in science, and why that scaffolding may matter more than any parameter count.

Bar chart comparing FrontierScience-Research pass rates for tested models in descending order by pass@ep; Thyla 1.0 ranks second at 33.9%, closely trailing GPT-5.4 Pro, while open-weight base models lag further behind.

Rivaling Frontier Models with an Open-Weight System on FrontierScience-Research

Seunghyun Moon, Johyun Park, Hojin Yoo, Suyoung Hwang · 2026-05-28

We built an open-weight multi-agent system that rivals frontier proprietary models on the most demanding scientific reasoning benchmark available — and the gap it closes has less to do with what the model knows than with how we put it in contact with what it needs to know.

Flowchart of a node cycle: a transcript feeds an LLM step, which either runs tool calls to produce the next transcript and loops back, or yields a final answer and terminates.

A New Primitive for Agentic Workflows

Jiho Park, Suyoung Hwang · 2026-05-26

The agent loop is a state machine. Our library exposes one step of it as a node: a single LLM call and the tool executions it triggers, with a typed output, a content-addressed cache, and callbacks at named points inside.

Humanity's Last Exam is Significantly Flawed

Hojin Yoo, Johyun Park · 2026-05-26

While the Humanity's Last Exam (HLE) has been widely accepted as the definitive standard for benchmarking scientific and reasoning capabilities of language models, a direct examination of its contents exposes a benchmark unworthy of the name.

Flow diagram of the control flow of a Rust async future: spawning, queuing, executing while polled by the scheduler, then completing, waiting in a pending state for IO or a timer, or being dropped prematurely on cancellation.

Cancel safety in Rust's Future

Jeongyun Moon, Donghyun Koh · 2026-05-23

Rust futures can be cancelled mid-execution, which can leave async programs in an inconsistent state. This post covers where cancellation comes from and how to write cancel-safe code.

Illustration of the Rust mascot crab holding a shield with a checkmark, connected by lines to code blocks, servers, and databases.

Why We Use Rust

Suyoung Hwang, Donghyun Koh · 2026-05-23

Why we standardized on Rust for the infrastructure around our ML systems — typed contracts, predictable concurrency, and build-verified schemas — and the tradeoffs that came with it.

Autoref method resolution: Rust tries receiver types w, &w, and &mut w in order and picks &w, the first match with the fewest references.

Faking Specialization on Stable Rust: Field Notes on Autoref Specialization

Jiho Park, Suyoung Hwang · 2026-05-22

Field notes on autoref specialization, a technique for faking specialization on stable Rust

Converting Markdown to PDF Inside a Binary

Suyoung Hwang · 2026-05-15

We built Litho, a Rust library that converts Markdown to PDF using the Typst compiler as an in-process dependency. This post covers the design decisions, challenges, and tradeoffs we encountered along the way.

Line chart of end-to-end MoE layer throughput in TFLOP/s versus token count for DeepSeek V3 on a single B200 GPU; both AsteroMoE configurations outperform sonic-moe, DeepGEMM, and FlashInfer across all batch sizes.

AsteroMoE: A Faster MoE Kernel for Blackwell

Seungwon Kim · 2026-03-22

AsteroMoE is a new MoE kernel designed for Blackwell that achieves better performance than sonic-moe by using TMA gather4 instructions and a more efficient tile scheduler.