RW-TTT: Batched Serving for Request-Owned Test-Time Training State

Han Chen; Hao Zhang; Jian Yang; Sirui Han; Yao Tian; Yike Guo; Zhizhuo Kou

arxiv: 2605.28053 · v1 · pith:NZ264K2Fnew · submitted 2026-05-27 · 💻 cs.LG

RW-TTT: Batched Serving for Request-Owned Test-Time Training State

Jian Yang , Zhizhuo Kou , Yao Tian , Hao Zhang , Han Chen , Sirui Han , Yike Guo This is my paper

classification 💻 cs.LG

keywords servingstateownerrw-tttbatchedonlyrequest-ownedtest-time

0 comments

read the original abstract

Test-time training (TTT) adapts an LLM during generation by reading and updating request-owned state, such as fast weights, low-rank deltas, or streaming learner state. This breaks batched LLM serving, which assumes shared static weights: serial execution is correct but slow, while naive batching can corrupt request state. We formulate this problem as read-write TTT serving and present RW-TTT , which tags each decode step with its owner, version, and READ/WRITE effect, batches only compatible phases, and commits updates only to the owner. On one GPU with eight fast-weight InPlace-TTT streams, RW-TTT reaches 274.61 aggregate tok/s, 9.31x over sequential serving and 3.44x over per-stream replicas under the same memory budget. It preserves behavior on RULER, a long-context benchmark, and passes owner/version checks.

This paper has not been read by Pith yet.

RW-TTT: Batched Serving for Request-Owned Test-Time Training State

discussion (0)