SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification

Chanyoung Park; Dongsoo Lee; Joonhyung Lee; Kanghoon Yoon; Minsub Kim; Se Jung Kwon; Sunghyeon Woo; Sungjae Lee; Yeonjun In

arxiv: 2510.02329 · v2 · pith:AQ3KOVKAnew · submitted 2025-09-26 · 💻 cs.CL · cs.AI

SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification

Kanghoon Yoon , Minsub Kim , Sungjae Lee , Joonhyung Lee , Sunghyeon Woo , Yeonjun In , Se Jung Kwon , Chanyoung Park

show 1 more author

Dongsoo Lee

This is my paper

classification 💻 cs.CL cs.AI

keywords decodingjudgemodelselfjudgetargettasksacrossdiverse

0 comments

read the original abstract

Speculative decoding accelerates LLM inference by verifying candidate tokens from a draft model against a larger target model. Recent judge decoding boosts this process by relaxing verification criteria by accepting draft tokens that may exhibit minor discrepancies from target model output, but existing methods are restricted by their reliance on human annotations or tasks with verifiable ground truths, limiting generalizability across diverse NLP tasks. We propose SelfJudge, which trains judge verifiers via self-supervision of the target model. Our method measures semantic preservation by assessing whether token-substituted responses preserve the meaning of original responses, enabling automatic verifier training across diverse NLP tasks. Our experiments show SelfJudge achieves superior inference-accuracy trade-offs than judge decoding baselines, offering a broadly applicable solution for faster LLM inference.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding
cs.CV 2026-04 unverdicted novelty 7.0

Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.