pith. sign in

hub

L ayer S kip: Enabling Early Exit Inference and Self-Speculative Decoding

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

citation-role summary

background 2

citation-polarity summary

years

2026 11 2025 1

roles

background 2

polarities

background 2

clear filters

representative citing papers

The Shape of Overthinking: Backtracking Bursts in Long Reasoning Traces

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

On 6000 Qwen3-8B AIME traces, late-clustered moderate-to-severe backtracks are more common in incorrect outputs, enabling prefix-causal burst-aware filtering that outperforms fixed-length cutoffs at shallow and intermediate depths.

Two-dimensional early exit optimisation of LLM inference

cs.CL · 2026-03-27 · unverdicted · novelty 7.0

Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.

All is Not Lost: LLM Recovery without Checkpoints

cs.DC · 2025-06-18 · conditional · novelty 7.0

CheckFree recovers intermediate stage failures in pipeline-parallel LLM training via neighbor averaging; CheckFree+ adds out-of-order execution to handle first/last stages by copying neighbors, with small embedding storage, outperforming checkpointing and redundancy at 5-10% failure rates by up to

Depth Exploration for LLM Decoding

cs.LG · 2026-06-28 · unverdicted · novelty 6.0

DEX replaces single-depth selection with parallel exploration over multiple candidate depths, committing the final-depth token while collapsing reusable states to reduce per-token computation.

Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

DABS is a single-pass framework that builds a depth-ordered substrate from one Transformer encoding and performs lightweight aspect-conditioned readout, cutting computation by up to 60% on multi-aspect ATSA benchmarks while matching prior accuracy.

Parcae: Scaling Laws For Stable Looped Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

Sparse Layers are Critical to Scaling Looped Language Models

cs.LG · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

Looped-MoE models scale better than dense looped or standard transformers because routing changes across loops, and they enable stronger compute-quality trade-offs via early exits at loop boundaries.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • All is Not Lost: LLM Recovery without Checkpoints cs.DC · 2025-06-18 · conditional · none · ref 9

    CheckFree recovers intermediate stage failures in pipeline-parallel LLM training via neighbor averaging; CheckFree+ adds out-of-order execution to handle first/last stages by copying neighbors, with small embedding storage, outperforming checkpointing and redundancy at 5-10% failure rates by up to