arXiv preprint arXiv:2509.23024 , year=

Melody Zixuan Li, Kumar Krishna Agrawal, Arna Ghosh, Komal Kumar Teru, Adam Santoro, Guillaume Lajoie · 2025 · arXiv 2509.23024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

Dynamic Rollout Editing reduces overthinking in RL-trained LLMs by editing post-answer continuations in successful rollouts and preferring the edited versions within GRPO groups.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

Scale Determines Whether Language Models Organize Representation Geometry for Prediction

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Representation geometry in language models aligns with the unembedding readout subspace in a scale-dependent manner, preserved throughout training in large models but progressively lost in late layers of small models despite continued loss improvement.

Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.

Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization

stat.ML · 2026-05-07 · unverdicted · novelty 6.0

Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-only LLMs, backed by a mechanistic model.

From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales

cs.LG · 2026-03-31

citing papers explorer

Showing 6 of 6 citing papers.

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models cs.CL · 2026-06-16 · unverdicted · none · ref 55
Dynamic Rollout Editing reduces overthinking in RL-trained LLMs by editing post-answer continuations in successful rollouts and preferring the edited versions within GRPO groups.
Uncovering the Latent Potential of Deep Intermediate Representations cs.LG · 2026-05-21 · unverdicted · none · ref 6
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
Scale Determines Whether Language Models Organize Representation Geometry for Prediction cs.LG · 2026-05-16 · unverdicted · none · ref 5
Representation geometry in language models aligns with the unembedding readout subspace in a scale-dependent manner, preserved throughout training in large models but progressively lost in late layers of small models despite continued loss improvement.
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs cs.LG · 2026-05-12 · unverdicted · none · ref 37
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization stat.ML · 2026-05-07 · unverdicted · none · ref 7
Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-only LLMs, backed by a mechanistic model.
From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales cs.LG · 2026-03-31 · unreviewed · ref 21

arXiv preprint arXiv:2509.23024 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer