The truth is in there: Improving reasoning in language models with layer-selective rank reduction.arXiv preprint arXiv:2312.13558

Pratyusha Sharma, Jordan T Ash, Dipendra Misra · 2013 · arXiv 2312.13558

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Tapered Language Models

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

Tapered Language Models monotonically decrease MLP width across depth with a cosine schedule, yielding better perplexity and downstream performance than uniform-width baselines across multiple architectures and scales at no extra cost.

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Pretraining and alignment induce asymmetric geometric traces in transformer weights because alignment updates concentrate in read pathways due to activation covariance while write pathways inherit less structure from alignment losses.

Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations

cs.LG · 2026-05-02 · unverdicted · novelty 7.0

Transformer activations show spectral anti-concentration for concepts in the tail while syntax prefers high-variance directions, forming a dual geometry.

Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Uncertainty and correctness in LLMs are encoded by distinct feature populations, with suppression of confounded features improving accuracy and reducing entropy.

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

cs.CL · 2024-04-29 · unverdicted · novelty 7.0

Holmes is a probing benchmark compiling over 200 datasets from 270 studies to evaluate linguistic competence across syntax, morphology, semantics, reasoning, and discourse in more than 50 language models.

Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

DG-Hard uses Donoho-Gavish hard thresholding on the fine-tuning weight delta to separate task-aligned signal from noise-like residual, recovering damaged capabilities while preserving target-task gains.

Importance-Guided Basis Selection for Low-Rank Decomposition of Large Language Models

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

BSI ranks singular-vector bases for LLM low-rank compression by estimating expected task loss increase via second-order Taylor expansion of the loss and an efficient Hessian-diagonal estimator, outperforming magnitude-based baselines on math reasoning benchmarks.

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.

Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

cs.LG · 2026-04-04 · conditional · novelty 6.0

Gradient-guided layer selection for LoRA yields 15-28% training speedup with matched downstream results on MMLU, GSM8K, and HumanEval across 14 models from 0.5B to 72B parameters.

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

cs.LG · 2026-03-10 · unverdicted · novelty 5.0

HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.

citing papers explorer

Showing 9 of 9 citing papers after filters.

Tapered Language Models cs.LG · 2026-06-22 · unverdicted · none · ref 32
Tapered Language Models monotonically decrease MLP width across depth with a cosine schedule, yielding better perplexity and downstream performance than uniform-width baselines across multiple architectures and scales at no extra cost.
Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space cs.LG · 2026-05-15 · unverdicted · none · ref 14
Pretraining and alignment induce asymmetric geometric traces in transformer weights because alignment updates concentrate in read pathways due to activation covariance while write pathways inherit less structure from alignment losses.
Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations cs.LG · 2026-05-02 · unverdicted · none · ref 39
Transformer activations show spectral anti-concentration for concepts in the tail while syntax prefers high-variance directions, forming a dual geometry.
Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders cs.LG · 2026-04-21 · unverdicted · none · ref 51
Uncertainty and correctness in LLMs are encoded by distinct feature populations, with suppression of confounded features improving accuracy and reducing entropy.
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models cs.CL · 2024-04-29 · unverdicted · none · ref 10
Holmes is a probing benchmark compiling over 200 datasets from 270 studies to evaluate linguistic competence across syntax, morphology, semantics, reasoning, and discourse in more than 50 language models.
Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining cs.LG · 2026-05-19 · unverdicted · none · ref 41
DG-Hard uses Donoho-Gavish hard thresholding on the fine-tuning weight delta to separate task-aligned signal from noise-like residual, recovering damaged capabilities while preserving target-task gains.
Importance-Guided Basis Selection for Low-Rank Decomposition of Large Language Models cs.LG · 2026-05-02 · unverdicted · none · ref 21
BSI ranks singular-vector bases for LLM low-rank compression by estimating expected task loss increase via second-order Taylor expansion of the loss and an efficient Hessian-diagonal estimator, outperforming magnitude-based baselines on math reasoning benchmarks.
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing cs.CL · 2026-04-21 · unverdicted · none · ref 7
DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction cs.LG · 2026-03-10 · unverdicted · none · ref 23
HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.

The truth is in there: Improving reasoning in language models with layer-selective rank reduction.arXiv preprint arXiv:2312.13558

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer