super hub Mixed citations

Gemma 2: Improving Open Language Models at a Practical Size

Cassidy Hardin, Gemma Team: Morgane Riviere, Pier Giuseppe Sessa, Shreya Pathak, Surya Bhupatiraju · 2024 · cs.CL · arXiv 2408.00118

Mixed citation behavior. Most common role is background (64%).

338 Pith papers citing it

Background 64% of classified citations

open full Pith review browse 338 citing papers more from Cassidy Hardin arXiv PDF

abstract

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 23 method 6 baseline 2 dataset 1 other 1

citation-polarity summary

background 21 use method 6 unclear 3 baseline 2 use dataset 1

claims ledger

abstract In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer compe

authors

Cassidy Hardin Gemma Team: Morgane Riviere L\'eonard Hussenot Pier Giuseppe Sessa Shreya Pathak Surya Bhupatiraju

co-cited works

representative citing papers

Do Activation Monitors Survive Model Updates? Benchmarking, Predicting, and Repairing Activation-Monitor Staleness

cs.LG · 2026-06-14 · unverdicted · novelty 8.0

Fine-tuning updates frequently stale activation monitors for language model safety while quantization does not, with degradation predictable and repairable via label-free realignment.

Masked Generative Transformer Is What You Need for Image Editing

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EditMGT applies masked generative transformers with attention consolidation and region-hold sampling to deliver state-of-the-art localized image editing at 6x the speed of diffusion methods.

Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims

cs.CR · 2026-05-11 · unverdicted · novelty 8.0

Acceptance Cards is a new four-diagnostic standard for safe fine-tuning defense claims that requires statistical reliability, fresh semantic generalization, mechanism alignment, and cross-task transfer; under this protocol SafeLoRA fails the full-card pass on Gemma-2-2B-it.

SLAM: Structural Linguistic Activation Marking for Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 8.0 · 2 refs

SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

DRIFTLENS: Measuring Memory-Induced Reasoning Drift in Personalized Language Models

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

DRIFTLENS quantifies memory-induced reasoning drift in personalized LLMs, finding medium-to-large effects across four models and ten user attributes that post-training only partly reduces.

Conditional Co-Ablation: Recovering Self-Repair Backups in Transformer Circuits

cs.LG · 2026-07-02 · unverdicted · novelty 7.0

Conditional Co-Ablation recovers self-repair backup heads in transformers by scoring conditional ablation growth, raising ROC-AUC from 0.33 to 0.91 on the IOI circuit and transferring to induction across models.

FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

FRAME adds a learnable fractional-Fourier order per expert in a MoE-LoRA setup so that low-rank updates are placed in the domain where they are most compact, yielding gains over fixed-domain baselines on LLaMA-3.1-8B and Qwen2.5-7B.

Probing Memorization of Tabular In-Context Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.

Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise

cs.LG · 2026-06-28 · unverdicted · novelty 7.0

Fixed-clock optimizer memory turns equal-multiset data shuffle order into an O(η) source of fine-tuning noise, larger than the O(η²) effect in memoryless cases, with a fit-free sizing method derived.

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

NLL-guided layer selection identifies 1/4 of layers for full attention in hybrid models, matching periodic 1/2-FA baseline accuracy on LongMemEval with Qwen3-4B while halving the full-attention compute budget.

Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?

cs.RO · 2026-06-26 · accept · novelty 7.0

VLA language backbones show high redundancy on manipulation benchmarks, with half the LLM blocks removable and even two blocks sufficient to recover baseline performance after fine-tuning, unlike vision and action pathways.

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

cs.LG · 2026-06-24 · unverdicted · novelty 7.0

During pretraining, language models exhibit natural ungrokking where learned rules are forgotten based on their support frequency in the corpus, with asymmetric editability of rule survival.

FinRED: An Expert-Guided Benchmark Generation and Evaluation Framework for Financial LLM Red-Teaming

cs.CR · 2026-06-18 · unverdicted · novelty 7.0

FinRED creates an expert-validated benchmark and rubric for financial LLM safety that maps regulatory standards to specific threats and reduces critical false negatives in evaluation from 28 to 12.

Comparing Linear Probes with Mahalanobis Cosine Similarity

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

For balanced Gaussian class projections, OOD AUROC is a linear function of MCS to the reference probe because both are sigmoid-shaped functions of the probe SNR on test data.

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

The normalized inverse-scale direction of LayerNorm's affine parameters is an exact algebraic kernel of the post-final-norm centred activation covariance for any input distribution in LayerNorm transformers.

MuseVLA: An Adaptive Multimodal Sensing Vision-Language-Action Model for Robotic Manipulation

cs.RO · 2026-06-16 · unverdicted · novelty 7.0

MuseVLA adds on-demand sensor selection via tokens and converts readings into grounded sensor images for multimodal fusion, reporting 80.6% average success on real-robot dexterous tasks that need non-visual sensing.

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

cs.LG · 2026-06-14 · unverdicted · novelty 7.0

KV caches function as notebooks of prefilled conclusions, enabling field-level edits that recover decisions (especially with CoT) and position-portable skill composition with near-identical outputs at O(L) cost.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

cs.AI · 2026-06-12 · unverdicted · novelty 7.0

Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.

AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

AfriSUD supplies new SUD-annotated dependency treebanks for nine Sub-Saharan African languages and demonstrates that existing models exhibit clear limitations on their syntax.

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

Doc-to-Atom decomposes documents into composable micro-LoRA adapters selected by a query router for efficient long-context QA.

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

BenSyc is the first benchmark for conversational sycophancy in Bengali, with top LLMs achieving only 61.8 Macro-F1 on binary detection and 61.7 on five-class classification while often generating overly validating responses.

SurgiQ: A Large-Scale Multi-Domain Benchmark for Evaluating Surgical Understanding in Large Language Models

cs.CL · 2026-06-06 · unverdicted · novelty 7.0

SurgiQ is a new 13k-question surgical benchmark showing general-purpose LLMs reach 68.1% accuracy while most biomedical models lag and smaller models stay near random baseline.

citing papers explorer

Showing 50 of 98 citing papers after filters.

Do Activation Monitors Survive Model Updates? Benchmarking, Predicting, and Repairing Activation-Monitor Staleness cs.LG · 2026-06-14 · unverdicted · none · ref 100 · internal anchor
Fine-tuning updates frequently stale activation monitors for language model safety while quantization does not, with degradation predictable and repairable via label-free realignment.
Conditional Co-Ablation: Recovering Self-Repair Backups in Transformer Circuits cs.LG · 2026-07-02 · unverdicted · none · ref 38 · internal anchor
Conditional Co-Ablation recovers self-repair backup heads in transformers by scoring conditional ablation growth, raising ROC-AUC from 0.33 to 0.91 on the IOI circuit and transferring to induction across models.
FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts cs.LG · 2026-06-30 · unverdicted · none · ref 62 · internal anchor
FRAME adds a learnable fractional-Fourier order per expert in a MoE-LoRA setup so that low-rank updates are placed in the domain where they are most compact, yielding gains over fixed-domain baselines on LLaMA-3.1-8B and Qwen2.5-7B.
Probing Memorization of Tabular In-Context Learning cs.LG · 2026-06-30 · unverdicted · none · ref 86 · internal anchor
A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.
Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise cs.LG · 2026-06-28 · unverdicted · none · ref 27 · internal anchor
Fixed-clock optimizer memory turns equal-multiset data shuffle order into an O(η) source of fine-tuning noise, larger than the O(η²) effect in memoryless cases, with a fit-free sizing method derived.
Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining cs.LG · 2026-06-24 · unverdicted · none · ref 11 · internal anchor
During pretraining, language models exhibit natural ungrokking where learned rules are forgotten based on their support frequency in the corpus, with asymmetric editability of rule survival.
Comparing Linear Probes with Mahalanobis Cosine Similarity cs.LG · 2026-06-17 · unverdicted · none · ref 19 · internal anchor
For balanced Gaussian class projections, OOD AUROC is a linear function of MCS to the reference probe because both are sigmoid-shaped functions of the probe SNR on test data.
Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale cs.LG · 2026-06-17 · unverdicted · none · ref 42 · internal anchor
The normalized inverse-scale direction of LayerNorm's affine parameters is an exact algebraic kernel of the post-final-norm centred activation covariance for any input distribution in LayerNorm transformers.
Models Take Notes at Prefill: KV Cache Can Be Editable and Composable cs.LG · 2026-06-14 · unverdicted · none · ref 6 · internal anchor
KV caches function as notebooks of prefilled conclusions, enabling field-level edits that recover decisions (especially with CoT) and position-portable skill composition with near-identical outputs at O(L) cost.
Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving cs.LG · 2026-06-04 · unverdicted · none · ref 36 · internal anchor
Tangram makes non-uniform KV cache compression practical for LLM serving with deterministic budget allocation, head group paging, and ahead-of-time load balancing, achieving up to 2.6x throughput gains.
Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models cs.LG · 2026-06-01 · unverdicted · none · ref 20 · internal anchor
Defines representational capacity as the upper bound on distinguishable near-orthogonal directions in transformer latent spaces, derived from embedding similarity distributions and an adjusted Johnson-Lindenstrauss formula dependent on the k/d ratio.
Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention cs.LG · 2026-06-01 · conditional · none · ref 10 · internal anchor
Fixed block causal masks create reachability boundaries where representations depend only on block prefixes, formalized via dependency sets and phase-conditioned coverage functions, with a parameter-free boundary bridge repair.
ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions cs.LG · 2026-05-27 · unverdicted · none · ref 2 · internal anchor
ReSAEs improve multi-layer SAE interventions on Pythia-1.4B and Gemma-2-9B by training later-layer dictionaries on residuals after affine mapping, recovering more cross-entropy loss despite lower raw variance reconstruction.
GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving cs.LG · 2026-05-21 · unverdicted · none · ref 20 · internal anchor
GraphFlow uses a unified wGraph to dynamically instantiate workflows and manage KV caches for LLM agents, reporting 4.95 pp average gains and 4x memory reduction on five benchmarks.
Tensor Cache: Eviction-conditioned Associative Memory for Transformers cs.LG · 2026-05-21 · unverdicted · none · ref 15 · internal anchor
Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models cs.LG · 2026-05-20 · unverdicted · none · ref 27 · internal anchor
In 1-3B instruction-tuned LMs on GSM8K, arithmetic CoT readout is dominated by positional copying of the trailing number before the answer delimiter, accounting for 54-92 percentage points of accuracy.
Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding cs.LG · 2026-05-18 · unverdicted · none · ref 46 · internal anchor
Chronicle is the first model jointly pretrained from scratch on text and time series in a unified transformer that matches a comparable language model on NLU tasks and sets new bars for time series classification and multimodal forecasting.
PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction cs.LG · 2026-05-07 · unverdicted · none · ref 7 · 2 links · internal anchor
PLOT localizes causal variables in neural networks by fitting optimal transport couplings between abstract and neural intervention effect geometries, enabling fast handles or guided search.
Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA cs.LG · 2026-05-07 · unverdicted · none · ref 18 · internal anchor
GLoRA replaces raw factor averaging with gauge-aware aggregation in a consensus subspace estimated from client projectors, enabling consistent low-rank federated LoRA under heterogeneity.
Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression cs.LG · 2026-04-30 · unverdicted · none · ref 64 · internal anchor
Auto-FlexSwitch achieves efficient dynamic model merging by decomposing task vectors into sparse masks, signs, and scalars, then making the compression learnable via gating and adaptive bit selection with KNN-based retrieval.
How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models cs.LG · 2026-04-22 · unverdicted · none · ref 37 · internal anchor
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders cs.LG · 2026-04-21 · unverdicted · none · ref 53 · internal anchor
Uncertainty and correctness in LLMs are encoded by distinct feature populations, with suppression of confounded features improving accuracy and reducing entropy.
MetaSAEs: Joint Training with a Decomposability Penalty Produces More Atomic Sparse Autoencoder Latents cs.LG · 2026-04-03 · conditional · none · ref 8 · internal anchor
Joint training of a primary SAE with a meta SAE that applies a decomposability penalty on decoder directions produces more atomic latents, shown by 7.5% lower mean absolute phi and 7.6% higher fuzzing scores on GPT-2.
Activation Steering with a Feedback Controller cs.LG · 2025-10-05 · unverdicted · none · ref 7 · internal anchor
Popular LLM activation steering methods are shown to act as proportional controllers; a PID steering framework is proposed that improves robustness and outperforms baselines in experiments across model families.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 156 · internal anchor
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models cs.LG · 2024-10-07 · accept · none · ref 91 · internal anchor
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
Evidence for feature-specific error correction in LLMs cs.LG · 2026-06-23 · unverdicted · none · ref 13 · internal anchor
Perturbation experiments across six LLMs show activation robustness follows L^p norm with p>2 for feature directions (contrastive, MELBO, SAE) but p≈2 for random/PCA controls, indicating feature-specific error correction.
From Sparse Features to Trustworthy Proxies: Certifying SAE-Based Interpretability cs.LG · 2026-06-16 · unverdicted · none · ref 11 · internal anchor
Derives an upper bound on frozen LM expected risk from proxy risk, SAE reconstruction gap, concept-pool mismatch and sparse complexity, with non-vacuous bounds observed on GPT-2, Gemma-2B and Llama-3-8B.
Beyond representational alignment with brain-guided language models for robust reasoning cs.LG · 2026-06-10 · unverdicted · none · ref 36 · internal anchor
Task-evoked brain signals enhance LLM reasoning performance via representation steering at inference and fine-tuning, yielding up to 13 percent accuracy gains orthogonal to language supervision.
When Behavioral Safety Evaluation Fails: A Representation-Level Perspective cs.LG · 2026-06-06 · unverdicted · none · ref 25 · internal anchor
Behavioral safety metrics for LLMs are insufficient because models can maintain safe outputs while remaining vulnerable to latent-space interventions, as shown via dissociated models and the new Latent Vulnerability Score.
When Attribution Patching Lies: Diagnosis and a Second-Order Correction cs.LG · 2026-06-05 · unverdicted · none · ref 41 · internal anchor
Dominant error in attribution patching arises from downstream non-linearities; a single HVP correction removes the leading error term and matches Integrated Gradients accuracy at lower cost across 124M-9B models.
Steering Vectors are an Adversarial Attack Surface cs.LG · 2026-06-04 · unverdicted · none · ref 2 · internal anchor
Poisoning 4-6% of tokens in activation steering datasets produces vectors that jailbreak LLMs with 20-55% attack success rate while preserving benign steering effects.
A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners cs.LG · 2026-06-02 · unverdicted · none · ref 23 · internal anchor
Supervised fine-tuning lets LLMs linearly encode action validity and state predicates, with broader state-space coverage during training improving world-model recovery.
CANARY: Zero-Label Detection of Fine-Tuning Contamination in Language Models cs.LG · 2026-06-01 · unverdicted · none · ref 6 · internal anchor
CANARY detects 1% fine-tuning contamination with AUROC 1.000 using SAE-filtered hidden states, 7.5x below output-level detection thresholds, with zero false positives on benign tuning.
Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models cs.LG · 2026-05-31 · unverdicted · none · ref 42 · internal anchor
UniTok tokenizes time series for an off-the-shelf LLM foundation model that unifies forecasting, generation, and classification through next-token prediction and training-free inference.
When and How Long? The Readout-Mediator Angle in Temporal Reasoning cs.LG · 2026-05-27 · unverdicted · none · ref 1 · internal anchor
Linear probes recover day-of-year from LM activations for temporal reasoning but are orthogonal to the model's causal 4D subspace identified by DAS, with the angle matching the Haar-uniform random null, replicated across scales and families.
MobileMoE: Scaling On-Device Mixture of Experts cs.LG · 2026-05-26 · unverdicted · none · ref 56 · internal anchor
MobileMoE introduces on-device MoE LLMs that match dense models with 2-4x fewer FLOPs and provide efficient smartphone inference.
General Preference Reinforcement Learning cs.LG · 2026-05-18 · unverdicted · none · ref 52 · 3 links · internal anchor
GPRL carries a k-dimensional skew-symmetric preference structure into policy updates with per-dimension advantages and a drift monitor, yielding 56.51% length-controlled win rate on AlpacaEval 2.0 from Llama-3-8B-Instruct while outperforming SimPO and SPPO on other benchmarks.
Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE) cs.LG · 2026-05-18 · unverdicted · none · ref 31 · 2 links · internal anchor
Aligned training reparameterizes SAEs to enforce unit alignment between encoder and decoder directions, yielding Pareto gains on SAEBench while removing dead features and improving stability.
Are Sparse Autoencoder Benchmarks Reliable? cs.LG · 2026-05-18 · unverdicted · none · ref 31 · internal anchor
An audit of SAEBench reveals that Targeted Probe Perturbation and Spurious Correlation Removal metrics fail reliability tests and should not be used to evaluate sparse autoencoders.
Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry cs.LG · 2026-05-18 · unverdicted · none · ref 23 · internal anchor
Geometry-Lite decomposes LLM safety detection into layer-wise margin geometries and finds that persistent boundary positions, not layer-to-layer drift, drive most detection performance across nine models and seven benchmarks.
Scale Determines Whether Language Models Organize Representation Geometry for Prediction cs.LG · 2026-05-16 · unverdicted · none · ref 3 · internal anchor
Representation geometry in language models aligns with the unembedding readout subspace in a scale-dependent manner, preserved throughout training in large models but progressively lost in late layers of small models despite continued loss improvement.
Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility cs.LG · 2026-05-13 · unverdicted · none · ref 62 · internal anchor
SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.
Teacher-Guided Policy Optimization for On-Policy Reasoning Distillation under Large Policy Divergence cs.LG · 2026-05-13 · unverdicted · none · ref 15 · internal anchor
TGPO improves on-policy reasoning distillation in LLMs by using teacher-guided token generation conditioned on student contexts together with trajectory rewards, outperforming RKL-based methods under large policy divergence.
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy cs.LG · 2026-05-13 · unverdicted · none · ref 13 · 2 links · internal anchor
Base LLMs show multi-agent yield to peer pressure at rates equal to or higher than aligned models, localized by activation patching to mid-layers where attention dominates, with one dissenter cutting yield by 54-73 points while prompt defenses fail on variants.
Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation cs.LG · 2026-05-12 · unverdicted · none · ref 21 · internal anchor
RESD turns failure trajectories into token-level supervision via retrospective reflections and a persistent global playbook, enabling faster improvement than standard self-distillation or GRPO with only one rollout per prompt.
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs cs.LG · 2026-05-12 · unverdicted · none · ref 63 · internal anchor
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
Leveraging RAG for Training-Free Alignment of LLMs cs.LG · 2026-05-11 · unverdicted · none · ref 22 · internal anchor
RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with offline methods across five LLMs.
Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration cs.LG · 2026-05-09 · conditional · none · ref 16 · 2 links · internal anchor
ZO-MOPI accelerates zeroth-order LLM fine-tuning by applying partial spectral orthogonalization from power iteration inside a momentum-projected subspace to reduce variance and exploit dominant directions.
Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure cs.LG · 2026-05-09 · unverdicted · none · ref 11 · internal anchor
Causal dimensionality kappa of transformer layers grows sub-linearly with SAE width, remains invariant to model scale, and stays constant across depth while attribution thresholds drop sharply.

Gemma 2: Improving Open Language Models at a Practical Size

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer