hub Mixed citations

2 OLMo 2 Furious

Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora · 2024 · cs.CL · arXiv 2501.00656

Mixed citation behavior. Most common role is background (46%).

99 Pith papers citing it

Background 46% of classified citations

open full Pith review browse 99 citing papers arXiv PDF

abstract

We present OLMo 2, the next generation of our fully open language models. OLMo 2 includes a family of dense autoregressive language models at 7B, 13B and 32B scales with fully released artifacts -- model weights, full training data, training code and recipes, training logs and thousands of intermediate checkpoints. In this work, we describe our modified model architecture and training recipe, focusing on techniques for achieving better training stability and improved per-token efficiency. Our updated pretraining data mixture introduces a new, specialized data mix called Dolmino Mix 1124, which significantly improves model capabilities across many downstream task benchmarks when introduced via late-stage curriculum training (i.e. specialized data during the annealing phase of pretraining). Finally, we incorporate best practices from T\"ulu 3 to develop OLMo 2-Instruct, focusing on permissive data and extending our final-stage reinforcement learning with verifiable rewards (RLVR). Our OLMo 2 base models sit at the Pareto frontier of performance to training compute, often matching or outperforming open-weight only models like Llama 3.1, Qwen 2.5, and Gemma 2 while using fewer FLOPs and with fully transparent training data, code, and recipe. Our fully open OLMo 2-Instruct models are competitive with open-weight only models of comparable size and even some proprietary models like GPT-3.5 Turbo and GPT 4o Mini.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 method 3 other 1

citation-polarity summary

background 6 unclear 3 use method 3 support 1

claims ledger

abstract We present OLMo 2, the next generation of our fully open language models. OLMo 2 includes a family of dense autoregressive language models at 7B, 13B and 32B scales with fully released artifacts -- model weights, full training data, training code and recipes, training logs and thousands of intermediate checkpoints. In this work, we describe our modified model architecture and training recipe, focusing on techniques for achieving better training stability and improved per-token efficiency. Our updated pretraining data mixture introduces a new, specialized data mix called Dolmino Mix 1124, which

co-cited works

representative citing papers

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

cs.CL · 2026-07-02 · conditional · novelty 8.0

LACUNA is a new testbed that injects PII into predefined model parameters to benchmark the localization precision of LLM unlearning methods, revealing that SOTA approaches are imprecise despite strong output performance.

Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback

cs.LG · 2026-06-29 · unverdicted · novelty 8.0

Noisy expert imitation learning requires exponential samples for offline methods but polynomial for a variant of on-policy distillation under a noise condition.

DataComp-VLM: Improved Open Datasets for Vision-Language Models

cs.CV · 2026-06-26 · conditional · novelty 8.0 · 2 refs

DataComp-VLM benchmark shows instruction-heavy data mixing outperforms filtering for VLM training, with DCVLM-Baseline achieving 63.6% on 33 tasks for 8B models (+5.4pp over FineVision).

Scaling limit of the Random Language Model

cond-mat.dis-nn · 2026-06-26 · unverdicted · novelty 8.0

In the scaling limit of the Random Language Model, a condensation transition occurs at x_c=1/8 with explicit scaling laws for rule usage and entropy derived from large-deviation principles and a mapping to Random Energy Models.

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

cs.LG · 2026-06-01 · unverdicted · novelty 8.0

KV cache quantization silently erodes LLM safety alignment via vulnerable low-dimensional subspaces, diagnosed by Per-Channel Reduction into three failure modes and mitigated training-free with up to 97% recovery.

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

cs.AI · 2026-05-15 · unverdicted · novelty 8.0 · 2 refs

Presents the first fully open pipeline for clinical LLMs by unifying eight public QA datasets with three clinician-vetted synthetic extensions and applying it to five base models to achieve benchmark gains while maintaining auditability.

Demystifying the Silence of Correctness Bugs in PyTorch Compiler

cs.SE · 2026-04-09 · conditional · novelty 8.0

First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.

Spurious Rewards: Rethinking Training Signals in RLVR

cs.AI · 2025-06-12 · accept · novelty 8.0

Spurious rewards in RLVR can produce large gains in mathematical reasoning for certain language models via GRPO's clipping bias amplifying pretraining behaviors like code reasoning.

Purified OPSD: On-Policy Self-Distillation Without Losing How to Think

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

Purified OPSD subtracts a reference-only teacher's signal from standard OPSD supervision and applies PMI to create a cleaner distillation target, yielding gains on long-CoT models while preserving epistemic behavior.

Conditional Co-Ablation: Recovering Self-Repair Backups in Transformer Circuits

cs.LG · 2026-07-02 · unverdicted · novelty 7.0

Conditional Co-Ablation recovers self-repair backup heads in transformers by scoring conditional ablation growth, raising ROC-AUC from 0.33 to 0.91 on the IOI circuit and transferring to induction across models.

Phase structure of the Random Language Model

cond-mat.dis-nn · 2026-06-26 · unverdicted · novelty 7.0

The Random Language Model exhibits a hierarchy of phase transitions in the double-scaling limit ε̃_d → 0, N → ∞ at fixed x = ε̃_d log N, with symbol correlations, non-uniform marginals, and glassy freezing, yielding scaling laws consistent with large language models.

Training for the Model You Return: Improving Optimization for Iterate-Averaged Language Models

cs.LG · 2026-06-23 · unverdicted · novelty 7.0

PACE is a clipped per-coordinate controller added to AdamW that improves the limiting error of the returned iterate average in both quadratic analysis and LM experiments.

Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

ModSleuth reconstructs dependency graphs from public artifacts for four LLM releases, recovering 1,060 source-verified dependencies and exposing license issues, train-evaluation coupling, and documentation gaps.

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

cs.CL · 2026-06-09 · unverdicted · novelty 7.0

Fragility, the activation noise level causing probe accuracy collapse, reveals evolving lexical-to-compositional moral encoding, layer robustness gradients, and fine-tuning differences invisible to saturated probing accuracy.

LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

LoopMoE is a looped MoE language model that outperforms matched vanilla MoE on 8 of 9 downstream benchmarks at 3B scale and continues to outperform at 9B scale under strictly controlled budgets.

Spectral Scaling Laws of Muon

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

Muon momentum matrices show layer-dependent power-law scaling of stabilized singular value quantiles with model size from 77M to 2.8B parameters.

Fully Automated Identification of Lexical Alignment and Preference-Stage Shifts in Large Language Models

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Introduces Lexical Alignment Score and Triangulated Preference Shift metrics to automatically identify lexical overuse in LLMs and attribute portions to preference learning stages via windowed prevalence on PubMed data.

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models

cs.CL · 2026-05-31 · unverdicted · novelty 7.0

MENTIS applies layerwise covariance torsion (T1), spectral torsion (T2), and ERA localization to paired IT/PA 7-8B models, finding selective larger shifts for normative concepts, negative correlation with entropy, and mid-to-late layer peaks.

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

Representational convergence across 16 LLMs on 800 reasoning problems is stronger for failed tasks and pre-decision stages but shows minimal causal influence on predictions, pointing to shared processing constraints over shared reasoning.

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

cs.CR · 2026-05-17 · unverdicted · novelty 7.0

Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.

How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.

From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation

cs.LG · 2026-05-12 · conditional · novelty 7.0

Self-distillation token rewards measure input-response-feedback pointwise mutual information, and CREDIT extracts the input-specific component with contrastive baselines to improve LLM reasoning performance.

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

RL on binary rewards boosts LLM factual recall by ~27% relative across models by redistributing probability mass to latent correct answers rather than acquiring new knowledge.

Implicit Representations of Grammaticality in Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

Linear probes on LM hidden states detect grammaticality better than string probabilities, generalize to human benchmarks and other languages, and correlate weakly with likelihood.

citing papers explorer

Showing 50 of 99 citing papers.

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning cs.CL · 2026-07-02 · conditional · none · ref 16 · internal anchor
LACUNA is a new testbed that injects PII into predefined model parameters to benchmark the localization precision of LLM unlearning methods, revealing that SOTA approaches are imprecise despite strong output performance.
Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback cs.LG · 2026-06-29 · unverdicted · none · ref 43 · internal anchor
Noisy expert imitation learning requires exponential samples for offline methods but polynomial for a variant of on-policy distillation under a noise condition.
DataComp-VLM: Improved Open Datasets for Vision-Language Models cs.CV · 2026-06-26 · conditional · none · ref 228 · 2 links · internal anchor
DataComp-VLM benchmark shows instruction-heavy data mixing outperforms filtering for VLM training, with DCVLM-Baseline achieving 63.6% on 33 tasks for 8B models (+5.4pp over FineVision).
Scaling limit of the Random Language Model cond-mat.dis-nn · 2026-06-26 · unverdicted · none · ref 43 · internal anchor
In the scaling limit of the Random Language Model, a condensation transition occurs at x_c=1/8 with explicit scaling laws for rule usage and entropy derived from large-deviation principles and a mapping to Random Energy Models.
Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation cs.LG · 2026-06-01 · unverdicted · none · ref 73 · internal anchor
KV cache quantization silently erodes LLM safety alignment via vulnerable low-dimensional subspaces, diagnosed by Per-Channel Reduction into three failure modes and mitigated training-free with up to 97% recovery.
Fully Open Meditron: An Auditable Pipeline for Clinical LLMs cs.AI · 2026-05-15 · unverdicted · none · ref 35 · 2 links · internal anchor
Presents the first fully open pipeline for clinical LLMs by unifying eight public QA datasets with three clinician-vetted synthetic extensions and applying it to five base models to achieve benchmark gains while maintaining auditability.
Demystifying the Silence of Correctness Bugs in PyTorch Compiler cs.SE · 2026-04-09 · conditional · none · ref 33 · internal anchor
First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
Spurious Rewards: Rethinking Training Signals in RLVR cs.AI · 2025-06-12 · accept · none · ref 2 · internal anchor
Spurious rewards in RLVR can produce large gains in mathematical reasoning for certain language models via GRPO's clipping bias amplifying pretraining behaviors like code reasoning.
Purified OPSD: On-Policy Self-Distillation Without Losing How to Think cs.AI · 2026-07-02 · unverdicted · none · ref 14 · internal anchor
Purified OPSD subtracts a reference-only teacher's signal from standard OPSD supervision and applies PMI to create a cleaner distillation target, yielding gains on long-CoT models while preserving epistemic behavior.
Conditional Co-Ablation: Recovering Self-Repair Backups in Transformer Circuits cs.LG · 2026-07-02 · unverdicted · none · ref 39 · internal anchor
Conditional Co-Ablation recovers self-repair backup heads in transformers by scoring conditional ablation growth, raising ROC-AUC from 0.33 to 0.91 on the IOI circuit and transferring to induction across models.
Phase structure of the Random Language Model cond-mat.dis-nn · 2026-06-26 · unverdicted · none · ref 38 · internal anchor
The Random Language Model exhibits a hierarchy of phase transitions in the double-scaling limit ε̃_d → 0, N → ∞ at fixed x = ε̃_d log N, with symbol correlations, non-uniform marginals, and glassy freezing, yielding scaling laws consistent with large language models.
Training for the Model You Return: Improving Optimization for Iterate-Averaged Language Models cs.LG · 2026-06-23 · unverdicted · none · ref 27 · internal anchor
PACE is a clipped per-coordinate controller added to AdamW that improves the limiting error of the returned iterate average in both quadratic analysis and LM experiments.
Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs cs.CL · 2026-06-10 · unverdicted · none · ref 46 · internal anchor
ModSleuth reconstructs dependency graphs from public artifacts for four LLM releases, recovering 1,060 source-verified dependencies and exposing license issues, train-evaluation coupling, and documentation gaps.
When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis cs.CL · 2026-06-09 · unverdicted · none · ref 33 · internal anchor
Fragility, the activation noise level causing probe accuracy collapse, reveals evolving lexical-to-compositional moral encoding, layer robustness gradients, and fine-tuning differences invisible to saturated probing accuracy.
LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling cs.LG · 2026-06-03 · unverdicted · none · ref 40 · internal anchor
LoopMoE is a looped MoE language model that outperforms matched vanilla MoE on 8 of 9 downstream benchmarks at 3B scale and continues to outperform at 9B scale under strictly controlled budgets.
Spectral Scaling Laws of Muon cs.LG · 2026-06-02 · unverdicted · none · ref 12 · internal anchor
Muon momentum matrices show layer-dependent power-law scaling of stabilized singular value quantiles with model size from 77M to 2.8B parameters.
Fully Automated Identification of Lexical Alignment and Preference-Stage Shifts in Large Language Models cs.CL · 2026-06-02 · unverdicted · none · ref 21 · internal anchor
Introduces Lexical Alignment Score and Triangulated Preference Shift metrics to automatically identify lexical overuse in LLMs and attribute portions to preference learning stages via windowed prevalence on PubMed data.
MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models cs.CL · 2026-05-31 · unverdicted · none · ref 12 · internal anchor
MENTIS applies layerwise covariance torsion (T1), spectral torsion (T2), and ERA localization to paired IT/PA 7-8B models, finding selective larger shifts for normative concepts, negative correlation with entropy, and mid-to-late layer peaks.
Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning cs.CL · 2026-05-22 · unverdicted · none · ref 15 · internal anchor
Representational convergence across 16 LLMs on 800 reasoning problems is stronger for failed tasks and pre-decision stages but shows minimal causal influence on predictions, pointing to shared processing constraints over shared reasoning.
Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback cs.CR · 2026-05-17 · unverdicted · none · ref 25 · internal anchor
Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.
How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization cs.LG · 2026-05-13 · unverdicted · none · ref 44 · internal anchor
The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.
From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation cs.LG · 2026-05-12 · conditional · none · ref 26 · internal anchor
Self-distillation token rewards measure input-response-feedback pointwise mutual information, and CREDIT extracts the input-specific component with contrastive baselines to improve LLM reasoning performance.
Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs cs.CL · 2026-05-08 · unverdicted · none · ref 28 · internal anchor
RL on binary rewards boosts LLM factual recall by ~27% relative across models by redistributing probability mass to latent correct answers rather than acquiring new knowledge.
Implicit Representations of Grammaticality in Language Models cs.CL · 2026-05-06 · unverdicted · none · ref 16 · internal anchor
Linear probes on LM hidden states detect grammaticality better than string probabilities, generalize to human benchmarks and other languages, and correlate weakly with likelihood.
The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining cs.CY · 2026-05-01 · unverdicted · none · ref 9 · internal anchor
Full development of 7B and 32B Olmo 3 models used 12.3 GWh datacenter energy and emitted 4,251 tCO2eq, with development overheads accounting for 82% of compute and reasoning models costing 17x more to post-train than instruction-tuned ones.
Characterizing the Expressivity of Local Attention in Transformers cs.CL · 2026-05-01 · unverdicted · none · ref 34 · 3 links · internal anchor
Local attention in fixed-precision transformers introduces a second past operator in linear temporal logic, strictly increasing expressivity over global attention alone, with hybrids being most expressive.
Supernodes and Halos: Loss-Critical Hubs in LLM Feed-Forward Layers cs.LG · 2026-04-26 · unverdicted · none · ref 7 · internal anchor
In LLM feed-forward networks, the top 1% of channels per layer carry a median 58.7% of loss sensitivity, forming supernodes whose protection enables effective 50% sparsity pruning with much lower perplexity than baselines.
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training cs.CV · 2026-04-21 · unverdicted · none · ref 15 · internal anchor
EmbodiedMidtrain mid-trains VLMs on curated VLA-aligned data subsets to improve downstream performance on robot manipulation benchmarks.
Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models cs.CL · 2026-04-11 · unverdicted · none · ref 6 · internal anchor
Supervised fine-tuning of LLMs often fails to fully internalize all training instances due to five recurring causes including missing prerequisites and data conflicts, as diagnosed via a new framework across multiple models.
Perceptrons and localization of attention's mean-field landscape cs.LG · 2026-01-29 · unverdicted · none · ref 14 · internal anchor
In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.
MURPHY: Feedback-Aware GRPO with Retrospective Credit Assignment for Multi-Turn Code Generation cs.LG · 2025-11-11 · unverdicted · none · ref 16 · internal anchor
MURPHY improves code generation pass rates by up to 6% through retrospective credit assignment on multi-turn feedback trees using max or mean reward propagation.
Vocab Diet: Reshaping the Vocabulary of LLMs via Vector Arithmetic cs.CL · 2025-10-19 · conditional · none · ref 10 · internal anchor
LLMs can compose surface-form tokens from base embeddings plus learned transformation vectors, freeing 10-40% of vocabulary slots while expanding coverage and preserving downstream performance across five languages.
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training cs.LG · 2025-07-21 · unverdicted · none · ref 24 · internal anchor
An RL agent learns domain re-weighting policies from evaluation feedback to improve balanced performance in continual pre-training of LLMs across source and target domains.
Sampling from Your Language Model One Byte at a Time cs.CL · 2025-06-17 · unverdicted · none · ref 49 · internal anchor
An inference-time technique turns BPE-based LMs into byte- or character-level models, solving the prompt boundary problem while unifying vocabularies across different tokenizers.
Pre-trained Large Language Models Learn Hidden Markov Models In-context cs.LG · 2025-06-08 · unverdicted · none · ref 40 · internal anchor
Pre-trained LLMs learn to predict HMM-generated sequences via in-context learning, approaching theoretical optimum on synthetic HMMs and matching expert models on real animal decision data.
Explaining Sources of Uncertainty in Automated Fact-Checking cs.CL · 2025-05-23 · unverdicted · none · ref 7 · internal anchor
CLUE generates natural language explanations of model uncertainty in fact-checking by unsupervised identification of claim-evidence and inter-evidence conflicts and agreements, followed by prompting and attention steering.
Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature? cs.CL · 2025-02-11 · unverdicted · none · ref 49 · internal anchor
Evaluation of 22 LLMs shows they are more susceptible to spin in medical abstracts than humans but can recognize and mitigate it when prompted.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 157 · internal anchor
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
An Efficient vLLM-Based Inference Pipeline for Unified Audio Understanding and Generation eess.AS · 2026-07-02 · unverdicted · none · ref 32 · internal anchor
Extends vLLM with delay-pattern de-interleaving, multi-stream sampling, and co-scheduled CFG to achieve 80% of non-CFG throughput for unified audio tasks while open-sourcing the pipeline.
The Model Organism Lottery: Model Organism Interpretability Strongly Depends on Training Methodology cs.LG · 2026-07-01 · unverdicted · none · ref 17 · internal anchor
Model organism interpretability depends strongly on training methodology, with integrated training yielding less interpretable MOs than post-hoc SFT or DPO.
SCOPE: Sequential Conformal Probing for Reliable OOD Rejection in LLM Services cs.CL · 2026-06-19 · unverdicted · none · ref 45 · internal anchor
SCOPE selects readable hidden layers, constructs conformal gates with IND calibration, and uses supermartingale e-processes to certify persistent service-boundary evidence, improving rejection over final-layer detectors across multiple LLMs and boundary conditions.
Understanding helpfulness and harmless tension in reward models cs.LG · 2026-06-11 · unverdicted · none · ref 30 · internal anchor
Mixed-objective reward models underperform single-objective ones because shared neurons support one objective while negatively affecting the other, creating alignment tension.
Scaling Participation in Modular AI Systems cs.AI · 2026-06-05 · unverdicted · none · ref 120 · internal anchor
Modular AI systems assembled from contributed small models outperform monolithic LLMs by up to 15.4% on 15 tasks including reasoning and factuality while showing emergent problem-solving and benefits from contributor diversity.
Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws cs.LG · 2026-06-05 · unverdicted · none · ref 25 · internal anchor
MIR improves validation loss in repeated-data pretraining and SoftQ fits data-constrained scaling experiments better than additive laws, equating MIR gains to roughly 1.3 times more unique data.
Validity Threats for Foundation Model Research cs.LG · 2026-06-03 · accept · none · ref 72 · internal anchor
Maps common low-compute research strategies for foundation models onto statistical, internal, external, and construct validity threats via a causal-inference lens.
"I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise cs.CL · 2026-06-01 · unverdicted · none · ref 14 · internal anchor
Decan (D_Ca_n = C × a_n) measures text diversity as progressive conditional surprise from base LM log-probabilities, scoring 0.846 OCA on McDiv benchmark and detecting monotonic diversity drop across base→SFT→DPO→RLVR stages.
Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning cs.CL · 2026-05-29 · unverdicted · none · ref 56 · internal anchor
Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention cs.LG · 2026-05-28 · unverdicted · none · ref 56 · internal anchor
Larger models succeed on rare and complex tasks by reducing gradient interference from common tasks, allowing rare-task features to accumulate, as shown via synthetic task mixtures and OLMo pretraining from 4M to 4B parameters.
Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection cs.LG · 2026-05-27 · unverdicted · none · ref 36 · internal anchor
Activation steering produces synthetic safety-violating data that improves downstream classifiers over prompting on most tested concepts when a harmonic mean of alignment, coherence, and diversity is optimized.
Human-like in-group bias in instruction-tuned language model agents cs.AI · 2026-05-27 · unverdicted · none · ref 9 · internal anchor
Instruction-tuned language model agents exhibit in-group bias, action homophily, and network assortativity in simulations when group labels are salient, accumulating into structural inequality over repeated interactions.

2 OLMo 2 Furious

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer