OLM o: Accelerating the science of language models

Dirk Groeneveld, Iz Beltagy, Evan Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord + 2 more · 2024 · Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) · DOI 10.18653/v1/2024.acl-long.841

7 Pith papers cite this work, alongside 57 external citations. Polarity classification is still indexing.

7 Pith papers citing it

57 external citations · Crossref

open at publisher browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Disentangling MLP Neuron Weights in Vocabulary Space

cs.CL · 2026-04-07 · unverdicted · novelty 8.0

ROTATE disentangles MLP neurons into faithful vocabulary channels by optimizing weight rotations to maximize vocabulary-space kurtosis, outperforming activation-based baselines for neuron descriptions.

BOOKMARKS: Efficient Active Storyline Memory for Role-playing

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

cs.CL · 2026-05-13 · conditional · novelty 6.0

OP-Mix is an on-policy data mixing method that uses low-rank adapter interpolation to find near-optimal data mixtures throughout language model training with reduced compute.

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

cs.DC · 2026-04-29 · unverdicted · novelty 6.0

COPUS co-adapts batch size and parallelism during LLM training via goodput to deliver 3.9-8% average faster convergence than fixing one while tuning the other.

How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models

cs.CL · 2026-04-17 · unverdicted · novelty 6.0

LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

cs.LG · 2025-02-17 · unverdicted · novelty 6.0

Pretraining data determines loss-to-loss scaling laws in LLMs, while model size, optimization, tokenizer, and architecture have limited impact.

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

cs.CL · 2025-06-02 · unverdicted · novelty 5.0

Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.

citing papers explorer

Showing 7 of 7 citing papers.

Disentangling MLP Neuron Weights in Vocabulary Space cs.CL · 2026-04-07 · unverdicted · none · ref 1
ROTATE disentangles MLP neurons into faithful vocabulary channels by optimizing weight rotations to maximize vocabulary-space kurtosis, outperforming activation-based baselines for neuron descriptions.
BOOKMARKS: Efficient Active Storyline Memory for Role-playing cs.CL · 2026-05-13 · unverdicted · none · ref 54
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time cs.CL · 2026-05-13 · conditional · none · ref 32
OP-Mix is an on-policy data mixing method that uses low-rank adapter interpolation to find near-optimal data mixtures throughout language model training with reduced compute.
COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training cs.DC · 2026-04-29 · unverdicted · none · ref 12
COPUS co-adapts batch size and parallelism during LLM training via goodput to deliver 3.9-8% average faster convergence than fixing one while tuning the other.
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models cs.CL · 2026-04-17 · unverdicted · none · ref 14
LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.
LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws cs.LG · 2025-02-17 · unverdicted · none · ref 17
Pretraining data determines loss-to-loss scaling laws in LLMs, while model size, optimization, tokenizer, and architecture have limited impact.
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models cs.CL · 2025-06-02 · unverdicted · none · ref 15
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.

OLM o: Accelerating the science of language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer