super hub Mixed citations

Gemma 2: Improving Open Language Models at a Practical Size

Cassidy Hardin, Gemma Team: Morgane Riviere, Pier Giuseppe Sessa, Shreya Pathak, Surya Bhupatiraju · 2024 · cs.CL · arXiv 2408.00118

Mixed citation behavior. Most common role is background (64%).

301 Pith papers citing it

Background 64% of classified citations

open full Pith review browse 301 citing papers more from Cassidy Hardin arXiv PDF

abstract

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 23 method 6 baseline 2 dataset 1 other 1

citation-polarity summary

background 21 use method 6 unclear 3 baseline 2 use dataset 1

claims ledger

abstract In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer compe

authors

Cassidy Hardin Gemma Team: Morgane Riviere L\'eonard Hussenot Pier Giuseppe Sessa Shreya Pathak Surya Bhupatiraju

co-cited works

representative citing papers

Masked Generative Transformer Is What You Need for Image Editing

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EditMGT applies masked generative transformers with attention consolidation and region-hold sampling to deliver state-of-the-art localized image editing at 6x the speed of diffusion methods.

Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims

cs.CR · 2026-05-11 · unverdicted · novelty 8.0

Acceptance Cards is a new four-diagnostic standard for safe fine-tuning defense claims that requires statistical reliability, fresh semantic generalization, mechanism alignment, and cross-task transfer; under this protocol SafeLoRA fails the full-card pass on Gemma-2-2B-it.

SLAM: Structural Linguistic Activation Marking for Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 8.0 · 2 refs

SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

FRAME: Learning the Adaptation Domain with a Mixture of Fractional-Fourier Experts

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

FRAME adds a learnable fractional-Fourier order per expert in a MoE-LoRA setup so that low-rank updates are placed in the domain where they are most compact, yielding gains over fixed-domain baselines on LLaMA-3.1-8B and Qwen2.5-7B.

Probing Memorization of Tabular In-Context Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.

Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise

cs.LG · 2026-06-28 · unverdicted · novelty 7.0

Fixed-clock optimizer memory turns equal-multiset data shuffle order into an O(η) source of fine-tuning noise, larger than the O(η²) effect in memoryless cases, with a fit-free sizing method derived.

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

NLL-guided layer selection identifies 1/4 of layers for full attention in hybrid models, matching periodic 1/2-FA baseline accuracy on LongMemEval with Qwen3-4B while halving the full-attention compute budget.

Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?

cs.RO · 2026-06-26 · accept · novelty 7.0

VLA language backbones show high redundancy on manipulation benchmarks, with half the LLM blocks removable and even two blocks sufficient to recover baseline performance after fine-tuning, unlike vision and action pathways.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

cs.AI · 2026-06-12 · unverdicted · novelty 7.0

Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

BenSyc is the first benchmark for conversational sycophancy in Bengali, with top LLMs achieving only 61.8 Macro-F1 on binary detection and 61.7 on five-class classification while often generating overly validating responses.

SurgiQ: A Large-Scale Multi-Domain Benchmark for Evaluating Surgical Understanding in Large Language Models

cs.CL · 2026-06-06 · unverdicted · novelty 7.0

SurgiQ is a new 13k-question surgical benchmark showing general-purpose LLMs reach 68.1% accuracy while most biomedical models lag and smaller models stay near random baseline.

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

cs.CL · 2026-06-05 · unverdicted · novelty 7.0

UrduMMLU is a new native-source MCQ benchmark for Urdu that reveals top LLMs reach only ~90% accuracy with large gaps on region-specific humanities content.

Interpreting Brain Responses to Language with Sparse Features from Language Models

cs.CL · 2026-06-05 · unverdicted · novelty 7.0

Sparse autoencoder features from LMs plus surprisal predict fMRI language responses, recovering prior interpretations and revealing a people-tuned voxel population while showing frontal areas are surprisal-driven and general features outperform arbitrary ones.

Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

Tangram makes non-uniform KV cache compression practical for LLM serving with deterministic budget allocation, head group paging, and ahead-of-time load balancing, achieving up to 2.6x throughput gains.

Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

A cycle-consistent MT pipeline generates and similarity-weights training data for coreference resolution, producing gains on four low-resource languages and enabling the task where no corpora existed.

RogueMerge: Robust and Unified Attacks against LLM Model Merging

cs.CR · 2026-06-02 · unverdicted · novelty 7.0

RogueMerge is a unified attack method that jointly optimizes task vectors to succeed after merging, using stochastic min-max simulation for unknown merging settings and a Taylor-approximated DRO for prompt generalization on generative LLMs.

Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Defines representational capacity as the upper bound on distinguishable near-orthogonal directions in transformer latent spaces, derived from embedding similarity distributions and an adjusted Johnson-Lindenstrauss formula dependent on the k/d ratio.

Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention

cs.LG · 2026-06-01 · conditional · novelty 7.0

Fixed block causal masks create reachability boundaries where representations depend only on block prefixes, formalized via dependency sets and phase-conditioned coverage functions, with a parameter-free boundary bridge repair.

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

A deferral mechanism using forward-looking simulations reduces false positives in derailment forecasting by selectively waiting when recovery paths appear plausible.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

citing papers explorer

Showing 50 of 112 citing papers after filters.

SLAM: Structural Linguistic Activation Marking for Language Models cs.CL · 2026-05-06 · unverdicted · none · ref 6 · 2 links · internal anchor
SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.
ArgBench: Benchmarking LLMs on Computational Argumentation Tasks cs.CL · 2026-04-19 · unverdicted · none · ref 57 · internal anchor
ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.
LiveBench: A Challenging, Contamination-Limited LLM Benchmark cs.CL · 2024-06-27 · unverdicted · none · ref 37 · internal anchor
LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.
NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation cs.CL · 2026-06-26 · unverdicted · none · ref 6 · internal anchor
NLL-guided layer selection identifies 1/4 of layers for full attention in hybrid models, matching periodic 1/2-FA baseline accuracy on LongMemEval with Qwen3-4B while halving the full-attention compute budget.
BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts cs.CL · 2026-06-08 · unverdicted · none · ref 109 · internal anchor
BenSyc is the first benchmark for conversational sycophancy in Bengali, with top LLMs achieving only 61.8 Macro-F1 on binary detection and 61.7 on five-class classification while often generating overly validating responses.
SurgiQ: A Large-Scale Multi-Domain Benchmark for Evaluating Surgical Understanding in Large Language Models cs.CL · 2026-06-06 · unverdicted · none · ref 75 · internal anchor
SurgiQ is a new 13k-question surgical benchmark showing general-purpose LLMs reach 68.1% accuracy while most biomedical models lag and smaller models stay near random baseline.
UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding cs.CL · 2026-06-05 · unverdicted · none · ref 36 · internal anchor
UrduMMLU is a new native-source MCQ benchmark for Urdu that reveals top LLMs reach only ~90% accuracy with large gaps on region-specific humanities content.
Interpreting Brain Responses to Language with Sparse Features from Language Models cs.CL · 2026-06-05 · unverdicted · none · ref 17 · internal anchor
Sparse autoencoder features from LMs plus surprisal predict fMRI language responses, recovering prior interpretations and revealing a people-tuned voxel population while showing frontal areas are surprisal-driven and general features outperform arbitrary ones.
Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery cs.CL · 2026-06-04 · unverdicted · none · ref 99 · internal anchor
Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.
Multilingual Coreference Resolution via Cycle-Consistent Machine Translation cs.CL · 2026-06-03 · unverdicted · none · ref 10 · internal anchor
A cycle-consistent MT pipeline generates and similarity-weights training data for coreference resolution, producing gains on four low-resource languages and enabling the task where no corpora existed.
Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs cs.CL · 2026-05-29 · unverdicted · none · ref 30 · internal anchor
Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.
Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment cs.CL · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
A deferral mechanism using forward-looking simulations reduces false positives in derailment forecasting by selectively waiting when recovery paths appear plausible.
StakeBench: Evaluating Language Understanding Grounded in Market Commitment cs.CL · 2026-05-25 · unverdicted · none · ref 24 · internal anchor
StakeBench is a new benchmark using market-derived supervision from resolved prediction markets to test LLMs on commitment detection, side identification, action anticipation, and odds projection, revealing partial success on sides but structural failures on higher tasks.
SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models cs.CL · 2026-05-25 · unverdicted · none · ref 9 · internal anchor
SomaliBench finds large English-to-Somali refusal gaps (0.38 to 0.90) across Llama-3.1-8B, Gemma-2-9B, Qwen-2.5-7B, and Aya-23-8B, with many Somali responses being unclear rather than compliant.
Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning cs.CL · 2026-05-22 · unverdicted · none · ref 14 · internal anchor
Representational convergence across 16 LLMs on 800 reasoning problems is stronger for failed tasks and pre-decision stages but shows minimal causal influence on predictions, pointing to shared processing constraints over shared reasoning.
Self-Improving In-Context Learning cs.CL · 2026-05-22 · unverdicted · none · ref 44 · internal anchor
A test-time zeroth-order optimization of prompt embeddings using a bounded self-supervised proxy from demonstration log-probabilities improves ICL accuracy and correlates with gains across tasks.
Implicit Representations of Grammaticality in Language Models cs.CL · 2026-05-06 · unverdicted · none · ref 6 · internal anchor
Linear probes on LM hidden states detect grammaticality better than string probabilities, generalize to human benchmarks and other languages, and correlate weakly with likelihood.
How Language Models Process Negation cs.CL · 2026-05-04 · unverdicted · none · ref 34 · 2 links · internal anchor
LLMs process negation using both attention-based suppression and constructive representation mechanisms (construction dominant), with late-layer attention shortcuts explaining poor accuracy on negation tasks.
Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective cs.CL · 2026-04-25 · conditional · none · ref 59 · 2 links · internal anchor
A controlled formal language task reveals fine-tuning outperforms in-context learning on in-distribution generalization but equals it on out-of-distribution, with ICL showing greater sensitivity to model size and tokenization.
Why are all LLMs Obsessed with Japanese Culture? On the Hidden Cultural and Regional Biases of LLMs cs.CL · 2026-04-23 · unverdicted · none · ref 2 · internal anchor
LLMs exhibit a clear preference for Japanese culture when answering open cultural questions, with this bias emerging after supervised fine-tuning rather than during pre-training.
MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation cs.CL · 2026-04-20 · unverdicted · none · ref 44 · internal anchor
MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation cs.CL · 2026-04-20 · unverdicted · none · ref 53 · internal anchor
LQM introduces a six-level linguistically motivated error taxonomy for MT evaluation and applies it via expert annotation to LLM outputs on a new 3,850-sentence multi-dialect Arabic corpus.
Prune, Interpret, Evaluate: A Cross-Layer Transcoder-Native Framework for Efficient Circuit Discovery via Feature Attribution cs.CL · 2026-04-18 · unverdicted · none · ref 2 · 2 links · internal anchor
PIE prunes CLT features first via FAP and FAP-Synergy to match baseline circuit fidelity at lower feature budgets on IOI and Doc-String tasks, reducing interpretation costs.
DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs cs.CL · 2026-03-20 · unverdicted · none · ref 18 · internal anchor
DeEscalWild supplies 1,500 high-fidelity de-escalation scenarios that let fine-tuned 3B SLMs outperform general-purpose larger models on realism and dialogue metrics.
The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs? cs.CL · 2026-03-11 · unverdicted · none · ref 9 · internal anchor
The Stepwise Informativeness Assumption explains the correlation between LLM entropy dynamics and reasoning correctness by positing that correct traces accumulate answer-relevant information stepwise during generation.
PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses cs.CL · 2026-03-11 · unverdicted · none · ref 27 · internal anchor
PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.
Physical Commonsense Reasoning for Lower-Resourced Languages and Dialects: a Study on Basque cs.CL · 2026-02-16 · conditional · none · ref 12 · internal anchor
BasPhyCo is the first physical commonsense reasoning dataset for Basque and dialects, showing LLMs have limited performance on verifiability tasks especially with dialects.
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability cs.CL · 2026-01-27 · unverdicted · none · ref 16 · internal anchor
Transformer weights at early training stages are closed-form compositions of bigram, token-interchangeability, and context mappings that directly reflect text-corpus statistics and explain the emergence of semantic associations.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing cs.CL · 2024-06-12 · unverdicted · none · ref 140 · internal anchor
Magpie synthesizes 300K high-quality alignment instructions from Llama-3-Instruct via auto-regressive prompting on partial templates, enabling fine-tuned models to match official instruct performance on AlpacaEval, ArenaHard, and WildBench.
CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning cs.CL · 2026-06-30 · unverdicted · none · ref 172 · internal anchor
CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.
The strength of clinical evidence is recoverable from language model representations but not from their stated grades cs.CL · 2026-06-27 · unverdicted · none · ref 32 · internal anchor
Linear probes recover evidence grades from LLM activations (median AUROC 71.8) across 22 models but the models' stated grades perform at chance level and the signal is largely lexical.
ConSA: Controllable Sparsity in Hybrid Attention via Learnable Allocation cs.CL · 2026-06-16 · unverdicted · none · ref 2 · internal anchor
ConSA learns FA/SWA allocation via L0 masks and augmented Lagrangian constraints, outperforming rule-based baselines on 0.6B and 1.7B models with consistent layer patterns.
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning cs.CL · 2026-06-16 · unverdicted · none · ref 42 · internal anchor
The LLM-as-Environment-Engineer framework lets the policy model redesign its own RL environments on the new MAPF-FrozenLake testbed, outperforming larger models and fixed baselines with Qwen3-4B.
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It cs.CL · 2026-06-09 · conditional · none · ref 42 · internal anchor
CoT SFT disrupts long-range routing in hybrid models via changes to W_Q and W_K; QK-Restore restores pre-SFT projections to recover NIAH performance.
Inside the LLM Word Factory cs.CL · 2026-06-07 · unverdicted · none · ref 9 · internal anchor
Activation patching localizes English detokenization in Llama2-7B to a two-stage attention-then-MLP process at layer 1 that generalizes to 12 models from 8 families, with depth varying by positional encoding, plus an early-layer probe achieving 0.94-0.97 AUROC.
Customer-Agent: Overcoming Context Limitations in Ultra-Long Shopping Trajectories via Tool-Augmented Agents and RLVR cs.CL · 2026-06-06 · unverdicted · none · ref 81 · internal anchor
Introduces ShopTrajQA long-context benchmark and an RLVR-trained tool-augmented agent that bypasses LLM context limits by external file storage and code-based retrieval for shopping trajectories.
The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs cs.CL · 2026-06-05 · unverdicted · none · ref 45 · internal anchor
Using a 1PL IRT model on real cultural questions across 13 locales, the study identifies a local-language knowledge-access advantage masked by lower proficiency in raw accuracy.
ReverseEOL: Improving Training-free Text Embeddings via Text Reversal in Decoder-only LLMs cs.CL · 2026-06-04 · unverdicted · none · ref 33 · internal anchor
ReverseEOL improves training-free text embeddings by combining forward and reversed-text representations from frozen decoder-only LLMs.
Boosting Self-Consistency with Ranking cs.CL · 2026-06-03 · unverdicted · none · ref 203 · internal anchor
RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity cs.CL · 2026-06-01 · unverdicted · none · ref 39 · internal anchor
Peer agreement misleads initially correct LLMs more than it corrects initially wrong ones, with authority labels biasing choices independently of accuracy and reasoning prompts failing to mitigate the asymmetry.
Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment cs.CL · 2026-05-17 · unverdicted · none · ref 96 · internal anchor
Introduces HRC model for game-theoretic decomposition of preferences into orthogonal transitive and cyclic components, paired with DSPPO for dynamic Nash-seeking alignment, reporting gains over BT and GPM baselines on RewardBench and downstream LLM evaluations.
Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making cs.CL · 2026-05-17 · unverdicted · none · ref 30 · internal anchor
Frontier LLMs exhibit bias from stigmatizing language in clinical vignettes across four conditions, skewing decisions toward less aggressive management, with limited mitigation from Chain-of-Thought or self-debiasing prompts.
ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset cs.CL · 2026-05-13 · conditional · none · ref 52 · internal anchor
ATD-Trans is a new geographically annotated Japanese-English travelogue dataset that reveals Japanese-enhanced models perform better on geo-entity translation while domestic Japanese locations remain harder to translate accurately.
From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction cs.CL · 2026-05-12 · unverdicted · none · ref 23 · internal anchor
MedTPE compresses EHR token sequences by up to 31% via merging common medical token pairs, reducing LLM inference latency 34-63% while maintaining or improving performance on mortality and phenotyping tasks.
SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation cs.CL · 2026-05-08 · unverdicted · none · ref 47 · 2 links · internal anchor
SimCT enlarges the supervision space in cross-tokenizer on-policy distillation using short jointly tokenizable multi-token continuations, producing consistent gains over shared-token baselines on math and code benchmarks.
Don't Lose Focus: Activation Steering via Key-Orthogonal Projections cs.CL · 2026-05-07 · unverdicted · none · ref 31 · internal anchor
SKOP uses key-orthogonal projections to steer LLM activations while preserving attention patterns on focus tokens, cutting utility degradation by 5-7x and retaining over 95% of standard steering efficacy.
When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models cs.CL · 2026-05-04 · conditional · none · ref 4 · internal anchor
AloLab, an iterative meta-agent prompt optimizer, raises structured output accuracy for 7-9B models from 0% to 84-87% on GSM8K while preserving near-native inference speed.
Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model cs.CL · 2026-04-23 · unverdicted · none · ref 18 · internal anchor
IRM derives implicit reward signals from off-the-shelf LLMs to detect generated text zero-shot and reports better results than prior zero-shot and supervised detectors on the DetectRL benchmark.
In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores cs.CL · 2026-04-21 · unverdicted · none · ref 10 · internal anchor
Standardized-test benchmarks for LLM fairness are unreliable because prompt wording alone drives most score variance and ranking changes, while a multi-agent conversational framework reveals consistent model-specific fairness behaviors across millions of dialogues.
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization cs.CL · 2026-04-21 · unverdicted · none · ref 31 · internal anchor
LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.

Gemma 2: Improving Open Language Models at a Practical Size

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer