super hub Mixed citations

Gemma 2: Improving Open Language Models at a Practical Size

Cassidy Hardin, Gemma Team: Morgane Riviere, Pier Giuseppe Sessa, Shreya Pathak, Surya Bhupatiraju · 2024 · cs.CL · arXiv 2408.00118

Mixed citation behavior. Most common role is background (64%).

262 Pith papers citing it

Background 64% of classified citations

open full Pith review browse 262 citing papers more from Cassidy Hardin arXiv PDF

abstract

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 23 method 6 baseline 2 dataset 1 other 1

citation-polarity summary

background 21 use method 6 unclear 3 baseline 2 use dataset 1

claims ledger

abstract In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer compe

authors

Cassidy Hardin Gemma Team: Morgane Riviere L\'eonard Hussenot Pier Giuseppe Sessa Shreya Pathak Surya Bhupatiraju

co-cited works

representative citing papers

Masked Generative Transformer Is What You Need for Image Editing

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EditMGT applies masked generative transformers with attention consolidation and region-hold sampling to deliver state-of-the-art localized image editing at 6x the speed of diffusion methods.

Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims

cs.CR · 2026-05-11 · unverdicted · novelty 8.0

Acceptance Cards is a new four-diagnostic standard for safe fine-tuning defense claims that requires statistical reliability, fresh semantic generalization, mechanism alignment, and cross-task transfer; under this protocol SafeLoRA fails the full-card pass on Gemma-2-2B-it.

SLAM: Structural Linguistic Activation Marking for Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 8.0 · 2 refs

SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

Probing Memorization of Tabular In-Context Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.

Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise

cs.LG · 2026-06-28 · unverdicted · novelty 7.0

Fixed-clock optimizer memory turns equal-multiset data shuffle order into an O(η) source of fine-tuning noise, larger than the O(η²) effect in memoryless cases, with a fit-free sizing method derived.

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

NLL-guided layer selection identifies 1/4 of layers for full attention in hybrid models, matching periodic 1/2-FA baseline accuracy on LongMemEval with Qwen3-4B while halving the full-attention compute budget.

Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?

cs.RO · 2026-06-26 · accept · novelty 7.0

VLA language backbones show high redundancy on manipulation benchmarks, with half the LLM blocks removable and even two blocks sufficient to recover baseline performance after fine-tuning, unlike vision and action pathways.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

cs.AI · 2026-06-12 · unverdicted · novelty 7.0

Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

A deferral mechanism using forward-looking simulations reduces false positives in derailment forecasting by selectively waiting when recovery paths appear plausible.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

ReSAEs improve multi-layer SAE interventions on Pythia-1.4B and Gemma-2-9B by training later-layer dictionaries on residuals after affine mapping, recovering more cross-entropy loss despite lower raw variance reconstruction.

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

StakeBench is a new benchmark using market-derived supervision from resolved prediction markets to test LLMs on commitment detection, side identification, action anticipation, and odds projection, revealing partial success on sides but structural failures on higher tasks.

SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

SomaliBench finds large English-to-Somali refusal gaps (0.38 to 0.90) across Llama-3.1-8B, Gemma-2-9B, Qwen-2.5-7B, and Aya-23-8B, with many Somali responses being unclear rather than compliant.

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

Representational convergence across 16 LLMs on 800 reasoning problems is stronger for failed tasks and pre-decision stages but shows minimal causal influence on predictions, pointing to shared processing constraints over shared reasoning.

Self-Improving In-Context Learning

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

A test-time zeroth-order optimization of prompt embeddings using a bounded self-supervised proxy from demonstration log-probabilities improves ICL accuracy and correlates with gains across tasks.

GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

GraphFlow uses a unified wGraph to dynamically instantiate workflows and manage KV caches for LLM agents, reporting 4.95 pp average gains and 4x memory reduction on five benchmarks.

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

In 1-3B instruction-tuned LMs on GSM8K, arithmetic CoT readout is dominated by positional copying of the trailing number before the answer delimiter, accounting for 54-92 percentage points of accuracy.

Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Chronicle is the first model jointly pretrained from scratch on text and time series in a unified transformer that matches a comparable language model on NLU tasks and sets new bars for time series classification and multimodal forecasting.

Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.

Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization

math.OC · 2026-05-12 · conditional · novelty 7.0

Symmetries in next-token prediction targets induce corresponding geometric symmetries such as circulant matrices and equiangular tight frames in the optimal weights and embeddings of a layer-peeled LLM surrogate model.

citing papers explorer

Showing 31 of 31 citing papers after filters.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations cs.AI · 2026-06-12 · unverdicted · none · ref 34 · internal anchor
Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.
Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning cs.AI · 2026-05-27 · unverdicted · none · ref 13 · internal anchor
MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.
Towards Automated Air Traffic Safety Assessment Around Non-Towered Airports Using Large Language Models cs.AI · 2026-05-12 · unverdicted · none · ref 29 · internal anchor
Large language models achieve macro F1 scores above 0.85 on binary nominal-versus-danger classification from CTAF radio transcripts and METAR weather data using a new synthetic dataset with a 12-category hazard taxonomy.
Causal Bias Detection in Generative Artificial Intelligence cs.AI · 2026-05-12 · unverdicted · none · ref 53 · 2 links · internal anchor
Develops a causal framework unifying generative AI fairness with standard ML, with new decompositions, identification conditions, and estimators demonstrated on LLM race and gender bias.
Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations cs.AI · 2026-05-11 · unverdicted · none · ref 15 · internal anchor
Behavioral directions from one LLM family transfer to others via projection into a shared anchor coordinate space, yielding 0.83 ten-way detection accuracy and steering effects up to 0.46% on held-out models.
Response-Aware User Memory Selection for LLM Personalization cs.AI · 2026-04-15 · unverdicted · none · ref 2 · internal anchor
RUMS selects LLM user memory via mutual information with model outputs to reduce response uncertainty, outperforming similarity-based methods in human alignment and response quality with up to 95% lower cost.
WMF-AM: Probing LLM Working Memory via Depth-Parameterized Cumulative State Tracking cs.AI · 2026-03-28 · unverdicted · none · ref 50 · internal anchor
WMF-AM is a depth-parameterized benchmark that measures LLMs' cumulative state tracking ability without scratchpads, validated on 28 models across arithmetic and non-arithmetic tasks with ablations confirming the construct.
SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators cs.AI · 2025-11-05 · unverdicted · none · ref 20 · internal anchor
SnapStream deploys sparse KV attention in a production inference system on dataflow accelerators, delivering 4x on-chip memory savings for DeepSeek-671B at 128k context with up to 1832 tokens/sec and minimal accuracy loss on LongBench-v2, AIME24, and LiveCodeBench.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 37 · internal anchor
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents cs.AI · 2024-05-23 · accept · none · ref 28 · internal anchor
AndroidWorld is a dynamic, reproducible Android benchmark that generates unlimited natural-language tasks for autonomous agents and shows current agents succeed on only 30.6 percent of them.
Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures cs.AI · 2026-05-28 · unverdicted · none · ref 13 · internal anchor
TLO is a logit-based diagnostic that visualizes temporal patterns of LLM jailbreak failures on a calibrated 2D plane, distinguishing attacks with identical ASR and enabling early stopping that reduces successful jailbreaks by more than half.
Human-like in-group bias in instruction-tuned language model agents cs.AI · 2026-05-27 · unverdicted · none · ref 3 · internal anchor
Instruction-tuned language model agents exhibit in-group bias, action homophily, and network assortativity in simulations when group labels are salient, accumulating into structural inequality over repeated interactions.
Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models cs.AI · 2026-05-23 · unverdicted · none · ref 43 · internal anchor
A new fine-tuning defense uses temporary jailbreaking induced by BufferLoRA to limit harmful updates, followed by merging with ReinforceLoRA via QR decomposition to restore safety while keeping task performance.
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs cs.AI · 2026-05-20 · conditional · none · ref 90 · internal anchor
Introduces MOOD benchmark for OOD LLM alignment failures and shows guard models plus Mahalanobis and perplexity OOD detectors improve recall from 39% to 45% with positive scaling.
Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy cs.AI · 2026-05-20 · unverdicted · none · ref 3 · internal anchor
Off-the-shelf persona vectors rival targeted CAA for reducing sycophancy in two instruction-tuned models while maintaining accuracy on correct statements and appearing geometrically independent.
Domain Restriction via Multi SAE Layer Transitions cs.AI · 2026-05-12 · unverdicted · none · ref 27 · internal anchor
Multi-layer SAE transitions capture domain-specific signatures that distinguish OOD texts in Gemma-2 models.
FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models cs.AI · 2026-05-05 · unverdicted · none · ref 21 · 2 links · internal anchor
FinSTaR reaches 78.9% average accuracy on the new FinTSR-Bench by using Compute-in-CoT for deterministic assessment tasks and Scenario-Aware CoT for stochastic prediction tasks.
PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs cs.AI · 2026-05-01 · unverdicted · none · ref 71 · internal anchor
PERSA combines RLHF with selective parameter-efficient updates to top transformer layers, raising style alignment scores from 35% to 96% on code feedback benchmarks while holding correctness near 100%.
Reasoning Structure Matters for Safety Alignment of Reasoning Models cs.AI · 2026-04-21 · unverdicted · none · ref 13 · internal anchor
Changing the internal reasoning structure of large reasoning models through simple supervised fine-tuning on 1K examples produces strong safety alignment that generalizes across tasks and languages.
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification cs.AI · 2026-04-18 · unverdicted · none · ref 42 · internal anchor
Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.
Weight Patching: Toward Source-Level Mechanistic Localization in LLMs cs.AI · 2026-04-15 · unverdicted · none · ref 30 · internal anchor
Weight Patching localizes capabilities to specific parameter modules in LLMs by replacing weights from a behavior-specialized model into a base model and validating recovery via a vector-anchor interface, revealing a hierarchy of source, routing, and execution components.
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning cs.AI · 2026-01-19 · unverdicted · none · ref 54 · internal anchor
CURE-MED pairs a new 13-language medical reasoning benchmark with curriculum RL to raise logical correctness to 70% and language consistency to 95% at 32B scale while outperforming baselines.
Token-Level LLM Collaboration via FusionRoute cs.AI · 2026-01-08 · unverdicted · none · ref 21 · internal anchor
FusionRoute augments token-level expert routing with a trainable complementary logit generator to expand the policy class and recover optimal decoding under mild conditions, outperforming prior collaboration and merging methods on reasoning and generation benchmarks.
Multiplayer Nash Preference Optimization cs.AI · 2025-09-27 · unverdicted · none · ref 27 · internal anchor
MNPO extends NLHF to multiplayer Nash games, inheriting equilibrium guarantees while showing empirical gains on instruction-following benchmarks under diverse preferences.
DDIAgents: Mechanism-Conditioned Context Flow for Drug-Drug Interaction Prediction cs.AI · 2026-06-30 · unverdicted · none · ref 37 · internal anchor
DDIAgents introduces a mechanism-conditioned multi-agent framework for drug-drug interaction prediction that dynamically orchestrates knowledge sources and outperforms baselines.
Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models cs.AI · 2026-05-28 · unverdicted · none · ref 8 · internal anchor
An iterative writer-editor multi-agent LLM process improves perceived story quality in simulations of child collaborative storytelling.
Towards Platonic Representation for Table Reasoning: A Foundation for Permutation-Invariant Retrieval cs.AI · 2026-04-13 · unverdicted · none · ref 24 · internal anchor
Table representations must be permutation-invariant to preserve semantic structure, and a new header-aligned encoder moves toward this ideal while exposing fragility in existing LLM table embeddings.
Memory as an Attack Surface in LLM Agents: A Study on Multiple-Choice Question Answering cs.AI · 2026-06-27 · unverdicted · none · ref 25 · internal anchor
Inserting misleading memories into LLM agents reduces their accuracy on multiple-choice questions by leading them to select manipulated options.
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs cs.AI · 2024-10-24 · unverdicted · none · ref 20 · internal anchor
Data-centric filtering yields an 80K preference dataset and reward models that lead RewardBench while boosting other top entries.
Preventing Error Propagation in Multi-Agent AI through Runtime Monitoring cs.AI · 2026-06-27 · unverdicted · none · ref 21 · internal anchor
Multi-agent AI agents answer questions alone then exchange reasoning to revise decisions, tested via experiments for net reliability gains versus error propagation.
Language Model Networks: Supervision-Efficient Learning through Dense Communication cs.AI · 2025-05-19 · unreviewed · ref 43 · internal anchor

Gemma 2: Improving Open Language Models at a Practical Size

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer