GQA : Training generalized multi-query transformer models from multi-head checkpoints

Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebron, Sumit Sanghai · 2023 · DOI 10.18653/v1/2023.emnlp-main.298

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

cs.DC · 2026-05-01 · unverdicted · novelty 7.0

SAGA reduces AI agent task completion time by 1.64x on 64-GPU clusters by scheduling at the full workflow level with execution graphs, affinity batching, and completion-time fairness.

Priming: Hybrid State Space Models From Pre-trained Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.

StarCoder 2 and The Stack v2: The Next Generation

cs.SE · 2024-02-29 · accept · novelty 6.0

StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.

citing papers explorer

Showing 4 of 4 citing papers.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks cs.CL · 2026-04-19 · unverdicted · none · ref 94
ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.
SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters cs.DC · 2026-05-01 · unverdicted · none · ref 3
SAGA reduces AI agent task completion time by 1.64x on 64-GPU clusters by scheduling at the full workflow level with execution graphs, affinity batching, and completion-time fairness.
Priming: Hybrid State Space Models From Pre-trained Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 1
Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.
StarCoder 2 and The Stack v2: The Next Generation cs.SE · 2024-02-29 · accept · none · ref 150
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.

GQA : Training generalized multi-query transformer models from multi-head checkpoints

fields

years

verdicts

representative citing papers

citing papers explorer