pith. machine review for the scientific record. sign in

GQA : Training generalized multi-query transformer models from multi-head checkpoints

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 3 2024 1

representative citing papers

Priming: Hybrid State Space Models From Pre-trained Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.

StarCoder 2 and The Stack v2: The Next Generation

cs.SE · 2024-02-29 · accept · novelty 6.0

StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.

citing papers explorer

Showing 4 of 4 citing papers.

  • ArgBench: Benchmarking LLMs on Computational Argumentation Tasks cs.CL · 2026-04-19 · unverdicted · none · ref 94

    ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

  • SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters cs.DC · 2026-05-01 · unverdicted · none · ref 3

    SAGA reduces AI agent task completion time by 1.64x on 64-GPU clusters by scheduling at the full workflow level with execution graphs, affinity batching, and completion-time fairness.

  • Priming: Hybrid State Space Models From Pre-trained Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 1

    Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.

  • StarCoder 2 and The Stack v2: The Next Generation cs.SE · 2024-02-29 · accept · none · ref 150

    StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.