pith. sign in

hub

Are we done with mmlu? CoRR, abs/2406.04127

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

hub tools

citation-role summary

dataset 2

citation-polarity summary

roles

dataset 2

polarities

use dataset 2

clear filters

representative citing papers

Dynamic Model Merging Made Slim

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.

Qwen3-Omni Technical Report

cs.CL · 2025-09-22 · unverdicted · novelty 6.0

Qwen3-Omni is a unified multimodal model that achieves open-source SOTA on 32 of 36 audio and audio-visual benchmarks and overall SOTA on 22 without degrading performance on text, image, or video relative to single-modal Qwen counterparts.

Qwen2.5-1M Technical Report

cs.CL · 2025-01-26 · accept · novelty 6.0

Qwen2.5-1M models reach 1M token context with improved long-context performance, no short-context loss, and 3-7x prefill speedup via open inference optimizations.

Qwen3.5-Omni Technical Report

cs.CL · 2026-04-17 · unverdicted · novelty 5.0

Qwen3.5-Omni scales an omnimodal model to hundreds of billions of parameters with 256k context, introduces ARIA for stable speech synthesis, and reports SOTA performance on 215 audio-visual benchmarks while adding multilingual and audio-visual coding capabilities.

MiMo-V2-Flash Technical Report

cs.CL · 2026-01-06 · unverdicted · novelty 5.0

MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.

Kimi K2: Open Agentic Intelligence

cs.LG · 2025-07-28 · unverdicted · novelty 5.0

Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.

Qwen3 Technical Report

cs.CL · 2025-05-14 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Qwen2.5-Omni Technical Report

cs.CL · 2025-03-26 · conditional · novelty 5.0

Qwen2.5-Omni presents a multimodal model with block-wise encoders, TMRoPE position embeddings, and a Thinker-Talker architecture that enables simultaneous text and streaming speech generation while matching text performance on reasoning benchmarks.

Qwen2.5-VL Technical Report

cs.CV · 2025-02-19 · unverdicted · novelty 5.0

Qwen2.5-VL reports a vision-language model family using native dynamic-resolution ViT and absolute time encoding that matches GPT-4o on document and diagram tasks while supporting hour-long videos with second-level localization.

Measuring AI Reasoning: A Guide for Researchers

cs.AI · 2026-05-04 · unverdicted · novelty 4.0

Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

Ministral 3

cs.CL · 2026-01-13 · unverdicted · novelty 4.0

Ministral 3 releases 3B/8B/14B parameter-efficient language models with base, instruction, and reasoning variants derived via iterative pruning and distillation, including image understanding capabilities.

Mellum2 Technical Report

cs.CL · 2026-05-29 · unverdicted · novelty 3.0

Mellum 2 is a 12B MoE model with 2.5B active parameters, trained on 10.6T tokens with MoE, GQA, SWA, and MTP, then post-trained into Instruct and Thinking variants, claimed competitive with 4B-14B models at 2.5B compute.

citing papers explorer

Showing 9 of 9 citing papers after filters.

  • Dynamic Model Merging Made Slim cs.LG · 2026-05-17 · unverdicted · none · ref 30

    DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.

  • Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety cs.CL · 2026-06-26 · unverdicted · none · ref 13

    Yuvion LLM applies adversarially aware training and introduces the YLRE benchmark set, claiming superior safety robustness over larger models on multiple tasks.

  • SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training cs.LG · 2026-05-09 · unverdicted · none · ref 31 · 2 links

    Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.

  • Qwen3.5-Omni Technical Report cs.CL · 2026-04-17 · unverdicted · none · ref 13

    Qwen3.5-Omni scales an omnimodal model to hundreds of billions of parameters with 256k context, introduces ARIA for stable speech synthesis, and reports SOTA performance on 215 audio-visual benchmarks while adding multilingual and audio-visual coding capabilities.

  • SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining cs.LG · 2026-02-11 · conditional · none · ref 10

    SnapMLA achieves up to 1.91x higher throughput in long-output MLA decoding using FP8 quantization and specialized kernels while keeping benchmark quality near the BF16 baseline.

  • MiMo-V2-Flash Technical Report cs.CL · 2026-01-06 · unverdicted · none · ref 17

    MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.

  • Measuring AI Reasoning: A Guide for Researchers cs.AI · 2026-05-04 · unverdicted · none · ref 147

    Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

  • Ministral 3 cs.CL · 2026-01-13 · unverdicted · none · ref 20

    Ministral 3 releases 3B/8B/14B parameter-efficient language models with base, instruction, and reasoning variants derived via iterative pruning and distillation, including image understanding capabilities.

  • Mellum2 Technical Report cs.CL · 2026-05-29 · unverdicted · none · ref 20

    Mellum 2 is a 12B MoE model with 2.5B active parameters, trained on 10.6T tokens with MoE, GQA, SWA, and MTP, then post-trained into Instruct and Thinking variants, claimed competitive with 4B-14B models at 2.5B compute.