Piqa: Reasoning about phys- ical commonsense in natural language

Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al · 2020

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

dataset 3

citation-polarity summary

use dataset 3

representative citing papers

Different Prompts, Different Ranks: Prompt-aware Dynamic Rank Selection for SVD-based LLM Compression

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

PARSE trains a prompt-aware linear router on dense-model outputs to select dynamic SVD ranks, improving accuracy up to 10% at 0.6 compression ratio on LLaMA-7B while delivering 2.5x prefill and 2.4x decode speedups.

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

cs.LG · 2026-04-02 · unverdicted · novelty 6.0

AdaHOP applies pattern-aware Hadamard transforms and selective outlier extraction to enable from-scratch MXFP4 training of LLMs at BF16 quality with up to 3.6X memory compression and 1.46X speedup.

Cubit: Token Mixer with Kernel Ridge Regression

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

Cubit replaces Transformer's attention with a closed-form Kernel Ridge Regression token mixer and reports larger gains as training sequence length increases.

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

TextPro-SLM reduces the speech-text modality gap by feeding an LLM backbone with synchronized text tokens and prosody embeddings from WhisperPro, achieving lowest gap scores at 3B/7B scales with roughly 1,000 hours of audio.

citing papers explorer

Showing 4 of 4 citing papers.

Different Prompts, Different Ranks: Prompt-aware Dynamic Rank Selection for SVD-based LLM Compression cs.LG · 2026-05-09 · unverdicted · none · ref 46
PARSE trains a prompt-aware linear router on dense-model outputs to select dynamic SVD ranks, improving accuracy up to 10% at 0.6 compression ratio on LLaMA-7B while delivering 2.5x prefill and 2.4x decode speedups.
AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation cs.LG · 2026-04-02 · unverdicted · none · ref 4
AdaHOP applies pattern-aware Hadamard transforms and selective outlier extraction to enable from-scratch MXFP4 training of LLMs at BF16 quality with up to 3.6X memory compression and 1.46X speedup.
Cubit: Token Mixer with Kernel Ridge Regression cs.LG · 2026-05-07 · unverdicted · none · ref 7
Cubit replaces Transformer's attention with a closed-form Kernel Ridge Regression token mixer and reports larger gains as training sequence length increases.
Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM cs.CL · 2026-05-07 · unverdicted · none · ref 59
TextPro-SLM reduces the speech-text modality gap by feeding an LLM backbone with synchronized text tokens and prosody embeddings from WhisperPro, achieving lowest gap scores at 3B/7B scales with roughly 1,000 hours of audio.

Piqa: Reasoning about phys- ical commonsense in natural language

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer