International Conference on Learning Representations , year=

GLUE: A Multi-Task Benchmark, Analysis Platform for Natural Language Understanding , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

cs.CL · 2026-05-16 · unverdicted · novelty 6.0

E-PMQ improves 4-bit quantization accuracy on merged models by 8-42 points across CLIP and GLUE tasks through expert-guided calibration and merged-weight anchoring.

LIMO: Less is More for Reasoning

cs.CL · 2025-02-05 · unverdicted · novelty 6.0

LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.

The Falcon Series of Open Language Models

cs.CL · 2023-11-28 · conditional · novelty 6.0

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

GiVA: Gradient-Informed Bases for Vector-Based Adaptation

cs.CL · 2026-04-23 · unverdicted · novelty 5.0

GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.

citing papers explorer

Showing 4 of 4 citing papers.

E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring cs.CL · 2026-05-16 · unverdicted · none · ref 17
E-PMQ improves 4-bit quantization accuracy on merged models by 8-42 points across CLIP and GLUE tasks through expert-guided calibration and merged-weight anchoring.
LIMO: Less is More for Reasoning cs.CL · 2025-02-05 · unverdicted · none · ref 47
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
The Falcon Series of Open Language Models cs.CL · 2023-11-28 · conditional · none · ref 67
Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
GiVA: Gradient-Informed Bases for Vector-Based Adaptation cs.CL · 2026-04-23 · unverdicted · none · ref 1
GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.

International Conference on Learning Representations , year=

fields

years

verdicts

representative citing papers

citing papers explorer