and Downey, Doug

Shen, Z · 2022 · DOI 10.1162/tacl_a_00466

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

representative citing papers

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

cs.CL · 2026-06-05 · unverdicted · novelty 7.0

Prefix gain measured via student-model solve-rate improvement is used to train a Prefix Utility Model (PUM) that supplies stronger supervision than correctness-based process rewards for mathematical reasoning.

Invoice Haystack: Benchmarking Document Retrieval and Visual Question Answering Under Strong Visual Homogeneity

cs.CV · 2026-06-24 · unverdicted · novelty 6.0 · 2 refs

Presents Invoice Haystack benchmark for homogeneous document retrieval and VL-RAG hybrid framework achieving 60% Recall@1 and up to 13.5 point gains over prior methods.

Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

Decisive combines document-grounded option scoring with adaptive Bayesian preference elicitation to achieve up to 20% higher decision accuracy than LLMs and existing frameworks across domains.

ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

ProbScale finds layer subsets in SLMs like RoBERTa-Large and T5-Base that cut parameters 5-10x while retaining 95-98% of original task performance by maximizing aggregated probe scores under a budget.

Beyond Logical Forms: LLM-Extracted Patterns for Fallacy Classification

cs.CL · 2026-06-25 · unverdicted · novelty 4.0

LLM-extracted patterns merging logical structures and linguistic cues yield statistically significant gains in fallacy classification over zero-shot baselines with cross-dataset generalization.

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

cs.CL · 2026-05-18 · unverdicted · novelty 3.0

MMoA adds LSTM recurrence to Mixture-of-Agents routing, reaching 58.0% win rate on AlpacaEval 2.0 versus 59.8% for baseline MoA while cutting runtime by up to 4.6%.

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

cs.CL · 2026-06-24 · unverdicted · novelty 2.0

Fine-tuned PEGASUS achieves state-of-the-art ROUGE scores on XL-Sum English corpus with 4.04% ROUGE-1, 15.25% ROUGE-2, and 3.39% ROUGE-L gains over mT5 baseline.

citing papers explorer

Showing 6 of 6 citing papers after filters.

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning cs.CL · 2026-06-05 · unverdicted · none · ref 274
Prefix gain measured via student-model solve-rate improvement is used to train a Prefix Utility Model (PUM) that supplies stronger supervision than correctness-based process rewards for mathematical reasoning.
Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents cs.CL · 2026-04-20 · unverdicted · none · ref 296
Decisive combines document-grounded option scoring with adaptive Bayesian preference elicitation to achieve up to 20% higher decision accuracy than LLMs and existing frameworks across domains.
ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference cs.CL · 2026-06-01 · unverdicted · none · ref 275
ProbScale finds layer subsets in SLMs like RoBERTa-Large and T5-Base that cut parameters 5-10x while retaining 95-98% of original task performance by maximizing aggregated probe scores under a budget.
Beyond Logical Forms: LLM-Extracted Patterns for Fallacy Classification cs.CL · 2026-06-25 · unverdicted · none · ref 284
LLM-extracted patterns merging logical structures and linguistic cues yield statistically significant gains in fallacy classification over zero-shot baselines with cross-dataset generalization.
MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent cs.CL · 2026-05-18 · unverdicted · none · ref 257
MMoA adds LSTM recurrence to Mixture-of-Agents routing, reaching 58.0% win rate on AlpacaEval 2.0 versus 59.8% for baseline MoA while cutting runtime by up to 4.6%.
Optimizing Abstractive Summarization With Fine-Tuned PEGASUS cs.CL · 2026-06-24 · unverdicted · none · ref 271
Fine-tuned PEGASUS achieves state-of-the-art ROUGE scores on XL-Sum English corpus with 4.04% ROUGE-1, 15.25% ROUGE-2, and 3.39% ROUGE-L gains over mT5 baseline.

and Downey, Doug

fields

years

verdicts

representative citing papers

citing papers explorer