In: Findings of the Association for Computational Linguistics: ACL 2023

Hsieh, C · 2023 · DOI 10.18653/v1/2023.findings-acl.507

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

open at publisher browse 13 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation

cs.CL · 2025-02-28 · unverdicted · novelty 7.0

CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.

Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

Z-Reward trains a 27B reasoning teacher VLM on score distributions via GDSO and distills it via RISD into a 9B student, reaching 89.6% and 88.6% human preference accuracy with 41.3% optimization gain over SFT baseline.

Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.

Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Fine-tuning reasoning models on answer-only data induces reasoning-trace collapse where valid traces disappear while answer performance stays high, and simple loss-masking can mitigate it.

Reasoning-Aware Training for Time Series Forecasting

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

STRIDE injects distilled LLM reasoning as continuous cross-modal priors into TSFMs via mean-pooled hidden states, achieving SOTA forecasting (0.674 MASE, 0.454 CRPS) on GIFT-Eval and superior reasoning on TFRBench.

SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.

LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA

cs.CV · 2025-09-12 · unverdicted · novelty 6.0

LaV-CoT introduces a multi-stage visual CoT pipeline and GRPO training with language-consistency rewards, delivering up to 9.5% accuracy gains on multilingual VQA benchmarks over similar-sized open models.

Predicate Importance Estimation and Decoupled Rationale-Score Distillation for Entity Alignment

cs.CL · 2026-06-22 · unverdicted · novelty 5.0

PIE creates predicate-aware embeddings by weighting subjectless triples and DRSD distills LLM reasoning into an SLM while decoupling confidence from rationales to improve entity alignment and enable human-in-the-loop verification.

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.

Accurate Legal Reasoning at Scale: Neuro-Symbolic Offloading and Structural Auditability for Robust Legal Adjudication

cs.CL · 2026-05-04 · unverdicted · novelty 5.0

A neuro-symbolic system converts legal clauses into deterministic typed graphs for consistent, auditable adjudication that cuts compute costs by over 90% versus direct large reasoning model use.

citing papers explorer

Showing 13 of 13 citing papers.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 152
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation cs.CL · 2025-02-28 · unverdicted · none · ref 100
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning cs.LG · 2026-06-08 · unverdicted · none · ref 15
Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.
Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions cs.CV · 2026-06-08 · unverdicted · none · ref 21
Z-Reward trains a 27B reasoning teacher VLM on score distributions via GDSO and distills it via RISD into a 9B student, reaching 89.6% and 88.6% human preference accuracy with 41.3% optimization gain over SFT baseline.
Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion cs.CL · 2026-05-21 · unverdicted · none · ref 6
Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.
Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning cs.LG · 2026-05-20 · unverdicted · none · ref 11
Fine-tuning reasoning models on answer-only data induces reasoning-trace collapse where valid traces disappear while answer performance stays high, and simple loss-masking can mitigate it.
Reasoning-Aware Training for Time Series Forecasting cs.LG · 2026-05-09 · unverdicted · none · ref 1
STRIDE injects distilled LLM reasoning as continuous cross-modal priors into TSFMs via mean-pooled hidden states, achieving SOTA forecasting (0.674 MASE, 0.454 CRPS) on GIFT-Eval and superior reasoning on TFRBench.
SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization cs.CL · 2026-04-21 · unverdicted · none · ref 12
SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.
Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents cs.LG · 2026-04-12 · unverdicted · none · ref 11
Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.
LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA cs.CV · 2025-09-12 · unverdicted · none · ref 18
LaV-CoT introduces a multi-stage visual CoT pipeline and GRPO training with language-consistency rewards, delivering up to 9.5% accuracy gains on multilingual VQA benchmarks over similar-sized open models.
Predicate Importance Estimation and Decoupled Rationale-Score Distillation for Entity Alignment cs.CL · 2026-06-22 · unverdicted · none · ref 45
PIE creates predicate-aware embeddings by weighting subjectless triples and DRSD distills LLM reasoning into an SLM while decoupling confidence from rationales to improve entity alignment and enable human-in-the-loop verification.
OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models cs.CL · 2026-05-12 · unverdicted · none · ref 12
OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.
Accurate Legal Reasoning at Scale: Neuro-Symbolic Offloading and Structural Auditability for Robust Legal Adjudication cs.CL · 2026-05-04 · unverdicted · none · ref 49
A neuro-symbolic system converts legal clauses into deterministic typed graphs for consistent, auditable adjudication that cuts compute costs by over 90% versus direct large reasoning model use.

In: Findings of the Association for Computational Linguistics: ACL 2023

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer