Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
In: Findings of the Association for Computational Linguistics: ACL 2023
13 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 13roles
background 2representative citing papers
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.
Z-Reward trains a 27B reasoning teacher VLM on score distributions via GDSO and distills it via RISD into a 9B student, reaching 89.6% and 88.6% human preference accuracy with 41.3% optimization gain over SFT baseline.
Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.
Fine-tuning reasoning models on answer-only data induces reasoning-trace collapse where valid traces disappear while answer performance stays high, and simple loss-masking can mitigate it.
STRIDE injects distilled LLM reasoning as continuous cross-modal priors into TSFMs via mean-pooled hidden states, achieving SOTA forecasting (0.674 MASE, 0.454 CRPS) on GIFT-Eval and superior reasoning on TFRBench.
SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.
Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.
LaV-CoT introduces a multi-stage visual CoT pipeline and GRPO training with language-consistency rewards, delivering up to 9.5% accuracy gains on multilingual VQA benchmarks over similar-sized open models.
PIE creates predicate-aware embeddings by weighting subjectless triples and DRSD distills LLM reasoning into an SLM while decoupling confidence from rationales to improve entity alignment and enable human-in-the-loop verification.
OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.
A neuro-symbolic system converts legal clauses into deterministic typed graphs for consistent, auditable adjudication that cuts compute costs by over 90% versus direct large reasoning model use.
citing papers explorer
-
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
-
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
-
Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning
Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.
-
Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions
Z-Reward trains a 27B reasoning teacher VLM on score distributions via GDSO and distills it via RISD into a 9B student, reaching 89.6% and 88.6% human preference accuracy with 41.3% optimization gain over SFT baseline.
-
Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion
Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.
-
Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning
Fine-tuning reasoning models on answer-only data induces reasoning-trace collapse where valid traces disappear while answer performance stays high, and simple loss-masking can mitigate it.
-
Reasoning-Aware Training for Time Series Forecasting
STRIDE injects distilled LLM reasoning as continuous cross-modal priors into TSFMs via mean-pooled hidden states, achieving SOTA forecasting (0.674 MASE, 0.454 CRPS) on GIFT-Eval and superior reasoning on TFRBench.
-
SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization
SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.
-
Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents
Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.
-
LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA
LaV-CoT introduces a multi-stage visual CoT pipeline and GRPO training with language-consistency rewards, delivering up to 9.5% accuracy gains on multilingual VQA benchmarks over similar-sized open models.
-
Predicate Importance Estimation and Decoupled Rationale-Score Distillation for Entity Alignment
PIE creates predicate-aware embeddings by weighting subjectless triples and DRSD distills LLM reasoning into an SLM while decoupling confidence from rationales to improve entity alignment and enable human-in-the-loop verification.
-
OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models
OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.
-
Accurate Legal Reasoning at Scale: Neuro-Symbolic Offloading and Structural Auditability for Robust Legal Adjudication
A neuro-symbolic system converts legal clauses into deterministic typed graphs for consistent, auditable adjudication that cuts compute costs by over 90% versus direct large reasoning model use.