Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
In: Findings of the Association for Computational Linguistics: ACL 2023
13 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 13roles
background 2representative citing papers
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.
Z-Reward trains a 27B reasoning teacher VLM on score distributions via GDSO and distills it via RISD into a 9B student, reaching 89.6% and 88.6% human preference accuracy with 41.3% optimization gain over SFT baseline.
Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.
Fine-tuning reasoning models on answer-only data induces reasoning-trace collapse where valid traces disappear while answer performance stays high, and simple loss-masking can mitigate it.
STRIDE injects distilled LLM reasoning as continuous cross-modal priors into TSFMs via mean-pooled hidden states, achieving SOTA forecasting (0.674 MASE, 0.454 CRPS) on GIFT-Eval and superior reasoning on TFRBench.
SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.
Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.
LaV-CoT introduces a multi-stage visual CoT pipeline and GRPO training with language-consistency rewards, delivering up to 9.5% accuracy gains on multilingual VQA benchmarks over similar-sized open models.
PIE creates predicate-aware embeddings by weighting subjectless triples and DRSD distills LLM reasoning into an SLM while decoupling confidence from rationales to improve entity alignment and enable human-in-the-loop verification.
OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.
A neuro-symbolic system converts legal clauses into deterministic typed graphs for consistent, auditable adjudication that cuts compute costs by over 90% versus direct large reasoning model use.
citing papers explorer
-
Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning
Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.
-
Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning
Fine-tuning reasoning models on answer-only data induces reasoning-trace collapse where valid traces disappear while answer performance stays high, and simple loss-masking can mitigate it.
-
Reasoning-Aware Training for Time Series Forecasting
STRIDE injects distilled LLM reasoning as continuous cross-modal priors into TSFMs via mean-pooled hidden states, achieving SOTA forecasting (0.674 MASE, 0.454 CRPS) on GIFT-Eval and superior reasoning on TFRBench.
-
Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents
Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.