pith. sign in

hub Canonical reference

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Canonical reference. 83% of citing Pith papers cite this work as background.

30 Pith papers citing it
Background 83% of classified citations
abstract

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.

hub tools

citation-role summary

background 12

citation-polarity summary

representative citing papers

Validity-Calibrated Reasoning Distillation

cs.LG · 2026-04-14 · unverdicted · novelty 7.0 · 2 refs

Validity-calibrated reasoning distillation improves transfer of reasoning skills by modulating updates based on relative local validity of next steps instead of enforcing full trajectory imitation.

An Information-Theoretic Criterion for Efficient Data Synthesis

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Synthetic data improves models only in information-open generation-training loops with external signals, and coarser signals like binary correctness enable better generalization by converging to the most information-efficient component.

SkillGen: Verified Inference-Time Agent Skill Synthesis

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

SkillGen synthesizes auditable skills from agent trajectories via contrastive induction on successes and failures, then verifies net performance impact by comparing outcomes with and without the skill on identical tasks.

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 3 refs

RL training compute for logical reasoning follows a power law with horizon depth whose exponent rises with logical expressiveness, yielding better downstream transfer when models train on richer logics.

CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning

cs.CL · 2025-09-26 · conditional · novelty 6.0

CoSpaDi introduces a training-free sparse dictionary learning framework for post-training LLM compression that optimizes functional reconstruction error via activation-derived orthonormalization and achieves improved accuracy-compression trade-offs over SVD and pruning baselines.

Textbooks Are All You Need

cs.CL · 2023-06-20 · unverdicted · novelty 6.0

A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.

Internal narratives parameterise affective states

cs.CL · 2025-02-13 · unverdicted · novelty 5.0

LLM representations of internal narratives predict depression scores and track affective changes, with preservation of symptom covariance structure required for validity.

citing papers explorer

Showing 30 of 30 citing papers.