Canonical reference

Orca 2: Teaching small language models how to reason

Mitra, A · 2023 · arXiv 2311.11045

Canonical reference. 80% of citing Pith papers cite this work as background.

9 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 9 citing papers

citation-role summary

background 4 baseline 1

citation-polarity summary

background 4 baseline 1

representative citing papers

LLM-Agnostic Semantic Representation Attack

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

SRA achieves 99.71% average attack success across 26 LLMs by optimizing for coherent malicious semantics via the SRHS algorithm, with claimed theoretical guarantees on convergence and transfer.

DART: Mitigating Harm Drift in Difference-Aware LLMs via Distill-Audit-Repair Training

cs.CL · 2026-04-18 · unverdicted · novelty 6.0

DART raises difference-awareness accuracy from 39% to 68.8% on benchmarks while cutting harm-drift cases by 72.6% and improving real-world appropriate responses from 39.8% to 77.5%.

CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning

cs.CL · 2025-09-26 · conditional · novelty 6.0

CoSpaDi introduces a training-free sparse dictionary learning framework for post-training LLM compression that optimizes functional reconstruction error via activation-derived orthonormalization and achieves improved accuracy-compression trade-offs over SVD and pruning baselines.

Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models

cs.AI · 2026-05-08 · unverdicted · novelty 5.0

Mid-training LLMs on self-generated diverse reasoning paths improves subsequent RL performance on mathematical benchmarks and OOD tasks.

Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs

cs.LG · 2026-05-05 · unverdicted · novelty 5.0 · 2 refs

Predict-then-Diffuse predicts response length for diffusion LLMs before inference, cutting FLOPs with a data-driven safety buffer while preserving output quality.

Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning

cs.CL · 2026-04-09 · accept · novelty 5.0

LLM post-training is unified as off-policy or on-policy interventions that expand support for useful behaviors, reshape policies within reachable states, or consolidate behavior across training stages.

Phi-4-reasoning Technical Report

cs.AI · 2025-04-30 · unverdicted · novelty 4.0

A 14B reasoning model trained via supervised fine-tuning on selected prompts and o3-mini traces, plus outcome RL, outperforms larger open models like DeepSeek-R1-Distill-Llama-70B on math, coding, planning and related benchmarks.

Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices

cs.DC · 2025-03-11 · unverdicted · novelty 2.0

Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.

Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)

cs.CL · 2025-01-03 · unverdicted · novelty 2.0

A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.

citing papers explorer

Showing 9 of 9 citing papers.

LLM-Agnostic Semantic Representation Attack cs.CL · 2026-05-09 · unverdicted · none · ref 67
SRA achieves 99.71% average attack success across 26 LLMs by optimizing for coherent malicious semantics via the SRHS algorithm, with claimed theoretical guarantees on convergence and transfer.
DART: Mitigating Harm Drift in Difference-Aware LLMs via Distill-Audit-Repair Training cs.CL · 2026-04-18 · unverdicted · none · ref 6
DART raises difference-awareness accuracy from 39% to 68.8% on benchmarks while cutting harm-drift cases by 72.6% and improving real-world appropriate responses from 39.8% to 77.5%.
CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning cs.CL · 2025-09-26 · conditional · none · ref 10
CoSpaDi introduces a training-free sparse dictionary learning framework for post-training LLM compression that optimizes functional reconstruction error via activation-derived orthonormalization and achieves improved accuracy-compression trade-offs over SVD and pruning baselines.
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models cs.AI · 2026-05-08 · unverdicted · none · ref 32
Mid-training LLMs on self-generated diverse reasoning paths improves subsequent RL performance on mathematical benchmarks and OOD tasks.
Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs cs.LG · 2026-05-05 · unverdicted · none · ref 19 · 2 links
Predict-then-Diffuse predicts response length for diffusion LLMs before inference, cutting FLOPs with a data-driven safety buffer while preserving output quality.
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning cs.CL · 2026-04-09 · accept · none · ref 83
LLM post-training is unified as off-policy or on-policy interventions that expand support for useful behaviors, reshape policies within reachable states, or consolidate behavior across training stages.
Phi-4-reasoning Technical Report cs.AI · 2025-04-30 · unverdicted · none · ref 40
A 14B reasoning model trained via supervised fine-tuning on selected prompts and o3-mini traces, plus outcome RL, outperforms larger open models like DeepSeek-R1-Distill-Llama-70B on math, coding, planning and related benchmarks.
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices cs.DC · 2025-03-11 · unverdicted · none · ref 161
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.
Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026) cs.CL · 2025-01-03 · unverdicted · none · ref 91
A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.

Orca 2: Teaching small language models how to reason

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer