ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.
Canonical reference
InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Canonical reference. 71% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
representative citing papers
Pythia releases 16 identically trained LLMs with full checkpoints and data tools to study training dynamics, scaling, memorization, and bias in language models.
RATrain introduces a resource-aware scheduler and MT-3000-specific backend for 1F1B LLM training that achieves 1.35x speedup and 97% scaling efficiency while preserving training correctness.
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
SKIM is an adaptive multi-resolution soft-token framework that compresses procedural skills while aiming to preserve logical dependencies and task performance better than prior compression methods.
KoRe encodes 1-hop knowledge graph subgraphs as compact discrete tokens for injection into LLMs, achieving competitive benchmark performance with up to 10x token reduction.
PrismLLM constructs a sliced execution graph and uses hybrid emulation to faithfully reproduce performance and memory behavior of up to 8192-GPU LLM training runs on fewer than 1% of the original GPUs.
MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.
Symphony detects step misalignments in ring collectives via lightweight in-network tracking and mitigates them by throttling outpacing flows with congestion signals, yielding up to 54% better communication times in Astra-Sim simulations and a Tofino2 prototype.
Continued pretraining of Code Llama on Proof-Pile-2 yields Llemma, an open math-specialized LLM that beats known open base models on MATH and supports tool use plus formal proving out of the box.
Pre-training loss predicts LLM math reasoning better than parameter count; rejection sampling fine-tuning with diverse paths raises LLaMA-7B accuracy on GSM8K from 35.9% with SFT to 49.3%.
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.
HSAP introduces a hierarchical framework and sequence-aware algorithm with JIT-optimized NCCL communication to enable correct causal attention computation on hybrid-context packed sequences without limiting parallelism.
Eidola is a gem5 extension that emulates cycle-level peer-to-peer GPU writes via real-application timing profiles to simulate traffic and synchronization in multi-GPU AI systems.
CroCo applies English-reward-ranked self-generations for contrastive preference tuning that improves two LLMs on structured and open-ended tasks across 14 languages without language-specific annotations.
torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.
VLA Foundry provides a single training stack for VLA models and releases open models that match prior closed-source performance or outperform baselines on multi-task manipulation in simulation.
BiasCareVL is a bias-aware vision-language framework trained on 3.44 million medical samples that outperforms prior methods on clinical tasks like diagnosis and segmentation while aiming for equitable performance under data imbalances.
PRISM introduces a probabilistic performance modeling framework that quantifies guarantees on training time for large-scale distributed systems under runtime variability.
On Grace Hopper superchips, energy efficiency during multimodal training is governed by data movement and overlap rather than compute utilization, and runtime-optimal configurations are not always energy-optimal.
DFLOP is a data-driven framework that profiles data-induced computation variance and uses predictive scheduling to balance workloads in multimodal LLM training pipelines, claiming up to 3.6x faster training than existing frameworks.
citing papers explorer
-
UCCL-Zip: Lossless Compression Supercharged GPU Communication
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.