ChunkFT enables full-parameter fine-tuning of Llama 3-8B on one 24 GB GPU and Llama 3-70B on two 80 GB GPUs by streaming gradients over dynamically activated sub-tensors.
Program induction by rationale generation: Learning to solve and explain algebraic word problems
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.
citing papers explorer
-
ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning
ChunkFT enables full-parameter fine-tuning of Llama 3-8B on one 24 GB GPU and Llama 3-70B on two 80 GB GPUs by streaming gradients over dynamically activated sub-tensors.
-
LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning
LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.