Zero: Memory optimiza- tions toward training trillion parameter models

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He · 2020 · SC20: International Conference for High Performance Computing, Networking, Storage and Analysis · DOI 10.1109/sc41405.2020.00024

7 Pith papers cite this work, alongside 603 external citations. Polarity classification is still indexing.

7 Pith papers citing it

603 external citations · external index

open at publisher browse 7 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

cs.LG · 2026-05-20 · conditional · novelty 6.0

ChunkFT enables full-parameter fine-tuning of Llama 3-8B on one 24 GB GPU and Llama 3-70B on two 80 GB GPUs by streaming gradients over dynamically activated sub-tensors.

SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

SceneGraphVLM generates dynamic scene graphs from video using compact VLMs, TOON serialization, and hallucination-aware RL to improve precision and achieve one-second latency.

Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning

cs.CV · 2026-05-09 · conditional · novelty 6.0

Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.

Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Diagonal plus Low-Rank (DLoR) neural networks achieve universal approximation for general activations by additive or multiplicative decompositions of full-rank transformations.

Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

A learned orchestration policy for LLM agents that jointly optimizes task decomposition and selective routing to (model, primitive) pairs, delivering 77% macro pass@1 at 10x lower cost than strong baselines across 13 benchmarks.

LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning

cs.RO · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

LaST-R1 introduces a RL post-training method called LAPO that optimizes latent Chain-of-Thought reasoning in vision-language-action models, yielding 99.9% success on LIBERO and up to 22.5% real-world gains.

MeMo: Memory as a Model

cs.CL · 2026-05-14 · unverdicted · novelty 5.0 · 2 refs

MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.

citing papers explorer

Showing 7 of 7 citing papers.

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning cs.LG · 2026-05-20 · conditional · none · ref 7
ChunkFT enables full-parameter fine-tuning of Llama 3-8B on one 24 GB GPU and Llama 3-70B on two 80 GB GPUs by streaming gradients over dynamically activated sub-tensors.
SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models cs.CV · 2026-05-13 · unverdicted · none · ref 24
SceneGraphVLM generates dynamic scene graphs from video using compact VLMs, TOON serialization, and hallucination-aware RL to improve precision and achieve one-second latency.
Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning cs.CV · 2026-05-09 · conditional · none · ref 45
Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.
Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks cs.LG · 2026-05-07 · unverdicted · none · ref 38
Diagonal plus Low-Rank (DLoR) neural networks achieve universal approximation for general activations by additive or multiplicative decompositions of full-rank transformations.
Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation cs.AI · 2026-05-06 · unverdicted · none · ref 48
A learned orchestration policy for LLM agents that jointly optimizes task decomposition and selective routing to (model, primitive) pairs, delivering 77% macro pass@1 at 10x lower cost than strong baselines across 13 benchmarks.
LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning cs.RO · 2026-04-30 · unverdicted · none · ref 84 · 2 links
LaST-R1 introduces a RL post-training method called LAPO that optimizes latent Chain-of-Thought reasoning in vision-language-action models, yielding 99.9% success on LIBERO and up to 22.5% real-world gains.
MeMo: Memory as a Model cs.CL · 2026-05-14 · unverdicted · none · ref 71 · 2 links
MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.

Zero: Memory optimiza- tions toward training trillion parameter models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer