MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.
Available: https://arxiv.org/abs/2311.03687
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.
Introduces Efficiency Frontier framework for deployment-aware cost-performance optimization of LLM context strategies, reporting ~25% token reduction at F1≈0.78 on 5,000 HotpotQA instances.
citing papers explorer
-
MeMo: Memory as a Model
MeMo encodes new knowledge into a separate memory model that integrates with frozen LLMs, showing strong performance on QA benchmarks while avoiding catastrophic forgetting and working without access to model weights.
-
Position: LLM Inference Should Be Evaluated as Energy-to-Token Production
LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.
-
The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management
Introduces Efficiency Frontier framework for deployment-aware cost-performance optimization of LLM context strategies, reporting ~25% token reduction at F1≈0.78 on 5,000 HotpotQA instances.