Matformer: Nested transformer for elastic inference

Devvrit, S · 2023 · arXiv 2310.07707

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.

GRINQH: Graded Input-based Quantization Hierarchy for Efficient LLM Generation

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

GRINQH introduces a graded input-based quantization hierarchy that dynamically assigns multi-precision weights using activation magnitudes as importance proxy, unifying quantization with sparsification to improve LLM decoding speed and quality trade-offs on Llama3 and Qwen3 models.

Small Language Models are the Future of Agentic AI

cs.AI · 2025-06-02 · unverdicted · novelty 5.0

Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control cs.LG · 2026-05-08 · unverdicted · none · ref 9
Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.
GRINQH: Graded Input-based Quantization Hierarchy for Efficient LLM Generation cs.LG · 2026-06-22 · unverdicted · none · ref 12
GRINQH introduces a graded input-based quantization hierarchy that dynamically assigns multi-precision weights using activation magnitudes as importance proxy, unifying quantization with sparsification to improve LLM decoding speed and quality trade-offs on Llama3 and Qwen3 models.
Small Language Models are the Future of Agentic AI cs.AI · 2025-06-02 · unverdicted · none · ref 42
Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.

Matformer: Nested transformer for elastic inference

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer