pith. sign in

Xu, et al., A survey of resource-efficient LLM and multimodal foun- dation models, arXiv preprint arXiv:2401.08092 (2024)

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

roles

background 2

polarities

background 2

representative citing papers

Rate-Distortion Optimization for Transformer Inference

cs.LG · 2026-01-29 · unverdicted · novelty 5.0

A rate-distortion framework for lossy compression of transformer representations yields substantial bitrate savings on language tasks while preserving accuracy, with observed rates aligning to derived information-theoretic bounds.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

citing papers explorer

Showing 6 of 6 citing papers.