Forty-first International Conference on Machine Learning , year=

Mobilellm: Optimizing sub-billion parameter language models for on-device use cases , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Transformers with Selective Access to Early Representations

cs.LG · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

SATFormer uses a context-dependent gate for selective reuse of early Transformer representations, improving validation loss and zero-shot accuracy especially on retrieval benchmarks.

Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM

cs.DC · 2026-04-20 · unverdicted · novelty 4.0

A framework combines multi-LoRA runtime switching, multi-stream stylistic decoding, and Dynamic Self-Speculative Decoding with INT4 quantization to achieve 4-6x memory and latency gains for on-device inference of a one-for-all foundational LLM on Qualcomm chipsets.

citing papers explorer

Showing 2 of 2 citing papers.

Transformers with Selective Access to Early Representations cs.LG · 2026-05-05 · unverdicted · none · ref 12 · 2 links
SATFormer uses a context-dependent gate for selective reuse of early Transformer representations, improving validation loss and zero-shot accuracy especially on retrieval benchmarks.
Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM cs.DC · 2026-04-20 · unverdicted · none · ref 18
A framework combines multi-LoRA runtime switching, multi-stream stylistic decoding, and Dynamic Self-Speculative Decoding with INT4 quantization to achieve 4-6x memory and latency gains for on-device inference of a one-for-all foundational LLM on Qualcomm chipsets.

Forty-first International Conference on Machine Learning , year=

fields

years

verdicts

representative citing papers

citing papers explorer