hub

On-device language models: A comprehensive review.arXiv preprint arXiv:2409.00088

· 2024 · arXiv 2409.00088

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

cs.LG · 2026-04-28 · unverdicted · novelty 7.0

KV cache eviction is unified under an information capacity maximization principle derived from a linear-Gaussian attention surrogate, with CapKV proposed as a leverage-score based implementation that outperforms prior heuristics in experiments.

WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

cs.IT · 2026-04-20 · unverdicted · novelty 7.0

WISV uses a channel-aware semantic acceptance policy on hidden representations to boost accepted sequence length by up to 60.8% and cut interaction rounds by 37.3% in distributed speculative decoding, with under 1% accuracy loss.

Towards Discovery of Polymers for Insulin Delivery via Physics-Grounded Agentic Workflows

q-bio.QM · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

An LLM-orchestrated physics simulation search identifies polymers with strong insulin interactions, outperforming standard optimization methods by significant margins.

SOD: Step-wise On-policy Distillation for Small Language Model Agents

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

cs.AR · 2026-04-28 · unverdicted · novelty 6.0

NVLLM offloads FFN computations to integrated 3D NAND flash with page-level access and keeps attention in DRAM, delivering 16.7x-37.9x speedups over GPU out-of-core baselines for models up to 30B parameters.

Do LLMs Need to See Everything? A Benchmark and Study of Failures in LLM-driven Smartphone Automation using Screentext vs. Screenshots

cs.HC · 2026-04-20 · unverdicted · novelty 6.0

A new benchmark shows LLM smartphone agents achieve comparable success with screen text alone as with screenshots, but both fail often due to UI accessibility and reasoning gaps.

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

cs.CR · 2026-03-10 · unverdicted · novelty 6.0

FlexServe achieves up to 10x faster time-to-first-token for secure LLM inference on mobile devices by using flexible resource isolation in TrustZone compared to standard approaches.

CoreQ: Learning-Free Mismatch Correction and Successive Rounding for Quantization

cs.LG · 2026-02-05 · unverdicted · novelty 6.0

CoreQ delivers adaptive mismatch correction via closed-form geometric coefficient and successive rounding to improve PTQ accuracy for large language models.

AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management

cs.AI · 2025-12-11 · conditional · novelty 6.0

AgentProg reframes interaction history as a program with variables and control flow, plus a belief state for partial observability, achieving SOTA success rates on long-horizon GUI benchmarks while baselines degrade.

Beyond Scaling: Agents Are Heading to the Edge

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Personal agents require edge deployment to preserve high-fidelity local context and zero-latency loops, as claimed through three structural shifts away from cloud-centric designs.

Intelligent Drill-Down: Large Language Model-Driven Drill-Down Technique for Human-AI Collaborative Visual Exploration

cs.HC · 2026-04-18 · unverdicted · novelty 5.0

An LLM-based framework recommends drill-down paths in visual analytics by approximating a greedy algorithm, interpreting user intent, and managing exploration branches to reduce cognitive load.

Diamonds in the rough: Transforming SPARCs of imagination into a game concept by leveraging medium sized LLMs

cs.HC · 2025-09-29 · unverdicted · novelty 5.0

Medium-sized LLMs can supply useful feedback on game concepts in early design stages, as demonstrated by model comparisons and a positive student pilot study.

RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation

cs.IR · 2026-05-06 · unverdicted · novelty 4.0

RecGPT-Mobile runs a compact LLM on phones to understand evolving user intent from behaviors and improve mobile e-commerce recommendations.

Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration

cs.IR · 2026-04-19 · unverdicted · novelty 4.0

A multi-agent multimodal system with fact-grounded adjudication and a dynamic two-tier preference graph cuts false positives in content filtering by 74.3% and nearly doubles F1-score versus text-only baselines while supporting user-driven Delta adjustments.

Less LLM, More Documents: Searching for Improved RAG

cs.IR · 2025-10-03 · unverdicted · novelty 4.0

Corpus scaling in RAG frequently matches the accuracy gains from larger LLMs on open-domain QA tasks, with mid-sized models benefiting most due to better passage coverage.

Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities

cs.DC · 2026-04-24 · unverdicted · novelty 3.0

A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.

Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda

cs.DC · 2026-04-19 · unverdicted · novelty 2.0

This research agenda argues that cloud-native architectures, microservices, autoscaling, and emerging trends like serverless inference and federated learning are required to make large language models efficient and scalable.

ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting

cs.AI · 2026-05-05

citing papers explorer

Showing 18 of 18 citing papers.

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective cs.LG · 2026-04-28 · unverdicted · none · ref 35
KV cache eviction is unified under an information capacity maximization principle derived from a linear-Gaussian attention surrogate, with CapKV proposed as a leverage-score based implementation that outperforms prior heuristics in experiments.
WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference cs.IT · 2026-04-20 · unverdicted · none · ref 2
WISV uses a channel-aware semantic acceptance policy on hidden representations to boost accepted sequence length by up to 60.8% and cut interaction rounds by 37.3% in distributed speculative decoding, with under 1% accuracy loss.
Towards Discovery of Polymers for Insulin Delivery via Physics-Grounded Agentic Workflows q-bio.QM · 2026-05-12 · unverdicted · none · ref 35 · 2 links
An LLM-orchestrated physics simulation search identifies polymers with strong insulin interactions, outperforming standard optimization methods by significant margins.
SOD: Step-wise On-policy Distillation for Small Language Model Agents cs.CL · 2026-05-08 · unverdicted · none · ref 8
SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.
NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference cs.AR · 2026-04-28 · unverdicted · none · ref 33
NVLLM offloads FFN computations to integrated 3D NAND flash with page-level access and keeps attention in DRAM, delivering 16.7x-37.9x speedups over GPU out-of-core baselines for models up to 30B parameters.
Do LLMs Need to See Everything? A Benchmark and Study of Failures in LLM-driven Smartphone Automation using Screentext vs. Screenshots cs.HC · 2026-04-20 · unverdicted · none · ref 75
A new benchmark shows LLM smartphone agents achieve comparable success with screen text alone as with screenshots, but both fail often due to UI accessibility and reasoning gaps.
FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation cs.CR · 2026-03-10 · unverdicted · none · ref 68
FlexServe achieves up to 10x faster time-to-first-token for secure LLM inference on mobile devices by using flexible resource isolation in TrustZone compared to standard approaches.
CoreQ: Learning-Free Mismatch Correction and Successive Rounding for Quantization cs.LG · 2026-02-05 · unverdicted · none · ref 18
CoreQ delivers adaptive mismatch correction via closed-form geometric coefficient and successive rounding to improve PTQ accuracy for large language models.
AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management cs.AI · 2025-12-11 · conditional · none · ref 51
AgentProg reframes interaction history as a program with variables and control flow, plus a belief state for partial observability, achieving SOTA success rates on long-horizon GUI benchmarks while baselines degrade.
Beyond Scaling: Agents Are Heading to the Edge cs.LG · 2026-05-18 · unverdicted · none · ref 67
Personal agents require edge deployment to preserve high-fidelity local context and zero-latency loops, as claimed through three structural shifts away from cloud-centric designs.
Intelligent Drill-Down: Large Language Model-Driven Drill-Down Technique for Human-AI Collaborative Visual Exploration cs.HC · 2026-04-18 · unverdicted · none · ref 64
An LLM-based framework recommends drill-down paths in visual analytics by approximating a greedy algorithm, interpreting user intent, and managing exploration branches to reduce cognitive load.
Diamonds in the rough: Transforming SPARCs of imagination into a game concept by leveraging medium sized LLMs cs.HC · 2025-09-29 · unverdicted · none · ref 15
Medium-sized LLMs can supply useful feedback on game concepts in early design stages, as demonstrated by model comparisons and a positive student pilot study.
RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation cs.IR · 2026-05-06 · unverdicted · none · ref 19
RecGPT-Mobile runs a compact LLM on phones to understand evolving user intent from behaviors and improve mobile e-commerce recommendations.
Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration cs.IR · 2026-04-19 · unverdicted · none · ref 32
A multi-agent multimodal system with fact-grounded adjudication and a dynamic two-tier preference graph cuts false positives in content filtering by 74.3% and nearly doubles F1-score versus text-only baselines while supporting user-driven Delta adjustments.
Less LLM, More Documents: Searching for Improved RAG cs.IR · 2025-10-03 · unverdicted · none · ref 35
Corpus scaling in RAG frequently matches the accuracy gains from larger LLMs on open-domain QA tasks, with mid-sized models benefiting most due to better passage coverage.
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities cs.DC · 2026-04-24 · unverdicted · none · ref 176
A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.
Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda cs.DC · 2026-04-19 · unverdicted · none · ref 146
This research agenda argues that cloud-native architectures, microservices, autoscaling, and emerging trends like serverless inference and federated learning are required to make large language models efficient and scalable.
ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting cs.AI · 2026-05-05 · unreviewed · ref 84

On-device language models: A comprehensive review.arXiv preprint arXiv:2409.00088

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer