pith. sign in

Title resolution pending

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

fields

cs.DC 3 cs.AI 1

years

2026 4

verdicts

UNVERDICTED 4

roles

background 3

polarities

background 3

representative citing papers

Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

Hive is a multi-agent infrastructure with a logits cache for reducing cross-path redundancy in sampling and agent-aware scheduling for better compute and KV-cache allocation, shown to deliver 1.11x-1.76x speedups and 33%-51% lower hotspot miss rates.

Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

cs.DC · 2026-04-16 · unverdicted · novelty 6.0

Scepsy schedules arbitrary multi-LLM agentic workflows on GPU clusters by constructing Aggregate LLM Pipelines from stable per-LLM execution time shares, then searching fractional GPU allocations, tensor parallelism, and replica counts to achieve up to 2.4x higher throughput and 27x lower latency.

citing papers explorer

Showing 4 of 4 citing papers.

  • Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling cs.AI · 2026-04-19 · unverdicted · none · ref 5 · internal anchor

    Hive is a multi-agent infrastructure with a logits cache for reducing cross-path redundancy in sampling and agent-aware scheduling for better compute and KV-cache allocation, shown to deliver 1.11x-1.76x speedups and 33%-51% lower hotspot miss rates.

  • Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines cs.DC · 2026-04-16 · unverdicted · none · ref 3 · internal anchor

    Scepsy schedules arbitrary multi-LLM agentic workflows on GPU clusters by constructing Aggregate LLM Pipelines from stable per-LLM execution time shares, then searching fractional GPU allocations, tensor parallelism, and replica counts to achieve up to 2.4x higher throughput and 27x lower latency.

  • ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache cs.DC · 2026-04-07 · unverdicted · none · ref 6 · internal anchor

    ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.

  • TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing cs.DC · 2026-04-03 · unverdicted · none · ref 2 · internal anchor

    TokenDance scales multi-agent LLM serving to 2.7x more concurrent agents by collective KV cache reuse and block-sparse diff encoding that achieves 11-17x compression.