hub

Xing, Joseph E

Zheng, L · 2023 · arXiv 2309.11998

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

read on arXiv browse 25 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 baseline 1 dataset 1

citation-polarity summary

background 1 baseline 1 use dataset 1

representative citing papers

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

Turn-averaged SAEs reconstruct average activations over conversation turns to represent high-level turn characteristics with a fixed number of features, simplifying long-context interpretability compared to per-token SAEs.

AI Fiction in the Wild

cs.CL · 2026-06-22 · unverdicted · novelty 7.0

Analysis of 500k ChatGPT logs shows over one-third of conversations generate fiction, dominated by power users with repetitive and niche patterns.

Cybersecurity AI (CAI) Dataset

cs.CR · 2026-05-27 · unverdicted · novelty 7.0

CAI Dataset is presented as the largest described corpus of LLM-driven hacker trajectories, with the claim that operator data concentration in frontier-model providers creates a major security risk best addressed by on-premise specialized LLMs.

EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

EvoCode-Bench shows that single-round success rates for coding agents exceed multi-turn persistent execution rates by 22-40 points, with performance dropping below half of round-1 levels by round 5 across 13 evaluated agents.

K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

cs.CL · 2026-05-10 · conditional · novelty 7.0

K12-KGraph is a textbook-derived knowledge graph that powers a new benchmark revealing LLMs' poor curriculum cognition and a small training corpus that outperforms general instruction data on educational tasks.

SAGE: A Service Agent Graph-guided Evaluation Benchmark

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.

Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads

cs.LG · 2026-01-29 · unverdicted · novelty 7.0

A renewal-reward analysis yields a closed-form mean-field rule for the optimal Attention/FFN provisioning ratio in disaggregated LLM serving that accounts for stochastic KV-cache growth and matches simulation optima within 10%.

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips

cs.DC · 2026-01-28 · conditional · novelty 7.0

SuperInfer improves TTFT SLO attainment by up to 74.7% on GH200 Superchips via SLO-aware rotary scheduling (RotaSched) and full-duplex KV cache rotation (DuplexKV) over NVLink-C2C while preserving TBT and throughput.

SeDT: Sentence-Transformer Decision-Transformer Conditioning for Multi-Turn Conversation Reliability

cs.CL · 2026-05-26 · unverdicted · novelty 6.0

SeDT recovers up to 37.7% of lost performance in multi-turn conversations by annotating history with relevance scores from semantic, lexical, and positional signals without training or data changes.

Test-Time Speculation

cs.CL · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

Adversarial restlessness in LLM activations allows five scalar features to detect multi-turn prompt injections at 93.8% accuracy on synthetic data, with cross-model replication but source-dependent generalization to real-world chats.

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

cs.CR · 2026-04-30 · unverdicted · novelty 6.0

TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.

Flow-Controlled Scheduling for LLM Inference with Provable Stability Guarantees

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

A flow-control framework for LLM inference derives necessary and sufficient stability conditions and experimentally improves throughput, latency, and KV cache stability over common baselines.

Language Model Goal Selection Differs from Humans' in a Self-Directed Learning Task

cs.CL · 2026-02-06 · unverdicted · novelty 6.0

LLMs diverge from human goal selection in self-directed learning by exploiting single solutions with low variability across instances.

Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving

cs.AR · 2025-05-19 · unverdicted · novelty 6.0

Sandwich delivers 2.01x average end-to-end speedup and up to 3.4x latency reduction for CPU LLM serving via phase-wise hot-switching, TopoTree hardware abstraction, and fast-start dynamic kernel generation.

LLMs Get Lost In Multi-Turn Conversation

cs.CL · 2025-05-09 · unverdicted · novelty 6.0

LLMs drop 39% in performance during multi-turn conversations due to premature assumptions and inability to recover from early errors.

KernelFlume: Elastic Core-Attention Scaling for Agentic Long-Context Decoding

cs.DC · 2026-06-28 · unverdicted · novelty 5.0

KernelFlume presents a disaggregated decode architecture that separates core attention from projection/FFN paths to enable elastic scaling of attention nodes, reporting up to 61% lower cost per million tokens versus full-instance scaling on H100 hardware for Llama-3.1-8B under dynamic long-context w

Kavier: Exploring Performance, Sustainability, and Efficiency of LLM Ecosystems under Inference through Cache-Aware Discrete-Event Simulation

cs.DC · 2026-05-24 · unverdicted · novelty 5.0

Proposes a reference architecture for LLM ecosystems under inference and Kavier, the first cache-aware discrete-event simulator for predicting performance, sustainability, and efficiency of inference workloads.

Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference

cs.AR · 2026-04-19 · unverdicted · novelty 5.0

A unified KV cache system with architecture-specific sizing, six-tier memory from GPU to filesystems, and Bayesian prediction delivers 7.4x higher batch sizes, 70-84% hit rates, and projected 1.7-2.9x throughput gains.

NVIDIA Nemotron 3: Efficient and Open Intelligence

cs.CL · 2025-12-24 · unverdicted · novelty 5.0

NVIDIA releases the Nemotron 3 model family with hybrid Mamba-Transformer architecture, LatentMoE, NVFP4 training, MTP layers, and multi-environment RL post-training for reasoning and agentic tasks.

At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization

cs.LG · 2026-06-24 · unverdicted · novelty 4.0

Sparse autoencoders show OOD prompts increase fallacious concept activation in transformers, offering a mechanistic measure of shift and a path to robust fine-tuning.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

Comparative Characterization of KV Cache Management Strategies for LLM Inference

cs.AR · 2026-04-06 · unverdicted · novelty 3.0

Benchmarks of vLLM, InfiniGen, and H2O identify conditions under which each KV cache strategy delivers the best trade-off between memory consumption and inference performance.

A Survey of Scaling in Large Language Model Reasoning

cs.AI · 2025-04-02 · unverdicted · novelty 3.0

A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

citing papers explorer

Showing 25 of 25 citing papers.

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution cs.CL · 2026-06-26 · unverdicted · none · ref 23
Turn-averaged SAEs reconstruct average activations over conversation turns to represent high-level turn characteristics with a fixed number of features, simplifying long-context interpretability compared to per-token SAEs.
AI Fiction in the Wild cs.CL · 2026-06-22 · unverdicted · none · ref 146
Analysis of 500k ChatGPT logs shows over one-third of conversations generate fiction, dominated by power users with repetitive and niche patterns.
Cybersecurity AI (CAI) Dataset cs.CR · 2026-05-27 · unverdicted · none · ref 62
CAI Dataset is presented as the largest described corpus of LLM-driven hacker trajectories, with the claim that operator data concentration in frontier-model providers creates a major security risk best addressed by on-premise specialized LLMs.
EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions cs.AI · 2026-05-22 · unverdicted · none · ref 3
EvoCode-Bench shows that single-round success rates for coding agents exceed multi-turn persistent execution rates by 22-40 points, with performance dropping below half of round-1 levels by round 5 across 13 evaluated agents.
K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs cs.CL · 2026-05-10 · conditional · none · ref 30
K12-KGraph is a textbook-derived knowledge graph that powers a new benchmark revealing LLMs' poor curriculum cognition and a small training corpus that outperforms general instruction data on educational tasks.
SAGE: A Service Agent Graph-guided Evaluation Benchmark cs.AI · 2026-04-10 · unverdicted · none · ref 67
SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.
Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads cs.LG · 2026-01-29 · unverdicted · none · ref 13
A renewal-reward analysis yields a closed-form mean-field rule for the optimal Attention/FFN provisioning ratio in disaggregated LLM serving that accounts for stochastic KV-cache growth and matches simulation optima within 10%.
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips cs.DC · 2026-01-28 · conditional · none · ref 26
SuperInfer improves TTFT SLO attainment by up to 74.7% on GH200 Superchips via SLO-aware rotary scheduling (RotaSched) and full-duplex KV cache rotation (DuplexKV) over NVLink-C2C while preserving TBT and throughput.
SeDT: Sentence-Transformer Decision-Transformer Conditioning for Multi-Turn Conversation Reliability cs.CL · 2026-05-26 · unverdicted · none · ref 15
SeDT recovers up to 37.7% of lost performance in multi-turn conversations by annotating history with relevance scores from semantic, lexical, and positional signals without training or data changes.
Test-Time Speculation cs.CL · 2026-05-10 · unverdicted · none · ref 35 · 2 links
TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.
Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection cs.CR · 2026-04-30 · unverdicted · none · ref 2
Adversarial restlessness in LLM activations allows five scalar features to detect multi-turn prompt injections at 93.8% accuracy on synthetic data, with cross-model replication but source-dependent generalization to real-world chats.
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning cs.CR · 2026-04-30 · unverdicted · none · ref 38
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
Flow-Controlled Scheduling for LLM Inference with Provable Stability Guarantees cs.LG · 2026-04-13 · unverdicted · none · ref 30
A flow-control framework for LLM inference derives necessary and sufficient stability conditions and experimentally improves throughput, latency, and KV cache stability over common baselines.
Language Model Goal Selection Differs from Humans' in a Self-Directed Learning Task cs.CL · 2026-02-06 · unverdicted · none · ref 23
LLMs diverge from human goal selection in self-directed learning by exploiting single solutions with low variability across instances.
Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving cs.AR · 2025-05-19 · unverdicted · none · ref 69
Sandwich delivers 2.01x average end-to-end speedup and up to 3.4x latency reduction for CPU LLM serving via phase-wise hot-switching, TopoTree hardware abstraction, and fast-start dynamic kernel generation.
LLMs Get Lost In Multi-Turn Conversation cs.CL · 2025-05-09 · unverdicted · none · ref 92
LLMs drop 39% in performance during multi-turn conversations due to premature assumptions and inability to recover from early errors.
KernelFlume: Elastic Core-Attention Scaling for Agentic Long-Context Decoding cs.DC · 2026-06-28 · unverdicted · none · ref 41
KernelFlume presents a disaggregated decode architecture that separates core attention from projection/FFN paths to enable elastic scaling of attention nodes, reporting up to 61% lower cost per million tokens versus full-instance scaling on H100 hardware for Llama-3.1-8B under dynamic long-context w
Kavier: Exploring Performance, Sustainability, and Efficiency of LLM Ecosystems under Inference through Cache-Aware Discrete-Event Simulation cs.DC · 2026-05-24 · unverdicted · none · ref 2
Proposes a reference architecture for LLM ecosystems under inference and Kavier, the first cache-aware discrete-event simulator for predicting performance, sustainability, and efficiency of inference workloads.
Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference cs.AR · 2026-04-19 · unverdicted · none · ref 40
A unified KV cache system with architecture-specific sizing, six-tier memory from GPU to filesystems, and Bayesian prediction delivers 7.4x higher batch sizes, 70-84% hit rates, and projected 1.7-2.9x throughput gains.
NVIDIA Nemotron 3: Efficient and Open Intelligence cs.CL · 2025-12-24 · unverdicted · none · ref 19
NVIDIA releases the Nemotron 3 model family with hybrid Mamba-Transformer architecture, LatentMoE, NVFP4 training, MTP layers, and multi-environment RL post-training for reasoning and agentic tasks.
At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization cs.LG · 2026-06-24 · unverdicted · none · ref 55
Sparse autoencoders show OOD prompts increase fallacious concept activation in transformers, offering a mechanistic measure of shift and a path to robust fine-tuning.
A Survey on LLM-as-a-Judge cs.CL · 2024-11-23 · unverdicted · none · ref 221
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
Comparative Characterization of KV Cache Management Strategies for LLM Inference cs.AR · 2026-04-06 · unverdicted · none · ref 18
Benchmarks of vLLM, InfiniGen, and H2O identify conditions under which each KV cache strategy delivers the best trade-off between memory consumption and inference performance.
A Survey of Scaling in Large Language Model Reasoning cs.AI · 2025-04-02 · unverdicted · none · ref 258
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
Gemma 2: Improving Open Language Models at a Practical Size cs.CL · 2024-07-31 · conditional · none · ref 157
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.

Xing, Joseph E

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer