Longbench: A bilingual, multitask benchmark for long context understanding, 2024

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

TCM-Serve: Modality-aware Scheduling for Multimodal Large Language Model Inference

cs.DC · 2026-03-27 · unverdicted · novelty 7.0

TCM-Serve applies modality-aware scheduling to reduce average TTFT by 54% and 78.5% for latency-critical requests in MLLM inference.

STS: Efficient Sparse Attention with Speculative Token Sparsity

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

STS repurposes draft-model attention scores from speculative decoding to build token-and-head-wise sparsity masks, delivering 2.67x speedup at ~90% sparsity on NarrativeQA with negligible accuracy loss.

citing papers explorer

Showing 2 of 2 citing papers.

TCM-Serve: Modality-aware Scheduling for Multimodal Large Language Model Inference cs.DC · 2026-03-27 · unverdicted · none · ref 6
TCM-Serve applies modality-aware scheduling to reduce average TTFT by 54% and 78.5% for latency-critical requests in MLLM inference.
STS: Efficient Sparse Attention with Speculative Token Sparsity cs.LG · 2026-05-15 · unverdicted · none · ref 1
STS repurposes draft-model attention scores from speculative decoding to build token-and-head-wise sparsity masks, delivering 2.67x speedup at ~90% sparsity on NarrativeQA with negligible accuracy loss.

Longbench: A bilingual, multitask benchmark for long context understanding, 2024

fields

years

verdicts

representative citing papers

citing papers explorer