Bamboo: A com- prehensive benchmark for evaluating long text modeling capacities of large language models

Dong, Z · 2023 · arXiv 2309.13345

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

RULER: What's the Real Context Size of Your Long-Context Language Models?

cs.CL · 2024-04-09 · accept · novelty 8.0

RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

cs.DC · 2026-02-10 · unverdicted · novelty 6.0

SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

cs.CL · 2023-11-09 · unverdicted · novelty 5.0

The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

citing papers explorer

Showing 3 of 3 citing papers.

RULER: What's the Real Context Size of Your Long-Context Language Models? cs.CL · 2024-04-09 · accept · none · ref 12
RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding cs.DC · 2026-02-10 · unverdicted · none · ref 14
SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions cs.CL · 2023-11-09 · unverdicted · none · ref 79
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

Bamboo: A com- prehensive benchmark for evaluating long text modeling capacities of large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer