Docbench: A benchmark for evaluating llm-based document reading systems

· 2024 · arXiv 2407.10701

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

cs.AI · 2025-08-01 · unverdicted · novelty 5.0

Cognitive Kernel-Pro provides an open-source agent framework with curated training data across web, file, code, and reasoning domains plus test-time reflection and voting, achieving SOTA results on GAIA among free agents.

Human-AI Collaborative Game Testing with Vision Language Models

cs.HC · 2025-01-20 · unverdicted · novelty 4.0

An experiment with 276 participants finds that vision language model assistance improves human game testers' defect identification, especially with design documentation, while AI errors create challenges.

citing papers explorer

Showing 2 of 2 citing papers.

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training cs.AI · 2025-08-01 · unverdicted · none · ref 16
Cognitive Kernel-Pro provides an open-source agent framework with curated training data across web, file, code, and reasoning domains plus test-time reflection and voting, achieving SOTA results on GAIA among free agents.
Human-AI Collaborative Game Testing with Vision Language Models cs.HC · 2025-01-20 · unverdicted · none · ref 55
An experiment with 276 participants finds that vision language model assistance improves human game testers' defect identification, especially with design documentation, while AI errors create challenges.

Docbench: A benchmark for evaluating llm-based document reading systems

fields

years

verdicts

representative citing papers

citing papers explorer