InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703, Online

Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, et al · 2025 · arXiv 2506.03922

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DriveSpatial benchmark shows the best of 15 VLMs trails humans by 28.4 points on spatiotemporal driving tasks, with cognitive scene construction as the main failure mode.

Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

ConsumerSimBench evaluates 13 LLMs on reconstructing crowd reactions from 1,553 Chinese social-media topics using 23,122 auditable yes-no criteria, finding maximum coverage of 47.8% by Gemini-3.1-Pro.

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

cs.AI · 2026-04-24 · unverdicted · novelty 7.0

Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.

Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine

cs.CV · 2026-03-02 · conditional · novelty 6.0

Chain-of-thought underperforms direct answering in medical VQA due to a perception bottleneck, but ROI cues and textual grounding interventions can improve results and reverse the gap.

BenCSSmark: Making the Social Sciences Count in LLM Research

cs.CL · 2026-05-06 · unverdicted · novelty 4.0

BenCSSmark is a proposed benchmark that adds social science datasets to LLM evaluation to improve model robustness and relevance across disciplines like sociology and economics.

citing papers explorer

Showing 5 of 5 citing papers.

DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving cs.CV · 2026-05-22 · unverdicted · none · ref 61
DriveSpatial benchmark shows the best of 15 VLMs trails humans by 28.4 points on spatiotemporal driving tasks, with cognitive scene construction as the main failure mode.
Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench cs.CL · 2026-05-16 · unverdicted · none · ref 72
ConsumerSimBench evaluates 13 LLMs on reconstructing crowd reactions from 1,553 Chinese social-media topics using 23,122 auditable yes-no criteria, finding maximum coverage of 47.8% by Gemini-3.1-Pro.
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond cs.AI · 2026-04-24 · unverdicted · none · ref 171
Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.
Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine cs.CV · 2026-03-02 · conditional · none · ref 10
Chain-of-thought underperforms direct answering in medical VQA due to a perception bottleneck, but ROI cues and textual grounding interventions can improve results and reverse the gap.
BenCSSmark: Making the Social Sciences Count in LLM Research cs.CL · 2026-05-06 · unverdicted · none · ref 9
BenCSSmark is a proposed benchmark that adds social science datasets to LLM evaluation to improve model robustness and relevance across disciplines like sociology and economics.

InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703, Online

fields

years

verdicts

representative citing papers

citing papers explorer