Questions should be more natural, try to be close to the real needs of users’ questions, and should not be deliberately set to unreasonable challenges just to increase difficulty

Deliberately difficult questions: It is forbidden for annotators to ask deliberately difficult, stilted questions just to ensure that the human reviewer cannot solve them within

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

cs.CL · 2024-12-19 · accept · novelty 6.0

LongBench v2 benchmark shows current LLMs underperform humans on deep long-context reasoning tasks, but extended inference-time reasoning enables surpassing the human baseline.

citing papers explorer

Showing 1 of 1 citing paper.

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks cs.CL · 2024-12-19 · accept · none · ref 9
LongBench v2 benchmark shows current LLMs underperform humans on deep long-context reasoning tasks, but extended inference-time reasoning enables surpassing the human baseline.

Questions should be more natural, try to be close to the real needs of users’ questions, and should not be deliberately set to unreasonable challenges just to increase difficulty

fields

years

verdicts

representative citing papers

citing papers explorer