RT-QA shows state-of-the-art models reach only 46% accuracy on real-time questions, primarily failing due to shallow search and failure to update reasoning to the present time.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.IR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Benchmarking Real-Time Question Answering via Executable Code Workflows
RT-QA shows state-of-the-art models reach only 46% accuracy on real-time questions, primarily failing due to shallow search and failure to update reasoning to the present time.