Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

When not to trust language models: Investigating effectiveness of parametric, non-parametric memories , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.

citing papers explorer

Showing 2 of 2 citing papers.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 26
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation cs.CL · 2026-05-19 · unverdicted · none · ref 33
BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.

Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

fields

years

verdicts

representative citing papers

citing papers explorer