PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MetaRAG is only partially reproducible with lower absolute scores than originally reported, gains substantially from reranking, and shows greater robustness than SIM-RAG under extended retrieval features.
citing papers explorer
-
PeerPrism: Peer Evaluation Expertise vs Review-writing AI
PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.
-
A Reproducibility Study of Metacognitive Retrieval-Augmented Generation
MetaRAG is only partially reproducible with lower absolute scores than originally reported, gains substantially from reranking, and shows greater robustness than SIM-RAG under extended retrieval features.