Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.
Fact or fiction: Verifying scientific claims.ArXiv, abs/2004.14974
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.
CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
citing papers explorer
-
Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics
Re²Math is a new benchmark that evaluates AI models on retrieving and verifying the applicability of theorems from math literature to advance steps in partial proofs, accepting any sufficient theorem while controlling for leakage.
-
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.
-
CleanBase: Detecting Malicious Documents in RAG Knowledge Databases
CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.
-
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.