LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.
Chandra and Dexter C
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 1polarities
unclear 1representative citing papers
POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.
Lexical frequency is a stronger predictor of metaphor novelty than LM surprisal, with the surprisal-novelty link peaking early in training before declining as surprisal becomes more aligned with frequency.
LLMs generally fail to maintain stable worldviews under adversarial conversational pressure, indicating they lack core beliefs akin to those in human cognition.
UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.
A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.
citing papers explorer
-
Logic-Regularized Verifier Elicits Reasoning from LLMs
LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.
-
POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference
POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.
-
The Frequency Confound in Language-Model Surprisal and Metaphor Novelty
Lexical frequency is a stronger predictor of metaphor novelty than LM surprisal, with the surprisal-novelty link peaking early in training before declining as surprisal becomes more aligned with frequency.
-
Do LLMs have core beliefs?
LLMs generally fail to maintain stable worldviews under adversarial conversational pressure, indicating they lack core beliefs akin to those in human cognition.
-
UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning
UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.
-
A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance
A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.