Audit finds 36-39% incorrect FOL labels in FOLIO and MALLS; corrections raise LLM accuracy 9-22 points and an LLM-guided review framework achieves 90% dataset quality after checking fewer than 24% of examples.
Towards a Common Framework for Autoformalization , booktitle =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
PrologMCP is a standardized MCP server for Prolog that lets LLM agents delegate inference, achieving near-perfect accuracy on PARARULE-Plus subsets where reasoning LLMs drop to 0.94-0.95.
citing papers explorer
-
Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling
Audit finds 36-39% incorrect FOL labels in FOLIO and MALLS; corrections raise LLM accuracy 9-22 points and an LLM-guided review framework achieves 90% dataset quality after checking fewer than 24% of examples.
-
PrologMCP: A Standardized Prolog Tool Interface for LLM Agents
PrologMCP is a standardized MCP server for Prolog that lets LLM agents delegate inference, achieving near-perfect accuracy on PARARULE-Plus subsets where reasoning LLMs drop to 0.94-0.95.