PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

· 2026 · cs.CL · arXiv 2604.12995

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-system benchmark (US-China) evaluating policy comprehension, comprising 21K cases across a broad spectrum of policy areas, capturing the diversity and complexity of real-world governance. Following Bloom's taxonomy, the benchmark assesses three core capabilities: (1) \textbf{Memorization}: factual recall of policy knowledge, (2) \textbf{Understanding}: conceptual and contextual reasoning, and (3) \textbf{Application}: problem-solving in real-life policy scenarios. Building on this benchmark, we further propose \textbf{\textit{PolicyMoE}}, a domain-specialized Mixture-of-Experts (MoE) model with expert modules aligned to each cognitive level. The proposed models demonstrate stronger performance on application-oriented policy tasks than on memorization or conceptual understanding, and yields the highest accuracy on structured reasoning tasks. Our results reveal key limitations of current LLMs in policy understanding and suggest paths toward more reliable, policy-focused models.

representative citing papers

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

cs.LG · 2026-06-11 · conditional · novelty 6.0

TRACE compiles user corrections into runtime enforcement rules for coding agents, cutting preference violations from 100% to 37.6% in-distribution and 2% out-of-distribution on ClawArena tasks while matching memory baselines on task success.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents cs.LG · 2026-06-11 · conditional · none · ref 32 · internal anchor
TRACE compiles user corrections into runtime enforcement rules for coding agents, cutting preference violations from 100% to 37.6% in-distribution and 2% out-of-distribution on ClawArena tasks while matching memory baselines on task success.

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer