PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
Zico , month = nov, year =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A-ProS uses a hybrid multi-model feedback framework with stateful refinement to improve success rates on competitive programming problems, achieving over 2x gains compared to baseline agent loops.
Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.
citing papers explorer
-
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
-
A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback
A-ProS uses a hybrid multi-model feedback framework with stateful refinement to improve success rates on competitive programming problems, achieving over 2x gains compared to baseline agent loops.
-
Prompt Governance? On Governing Technologies Governed by Natural Language
Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.