A 540-image benchmark with four phrasing variants per image reveals VLMs degrade when text leakage is minimized, with no-image ablations confirming reliance and GRPO post-training yielding gains that transfer to held-out data.
Klaus Krippendorff
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
A compliance-scored best-of-N orchestration layer for multimodal document generation reports 91% compliance at 5 attempts in 20 seconds and +11 percentage point win rate gains in aggregate operational data for payments dispute defense.
citing papers explorer
-
Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense
A compliance-scored best-of-N orchestration layer for multimodal document generation reports 91% compliance at 5 attempts in 20 seconds and +11 percentage point win rate gains in aggregate operational data for payments dispute defense.