Visual-ERM is a new multimodal reward model that supplies fine-grained visual feedback for training vision-language models on chart-to-code, table, and SVG tasks, yielding measurable gains over prior rewards.
Viscodex: Unified multimodal code generation via merging vision and coding models.arXiv preprint arXiv:2508.09945
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it