← back to paper
arxiv: 2605.07574 · 2 revisions
PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models