{"paper":{"title":"Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Vision language models rely on internal concepts that art historians judge as relevant for style prediction in 90 percent of cases.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Amith Ananthram, Anna Filonenko, Elias Stengel-Eskin, Emily L. Spratt, Hannah Pivo, Kathleen McKeown, Marvin Limpijankit, Milad Alshomary, Mohit Bansal, Noam M. Elcott, Tim Trombley, Yassin Oulad Daoud","submitted_at":"2026-03-11T17:49:45Z","abstract_excerpt":"VLMs have become increasingly proficient at a range of computer vision tasks, such as visual question answering and object detection. This includes increasingly strong capabilities in the domain of art, from analyzing artwork to generation of art. In an interdisciplinary collaboration between computer scientists and art historians, we characterize the mechanisms underlying VLMs' ability to predict artistic style and assess the extent to which they align with the criteria art historians use to reason about artistic style. We employ a latent-space decomposition approach to identify concepts that"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"73% of the extracted concepts are judged by art historians to exhibit a coherent and semantically meaningful visual feature and 90% of concepts used to predict style of a given artwork were judged relevant.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the latent-space decomposition method accurately isolates the specific concepts the VLM internally uses for style classification rather than producing post-hoc interpretable features.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Vision-language models predict artistic style using concepts that art historians judge as mostly coherent and relevant, with some success from formal visual features like contrast.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Vision language models rely on internal concepts that art historians judge as relevant for style prediction in 90 percent of cases.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"fbc91635c9068fa38f0fe508d3d7ae41148c63a1285de6b222166171af68664b"},"source":{"id":"2603.11024","kind":"arxiv","version":3},"verdict":{"id":"44c2c8cd-9315-4c3f-a947-8fcf3419f620","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T13:20:14.728343Z","strongest_claim":"73% of the extracted concepts are judged by art historians to exhibit a coherent and semantically meaningful visual feature and 90% of concepts used to predict style of a given artwork were judged relevant.","one_line_summary":"Vision-language models predict artistic style using concepts that art historians judge as mostly coherent and relevant, with some success from formal visual features like contrast.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the latent-space decomposition method accurately isolates the specific concepts the VLM internally uses for style classification rather than producing post-hoc interpretable features.","pith_extraction_headline":"Vision language models rely on internal concepts that art historians judge as relevant for style prediction in 90 percent of cases."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2603.11024/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"f107e607e5cd18f2be4c242d564593bd18c7d5ac2df372dc49898c90a686d20f"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}