pith. sign in

arxiv: 1505.00855 · v1 · pith:IJ5BVNLKnew · submitted 2015-05-05 · 💻 cs.CV · cs.IR· cs.LG· cs.MM

Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature

classification 💻 cs.CV cs.IRcs.LGcs.MM
keywords similaritypaintingsfeaturesmetricmultimediavisualavailablecollections
0
0 comments X
read the original abstract

In the past few years, the number of fine-art collections that are digitized and publicly available has been growing rapidly. With the availability of such large collections of digitized artworks comes the need to develop multimedia systems to archive and retrieve this pool of data. Measuring the visual similarity between artistic items is an essential step for such multimedia systems, which can benefit more high-level multimedia tasks. In order to model this similarity between paintings, we should extract the appropriate visual features for paintings and find out the best approach to learn the similarity metric based on these features. We investigate a comprehensive list of visual features and metric learning approaches to learn an optimized similarity measure between paintings. We develop a machine that is able to make aesthetic-related semantic-level judgments, such as predicting a painting's style, genre, and artist, as well as providing similarity measures optimized based on the knowledge available in the domain of art historical interpretation. Our experiments show the value of using this similarity measure for the aforementioned prediction tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. NetTailor: Tuning the Architecture, Not Just the Weights

    cs.CV 2019-06 unverdicted novelty 7.0

    NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for s...

  2. Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM

    cs.LG 2026-04 unverdicted novelty 6.0

    Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.

  3. The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor

    cs.HC 2026-01 conditional novelty 6.0

    LAION-Aesthetics Predictor reinforces Western and male biases by preferentially selecting images associated with women and realistic Western/Japanese art while excluding men, LGBTQ+ references, and other styles.

  4. Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition

    cs.CV 2025-11 unverdicted novelty 6.0

    Insert In Style is a zero-shot framework that disentangles identity, style, and composition via multi-stage training, masked attention, and prior preservation to enable harmonious cross-domain object insertion in images.

  5. The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models

    cs.CV 2025-07 unverdicted novelty 6.0

    Text-to-image diffusion models exhibit varying degrees of emergent content-style separation in art generation, with content tokens primarily influencing object regions and style tokens affecting backgrounds and textures.

  6. ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

    cs.CV 2023-11 conditional novelty 6.0

    A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.

  7. Linking Art through Human Poses

    cs.CV 2019-07 unverdicted novelty 6.0

    Human pose similarity matching with spatial verification outperforms standard content-based image retrieval for discovering composition transfers in art on a manually annotated dataset.

  8. Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

    cs.LG 2026-05 unverdicted novelty 5.0

    CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks i...

  9. Long Story Short: Disentangling Compositionality and Long-Caption Understanding in Contrastive VLMs

    cs.CV 2025-09 unverdicted novelty 5.0

    Empirical study shows bidirectional but sensitive relationship between compositionality and long-caption understanding in VLMs, promoted by high-quality grounded data and affected by architectural choices like frozen ...

  10. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

    cs.CV 2024-12 accept novelty 5.0

    DeepSeek-VL2 is a series of MoE vision-language models using dynamic tiling and latent attention that reach competitive or state-of-the-art results on VQA, OCR, document understanding and grounding with 1.0B to 4.5B a...