Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature

Ahmed Elgammal; Babak Saleh

arxiv: 1505.00855 · v1 · pith:IJ5BVNLKnew · submitted 2015-05-05 · 💻 cs.CV · cs.IR· cs.LG· cs.MM

Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature

Babak Saleh , Ahmed Elgammal This is my paper

classification 💻 cs.CV cs.IRcs.LGcs.MM

keywords similaritypaintingsfeaturesmetricmultimediavisualavailablecollections

0 comments

read the original abstract

In the past few years, the number of fine-art collections that are digitized and publicly available has been growing rapidly. With the availability of such large collections of digitized artworks comes the need to develop multimedia systems to archive and retrieve this pool of data. Measuring the visual similarity between artistic items is an essential step for such multimedia systems, which can benefit more high-level multimedia tasks. In order to model this similarity between paintings, we should extract the appropriate visual features for paintings and find out the best approach to learn the similarity metric based on these features. We investigate a comprehensive list of visual features and metric learning approaches to learn an optimized similarity measure between paintings. We develop a machine that is able to make aesthetic-related semantic-level judgments, such as predicting a painting's style, genre, and artist, as well as providing similarity measures optimized based on the knowledge available in the domain of art historical interpretation. Our experiments show the value of using this similarity measure for the aforementioned prediction tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

NetTailor: Tuning the Architecture, Not Just the Weights
cs.CV 2019-06 unverdicted novelty 7.0

NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for s...
Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM
cs.LG 2026-04 unverdicted novelty 6.0

Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.
The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor
cs.HC 2026-01 conditional novelty 6.0

LAION-Aesthetics Predictor reinforces Western and male biases by preferentially selecting images associated with women and realistic Western/Japanese art while excluding men, LGBTQ+ references, and other styles.
Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition
cs.CV 2025-11 unverdicted novelty 6.0

Insert In Style is a zero-shot framework that disentangles identity, style, and composition via multi-stage training, masked attention, and prior preservation to enable harmonious cross-domain object insertion in images.
The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models
cs.CV 2025-07 unverdicted novelty 6.0

Text-to-image diffusion models exhibit varying degrees of emergent content-style separation in art generation, with content tokens primarily influencing object regions and style tokens affecting backgrounds and textures.
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
cs.CV 2023-11 conditional novelty 6.0

A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.
Linking Art through Human Poses
cs.CV 2019-07 unverdicted novelty 6.0

Human pose similarity matching with spatial verification outperforms standard content-based image retrieval for discovering composition transfers in art on a manually annotated dataset.
Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach
cs.LG 2026-05 unverdicted novelty 5.0

CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks i...
Long Story Short: Disentangling Compositionality and Long-Caption Understanding in Contrastive VLMs
cs.CV 2025-09 unverdicted novelty 5.0

Empirical study shows bidirectional but sensitive relationship between compositionality and long-caption understanding in VLMs, promoted by high-quality grounded data and affected by architectural choices like frozen ...
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
cs.CV 2024-12 accept novelty 5.0

DeepSeek-VL2 is a series of MoE vision-language models using dynamic tiling and latent attention that reach competitive or state-of-the-art results on VQA, OCR, document understanding and grounding with 1.0B to 4.5B a...