pith. sign in

arxiv: 2510.04180 · v2 · pith:SVCGCNMGnew · submitted 2025-10-05 · 💻 cs.CV · cs.LG

Spatially Grounded Concept-Based Image Classification

classification 💻 cs.CV cs.LG
keywords accuracyevidenceconceptsconceptevaluatedgroundedimageimproves
0
0 comments X
read the original abstract

Deep neural networks can achieve high accuracy while relying on evidence that is hard to inspect or misaligned with the intended task. Concept Bottleneck Models (CBMs) expose human-interpretable concepts, but most treat concepts as global attributes and do not show how localized evidence is aggregated into a decision. We propose \textbf{SEG-MIL-CBM}, a spatially grounded CBM that decomposes each image into concept-guided regions and classifies it by attention-based aggregation of segment-level concept evidence. The same segment evidence terms form the prediction and the explanation, exposing which regions and concepts support the predicted logit without a separate post-hoc attribution module. Among evaluated CBM-family baselines, SEG-MIL-CBM improves Waterbirds worst-group accuracy from $65.1\%$ to $72.0\%$, reaches $87.4\%$ worst-group accuracy on Pawrious, remains competitive on standard recognition, and attains the best CBM accuracy on CIFAR-100 ($85.3\%$). Segment-level faithfulness experiments on CUB further show that its learned segment ranking matches or improves over evaluated segment-ranking controls.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.