Mitigating Spurious Background Bias in Multimedia Recognition with Disentangled Concept Bottlenecks

Gaoxiang Huang; Songning Lai; Yutao Yue

arxiv: 2510.15770 · v3 · submitted 2025-10-17 · 💻 cs.CV · cs.LG

Mitigating Spurious Background Bias in Multimedia Recognition with Disentangled Concept Bottlenecks

Gaoxiang Huang , Songning Lai , Yutao Yue This is my paper

Pith reviewed 2026-05-18 06:09 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords concept bottleneck modelsdisentangled representationsbackground bias mitigationinterpretable image recognitionlightweight neural networksmultimedia recognitionconcept alignmentspurious correlation

0 comments

The pith

A lightweight disentangled concept bottleneck model groups visual features into human-aligned concepts to cut spurious background bias without region annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes LDCBM to fix how concept bottleneck models map images to concepts, which often picks up irrelevant background signals instead of the intended objects. It adds a filter grouping loss plus joint concept supervision so that visual features get organized into separate, meaningful bundles that better match what humans would label as concepts. This produces higher accuracy on both concept prediction and final classification while adding less than five percent to parameter count and computation compared with a basic CBM. Tests on three datasets plus background-mask interventions show the model can ignore irrelevant regions and still make correct decisions. The result is more controllable and trustworthy concept-based reasoning for image and multimedia tasks.

Core claim

LDCBM automatically groups convolutional filters into semantically coherent components through a filter grouping loss and joint concept supervision, producing visual-to-concept mappings that align more closely with human concepts, raise both concept and class accuracy, and allow explicit suppression of background regions even without any region-level labels.

What carries the argument

Filter grouping loss together with joint concept supervision inside the Lightweight Disentangled Concept Bottleneck Model (LDCBM), which partitions visual feature channels into concept-specific groups without region annotations.

If this is right

LDCBM records higher concept prediction and final class accuracy than earlier concept bottleneck models across three diverse datasets.
Parameter count and FLOPs stay within five percent of a vanilla CBM while delivering the gains.
Background mask interventions demonstrate that the model can actively suppress predictions driven by irrelevant image regions.
The resulting visual-to-concept mapping is more precise, supporting more reliable concept-based decision strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same grouping mechanism could be applied to video or audio concept models where background or noise cues similarly mislead intermediate representations.
If the learned groups prove stable across domains, the approach could lower the annotation burden for building interpretable systems in new visual tasks.
The method hints that explicit disentanglement at the filter level may generalize beyond CBMs to other architectures that suffer from spurious correlations.

Load-bearing premise

The combination of filter grouping loss and joint concept supervision will automatically produce groupings of visual features that correspond to human concepts and ignore background signals even though no region annotations or explicit supervision on important image areas is provided.

What would settle it

On a dataset containing clear background-object correlations, background-mask intervention on LDCBM would fail to change concept predictions more than the same intervention on a vanilla CBM, or concept accuracy would not rise relative to prior CBMs.

Figures

Figures reproduced from arXiv: 2510.15770 by Gaoxiang Huang, Songning Lai, Yutao Yue.

**Figure 2.** Figure 2: Overview of our LDCBM. It includes two stages modules: Alignment of disentangled visual-pattern with [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Intervention result of performing correct and incorrect randomly concept interventions in fine-grained and [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Concept Bottleneck Models (CBMs) enhance interpretability by predicting human-understandable concepts as intermediate representations. However, existing CBMs often suffer from input-to-concept mapping bias and limited controllability, which restricts their practical utility and undermines the reliability of concept-based strategies. To address these challenges, we propose a Lightweight Disentangled Concept Bottleneck Model (LDCBM) that automatically groups visual features into semantically meaningful components without the need for region annotations. By introducing a filter grouping loss and joint concept supervision, our method improves the alignment between visual patterns and concepts, enabling more transparent and robust decision-making. Notably, experiments on three diverse datasets demonstrate that LDCBM achieves higher concept and class accuracy, outperforming previous CBMs in both interpretability and classification performance. Complexity analysis reveals that the parameter count and FLOPs of LDCBM are less than 5% higher than those of Vanilla CBM. Furthermore, background mask intervention experiments validate the model's strong capability to suppress irrelevant image regions, further corroborating the high precision of the visual-concept mapping under LDCBM's lightweight design paradigm. By grounding concepts in visual evidence, our method overcomes a fundamental limitation of prior models and enhances the reliability of interpretable AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LDCBM adds a filter grouping loss to CBMs for background suppression but the evidence for semantic rather than statistical alignment is still thin.

read the letter

The main point is that this paper introduces LDCBM, a lightweight extension of concept bottleneck models that uses a filter grouping loss plus joint concept supervision to reduce background bias in multimedia recognition tasks without needing region annotations. It keeps parameter and FLOP overhead under 5% compared to vanilla CBM while claiming gains in both concept accuracy and final classification on three datasets, plus a background mask intervention that shows the model can ignore irrelevant regions.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Lightweight Disentangled Concept Bottleneck Model (LDCBM) for Concept Bottleneck Models (CBMs) in multimedia recognition. It introduces a filter grouping loss combined with joint concept supervision to automatically group visual features into semantically meaningful components without region annotations or explicit spatial supervision. The central claims are that this mitigates input-to-concept mapping bias and spurious background bias, yielding higher concept and class accuracy than prior CBMs on three diverse datasets, with parameter count and FLOPs less than 5% higher than a vanilla CBM, plus effective suppression of irrelevant background regions as shown by mask intervention experiments.

Significance. If the empirical results and the semantic alignment of the learned groupings hold under scrutiny, the work would offer a practical, low-overhead extension to CBMs that improves both predictive performance and the reliability of concept-based explanations in settings prone to background bias. The lightweight design and lack of requirement for region annotations could make the approach more deployable than prior disentanglement methods in computer vision.

major comments (2)

[§3 (Method) and §4 (Experiments)] The central claim that the filter grouping loss plus joint concept supervision produces groupings that are semantically meaningful (i.e., aligned with human concepts) rather than merely statistical correlations rests on the experimental outcomes, yet the manuscript provides no ablation isolating the contribution of the grouping loss, no concept localization metrics, and no filter visualizations compared against human-annotated regions. This leaves open the possibility that reported accuracy gains arise from regularization effects or dataset correlations instead of true disentanglement.
[§4.3 (Background Mask Intervention)] Background mask intervention results are presented as validation of high-precision visual-concept mapping, but without quantitative suppression scores, comparison to baseline CBMs under the same intervention protocol, or statistical significance tests, it is difficult to determine whether the observed robustness is load-bearing evidence for the disentanglement claim or an artifact of the intervention design.

minor comments (2)

[Abstract] The abstract states performance gains and background suppression but does not report concrete accuracy numbers, baseline names, or dataset details; these should be added for immediate readability even if full tables appear later.
[§3.2] Notation for the filter grouping loss (e.g., how filters are grouped and how the loss is balanced with the concept supervision term) should be made fully explicit with an equation reference in the method section to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us identify areas to strengthen the manuscript. We address each major comment below and outline the revisions we will make to improve clarity and evidence for our claims.

read point-by-point responses

Referee: [§3 (Method) and §4 (Experiments)] The central claim that the filter grouping loss plus joint concept supervision produces groupings that are semantically meaningful (i.e., aligned with human concepts) rather than merely statistical correlations rests on the experimental outcomes, yet the manuscript provides no ablation isolating the contribution of the grouping loss, no concept localization metrics, and no filter visualizations compared against human-annotated regions. This leaves open the possibility that reported accuracy gains arise from regularization effects or dataset correlations instead of true disentanglement.

Authors: We appreciate this observation and agree that an explicit ablation isolating the filter grouping loss would provide clearer evidence of its role in achieving semantic alignment beyond regularization. In the revised manuscript, we will add an ablation study comparing variants with and without the grouping loss, reporting impacts on both concept and class accuracy across the three datasets. We will also include filter visualizations to demonstrate the learned groupings. Regarding direct comparisons to human-annotated regions, our approach is designed to operate without region annotations, so quantitative localization metrics against such annotations are not feasible with the current datasets; however, we will strengthen the qualitative analysis and clarify how the joint concept supervision encourages semantic rather than purely statistical groupings. We maintain that the accuracy improvements and mask intervention results support the disentanglement claim, but the added ablation will make this more rigorous. revision: yes
Referee: [§4.3 (Background Mask Intervention)] Background mask intervention results are presented as validation of high-precision visual-concept mapping, but without quantitative suppression scores, comparison to baseline CBMs under the same intervention protocol, or statistical significance tests, it is difficult to determine whether the observed robustness is load-bearing evidence for the disentanglement claim or an artifact of the intervention design.

Authors: We agree that quantitative metrics would make the background mask intervention results more compelling as evidence for the disentanglement. In the revision, we will add quantitative suppression scores (e.g., change in concept activation or prediction accuracy when background regions are masked), direct comparisons to baseline CBMs using the identical intervention protocol, and statistical significance tests (e.g., paired t-tests or Wilcoxon tests across multiple runs). These additions will better isolate the contribution of our method's visual-concept mapping precision. revision: yes

standing simulated objections not resolved

Direct quantitative comparison of filter visualizations against human-annotated regions is not possible without region annotations in the evaluation datasets, which our method explicitly avoids requiring.

Circularity Check

0 steps flagged

No circularity: empirical method with experimental validation

full rationale

The paper proposes LDCBM as an empirical architecture that adds a filter grouping loss and joint concept supervision to standard CBMs, then validates the approach through accuracy metrics, complexity comparisons, and background-mask intervention experiments on three datasets. No derivation chain, equation, or first-principles claim reduces by construction to quantities defined by the model's own fitted parameters or self-referential definitions. Claims rest on observed performance differences rather than tautological predictions or self-citation load-bearing uniqueness theorems. The method is therefore self-contained against external benchmarks and receives a normal non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review identifies no explicit free parameters, axioms, or invented entities; the method extends standard CBM components with two new loss terms whose precise formulations are not detailed here.

pith-pipeline@v0.9.0 · 5757 in / 1171 out tokens · 56994 ms · 2026-05-18T06:09:36.530236+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a filter grouping loss and joint concept supervision... Lg(θ,A)=−∑k Sintra_k / Sinter_k ... spectral cluster to optimize the set of group A

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges,

Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong, “Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges,” July 2021

work page 2021
[2]

Prototypical Networks for Few-shot Learning,

Jake Snell, Kevin Swersky, and Richard S. Zemel, “Prototypical Networks for Few-shot Learning,” June 2017

work page 2017
[3]

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment,

Harrish Thasarathan, Julian Forsyth, Thomas Fel, Matthew Kowal, and Konstantinos Derpanis, “Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment,” Feb. 2025

work page 2025
[4]

Concept Bottleneck Models,

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang, “Concept Bottleneck Models,” Dec. 2020

work page 2020
[5]

Lan- guage in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification,

Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar, “Lan- guage in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification,” Apr. 2023. 7 Under review

work page 2023
[6]

Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off,

Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelan- gelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lio, and Mateja Jamnik, “Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off,” Dec. 2022

work page 2022
[7]

Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations,

Xinyue Xu, Yi Qin, Lu Mi, Hao Wang, and Xiaomeng Li, “Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations,” Dec. 2024

work page 2024
[8]

Probabilistic Concept Bottleneck Models,

Eunji Kim, Dahuin Jung, Sangha Park, Siwon Kim, and Sungroh Yoon, “Probabilistic Concept Bottleneck Models,” June 2023

work page 2023
[9]

Incremental Residual Concept Bottleneck Models,

Chenming Shang, Shiji Zhou, Hengyuan Zhang, Xinzhe Ni, Yujiu Yang, and Yuwang Wang, “Incremental Residual Concept Bottleneck Models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11030–11040

work page 2024
[10]

VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance,

Divyansh Srivastava, Ge Yan, and Tsui-Wei Weng, “VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance,”

work page
[11]

Auxiliary Losses for Learning Generalizable Concept-based Models,

Ivaxi Sheth and Samira Ebrahimi Kahou, “Auxiliary Losses for Learning Generalizable Concept-based Models,” Nov. 2023

work page 2023
[12]

A Theoretical design of Concept Sets: Improving the predictability of concept bottleneck models,

Max Ruiz Luyten, “A Theoretical design of Concept Sets: Improving the predictability of concept bottleneck models,”

work page
[13]

Coarse-to-Fine Concept Bottleneck Models,

Konstantinos P Panousis, Dino Ienco, and Diego Marcos, “Coarse-to-Fine Concept Bottleneck Models,”

work page
[14]

On the Concept Trustworthi- ness in Concept Bottleneck Models,

Qihan Huang, Jie Song, Jingwen Hu, Haofei Zhang, Yong Wang, and Mingli Song, “On the Concept Trustworthi- ness in Concept Bottleneck Models,” Mar. 2024

work page 2024
[15]

The Decoupling Concept Bottleneck Model,

Rui Zhang, Xingbo Du, Junchi Yan, and Shihua Zhang, “The Decoupling Concept Bottleneck Model,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 2, pp. 1250–1265, Feb. 2025

work page 2025
[16]

Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models,

Yan Xie, Zequn Zeng, Hao Zhang, Yucheng Ding, Yi Wang, Zhengjue Wang, Bo Chen, and Hongwei Liu, “Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models,” May 2025

work page 2025
[17]

Explain via Any Concept: Concept Bottleneck Model with Open V ocabulary Concepts,

Andong Tan, Fengtao Zhou, and Hao Chen, “Explain via Any Concept: Concept Bottleneck Model with Open V ocabulary Concepts,” inComputer Vision – ECCV 2024, Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol, Eds., Cham, 2025, pp. 123–138, Springer Nature Switzerland

work page 2024
[18]

Interpretable Compositional Convolutional Neural Networks,

Wen Shen, Zhihua Wei, Shikun Huang, Binbin Zhang, Jiaqi Fan, Ping Zhao, and Quanshi Zhang, “Interpretable Compositional Convolutional Neural Networks,” July 2021

work page 2021
[19]

Interpretable Compositional Representations for Robust Few-Shot Generalization,

Samarth Mishra, Pengkai Zhu, and Venkatesh Saligrama, “Interpretable Compositional Representations for Robust Few-Shot Generalization,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1496–1512, Mar. 2024

work page 2024
[20]

IPNet: Interpretable Prototype Network for Multi-Source Domain Adaptation,

Rui Chen, Haifeng Xia, Siyu Xia, Ming Shao, and Zhengming Ding, “IPNet: Interpretable Prototype Network for Multi-Source Domain Adaptation,” inICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2025, pp. 1–5

work page 2025
[21]

Prototypical Part Transformer for Interpretable Image Recognition,

Anni Yu and Yu-Bin Yang, “Prototypical Part Transformer for Interpretable Image Recognition,” inICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2025, pp. 1–5

work page 2025
[22]

A Survey on Compositional Learning of AI Models: Theoretical and Experimental Practices,

Sania Sinha, Tanawan Premsri, and Parisa Kordjamshidi, “A Survey on Compositional Learning of AI Models: Theoretical and Experimental Practices,” Nov. 2024

work page 2024
[23]

Learning Latent Variable Models by Pairwise Cluster Comparison,

Nuaman Asbeh and Boaz Lerner, “Learning Latent Variable Models by Pairwise Cluster Comparison,” in Proceedings of the Asian Conference on Machine Learning. Nov. 2012, pp. 33–48, PMLR

work page 2012
[24]

Pearson Correlation Coefficient,

Jiguang Wang, “Pearson Correlation Coefficient,” inEncyclopedia of Systems Biology, pp. 1671–1671. Springer, New York, NY , 2013

work page 2013
[25]

Learning AND-OR Templates for Object Recognition and Detection,

Zhangzhang Si and Song-Chun Zhu, “Learning AND-OR Templates for Object Recognition and Detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 9, pp. 2189–2205, Sept. 2013

work page 2013
[26]

Fine-grained Visual-textual Representation Learning,

Xiangteng He and Yuxin Peng, “Fine-grained Visual-textual Representation Learning,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 520–531, Feb. 2020

work page 2020
[27]

Deep Learning Face Attributes in the Wild,

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang, “Deep Learning Face Attributes in the Wild,” Sept. 2015

work page 2015
[28]

Zero-Shot Learning – A Comprehensive Evaluation of the Good, the Bad and the Ugly,

Yongqin Xian, Christoph H. Lampert, Bernt Schiele, and Zeynep Akata, “Zero-Shot Learning – A Comprehensive Evaluation of the Good, the Bad and the Ugly,” Sept. 2020. 8

work page 2020

[1] [1]

Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges,

Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong, “Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges,” July 2021

work page 2021

[2] [2]

Prototypical Networks for Few-shot Learning,

Jake Snell, Kevin Swersky, and Richard S. Zemel, “Prototypical Networks for Few-shot Learning,” June 2017

work page 2017

[3] [3]

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment,

Harrish Thasarathan, Julian Forsyth, Thomas Fel, Matthew Kowal, and Konstantinos Derpanis, “Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment,” Feb. 2025

work page 2025

[4] [4]

Concept Bottleneck Models,

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang, “Concept Bottleneck Models,” Dec. 2020

work page 2020

[5] [5]

Lan- guage in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification,

Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar, “Lan- guage in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification,” Apr. 2023. 7 Under review

work page 2023

[6] [6]

Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off,

Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelan- gelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lio, and Mateja Jamnik, “Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off,” Dec. 2022

work page 2022

[7] [7]

Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations,

Xinyue Xu, Yi Qin, Lu Mi, Hao Wang, and Xiaomeng Li, “Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations,” Dec. 2024

work page 2024

[8] [8]

Probabilistic Concept Bottleneck Models,

Eunji Kim, Dahuin Jung, Sangha Park, Siwon Kim, and Sungroh Yoon, “Probabilistic Concept Bottleneck Models,” June 2023

work page 2023

[9] [9]

Incremental Residual Concept Bottleneck Models,

Chenming Shang, Shiji Zhou, Hengyuan Zhang, Xinzhe Ni, Yujiu Yang, and Yuwang Wang, “Incremental Residual Concept Bottleneck Models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11030–11040

work page 2024

[10] [10]

VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance,

Divyansh Srivastava, Ge Yan, and Tsui-Wei Weng, “VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance,”

work page

[11] [11]

Auxiliary Losses for Learning Generalizable Concept-based Models,

Ivaxi Sheth and Samira Ebrahimi Kahou, “Auxiliary Losses for Learning Generalizable Concept-based Models,” Nov. 2023

work page 2023

[12] [12]

A Theoretical design of Concept Sets: Improving the predictability of concept bottleneck models,

Max Ruiz Luyten, “A Theoretical design of Concept Sets: Improving the predictability of concept bottleneck models,”

work page

[13] [13]

Coarse-to-Fine Concept Bottleneck Models,

Konstantinos P Panousis, Dino Ienco, and Diego Marcos, “Coarse-to-Fine Concept Bottleneck Models,”

work page

[14] [14]

On the Concept Trustworthi- ness in Concept Bottleneck Models,

Qihan Huang, Jie Song, Jingwen Hu, Haofei Zhang, Yong Wang, and Mingli Song, “On the Concept Trustworthi- ness in Concept Bottleneck Models,” Mar. 2024

work page 2024

[15] [15]

The Decoupling Concept Bottleneck Model,

Rui Zhang, Xingbo Du, Junchi Yan, and Shihua Zhang, “The Decoupling Concept Bottleneck Model,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 2, pp. 1250–1265, Feb. 2025

work page 2025

[16] [16]

Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models,

Yan Xie, Zequn Zeng, Hao Zhang, Yucheng Ding, Yi Wang, Zhengjue Wang, Bo Chen, and Hongwei Liu, “Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models,” May 2025

work page 2025

[17] [17]

Explain via Any Concept: Concept Bottleneck Model with Open V ocabulary Concepts,

Andong Tan, Fengtao Zhou, and Hao Chen, “Explain via Any Concept: Concept Bottleneck Model with Open V ocabulary Concepts,” inComputer Vision – ECCV 2024, Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol, Eds., Cham, 2025, pp. 123–138, Springer Nature Switzerland

work page 2024

[18] [18]

Interpretable Compositional Convolutional Neural Networks,

Wen Shen, Zhihua Wei, Shikun Huang, Binbin Zhang, Jiaqi Fan, Ping Zhao, and Quanshi Zhang, “Interpretable Compositional Convolutional Neural Networks,” July 2021

work page 2021

[19] [19]

Interpretable Compositional Representations for Robust Few-Shot Generalization,

Samarth Mishra, Pengkai Zhu, and Venkatesh Saligrama, “Interpretable Compositional Representations for Robust Few-Shot Generalization,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1496–1512, Mar. 2024

work page 2024

[20] [20]

IPNet: Interpretable Prototype Network for Multi-Source Domain Adaptation,

Rui Chen, Haifeng Xia, Siyu Xia, Ming Shao, and Zhengming Ding, “IPNet: Interpretable Prototype Network for Multi-Source Domain Adaptation,” inICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2025, pp. 1–5

work page 2025

[21] [21]

Prototypical Part Transformer for Interpretable Image Recognition,

Anni Yu and Yu-Bin Yang, “Prototypical Part Transformer for Interpretable Image Recognition,” inICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2025, pp. 1–5

work page 2025

[22] [22]

A Survey on Compositional Learning of AI Models: Theoretical and Experimental Practices,

Sania Sinha, Tanawan Premsri, and Parisa Kordjamshidi, “A Survey on Compositional Learning of AI Models: Theoretical and Experimental Practices,” Nov. 2024

work page 2024

[23] [23]

Learning Latent Variable Models by Pairwise Cluster Comparison,

Nuaman Asbeh and Boaz Lerner, “Learning Latent Variable Models by Pairwise Cluster Comparison,” in Proceedings of the Asian Conference on Machine Learning. Nov. 2012, pp. 33–48, PMLR

work page 2012

[24] [24]

Pearson Correlation Coefficient,

Jiguang Wang, “Pearson Correlation Coefficient,” inEncyclopedia of Systems Biology, pp. 1671–1671. Springer, New York, NY , 2013

work page 2013

[25] [25]

Learning AND-OR Templates for Object Recognition and Detection,

Zhangzhang Si and Song-Chun Zhu, “Learning AND-OR Templates for Object Recognition and Detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 9, pp. 2189–2205, Sept. 2013

work page 2013

[26] [26]

Fine-grained Visual-textual Representation Learning,

Xiangteng He and Yuxin Peng, “Fine-grained Visual-textual Representation Learning,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 520–531, Feb. 2020

work page 2020

[27] [27]

Deep Learning Face Attributes in the Wild,

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang, “Deep Learning Face Attributes in the Wild,” Sept. 2015

work page 2015

[28] [28]

Zero-Shot Learning – A Comprehensive Evaluation of the Good, the Bad and the Ugly,

Yongqin Xian, Christoph H. Lampert, Bernt Schiele, and Zeynep Akata, “Zero-Shot Learning – A Comprehensive Evaluation of the Good, the Bad and the Ugly,” Sept. 2020. 8

work page 2020