pith. sign in

arxiv: 2503.13874 · v1 · submitted 2025-03-18 · 💻 cs.LG

Multi-label feature selection based on binary hashing learning and dynamic graph constraints

Pith reviewed 2026-05-23 00:06 UTC · model grok-4.3

classification 💻 cs.LG
keywords multi-label feature selectionbinary hashingdynamic graph constraintspseudo-labelsfeature selectionmachine learningaugmented Lagrangian multiplier
0
0 comments X

The pith

Binary hashing codes serve as pseudo-labels to build more reliable dynamic graphs in multi-label feature selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BHDG, a multi-label feature selection approach that replaces continuous pseudo-labels with low-dimensional binary hashing codes. These codes aim to cut noise from irrelevant labels while constructing a dynamic graph that constrains the sample projection space more reliably. The method adds label graph constraints, inner product minimization, and l2,1-norm regularization, then solves the resulting problem with the augmented Lagrangian multiplier technique. Experiments across ten benchmark datasets show BHDG ranking first overall against ten competing methods on six metrics, with an average rank improvement of at least 2.7 places over the next-best approach.

Core claim

BHDG is the first method to integrate binary hashing into multi-label learning by using low-dimensional binary hashing codes as pseudo-labels to reduce noise and improve representation robustness, constructing a dynamically constrained sample projection space based on the graph structure of these binary pseudo-labels, incorporating label graph constraints and inner product minimization within the sample space, and adding an l2,1-norm regularization term to facilitate feature selection, all optimized via the augmented Lagrangian multiplier method.

What carries the argument

Low-dimensional binary hashing codes used as pseudo-labels to construct dynamic graph constraints on the sample projection space.

If this is right

  • The binary pseudo-labels yield graph structures that support more accurate feature selection than those built from continuous values.
  • Adding label graph constraints and inner product minimization further improves pseudo-label quality beyond the hashing step alone.
  • The l2,1-norm term combined with the binary codes enables effective selection of discriminative features across multiple labels.
  • The ALM optimization successfully handles the binary variables without requiring relaxation to continuous values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same binary hashing step could be tested as a drop-in replacement for continuous pseudo-labels inside other graph-based multi-label algorithms.
  • If binary codes consistently reduce label noise, they may also improve performance in related tasks such as multi-label classification without explicit feature selection.
  • The approach leaves open whether the performance gain scales with increasing label cardinality or dataset size.

Load-bearing premise

Low-dimensional binary hashing codes reduce noise from irrelevant labels and produce more reliable dynamic graph structures than continuous pseudo-labels.

What would settle it

An experiment that keeps every other component of BHDG fixed but swaps the binary hashing codes for continuous pseudo-labels and measures whether performance drops on the same ten datasets would falsify the central claim.

Figures

Figures reproduced from arXiv: 2503.13874 by Changqin Huang, Cong Guo, Wenhua Zhou, Xiaodi Huang.

Figure 1
Figure 1. Figure 1: An example of continuous and binary pseudo-label learning. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overall framework of the proposed BHDG. In this framework, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Results on Amphibians with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Results on Enron with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results on Yeast with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Results on Reuters with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Results on image with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Results on Langlog with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Results on Science with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Results on Entertainment with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Results on Corel5k with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Results on yelp with different numbers of features. [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Nemenyi test on 11 algorithms for HL,RL,OE,CV,AP, and Macro- [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Classification results with different λ1 while keeping other parameters unchanged [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Classification results with different λ2 while keeping other parameters unchanged. • BHDG1: Set λ2=0, meaning that the dynamic graph constraint term is removed from the objective function of BHDG. • BHDG2: Remove the binary constraint on the hash matrix B, which makes B numerical labels. The results of these experiments are presented in [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Classification results with different λ3 while keeping other parameters unchanged [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Classification results with different ρ while keeping other parameters unchanged [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Classification results with different α while keeping other parameters unchanged. 5.7 Analysis of convergence In this subsection, we examine the convergence behavior of the BHDG method across six datasets. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p019_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Convergence analysis. Author Contributions Cong Guo: Writing-original draft, Conceptualization, Validation, Methodology; Changqin Huang: Writing-review and editing, Supervision. Wenhua Zhou: Writing-review and editing. Xiaodi Huang: Writing-review and editing. Conflicts of Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared … view at source ↗
read the original abstract

Multi-label learning poses significant challenges in extracting reliable supervisory signals from the label space. Existing approaches often employ continuous pseudo-labels to replace binary labels, improving supervisory information representation. However, these methods can introduce noise from irrelevant labels and lead to unreliable graph structures. To overcome these limitations, this study introduces a novel multi-label feature selection method called Binary Hashing and Dynamic Graph Constraint (BHDG), the first method to integrate binary hashing into multi-label learning. BHDG utilizes low-dimensional binary hashing codes as pseudo-labels to reduce noise and improve representation robustness. A dynamically constrained sample projection space is constructed based on the graph structure of these binary pseudo-labels, enhancing the reliability of the dynamic graph. To further enhance pseudo-label quality, BHDG incorporates label graph constraints and inner product minimization within the sample space. Additionally, an $l_{2,1}$-norm regularization term is added to the objective function to facilitate the feature selection process. The augmented Lagrangian multiplier (ALM) method is employed to optimize binary variables effectively. Comprehensive experiments on 10 benchmark datasets demonstrate that BHDG outperforms ten state-of-the-art methods across six evaluation metrics. BHDG achieves the highest overall performance ranking, surpassing the next-best method by an average of at least 2.7 ranks per metric, underscoring its effectiveness and robustness in multi-label feature selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces BHDG, a multi-label feature selection method that generates low-dimensional binary hashing codes as pseudo-labels to reduce noise from irrelevant labels, constructs a dynamically constrained sample projection space from the graph of these binary codes, incorporates label graph constraints and inner product minimization, adds an l_{2,1}-norm regularization term for feature selection, and solves the resulting objective via the augmented Lagrangian multiplier (ALM) method. Comprehensive experiments on 10 benchmark datasets are reported to show that BHDG outperforms ten state-of-the-art methods across six evaluation metrics, achieving the highest overall performance ranking.

Significance. If the reported empirical ranking holds under standard statistical validation, the work would demonstrate a practical benefit of binary pseudo-labels over continuous ones for constructing reliable dynamic graphs in multi-label feature selection. The coherent integration of hashing, graph constraints, and ALM optimization provides a clear algorithmic contribution that could be adopted in related multi-label tasks.

minor comments (2)
  1. [Abstract, §4] Abstract and §4 (Experiments): the claim of outperformance would be strengthened by explicit reporting of the experimental protocol, including how the 10 datasets were split, the range of hashing code lengths tested, the procedure for selecting the trade-off regularization parameters, and whether paired statistical tests (e.g., Wilcoxon) were applied to the six metrics.
  2. [§3] §3 (Method): the description of the dynamic graph construction from binary codes should clarify whether the graph is recomputed at each ALM iteration or only once, as this affects both computational cost and the interpretation of the “dynamic” constraint.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces an algorithmic method BHDG whose objective combines binary hashing pseudo-labels, dynamic graph construction, label-graph constraints, inner-product minimization, and l2,1 regularization, solved by ALM. Its central claim is empirical ranking on 10 external benchmark datasets across 6 metrics. No derivation chain is supplied that reduces a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction; the reported superiority is therefore settled by the external tables rather than by internal redefinition of inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach relies on several assumptions about the benefits of binary codes and dynamic graphs, plus multiple free parameters in the optimization objective.

free parameters (2)
  • trade-off parameters for regularization terms
    The objective function includes l2,1-norm and other terms whose weights are likely hyperparameters tuned during experiments.
  • hashing code length
    Low-dimensional binary codes have a dimension that must be chosen.
axioms (2)
  • domain assumption Binary hashing codes provide more robust pseudo-labels than continuous values by reducing noise from irrelevant labels
    This is the core motivation stated in the abstract for using binary codes.
  • domain assumption The graph structure derived from binary pseudo-labels leads to a more reliable dynamic constraint on the sample projection space
    Assumed to enhance reliability of the dynamic graph.

pith-pipeline@v0.9.0 · 5778 in / 1390 out tokens · 87556 ms · 2026-05-23T00:06:55.823882+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

  1. [1]

    Dhal and C

    P. Dhal and C. Azad, ”A comprehensive survey on feature selection in the various fields of machine learning,” Applied Intelligence, vol. 52, no. 4, pp. 4543-4581, 2022

  2. [2]

    R. J. Urbanowicz, M. Meeker, W. La Cava, R. S. Olson, and J. H. Moore, ”Relief-based feature selection: Introduction and review,” Journal of biomedical informatics, vol. 85, pp. 189-203, 2018

  3. [3]

    W. Qian, J. Huang, F. Xu, W. Shu, and W. Ding, ”A survey on multi-label feature selection from perspectives of label fusion,” Information Fusion, vol. 100, p. 101948, 2023

  4. [4]

    Li et al., ”Feature selection: A data perspective,” ACM computing surveys (CSUR), vol

    J. Li et al., ”Feature selection: A data perspective,” ACM computing surveys (CSUR), vol. 50, no. 6, pp. 1-45, 2017

  5. [5]

    C. Guo, W. Yang, C. Liu, and Z. Li, ”Iterative missing value imputation based on feature importance,” Knowl- edge and Information Systems, pp. 1-28, 2024

  6. [6]

    C. Guo, W. Yang, Z. Li, and C. Liu, ”A novel feature selection framework for incomplete data,” Chemometrics and Intelligent Laboratory Systems, p. 105193, 2024

  7. [7]

    Zhang and Y

    Y . Zhang and Y . Ma, ”Non-negative multi-label feature selection with dynamic graph constraints,” Knowledge- Based Systems, vol. 238, p. 107924, 2022

  8. [8]

    Zhang, W

    Y . Zhang, W. Huo, and J. Tang, ”Multi-label feature selection via latent representation learning and dynamic graph constraints,” Pattern Recognition, vol. 151, p. 110411, 2024

  9. [9]

    J. Hu, Y . Li, G. Xu, and W. Gao, ”Dynamic subspace dual-graph regularized multi-label feature selection,” Neurocomputing, vol. 467, pp. 184-196, 2022

  10. [10]

    L. Zhen, P. Hu, X. Wang, and D. Peng, ”Deep supervised cross-modal retrieval,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10394-10403

  11. [11]

    L. Zhu, C. Zheng, W. Guan, J. Li, Y . Yang, and H. T. Shen, ”Multi-modal hashing for efficient multimedia retrieval: A survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 1, pp. 239-260, 2023

  12. [12]

    Z. He, Y . Lin, Z. Lin, and C. Wang, ”Multi-label feature selection via similarity constraints with non-negative matrix factorization,” Knowledge-Based Systems, vol. 297, p. 111948, 2024

  13. [13]

    Kileel, A

    J. Kileel, A. Moscovich, N. Zelesko, and A. Singer, ”Manifold learning with arbitrary norms,” Journal of Fourier Analysis and Applications, vol. 27, no. 5, p. 82, 2021. 20 PRIME AI paper

  14. [14]

    L. Jian, J. Li, K. Shu, and H. Liu, ”Multi-label informed feature selection,” in IJCAI, 2016, vol. 16, pp. 1627-33

  15. [15]

    Li and H

    H. Li and H. Zhai, ”Random Manifold Sampling and Joint Sparse Regularization for Multi-Label Feature Selection,” Big Data Research, vol. 32, p. 100383, 2023

  16. [16]

    Y . Li, L. Hu, and W. Gao, ”Label correlations variation for robust multi-label feature selection,” Information Sciences: An International Journal, 2022

  17. [17]

    Shang, H

    R. Shang, H. Chi, Y . Li, and L. Jiao, ”Adaptive graph regularization and self-expression for noise-aware feature selection,” Neurocomputing, vol. 535, pp. 107-122, 2023

  18. [18]

    Z. Qin, H. Chen, Y . Mi, C. Luo, S.-J. Horng, and T. Li, ”Multi-label Feature selection with adaptive graph learning and label information enhancement,” Knowledge-Based Systems, vol. 285, p. 111363, 2024

  19. [19]

    J. Ma, F. Xu, and X. Rong, ”Discriminative multi-label feature selection with adaptive graph diffusion,” Pattern Recognition, vol. 148, p. 110154, 2024

  20. [20]

    Q. Zhou, Q. Wang, Q. Gao, M. Yang, and X. Gao, ”Unsupervised Discriminative Feature Selection via Con- trastive Graph Learning,” IEEE Transactions on Image Processing, 2024

  21. [21]

    Y . Wang, X. Luo, and X.-S. Xu, ”Label embedding online hashing for cross-modal retrieval,” in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 871-879

  22. [22]

    D. Shi, L. Zhu, J. Li, Z. Zhang, and X. Chang, ”Unsupervised adaptive feature selection with binary hashing,” IEEE Transactions on Image Processing, vol. 32, pp. 838-853, 2023

  23. [23]

    Lee and H

    D. Lee and H. S. Seung, ”Algorithms for non-negative matrix factorization,” Advances in neural information processing systems, vol. 13, 2000

  24. [24]

    Z. Lin, M. Chen, and Y . Ma, ”The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices,” arXiv preprint arXiv:1009.5055, 2010

  25. [25]

    Tsoumakas, E

    G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, and I. Vlahavas, ”Mulan: A java library for multi-label learning,” The Journal of Machine Learning Research, vol. 12, pp. 2411-2414, 2011

  26. [26]

    Asuncion and D

    A. Asuncion and D. Newman, ”UCI machine learning repository,” ed: Irvine, CA, USA, 2007

  27. [27]

    Spola ˆor, E

    N. Spola ˆor, E. A. Cherman, M. C. Monard, and H. D. Lee, ”ReliefF for multi-label feature selection,” in 2013 Brazilian Conference on Intelligent Systems, 2013: IEEE, pp. 6-11

  28. [28]

    F. Nie, H. Huang, X. Cai, and C. Ding, ”Efficient and robust feature selection via joint l2, 1-norms minimiza- tion,” Advances in neural information processing systems, vol. 23, 2010

  29. [29]

    J. Liu, S. Ji, and J. Ye, ”Multi-task feature learning via efficient l2, 1-norm minimization,” arXiv preprint arXiv:1205.2631, 2012

  30. [30]

    Chang, F

    X. Chang, F. Nie, Y . Yang, and H. Huang, ”A convex formulation for semi-supervised multi-label feature selection,” in Proceedings of the AAAI conference on artificial intelligence, 2014, vol. 28, no. 1

  31. [31]

    Cai and W

    Z. Cai and W. Zhu, ”Multi-label feature selection via feature manifold learning and sparsity regularization,” International journal of machine learning and cybernetics, vol. 9, pp. 1321-1334, 2018

  32. [32]

    Zhang, Z

    J. Zhang, Z. Luo, C. Li, C. Zhou, and S. Li, ”Manifold regularized discriminative feature selection for multi- label learning,” Pattern Recognition, vol. 95, pp. 136-150, 2019

  33. [33]

    W. Gao, Y . Li, and L. Hu, ”Multilabel feature selection with constrained latent structure shared term,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 3, pp. 1253-1262, 2021

  34. [34]

    Y . Liu, H. Chen, T. Li, and W. Li, ”A robust graph based multi-label feature selection considering feature-label dependency,” Applied Intelligence, vol. 53, no. 1, pp. 837-863, 2023

  35. [35]

    J. Hu, Y . Li, W. Gao, and P. Zhang, ”Robust multi-label feature selection with dual-graph regularization,” Knowledge-Based Systems, vol. 203, p. 106126, 2020

  36. [36]

    Zhang and Z.-H

    M.-L. Zhang and Z.-H. Zhou, ”ML-KNN: A lazy learning approach to multi-label learning,” Pattern recognition, vol. 40, no. 7, pp. 2038-2048, 2007

  37. [37]

    Z. Sun, H. Xie, J. Liu, and Y . Yu, ”Multi-label feature selection via adaptive dual-graph optimization,” Expert Systems with Applications, vol. 243, p. 122884, 2024

  38. [38]

    Dem ˇsar, ”Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine learning research, vol

    J. Dem ˇsar, ”Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine learning research, vol. 7, pp. 1-30, 2006. 21