pith. sign in

arxiv: 2605.15460 · v1 · pith:YER45XPBnew · submitted 2026-05-14 · 💻 cs.IR · cs.AI

Differentially Private Motif-Preserving Multi-modal Hashing

Pith reviewed 2026-05-19 14:51 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords differential privacymulti-modal hashingcross-modal retrievalgraph motifsdegree clippingsynthetic graphedge differential privacystructural distillation
0
0 comments X

The pith

Degree clipping before noisy synthesis lets differentially private multi-modal hashing retain up to 92.5 percent of non-private retrieval accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard differential privacy techniques either ignore graph structure or inject too much noise when applied to similarity graphs built from user interactions. By first clipping every node's degree to a fixed threshold, the maximum change in any triangle count caused by adding or removing one edge becomes a constant that does not grow with dataset size. A private synthetic graph is then produced with Noisy Mirror Descent and fed to dual-stream hashing networks that distill cross-modal alignments through a structural loss. Experiments on MIRFlickr-25K and NUS-WIDE under an inductive protocol demonstrate gains of up to 11.4 mAP points over prior private baselines while keeping most of the original performance.

Core claim

DMP-MH first applies deterministic degree clipping to bound the L2-sensitivity of triangle motifs independently of graph size, then generates a sanitized synthetic graph via Noisy Mirror Descent under edge differential privacy, and finally trains dual-stream hashing networks with a holistic structural loss that transfers the preserved topology into compact binary codes for cross-modal retrieval.

What carries the argument

The Sanitize-then-Distill pipeline that separates sensitivity-bounded graph sanitization from downstream representation learning.

If this is right

  • Cross-modal retrieval systems can now incorporate behavioral similarity graphs without exposing individual user links.
  • Motif counts remain stable enough under privacy noise that downstream binary codes still align images and text effectively.
  • The same clipped-graph sanitization step can be reused for any graph-supervised hashing or embedding task that relies on local structure.
  • Inductive evaluation confirms that the learned hash functions generalize to query items never seen during sanitization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The clipping approach may reduce noise requirements in other motif-driven graph tasks such as community detection or link prediction under privacy constraints.
  • Because the sensitivity bound is now independent of N, the method could scale to much larger interaction graphs than previous DP graph techniques.
  • Replacing triangle motifs with higher-order subgraphs would require only a change in the clipping threshold and loss term, leaving the rest of the pipeline intact.

Load-bearing premise

Clipping every node degree to one fixed threshold limits the effect of any single edge change on triangle counts to a dataset-size-independent constant while leaving enough relational signal for the hashing loss to recover useful alignments.

What would settle it

On the same datasets, replace the clipped graph with its unclipped version and recompute the maximum triangle-count change after one edge flip; if that change grows linearly with the number of nodes, or if the final mAP falls below 80 percent of the non-private baseline, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2605.15460 by Jiahao Sun, Wei Dai, Zehua Cheng.

Figure 1
Figure 1. Figure 1: Overview of the DMP-MH framework. The proposed method operates through three sequential phases. Phase 1 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Cross-modal hashing enables efficient retrieval by encoding images and text into compact binary codes. State-of-the-art methods rely on semantic similarity graphs derived from user interactions for supervision, yet these graphs encode sensitive behavioral patterns vulnerable to link reconstruction attacks. Existing privacy-preserving approaches fail on graph-structured data: Differentially Private SGD destroys relational motifs by treating samples independently, while graph synthesis methods suffer from unbounded local sensitivity in scale-free networks, hub nodes cause single-edge modifications to alter triangle counts by $\mathcal{O}(N)$, necessitating prohibitive noise injection. We term this phenomenon Hubness Explosion. We propose DMP-MH, a Sanitize-then-Distill framework that decouples privacy from representation learning. Our approach first bounds sensitivity by deterministically clipping node degrees, capping the $L_2$-sensitivity of triangle motifs independently of dataset size. A sanitized synthetic graph is then generated via Noisy Mirror Descent under $(\epsilon,\delta)$-Edge Differential Privacy. Finally, dual-stream hashing networks distill this topology using a holistic structural loss that enforces cross-modal alignment. Evaluated on MIRFlickr-25K and NUS-WIDE under a strict inductive protocol, DMP-MH outperforms private baselines by up to 11.4 mAP points while retaining up to 92.5% of non-private performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces DMP-MH, a Sanitize-then-Distill framework for differentially private multi-modal hashing. It identifies 'Hubness Explosion' as the issue where hub nodes in scale-free interaction graphs cause O(N) changes in triangle counts under single-edge modifications, leading to excessive noise in prior DP graph methods. The approach first applies deterministic node-degree clipping to bound L2-sensitivity of triangle motifs independently of dataset size, then generates a sanitized synthetic graph via Noisy Mirror Descent under (ε,δ)-edge DP, and finally distills the topology into dual-stream hashing networks via a holistic structural loss enforcing cross-modal alignment. On MIRFlickr-25K and NUS-WIDE under a strict inductive protocol, it reports outperforming private baselines by up to 11.4 mAP while retaining up to 92.5% of non-private performance.

Significance. If the empirical results are robust, the work would provide a meaningful advance in privacy-preserving cross-modal retrieval by decoupling sensitivity bounding from representation learning and explicitly targeting motif preservation in interaction graphs. The identification of hubness as a distinct sensitivity phenomenon and the use of clipping plus noisy mirror descent constitute a concrete technical contribution. Reproducible evaluation under an inductive split is a positive aspect. However, the significance is limited by the absence of ablations on the clipping threshold and statistical reporting, which are load-bearing for validating that the clipped graphs retain usable structural signal for the hashing objective.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Experimental Evaluation): The central claim of up to 11.4 mAP improvement and 92.5% retention of non-private performance is reported without error bars, standard deviations across runs, or details on how the inductive split was constructed. This makes it impossible to assess whether the gains are statistically reliable or sensitive to the particular train/test partitioning.
  2. [§3.1] §3.1 (Degree Clipping): The deterministic clipping of node degrees is presented as capping L2-sensitivity of triangle motifs independently of N, yet no quantification of retained triangles or motifs post-clipping is given, nor is there an ablation varying the clipping threshold. In scale-free graphs this choice directly affects whether sufficient cross-modal structural supervision remains for the dual-stream networks and holistic loss to recover the reported performance; without such analysis the utility claim rests on an untested assumption.
  3. [§4.3] §4.3 (Baselines and Protocol): The comparison to private baselines lacks description of how the privacy budgets (ε, δ) were matched across methods and whether the same clipping threshold was applied uniformly. This is necessary to isolate whether the reported gains stem from the motif-preserving sanitization or from differences in experimental setup.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'holistic structural loss' is used without a forward reference to its precise formulation (e.g., which combination of motif, alignment, and reconstruction terms it comprises).
  2. [§3] Notation: The L2-sensitivity bound after clipping is stated to be independent of dataset size, but the exact functional dependence on the clipping threshold should be stated explicitly in an equation for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the insightful comments. We agree that additional statistical reporting, ablations, and clarifications on experimental protocols are necessary to strengthen the manuscript. We address each major comment below and will incorporate the suggested revisions.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experimental Evaluation): The central claim of up to 11.4 mAP improvement and 92.5% retention of non-private performance is reported without error bars, standard deviations across runs, or details on how the inductive split was constructed. This makes it impossible to assess whether the gains are statistically reliable or sensitive to the particular train/test partitioning.

    Authors: We concur that the absence of error bars and standard deviations limits the assessment of statistical reliability. We will revise the experimental section to include results averaged over multiple independent runs with reported standard deviations. Additionally, we will expand the description of the inductive split construction in §4 to provide full details on the train/test partitioning procedure, ensuring better reproducibility and allowing evaluation of sensitivity to the split. revision: yes

  2. Referee: [§3.1] §3.1 (Degree Clipping): The deterministic clipping of node degrees is presented as capping L2-sensitivity of triangle motifs independently of N, yet no quantification of retained triangles or motifs post-clipping is given, nor is there an ablation varying the clipping threshold. In scale-free graphs this choice directly affects whether sufficient cross-modal structural supervision remains for the dual-stream networks and holistic loss to recover the reported performance; without such analysis the utility claim rests on an untested assumption.

    Authors: We acknowledge that quantifying the retained triangles after clipping and providing an ablation on the clipping threshold would better support the claim that sufficient structural information is preserved. Although the overall performance indicates effective retention, we agree this requires explicit validation. In the revised manuscript, we will include a new analysis in §3.1 or §4 reporting the fraction of triangles retained post-clipping and an ablation study showing performance variation with different clipping thresholds. revision: yes

  3. Referee: [§4.3] §4.3 (Baselines and Protocol): The comparison to private baselines lacks description of how the privacy budgets (ε, δ) were matched across methods and whether the same clipping threshold was applied uniformly. This is necessary to isolate whether the reported gains stem from the motif-preserving sanitization or from differences in experimental setup.

    Authors: We appreciate this observation. The current manuscript does not provide sufficient detail on matching the privacy budgets and clipping thresholds across baselines. We will update §4.3 to clearly describe that all methods were evaluated under identical (ε, δ) privacy parameters and that the degree clipping threshold was applied uniformly to all graph-based approaches. This revision will better isolate the contribution of the motif-preserving sanitization. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims are empirical measurements, not tautological restatements

full rationale

The paper's derivation proceeds from deterministic degree clipping to bound L2-sensitivity of triangle motifs, through Noisy Mirror Descent for (ε,δ)-edge DP graph synthesis, to distillation in dual-stream hashing networks via a holistic structural loss. These steps are presented as sequential design choices whose outputs are then evaluated empirically on MIRFlickr-25K and NUS-WIDE under an inductive protocol. The reported mAP gains (up to 11.4 points) and retention rates (up to 92.5%) are framed as measured outcomes rather than quantities forced by construction from fitted parameters or self-referential definitions. No equations or self-citations in the abstract reduce the central claims to inputs by definition, and the framework remains self-contained against external benchmarks without load-bearing reductions to prior author work or ansatzes.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that the clipped synthetic graph still encodes the cross-modal semantic relations needed by the hashing objective; this premise is not derived from first principles and is not supported by any external benchmark in the abstract.

free parameters (2)
  • degree clipping threshold
    Chosen to bound L2-sensitivity of triangle counts; value not stated in abstract.
  • privacy budget (ε, δ)
    Controls noise in Mirror Descent; concrete values not reported.
axioms (1)
  • domain assumption Clipped degree sequence yields bounded sensitivity for triangle motifs independent of N
    Invoked when the authors state that clipping caps L2-sensitivity independently of dataset size.

pith-pipeline@v0.9.0 · 5756 in / 1450 out tokens · 78492 ms · 2026-05-19T14:51:28.525043+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 5 internal anchors

  1. [1]

    Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308–318

  2. [2]

    Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multi- modal machine learning: A survey and taxonomy.IEEE transactions on pattern analysis and machine intelligence41, 2 (2018), 423–443

  3. [3]

    Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks.science286, 5439 (1999), 509–512

  4. [4]

    Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organi- zation of complex networks.Science353, 6295 (2016), 163–166

  5. [5]

    Bertsekas

    Dimitri P. Bertsekas. 1999.Nonlinear Programming(2 ed.). Athena Scientific

  6. [6]

    Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip S Yu. 2017. Hashnet: Deep learning to hash by continuation. InProceedings of the IEEE international conference on computer vision. 5608–5617

  7. [7]

    Shixi Chen and Shuigeng Zhou. 2013. Recursive mechanism: towards node differential privacy and unrestricted joins. InProceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 653–664

  8. [8]

    Benny Chor, Eyal Kushilevitz, Oded Goldreich, and Madhu Sudan. 1998. Private information retrieval.Journal of the ACM (JACM)45, 6 (1998), 965–981

  9. [9]

    Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. InProceedings of the ACM international conference on image and video retrieval. 1–9

  10. [10]

    Guiguang Ding, Yuchen Guo, and Jile Zhou. 2014. Collective matrix factorization hashing for multimodal data. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2075–2082

  11. [11]

    Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differen- tial privacy.Foundations and trends®in theoretical computer science9, 3–4 (2014), 211–407

  12. [12]

    Rothblum, and Salil Vadhan

    Cynthia Dwork, Guy N. Rothblum, and Salil Vadhan. 2010. Boosting and differ- ential privacy. In2010 IEEE 51st Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 51–60

  13. [13]

    Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. 1999. Similarity search in high dimensions via hashing. InVldb, Vol. 99. 518–529

  14. [14]

    David R Hardoon, Sandor Szedmak, and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods.Neural computation16, 12 (2004), 2639–2664

  15. [15]

    Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. 2009. Boosting the accuracy of differentially-private histograms through consistency.arXiv preprint arXiv:0904.0942(2009)

  16. [16]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778

  17. [17]

    Xinlei He, Jinyuan Jia, Michael Backes, Neil Zhenqiang Gong, and Yang Zhang

  18. [18]

    In30th USENIX security sympo- sium (USENIX security 21)

    Stealing links from graph neural networks. In30th USENIX security sympo- sium (USENIX security 21). 2669–2686

  19. [19]

    Mark J Huiskes and Michael S Lew. 2008. The mir flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. 39–43

  20. [20]

    Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. InProceedings of the CVPR. 3232–3240

  21. [21]

    Qing-Yuan Jiang and Wu-Jun Li. 2019. Discrete latent factor model for cross- modal hashing.IEEE Transactions on Image Processing28, 7 (2019), 3490–3501

  22. [22]

    Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2013. Analyzing graphs with node differential privacy. InTheory of Cryptography Conference. Springer, 457–476

  23. [23]

    Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra

  24. [24]

    Privacy via the johnson-lindenstrauss transform.arXiv preprint arXiv:1204.2606(2012)

  25. [25]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic opti- mization. In3rd International Conference on Learning Representations (ICLR)

  26. [26]

    Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self- supervised adversarial hashing networks for cross-modal retrieval. InProceedings of the IEEE conference on computer vision and pattern recognition. 4242–4251

  27. [27]

    Song Liu, Shengsheng Qian, Yang Guan, Jiawei Zhan, and Long Ying. 2020. Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. InProceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 1379–1388

  28. [28]

    Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro

  29. [29]

    SIAM Journal on optimization19, 4 (2009), 1574–1609

    Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization19, 4 (2009), 1574–1609

  30. [30]

    Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2007. Smooth sensitivity and sampling in private data analysis. InProceedings of the thirty-ninth annual ACM symposium on Theory of computing. 75–84

  31. [31]

    Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross- modal multimedia retrieval. InProceedings of the 18th ACM international confer- ence on Multimedia. 251–260

  32. [32]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. Dis- tilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108(2019)

  33. [33]

    Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. 2013. Inter- media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the 2013 ACM SIGMOD international conference on management of data. 785–796

  34. [34]

    Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the ICCV. 3027–3035

  35. [35]

    Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, et al. 2017. A survey on learning to hash.IEEE transactions on pattern analysis and machine intelligence 40, 4 (2017), 769–790

  36. [36]

    Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. 2016. A com- prehensive survey on cross-modal retrieval.arXiv preprint arXiv:1607.06215 (2016)

  37. [37]

    Yair Weiss, Antonio Torralba, and Rob Fergus. 2008. Spectral hashing.Advances in neural information processing systems21 (2008)

  38. [38]

    Fan Wu, Yunhui Long, Ce Zhang, and Bo Li. 2022. Linkteller: Recovering private edges from graph neural networks via influence analysis. In2022 ieee symposium on security and privacy (sp). IEEE, 2005–2024

  39. [39]

    Erkun Yang, Cheng Deng, Wei Liu, Xianglong Liu, Dacheng Tao, and Xinbo Gao

  40. [40]

    In proceedings of the AAAI Conference on Artificial Intelligence, Vol

    Pairwise relationship guided deep hashing for cross-modal retrieval. In proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31

  41. [41]

    Peng-Fei Zhang, Guangdong Bai, Hongzhi Yin, and Zi Huang. 2023. Proactive privacy-preserving learning for cross-modal retrieval.ACM Transactions on Information Systems41, 2 (2023), 1–23

  42. [42]

    Elena Zheleva and Lise Getoor. 2009. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. InProceedings of the 18th international conference on World wide web. 531–540

  43. [43]

    Xin Zheng, Yi Wang, Yixin Liu, Ming Li, Miao Zhang, Di Jin, Philip S Yu, and Shirui Pan. 2022. Graph neural networks for graphs with heterophily: A survey. arXiv preprint arXiv:2202.07082(2022)