pith. sign in

arxiv: 2510.18109 · v4 · submitted 2025-10-20 · 💻 cs.CR · cs.LG

PrivaDE: Privacy-preserving Data Evaluation for Blockchain-based Data Marketplaces

Pith reviewed 2026-05-18 05:30 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords privacy-preserving data evaluationblockchain data marketplacessecure model inferenceutility scoringsmart contractsmachine learning data acquisitionmulti-party computation
0
0 comments X

The pith

PrivaDE lets model owners and data owners jointly score dataset utility without exposing models, features or labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PrivaDE, a protocol that lets a model owner and a data owner compute how useful a candidate dataset is for training while keeping the model parameters, raw data, and labels hidden from each other. This setup matters because it could support data trading in blockchain marketplaces where neither side wants to give away proprietary information before a deal. The work adds protections against cheating parties and uses smart contracts to handle execution and payments automatically. It further includes practical speed-ups such as efficient secure inference and a scoring method based on only a small but representative slice of the data.

Core claim

PrivaDE is a privacy-preserving protocol that allows a model owner and a data owner to jointly compute a utility score for a candidate dataset without fully exposing model parameters, raw features, or labels. It supplies strong security against malicious behavior and integrates directly with blockchain-based marketplaces where smart contracts enforce fair execution and payment. To keep the approach practical the protocol adds optimizations for efficient secure model inference together with a model-agnostic scoring technique that relies on a small representative subset of the data while still reflecting its downstream training impact.

What carries the argument

The PrivaDE protocol for secure joint model inference and subset-based utility scoring that ties into blockchain smart contracts for enforcement.

If this is right

  • Blockchain marketplaces can run automated data purchases where payments are released only after a verified utility score is produced.
  • Model owners gain a practical way to test candidate datasets before committing to full acquisition or training.
  • Data providers can demonstrate value without handing over raw records or labels to potential buyers.
  • Evaluations stay feasible for large models, finishing within roughly fifteen minutes of online computation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Protocols of this kind could lower the barrier to sharing data across organizations by reducing the need to trust the other party with full access.
  • Wider adoption might encourage the growth of decentralized machine-learning ecosystems where data quality is assessed on-chain before any transfer occurs.
  • The subset scoring idea could be tested on additional tasks such as regression or reinforcement learning to see how broadly it applies.

Load-bearing premise

A small representative subset of the data can accurately stand in for the dataset's full effect on model training performance.

What would settle it

Train the model once on the full candidate dataset and once using only the small subset chosen for scoring, then check whether the utility score from the subset reliably predicts the accuracy or loss obtained from the full training run.

Figures

Figures reproduced from arXiv: 2510.18109 by Michele Ciampi, Rik Sarkar, Sahel Torkamani, Wan Ki Wong.

Figure 1
Figure 1. Figure 1: Overall design of PrivaDE. The core protocol con￾sists of two parts: a secure inference of the data points and a scoring function that takes the inference results and la￾beled data points as input. Several optimization components, detailed in Section 5, are applied before secure inference to reduce the overhead of the first component. Organization. Section 2 reviews background on machine learn￾ing, cryptog… view at source ↗
Figure 3
Figure 3. Figure 3: Functionality FInference for maliciously secure 2PC. Parameters. Loss ℓ; uncertainty U; diversity D; aggregator 𝑓 ; model owner 𝑃1; data owner 𝑃2; commitment parameter pp; public representative features 𝐷 𝑥 𝑅 . (1) FSubScore receives private input (𝑦 ′ 1 , . . . , 𝑦′ 𝑘 , 𝑟𝑦 ′ 1 , . . . , 𝑟𝑦 ′ 𝑘 , com𝑦1 , . . . , com𝑦𝑘 ) from 𝑃1 and (𝐷 𝑥 𝐵 , 𝑦1, . . . , 𝑦𝑘, 𝑟𝑦1 , . . . , 𝑟𝑦𝑘 , com𝑦 ′ 1 , . . . , com𝑦 ′ 𝑘 ) … view at source ↗
Figure 4
Figure 4. Figure 4: Functionality FSubScore for maliciously secure 2PC. The Setup phase. The setup phase is an offline stage before Alice and Bob run the protocol. In this phase: • Alice and Bob communicate with a trusted third party to obtain cryptographic parameters pp ← SetupCom(1 𝜆 ) and CRS ← SetupZKP(1 𝜆 ). • Bob verifies the authenticity of his dataset 𝐷𝐵 with a data au￾thority, which issues a signature attesting to th… view at source ↗
Figure 6
Figure 6. Figure 6: Conceptual diagram of split model in PrivaDE: Split [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: PrivaDE for secure dataset scoring. Theorem 4.2. PrivaDE securely realizes F 𝑘 Score with malicious security. The proof of Theorem 4.2 will be provided in Appendix D.1. 5 Practical Optimizations for PrivaDE In this section, we introduce the design techniques that make Pri￾vaDE’s secure model inference practical in terms of computation and communication. 5.1 Model Distillation Model distillation for modern … view at source ↗
Figure 7
Figure 7. Figure 7: Decentralized Data Marketplace Workflow with [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Online runtime versus choices of 𝑛, 𝑘, |𝐼 |, and the verification ratio in CnCZK. All parameters except 𝑛 exhibit approximately linear scaling. 7.2 Robustness of the scoring algorithm The scoring algorithm (Algorithm 1) is evaluated against entropy￾based sampling [51] and core-set sampling [49] on MNIST, CIFAR￾10 and CIFAR-100. After normalization and Gaussian blurring, each dataset is split into a pre-tra… view at source ↗
Figure 9
Figure 9. Figure 9: Robustness of our multi-point scoring algorithm vs. other active learning methods. The results show our method is [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Cut-and-choose Protocol CnCZK for model inference verification [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: below. Parameters. Description of function 𝑓 . Let H ∈ {1, 2} and M ∈ {1, 2} \ {H } denote the index of honest party and corrupt party, respectively. (1) F2PC receives input 𝑥H from honest party 𝑃H and input 𝑥M of the corrupt party from simulator S. (2) F2PC computes 𝑦 ← 𝑓 (𝑥1, 𝑥2 ) and sends 𝑦 to S. (3) If F2PC receives abort from S, it sends abort to honest party 𝑃H. Otherwise, F2PC sends 𝑦 to honest pa… view at source ↗
read the original abstract

Evaluating the usefulness of data before purchase is essential when obtaining data for high-quality machine learning models, yet both model builders and data providers are often unwilling to reveal their proprietary assets. We present PrivaDE, a privacy-preserving protocol that allows a model owner and a data owner to jointly compute a utility score for a candidate dataset without fully exposing model parameters, raw features, or labels. PrivaDE provides strong security against malicious behavior and can be integrated into blockchain-based marketplaces, where smart contracts enforce fair execution and payment. To make the protocol practical, we propose optimizations to enable efficient secure model inference, and a model-agnostic scoring method that uses only a small, representative subset of the data while still reflecting its impact on downstream training. Evaluation shows that PrivaDE performs data evaluation effectively, achieving online runtimes within 15 minutes even for models with millions of parameters. Our work lays the foundation for fair and automated data marketplaces in decentralized machine learning ecosystems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents PrivaDE, a privacy-preserving protocol allowing a model owner and data owner to jointly compute a utility score for a candidate dataset without fully exposing model parameters, raw features, or labels. It claims strong security against malicious behavior, integration with blockchain marketplaces via smart contracts for fair execution and payment, optimizations for efficient secure model inference, and a model-agnostic scoring method using only a small representative subset that still reflects downstream training impact. Evaluation reports effective performance with online runtimes under 15 minutes even for models with millions of parameters.

Significance. If the central claims hold, this work could enable practical privacy-preserving data evaluation in decentralized ML ecosystems, supporting fair automated marketplaces. The combination of standard cryptographic building blocks with blockchain enforcement and efficiency optimizations for large models represents a useful engineering contribution, provided the scoring proxy and security guarantees receive rigorous validation.

major comments (2)
  1. [Abstract] Abstract: the claim that the model-agnostic scoring method 'uses only a small, representative subset of the data while still reflecting its impact on downstream training' is load-bearing for the utility score that drives marketplace transactions, yet the manuscript provides no empirical correlation analysis, bounds on distribution shift, or ablation on subset size to substantiate that the proxy accurately predicts full-dataset training impact.
  2. [Abstract] Abstract: the statements that PrivaDE 'provides strong security against malicious behavior' and 'performs data evaluation effectively' lack any security model, formal proofs, threat analysis, detailed experimental setup, error bars, or baseline comparisons, which undermines verifiability of the core protocol claims.
minor comments (2)
  1. Clarify the exact cryptographic primitives and secure inference optimizations (e.g., which MPC or homomorphic encryption variant) with concrete parameter choices to support reproducibility.
  2. Provide pseudocode or a high-level diagram for the joint utility-score computation protocol to improve readability of the construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to provide additional empirical validation, formal security details, and expanded experimental reporting as outlined.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the model-agnostic scoring method 'uses only a small, representative subset of the data while still reflecting its impact on downstream training' is load-bearing for the utility score that drives marketplace transactions, yet the manuscript provides no empirical correlation analysis, bounds on distribution shift, or ablation on subset size to substantiate that the proxy accurately predicts full-dataset training impact.

    Authors: We agree that the manuscript would benefit from explicit empirical support for this claim. The current text describes the subset selection approach (via clustering on feature statistics to ensure representativeness) and its motivation for efficiency, but lacks dedicated correlation studies or ablations. In the revised version, we will add a new subsection in the evaluation with: (i) ablation results for subset sizes of 1%, 5%, and 10% of the data; (ii) Pearson and Spearman correlation coefficients between subset-derived utility scores and full-dataset training accuracy gains across multiple datasets and models; and (iii) distribution shift analysis using metrics such as maximum mean discrepancy. These additions will directly substantiate the proxy's predictive power. revision: yes

  2. Referee: [Abstract] Abstract: the statements that PrivaDE 'provides strong security against malicious behavior' and 'performs data evaluation effectively' lack any security model, formal proofs, threat analysis, detailed experimental setup, error bars, or baseline comparisons, which undermines verifiability of the core protocol claims.

    Authors: The full manuscript (Section 4) presents a security argument based on the malicious security of the underlying MPC primitives (additive secret sharing combined with garbled circuits for inference) under standard assumptions such as the existence of oblivious transfer. It also includes a high-level threat model covering malicious deviation by either party. However, we acknowledge the absence of a formal security definition, proof sketches, and rigorous experimental reporting. We will revise by: adding a dedicated threat model and security definitions subsection with a proof sketch; expanding the evaluation section with detailed setup parameters, error bars from repeated runs (n=5), and baseline comparisons against non-private evaluation and alternative privacy-preserving methods. These changes will improve verifiability while preserving the existing protocol design. revision: yes

Circularity Check

0 steps flagged

No significant circularity; protocol is a new construction on standard primitives

full rationale

The paper introduces PrivaDE as a fresh cryptographic protocol for joint utility scoring using secure multi-party computation and blockchain smart contracts. The model-agnostic scoring method with a small representative subset is presented as a practical optimization whose correctness is argued via empirical evaluation rather than by construction from fitted parameters or prior self-citations. No derivation step reduces the claimed utility score, security guarantees, or marketplace integration to an input that is defined in terms of the output itself. The central claims rest on the protocol construction and standard cryptographic assumptions, which are externally verifiable and not load-bearing on self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from cryptography for security against malicious parties and on the unproven effectiveness of the subset-based scoring method to approximate full-dataset utility.

axioms (1)
  • domain assumption Underlying secure multi-party computation primitives provide strong security against malicious adversaries
    Invoked to support the claim of strong security against malicious behavior in the protocol.

pith-pipeline@v0.9.0 · 5703 in / 1321 out tokens · 51719 ms · 2026-05-18T05:30:58.759378+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 3 internal anchors

  1. [1]

    Blumberg, Eleftherios Ioannidis, and Jess Woods

    Sebastian Angel, Andrew J. Blumberg, Eleftherios Ioannidis, and Jess Woods

  2. [2]

    In31st USENIX Security Symposium (USENIX Security 22)

    Efficient Representation of Numerical Optimization Problems for SNARKs. In31st USENIX Security Symposium (USENIX Security 22). 4273–4290

  3. [3]

    Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal

    Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. 2020. Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds. PrivaDE: Privacy-preserving Data Evaluation for Blockchain-based Data Marketplaces ASIA CCS ’26, June 01–05, 2026, Bangalore, India

  4. [4]

    Marta Bellés-Muñoz, Miguel Isabel, Jose Luis Muñoz-Tapia, Albert Rubio, and Jordi Baylina. 2023. Circom: A Circuit Description Language for Building Zero- Knowledge Applications.IEEE Transactions on Dependable and Secure Computing 20, 6 (2023), 4733–4751

  5. [5]

    Ella Bingham and Heikki Mannila. 2001. Random projection in dimensionality reduction: Applications to image and text data. InProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 245–250

  6. [6]

    Manuel Blum. 1981. Coin Flipping by Telephone. InAdvances in Cryptology – Proceedings of CRYPTO ’81 (Lecture Notes in Computer Science, Vol. 196). 11–15

  7. [7]

    Joseph Bonneau, Andrew Miller, Jeremy Clark, Arvind Narayanan, Joshua A Kroll, and Edward W Felten. 2015. SoK: Research Perspectives and Challenges for Bitcoin and Cryptocurrencies. In2015 IEEE Symposium on Security and Privacy. 104–121

  8. [8]

    Vitalik Buterin. 2014. A Next-Generation Smart Contract and Decentralized Application Platform

  9. [9]

    Cardoso, Rodrigo M

    Thiago N.C. Cardoso, Rodrigo M. Silva, Sérgio Canuto, Mirella M. Moro, and Marcos A. Gonçalves. 2017. Ranked batch-mode active learning.Information Sciences379 (2017), 313–337

  10. [10]

    Lingjiao Chen, Paraschos Koutris, and Arun Kumar. 2019. Towards model-based pricing for machine learning in a data marketplace. InProceedings of the 2019 international conference on management of data. 1535–1552

  11. [11]

    Jang Hyun Cho and Bharath Hariharan. 2019. On the Efficacy of Knowledge Distillation. In2019 IEEE/CVF International Conference on Computer Vision (ICCV). 4793–4801. doi:10.1109/ICCV.2019.00489

  12. [12]

    R Dennis Cook. 1977. Detection of influential observation in linear regression. Technometrics19, 1 (1977), 15–18

  13. [13]

    Cramer, I

    R. Cramer, I. Damgård, D. Escudero, P. Scholl, and C. Xing. 2018. SpdZ2k: Efficient MPC mod 2k for Dishonest Majority. InAdvances in Cryptology – CRYPTO 2018. 769–798

  14. [14]

    Sanjoy Dasgupta. 2011. Two faces of active learning.Theoretical computer science 412, 19 (2011), 1767–1781

  15. [15]

    Amit Datta, Anupam Datta, Ariel D Procaccia, and Yair Zick. 2015. Influence in classification via cooperative game theory.arXiv preprint arXiv:1505.00036 (2015)

  16. [16]

    Zahra Ghodsi, Tianyu Gu, and Siddharth Garg. 2017. SafetyNets: verifiable execution of deep neural networks on an untrusted cloud. InProceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). 4675–4684

  17. [17]

    Amirata Ghorbani, Michael Kim, and James Zou. 2020. A distributional framework for data valuation. InInternational Conference on Machine Learning. 3535–3544

  18. [18]

    Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. InInternational Conference on Machine Learning (ICML). 2242–2251

  19. [19]

    Hanafy, Hossam Eldin A

    Mohamed E. Hanafy, Hossam Eldin A. Hassan, Mohamed S. Abdel-Latif, and Sherif A. Elgamel. 2018. Performance evaluation of deceptive and noise jamming on SAR focused image. InProceedings of the 11th International Conference on Electrical Engineering (ICEENG 2018). 1–11

  20. [20]

    Zecheng He, Tianwei Zhang, and Ruby B. Lee. 2019. Model inversion attacks against collaborative inference. InProceedings of the 35th Annual Computer Secu- rity Applications Conference (ACSAC ’19). 148–162

  21. [21]

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network.arXiv preprint arXiv:1503.02531(2015)

  22. [22]

    Tong-Yu Hsieh, Shang-En Chan, Chao-Ru Chen, Pao-Chien Li, and Chi-Hsuan Ho. 2018. No-Reference Error-Tolerability Evaluation for Videos via Edge and Extreme-Value Checking. InProceedings of the 2018 Workshop on Approximate Computing Across the Stack (W AX 2018). 1–6

  23. [23]

    Chenyu Huang, Jianzong Wang, Huangxun Chen, Shijing Si, Zhangcheng Huang, and Jing Xiao. 2022. zkMLaaS: a Verifiable Scheme for Machine Learning as a Service. InGLOBECOM 2022 - 2022 IEEE Global Communications Conference. 5475–5480. doi:10.1109/GLOBECOM48099.2022.10000784

  24. [24]

    Ruoxi Jia, Xuehui Sun, Jiacen Xu, Ce Zhang, Bo Li, and Dawn Song. 2019. An empirical and comparative analysis of data valuation with scalable algorithms. (2019)

  25. [25]

    Johnson and Joram Lindenstrauss

    William B. Johnson and Joram Lindenstrauss. 1984. Extensions of Lipschitz mappings into a Hilbert space. InConference in Modern Analysis and Probability, Vol. 26. 189–206

  26. [26]

    Daniel Kang, Tatsunori Hashimoto, Ion Stoica, and Yi Sun. 2022. Scaling up Trustless DNN Inference with Zero-Knowledge Proofs

  27. [27]

    Marcel Keller. 2020. MP-SPDZ: A Versatile Framework for Multi-Party Computa- tion

  28. [28]

    Marcel Keller, Emmanuela Orsini, and Peter Scholl. 2016. MASCOT: Faster Malicious Arithmetic Secure Computation with Oblivious Transfer. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). 830–842. doi:10.1145/2976749.2978357

  29. [29]

    Aggelos Kiayias, Hong-Sheng Zhou, and Vassilis Zikas. 2015. Fair and Robust Multi-Party Computation using a Global Transaction Ledger

  30. [30]

    Durk P Kingma and Prafulla Dhariwal. 2018. Glow: Generative Flow with In- vertible 1x1 Convolutions. InAdvances in Neural Information Processing Systems, Vol. 31

  31. [31]

    Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. InInternational conference on machine learning. 1885–1894

  32. [32]

    Nishat Koti, Mahak Pancholi, Arpita Patra, and Ajith Suresh. 2021. SWIFT: Super- fast and Robust Privacy-Preserving Machine Learning. In30th USENIX Security Symposium (USENIX Security 21). 2651–2668

  33. [33]

    Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, and Dan Suciu. 2015. Query-based data pricing.Journal of the ACM (JACM)62, 5 (2015), 1–44

  34. [34]

    2009.Learning Multiple Layers of Features from Tiny Images

    Alex Krizhevsky. 2009.Learning Multiple Layers of Features from Tiny Images. Technical Report. Citeseer

  35. [35]

    Yann LeCun and Corinna Cortes. 2010. MNIST Handwritten Digit Database

  36. [36]

    Tianyi Liu, Xiang Xie, and Yupeng Zhang. 2021. zkCNN: Zero Knowledge Proofs for Convolutional Neural Network Predictions and Accuracy. InCCS ’21. 2968–

  37. [37]

    doi:10.1145/3460120.3485379

  38. [38]

    Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. 2013. Distributed submodular maximization: Identifying representative elements in massive data.Advances in Neural Information Processing Systems26 (2013)

  39. [39]

    Satoshi Nakamoto. 2008. Bitcoin: A Peer-to-Peer Electronic Cash System

  40. [40]

    George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. 1978. An analysis of approximations for maximizing submodular set functions—I.Mathematical programming14 (1978), 265–294

  41. [41]

    Olga Ohrimenko, Shruti Tople, and Sebastian Tschiatschek. 2019. Collaborative machine learning markets with data-replication-robust payments.arXiv preprint arXiv:1911.09052(2019)

  42. [42]

    Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Knockoff Nets: Stealing Functionality of Black-Box Models. InCVPR

  43. [43]

    Arpita Patra and Ajith Suresh. 2020. BLAZE: Blazing Fast Privacy-Preserving Machine Learning. InNetwork and Distributed System Security Symposium (NDSS). doi:10.14722/ndss.2020.24202

  44. [44]

    George-Liviu Pereteanu, Amir Alansary, and Jonathan Passerat-Palmbach. 2022. Split HE: Fast Secure Inference Combining Split Learning and Homomorphic Encryption. InPPAI’22: Proceedings of the Third AAAI Workshop on Privacy- Preserving Artificial Intelligence. doi:10.48550/arXiv.2202.13351

  45. [45]

    Joseph Poon and Thaddeus Dryja. 2016. The Bitcoin Lightning Network: Scalable Off-Chain Instant Payments

  46. [46]

    Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. 2020. Estimating training data influence by tracing gradient descent.Advances in Neural Information Processing Systems33 (2020), 19920–19930

  47. [47]

    Xinyuan Qian, Hongwei Li, Guowen Xu, Haoyong Wang, Tianwei Zhang, Xi- anhao Chen, and Yuguang Fang. 2024. Privacy-Preserving Data Evaluation via Functional Encryption, Revisited. InIEEE INFOCOM 2024 - IEEE Conference on Computer Communications. 11–20. doi:10.1109/INFOCOM52122.2024.10621262

  48. [48]

    Yuma Rao. [n.d.]. Bittensor: A Peer-to-Peer Intelligence Market

  49. [49]

    Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FitNets: Hints for Thin Deep Nets. In International Conference on Learning Representations (ICLR)

  50. [50]

    Benedek Rozemberczki, Lauren Watson, Péter Bayer, Hao-Tsung Yang, Olivér Kiss, Sebastian Nilsson, and Rik Sarkar. 2022. The Shapley Value in Machine Learning

  51. [51]

    Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach

  52. [52]

    2009.Active Learning Literature Survey

    Burr Settles. 2009.Active Learning Literature Survey. Technical Report 1648. University of Wisconsin–Madison

  53. [53]

    Burr Settles and Mark Craven. 2008. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. InProceedings of the Conference on Empirical Methods in Natural Language Processing. 1070–1079

  54. [54]

    Rachael Hwee Ling Sim, Yehong Zhang, Mun Choon Chan, and Bryan Kian Hsiang Low. 2020. Collaborative Machine Learning with Incentive-Aware Model Rewards. InProceedings of the 37th International Conference on Machine Learning (ICML’20)

  55. [55]

    Qiyang Song, Jiahao Cao, Kun Sun, Qi Li, and Ke Xu. 2021. Try before You Buy: Privacy-preserving Data Evaluation on Cloud-based Machine Learning Data Marketplace. InProceedings of the 37th Annual Computer Security Applications Conference (ACSAC ’21). 260–272. doi:10.1145/3485832.3485921

  56. [56]

    Zhihua Tian, Jian Liu, Jingyu Li, Xinle Cao, Ruoxi Jia, and Kui Ren. 2022. Private Data Valuation and Fair Payment in Data Marketplaces.CoRRabs/2210.08723 (2022). doi:10.48550/ARXIV.2210.08723

  57. [57]

    tpmmthomas. 2025. Secure Data Valuation. https://github.com/tpmmthomas/ secure-data-valuation/tree/asiaccs2026. GitHub repository, accessed 13 Decem- ber 2025

  58. [58]

    Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. 2018. Split learning for health: Distributed deep learning without sharing raw patient data.CoRRabs/1812.00564 (2018)

  59. [59]

    Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, and Dawn Song. 2020. A Principled Approach to Data Valuation for Federated Learning

  60. [60]

    Gavin Wood. 2014. Ethereum: A Secure Decentralised Generalised Transaction Ledger. ASIA CCS ’26, June 01–05, 2026, Bangalore, India Wong et al

  61. [61]

    Lin Yang, Yizhe Zhang, Jianxu Chen, Siyuan Zhang, and Danny Z. Chen. 2017. Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation

  62. [62]

    Donggeun Yoo and In So Kweon. 2019. Learning loss for active learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 93–102

  63. [63]

    Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transfer- able are features in deep neural networks?. InProceedings of the 28th Interna- tional Conference on Neural Information Processing Systems - Volume 2 (NIPS’14). 3320–3328

  64. [64]

    Kaichao You, Yong Liu, Jianmin Wang, and Mingsheng Long. 2021. LogME: Practical Assessment of Pre-trained Models for Transfer Learning. InProceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139). 12133–12143

  65. [65]

    Boshi Yuan, Shixuan Yang, Yongxiang Zhang, Ning Ding, Dawu Gu, and Shi- Feng Sun. 2024. MD-ML: Super Fast Privacy-Preserving Machine Learning for Malicious Security with a Dishonest Majority. In33rd USENIX Security Symposium. 2227–2244

  66. [66]

    Mengxiao Zhang, Fernando Beltrán, and Jiamou Liu. 2023. A survey of data pricing for data marketplaces.IEEE Transactions on Big Data9, 4 (2023), 1038– 1056

  67. [67]

    Yuke Zhang, Dake Chen, Souvik Kundu, Haomei Liu, Ruiheng Peng, and Peter A. Beerel. 2025. C2PI: An Efficient Crypto-Clear Two-Party Neural Network Pri- vate Inference. InProceedings of the 60th Annual ACM/IEEE Design Automation Conference (DAC ’23). 1–6. doi:10.1109/DAC56929.2023.10247682

  68. [68]

    Yansong Zhang, Xiaojun Chen, Qinghui Zhang, Ye Dong, and Xudong Chen

  69. [69]

    Helix: Scalable Multi-Party Machine Learning Inference against Malicious Adversaries

  70. [70]

    Fedor Zhdanov. 2019. Diverse mini-batch Active Learning

  71. [71]

    Shuyuan Zheng, Yang Cao, and Masatoshi Yoshikawa. 2023. Secure Shapley Value for Cross-Silo Federated Learning.Proc. VLDB Endow.16, 7 (2023), 1657–1670. doi:10.14778/3587136.3587141

  72. [72]

    Xiaokai Zhou, Xiao Yan, Fangcheng Fu, Ziwen Fu, Tieyun Qian, Yuanyuan Zhu, Qinbo Zhang, Bin Cui, and Jiawei Jiang. 2025. PS-MI: Accurate, Efficient, and Private Data Valuation in Vertical Federated Learning.Proc. VLDB Endow.18, 10 (2025), 3559–3572. doi:10.14778/3748191.3748215

  73. [73]

    zkonduit. 2023. ezkl. https://github.com/zkonduit/ezkl. Accessed: 20 February 2025. A Subprotocol definitions A.1 The CP Protocol The CP protocol, as mentioned in section 5.3.2, is used to verify the correctness of representative set selection. Its full definition is shown in Fig. 10. Parties:Model owner𝑃 1, data owner𝑃 2 Public parameters:Indices 𝐼𝑅 ={𝑖 ...