pith. sign in

arxiv: 1907.02444 · v1 · pith:YQJU3NG6new · submitted 2019-07-04 · 💻 cs.CR · cs.LG

Diffprivlib: The IBM Differential Privacy Library

Pith reviewed 2026-05-25 09:19 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords differential privacypython libraryopen sourcemachine learningdata analyticsprivacy mechanismsdata privacy
0
0 comments X

The pith

The IBM Differential Privacy Library supplies the first single open-source Python codebase for differential privacy mechanisms and applications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces diffprivlib as a Python library that gathers differential privacy mechanisms together with applications for machine learning and data analytics tasks. Earlier research had produced these elements separately on an ad-hoc basis with no shared code base. The library emphasizes simple interfaces so that both newcomers and experts can use or extend it. If the library succeeds in its role as a unifying resource, developers can implement privacy guarantees without rebuilding core components each time.

Core claim

This work presents the IBM Differential Privacy Library as a general purpose, open source library for investigating, experimenting and developing differential privacy applications in Python, containing mechanisms that serve as building blocks alongside applications to machine learning and other data analytics tasks.

What carries the argument

The library's set of differential privacy mechanisms, which function as reusable building blocks, together with its ready-to-use applications for machine learning and data analytics.

If this is right

  • Developers gain standardized implementations of privacy mechanisms instead of writing them separately for each project.
  • Machine learning pipelines can incorporate differential privacy through the library's included applications.
  • Privacy experts can add new models or mechanisms that become available to all users of the library.
  • Newcomers to the field obtain an accessible starting point for experiments without needing to locate scattered prior implementations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use could reduce duplication of effort when adding privacy protections to existing Python data tools.
  • The library structure invites community extensions that might later cover additional data types or analysis methods.
  • Standardization around one codebase may make it easier to compare the privacy-utility trade-offs of different mechanisms.

Load-bearing premise

All prior differential privacy research occurred on an ad-hoc basis without any single unifying codebase.

What would settle it

The discovery of any earlier open-source Python library that already supplied a comparable collection of mechanisms and machine-learning applications under one roof.

Figures

Figures reproduced from arXiv: 1907.02444 by Killian Levacher, Naoise Holohan, P\'ol Mac Aonghusa, Stefano Braghin.

Figure 1
Figure 1. Figure 1: Comparison of accuracy versus ϵ for a di‚eren￾tially private na¨ıve Bayes classi€er on the Iris dataset. For each ϵ, the average accuracy over 30 simulations is shown. >>> from sklearn . model_selection import train_test_split >>> dataset = datasets . load_iris () >>> X_train , X_test , y_train , y_test = train_test_split ( dataset . data , dataset . target , test_size =0.2) We can now use X train and y tr… view at source ↗
read the original abstract

Since its conception in 2006, differential privacy has emerged as the de-facto standard in data privacy, owing to its robust mathematical guarantees, generalised applicability and rich body of literature. Over the years, researchers have studied differential privacy and its applicability to an ever-widening field of topics. Mechanisms have been created to optimise the process of achieving differential privacy, for various data types and scenarios. Until this work however, all previous work on differential privacy has been conducted on a ad-hoc basis, without a single, unifying codebase to implement results. In this work, we present the IBM Differential Privacy Library, a general purpose, open source library for investigating, experimenting and developing differential privacy applications in the Python programming language. The library includes a host of mechanisms, the building blocks of differential privacy, alongside a number of applications to machine learning and other data analytics tasks. Simplicity and accessibility has been prioritised in developing the library, making it suitable to a wide audience of users, from those using the library for their first investigations in data privacy, to the privacy experts looking to contribute their own models and mechanisms for others to use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents diffprivlib, an open-source Python library for differential privacy. It includes implementations of DP mechanisms as building blocks, plus applications to machine learning and data analytics tasks. The work emphasizes simplicity, accessibility for a broad audience, and positions the library as the first general-purpose unifying codebase, contrasting it with prior ad-hoc implementations.

Significance. A well-implemented, documented, and maintained open-source DP library could reduce duplication of effort and lower barriers for researchers and practitioners experimenting with differential privacy. The focus on both mechanisms and end-to-end applications is a constructive contribution if the code is correct, tested, and extensible.

major comments (1)
  1. [Abstract] Abstract: The claim that 'all previous work on differential privacy has been conducted on a ad-hoc basis, without a single, unifying codebase to implement results' is presented without citations, a comparison table, or discussion of any prior libraries or frameworks. This historical assertion is load-bearing for the paper's novelty positioning and requires either supporting references or a softened statement acknowledging the state of the field at the time of writing.
minor comments (1)
  1. [Abstract] Abstract: 'a ad-hoc' should be corrected to 'an ad-hoc'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive suggestion regarding the abstract. We address the comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that 'all previous work on differential privacy has been conducted on a ad-hoc basis, without a single, unifying codebase to implement results' is presented without citations, a comparison table, or discussion of any prior libraries or frameworks. This historical assertion is load-bearing for the paper's novelty positioning and requires either supporting references or a softened statement acknowledging the state of the field at the time of writing.

    Authors: We agree that the phrasing in the abstract is too absolute and would benefit from qualification. We will revise the abstract (and add a short paragraph in the introduction) to acknowledge that while ad-hoc implementations and some domain-specific libraries existed prior to our work, no single general-purpose, unifying open-source Python library covering both mechanisms and end-to-end applications had been presented. The revised text will avoid the unqualified claim of 'all previous work' and instead emphasize the library's scope and design goals. revision: yes

Circularity Check

0 steps flagged

No circularity: library presentation paper with no derivations or predictions

full rationale

The paper describes and releases an open-source Python library for differential privacy mechanisms and applications. It contains no mathematical derivations, equations, fitted parameters, or 'predictions' of any kind. The abstract's historical claim that prior DP work lacked a unifying codebase is an unsubstantiated assertion (not a self-citation or self-definition), but the instructions require circularity only when a derivation chain reduces by construction to its inputs via quoted equations or self-referential fits. No such chain exists here; the contribution is the software artifact itself. This matches the default expectation of no significant circularity (score 0-2) for self-contained non-derivational work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the established definition and properties of differential privacy but introduces no new free parameters, axioms beyond the domain standard, or invented entities.

axioms (1)
  • domain assumption Differential privacy provides robust mathematical guarantees, generalised applicability and rich body of literature.
    Invoked in the opening sentence of the abstract as the de-facto standard.

pith-pipeline@v0.9.0 · 5738 in / 1103 out tokens · 26426 ms · 2026-05-25T09:19:38.751574+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rashomon Sets and Model Multiplicity in Federated Learning

    cs.LG 2026-02 unverdicted novelty 7.0

    The work provides the first formal definitions of Rashomon sets for federated learning and introduces a multiplicity-aware training pipeline evaluated on standard benchmarks.

  2. Differentially Private Modeling of Disease Transmission within Human Contact Networks

    cs.CR 2026-04 unverdicted novelty 6.0

    A differentially private pipeline using node-level DP summaries to fit ERGMs or SBMs, generate synthetic networks, and simulate SIS disease spread on ARTNet sexual contact data produces incidence, prevalence, and inte...

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    B., M/i.sc/r.sc/o.sc/n.sc/o.sc/v.sc, I., T/a.sc/l.sc/w.sc/a.sc/r.sc, K., /a.sc/n.sc/d.sc Z/h.sc/a.sc/n.sc/g.sc, L.Deep learning with differential privacy

    A/b.sc/a.sc/d.sc/i.sc, M., C/h.sc/u.sc, A., G/o.sc/o.sc/d.sc/f.sc/e.sc/l.sc/l.sc/o.sc/w.sc, I., M/c.scM/a.sc/h.sc/a.sc/n.sc, H. B., M/i.sc/r.sc/o.sc/n.sc/o.sc/v.sc, I., T/a.sc/l.sc/w.sc/a.sc/r.sc, K., /a.sc/n.sc/d.sc Z/h.sc/a.sc/n.sc/g.sc, L.Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communicat...

  2. [2]

    In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm ¨assan, Stockholm, Sweden, July 10-15, 2018 (2018), pp

    B/a.sc/l.sc/l.sc/e.sc, B., /a.sc/n.sc/d.sc W /a.sc/n.sc/g.sc, Y.Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm ¨assan, Stockholm, Sweden, July 10-15, 2018 (2018), pp. 403–412

  3. [3]

    D.Differentially private empirical risk minimization

    C/h.sc/a.sc/u.sc/d.sc/h.sc/u.sc/r.sc/i.sc, K., M/o.sc/n.sc/t.sc/e.sc/l.sc/e.sc/o.sc/n.sc/i.sc, C., /a.sc/n.sc/d.sc S/a.sc/r.sc/w.sc/a.sc/t.sc/e.sc, A. D.Differentially private empirical risk minimization. Journal of Machine Learning Research 12 (2011), 1069–1109

  4. [4]

    In Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II

    D/w.sc/o.sc/r.sc/k.sc, C.Differential privacy. In Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 1–12

  5. [5]

    In /T_heory of cryptography

    D/w.sc/o.sc/r.sc/k.sc, C., M/c.scS/h.sc/e.sc/r.sc/r.sc/y.sc, F., N/i.sc/s.sc/s.sc/i.sc/m.sc, K., /a.sc/n.sc/d.sc S/m.sc/i.sc/t.sc/h.sc, A.Calibrating noise to sensitivity in private data analysis. In /T_heory of cryptography. Springer, 2006, pp. 265–284

  6. [6]

    Foundations and Trends in /T_heoretical Computer Science 9, 3-4 (2014), 211–407

    D/w.sc/o.sc/r.sc/k.sc, C., /a.sc/n.sc/d.sc R/o.sc/t.sc/h.sc, A./T_he algorithmic foundations of differential privacy. Foundations and Trends in /T_heoretical Computer Science 9, 3-4 (2014), 211–407

  7. [7]

    Privacy and Utility Tradeoff in Approximate Differential Privacy

    G/e.sc/n.sc/g.sc, Q., D/i.sc/n.sc/g.sc, W., G/u.sc/o.sc, R., /a.sc/n.sc/d.sc K/u.sc/m.sc/a.sc/r.sc, S.Privacy and utility tradeoff in approximate differential privacy. cs.CR abs/1810.00877 (2018)

  8. [8]

    Selected Topics in Signal Processing, IEEE Journal of 9 , 7 (2015), 1176–1184

    G/e.sc/n.sc/g.sc, Q., K/a.sc/i.sc/r.sc/o.sc/u.sc/z.sc, P., O/h.sc, S., /a.sc/n.sc/d.sc V/i.sc/s.sc/w.sc/a.sc/n.sc/a.sc/t.sc/h.sc, P./T_he staircase mechanism in differential privacy. Selected Topics in Signal Processing, IEEE Journal of 9 , 7 (2015), 1176–1184

  9. [9]

    SIAM Journal on Computing 41 , 6 (2012), 1673–1693

    G/h.sc/o.sc/s.sc/h.sc, A., R/o.sc/u.sc/g.sc/h.sc/g.sc/a.sc/r.sc/d.sc/e.sc/n.sc, T., /a.sc/n.sc/d.sc S/u.sc/n.sc/d.sc/a.sc/r.sc/a.sc/r.sc/a.sc/j.sc/a.sc/n.sc, M.Universally utility- maximizing privacy mechanisms. SIAM Journal on Computing 41 , 6 (2012), 1673–1693

  10. [10]

    The Bounded Laplace Mechanism in Differential Privacy

    H/o.sc/l.sc/o.sc/h.sc/a.sc/n.sc, N., A/n.sc/t.sc/o.sc/n.sc/a.sc/t.sc/o.sc/s.sc, S., B/r.sc/a.sc/g.sc/h.sc/i.sc/n.sc, S., /a.sc/n.sc/d.sc M/a.sc/c.sc A/o.sc/n.sc/g.sc/h.sc/u.sc/s.sc/a.sc, P./T_he bounded Laplace mechanism in differential privacy. CoRR abs/1808.10410 (2018)

  11. [11]

    J., /a.sc/n.sc/d.sc M/a.sc/s.sc/o.sc/n.sc, O.Differential privacy in metric spaces: Numerical, categorical and functional data under the one roof

    H/o.sc/l.sc/o.sc/h.sc/a.sc/n.sc, N., L/e.sc/i.sc/t.sc/h.sc, D. J., /a.sc/n.sc/d.sc M/a.sc/s.sc/o.sc/n.sc, O.Differential privacy in metric spaces: Numerical, categorical and functional data under the one roof. Information Sciences 305 (2015), 256–268

  12. [12]

    J., /a.sc/n.sc/d.sc M/a.sc/s.sc/o.sc/n.sc, O.Optimal differentially private mecha- nisms for randomised response

    H/o.sc/l.sc/o.sc/h.sc/a.sc/n.sc, N., L/e.sc/i.sc/t.sc/h.sc, D. J., /a.sc/n.sc/d.sc M/a.sc/s.sc/o.sc/n.sc, O.Optimal differentially private mecha- nisms for randomised response. IEEE Transactions on Information Forensics and Security 12, 11 (Nov 2017), 2726–2735

  13. [13]

    In Foundations of Computer Science, 2007

    M/c.scS/h.sc/e.sc/r.sc/r.sc/y.sc, F., /a.sc/n.sc/d.sc T/a.sc/l.sc/w.sc/a.sc/r.sc, K.Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on (2007), IEEE, pp. 94–103

  14. [14]

    Scikit-learn: Machine learning in Python

    P/e.sc/d.sc/r.sc/e.sc/g.sc/o.sc/s.sc/a.sc, F., V /a.sc/r.sc/o.sc/q.sc_u.sc/a.sc/u.sc/x.sc, G., G/r.sc/a.sc/m.sc/f.sc/o.sc/r.sc/t.sc, A., M/i.sc/c.sc/h.sc/e.sc/l.sc, V., T/h.sc/i.sc/r.sc/i.sc/o.sc/n.sc, B., G/r.sc/i.sc/s.sc/e.sc/l.sc, O., B/l.sc/o.sc/n.sc/d.sc/e.sc/l.sc, M., P/r.sc/e.sc/t.sc/t.sc/e.sc/n.sc/h.sc/o.sc/f.sc/e.sc/r.sc, P., W/e.sc/i.sc/s.sc/s.s...

  15. [15]

    In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy (New York, NY, USA, 2016), CODASPY ’16, ACM, pp

    S/u.sc, D., C/a.sc/o.sc, J., L/i.sc, N., B/e.sc/r.sc/t.sc/i.sc/n.sc/o.sc, E., /a.sc/n.sc/d.sc J/i.sc/n.sc, H.Differentially private k-means clustering. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy (New York, NY, USA, 2016), CODASPY ’16, ACM, pp. 26–37. 4

  16. [16]

    V /a.sc/i.sc/d.sc/y.sc/a.sc, J., S/h.sc/a.sc/f.sc/i.sc/q.sc, B., B/a.sc/s.sc/u.sc, A., /a.sc/n.sc/d.sc H/o.sc/n.sc/g.sc, Y.Differentially private na¨ıve Bayes classi/f_ication. InProceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01 (2013), WI-IAT ’13, IEEE Compute...

  17. [17]

    C., /a.sc/n.sc/d.sc V /a.sc/r.sc/o.sc/q.sc_u.sc/a.sc/u.sc/x.sc, G./T_he NumPy array: A structure for efficient numerical computation

    /v.sc/a.sc/n.sc /d.sc/e.sc/r.sc W /a.sc/l.sc/t.sc, S., C/o.sc/l.sc/b.sc/e.sc/r.sc/t.sc, S. C., /a.sc/n.sc/d.sc V /a.sc/r.sc/o.sc/q.sc_u.sc/a.sc/u.sc/x.sc, G./T_he NumPy array: A structure for efficient numerical computation. Computing in Science and Engineering 13, 2 (2011), 22–30

  18. [18]

    h/t_tps://www.python.org/dev/peps/pep-0008/ [Accessed: 2019-06-21], 2001

    /v.sc/a.sc/n.sc R/o.sc/s.sc/s.sc/u.sc/m.sc, G., W /a.sc/r.sc/s.sc/a.sc/w.sc, B., /a.sc/n.sc/d.sc C/o.sc/g.sc/h.sc/l.sc/a.sc/n.sc, N.PEP 8 – style guide for Python code. h/t_tps://www.python.org/dev/peps/pep-0008/ [Accessed: 2019-06-21], 2001. 5