Aim High, Stay Private: Differentially Private Synthetic Data Enables Public Release of Behavioral Health Information with High Utility

Christopher M. Danforth; Joseph P. Near; Juniper Lovato; Laura S. P. Bloomfield; Matthew Price; Mohsen Ghasemizade; Peter Sheridan Dodds; Team LEMURS

arxiv: 2507.02971 · v1 · submitted 2025-06-30 · 💻 cs.CR · cs.CY

Aim High, Stay Private: Differentially Private Synthetic Data Enables Public Release of Behavioral Health Information with High Utility

Mohsen Ghasemizade , Juniper Lovato , Christopher M. Danforth , Peter Sheridan Dodds , Laura S. P. Bloomfield , Matthew Price , Team LEMURS , Joseph P. Near This is my paper

Pith reviewed 2026-05-19 07:28 UTC · model grok-4.3

classification 💻 cs.CR cs.CY

keywords differential privacysynthetic databehavioral healthwearable devicesprivacy utility tradeoffdata releaseAIM mechanismre-identification risk

0 comments

The pith

Synthetic data generated with differential privacy at epsilon=5 retains adequate predictive utility for a real behavioral health study while reducing re-identification risks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to release sensitive data from a wearable-device and survey study of first-year college students without exposing individuals to privacy attacks. It generates synthetic versions of the LEMURS dataset using the Adaptive Iterative Mechanism under differential privacy. Tests across privacy budgets from epsilon=1 to 100 reveal that epsilon=5 still supports accurate predictions on tasks drawn from actual uses of the original data. This offers a concrete route for public data sharing that conventional de-identification cannot match.

Core claim

The authors generate differentially private synthetic data for the LEMURS behavioral health dataset using the Adaptive Iterative Mechanism and demonstrate that datasets produced at epsilon=5 preserve adequate predictive utility for downstream tasks while significantly mitigating privacy risks, as measured by a utility framework informed by real uses of the original records.

What carries the argument

The Adaptive Iterative Mechanism (AIM), which builds synthetic data by iteratively refining noisy statistics to meet a chosen differential privacy budget across many attributes and records.

If this is right

Public release of the synthetic LEMURS data becomes feasible at epsilon=5 without exposing participants to standard re-identification attacks.
Researchers can run predictive models on the released data and obtain results close to those from the protected original records.
The same generation and evaluation steps can be repeated on other multi-attribute health datasets to decide acceptable epsilon values.
Data stewards gain a reproducible method to document privacy-utility trade-offs before sharing behavioral health information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be applied to other wearable-health collections to test whether epsilon=5 remains sufficient when the number of attributes or participants changes.
If institutions adopt this workflow, review boards might require explicit epsilon reporting for any public behavioral data release.
Extending the utility tests to tasks outside the original framework, such as longitudinal trend analysis, would clarify the limits of the current findings.

Load-bearing premise

The chosen utility evaluation framework, built from existing uses of the LEMURS dataset, accurately reflects the downstream tasks that future users of the released data will perform.

What would settle it

A new prediction task, such as forecasting a specific mental-health outcome not included in the paper's evaluation, that shows substantially lower accuracy on the epsilon=5 synthetic data than on the original data would falsify the utility claim.

Figures

Figures reproduced from arXiv: 2507.02971 by Christopher M. Danforth, Joseph P. Near, Juniper Lovato, Laura S. P. Bloomfield, Matthew Price, Mohsen Ghasemizade, Peter Sheridan Dodds, Team LEMURS.

**Figure 2.** Figure 2: ROC curves for the Closest-Distance membership [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Spearman correlation heatmaps for the original and synthetic Oura datasets across various [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Spearman correlation heatmaps for the original and synthetic survey datasets at various [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: UMAP projections of the original and synthetic [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 7.** Figure 7: L1 and L2 marginal errors across different [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

Sharing health and behavioral data raises significant privacy concerns, as conventional de-identification methods are susceptible to privacy attacks. Differential Privacy (DP) provides formal guarantees against re-identification risks, but practical implementation necessitates balancing privacy protection and the utility of data. We demonstrate the use of DP to protect individuals in a real behavioral health study, while making the data publicly available and retaining high utility for downstream users of the data. We use the Adaptive Iterative Mechanism (AIM) to generate DP synthetic data for Phase 1 of the Lived Experiences Measured Using Rings Study (LEMURS). The LEMURS dataset comprises physiological measurements from wearable devices (Oura rings) and self-reported survey data from first-year college students. We evaluate the synthetic datasets across a range of privacy budgets, epsilon = 1 to 100, focusing on the trade-off between privacy and utility. We evaluate the utility of the synthetic data using a framework informed by actual uses of the LEMURS dataset. Our evaluation identifies the trade-off between privacy and utility across synthetic datasets generated with different privacy budgets. We find that synthetic data sets with epsilon = 5 preserve adequate predictive utility while significantly mitigating privacy risks. Our methodology establishes a reproducible framework for evaluating the practical impacts of epsilon on generating private synthetic datasets with numerous attributes and records, contributing to informed decision-making in data sharing practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They apply the AIM mechanism to the LEMURS wearable-plus-survey dataset and report that epsilon=5 keeps predictive utility while cutting re-identification risk, but the utility checks may miss harder downstream uses.

read the letter

The core point is straightforward: the authors take an off-the-shelf differentially private synthetic data generator, AIM, and run it on the LEMURS collection of Oura ring readings and first-year student surveys. They sweep epsilon from 1 to 100 and conclude that epsilon=5 still supports the prediction tasks they tested while giving a formal privacy guarantee. That is the practical takeaway a colleague would want to know first.

Referee Report

2 major / 1 minor

Summary. The manuscript applies the Adaptive Iterative Mechanism (AIM) to generate differentially private synthetic data from the LEMURS behavioral health dataset (Oura ring physiological measurements and self-reported surveys from first-year college students). It evaluates synthetic datasets across epsilon values from 1 to 100 and concludes that epsilon=5 preserves adequate predictive utility for downstream tasks while substantially reducing privacy risks, proposing a reproducible evaluation framework grounded in actual uses of the LEMURS data.

Significance. If the utility results hold under broader validation, the work provides a practical demonstration of releasing sensitive multi-attribute behavioral health data publicly under formal differential privacy guarantees. It supplies concrete guidance on privacy-utility trade-offs for high-dimensional datasets and a template for epsilon selection that could support data-sharing practices in health research.

major comments (2)

[Utility evaluation framework] The central claim that epsilon=5 preserves adequate predictive utility rests on an evaluation framework informed by actual uses of the LEMURS dataset. However, the manuscript does not demonstrate that the chosen predictive tasks and metrics capture critical statistical properties for behavioral-health research, such as joint distributions, temporal correlations between wearables and surveys, or heterogeneity across student subgroups. If these properties degrade faster under AIM at epsilon=5, the reported trade-off does not generalize to the full range of downstream analyses.
[Methods and experimental setup] The abstract reports an epsilon sweep and utility evaluation, yet the methods description lacks error bars, baseline comparisons (e.g., non-private synthetic data or alternative DP mechanisms), and statistical tests for the utility results. Without these, it is not possible to confirm that the epsilon=5 conclusion is robust rather than influenced by post-hoc choices or task selection.

minor comments (1)

[Abstract] The abstract could more explicitly state the specific predictive metrics and thresholds used to define 'adequate' utility and 'significant' privacy mitigation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. The comments highlight important opportunities to strengthen the evaluation framework and experimental rigor. We address each major comment below and indicate where revisions will be made to the next version of the manuscript.

read point-by-point responses

Referee: [Utility evaluation framework] The central claim that epsilon=5 preserves adequate predictive utility rests on an evaluation framework informed by actual uses of the LEMURS dataset. However, the manuscript does not demonstrate that the chosen predictive tasks and metrics capture critical statistical properties for behavioral-health research, such as joint distributions, temporal correlations between wearables and surveys, or heterogeneity across student subgroups. If these properties degrade faster under AIM at epsilon=5, the reported trade-off does not generalize to the full range of downstream analyses.

Authors: We appreciate the referee's emphasis on broader statistical fidelity. Our predictive tasks were selected to reflect documented downstream uses of the LEMURS data in prior behavioral-health analyses. We agree that explicit checks on joint distributions, cross-modal correlations, and subgroup heterogeneity would better support generalizability claims. In the revised manuscript we will add marginal and pairwise correlation fidelity metrics between wearable and survey attributes, plus a basic subgroup stability check (e.g., by gender and academic major where sample sizes permit). We note that the LEMURS Phase 1 data are primarily cross-sectional summaries rather than fine-grained longitudinal traces, which limits the depth of temporal correlation analysis we can perform without additional data processing; however, we will report the available pairwise temporal alignments where they exist. revision: partial
Referee: [Methods and experimental setup] The abstract reports an epsilon sweep and utility evaluation, yet the methods description lacks error bars, baseline comparisons (e.g., non-private synthetic data or alternative DP mechanisms), and statistical tests for the utility results. Without these, it is not possible to confirm that the epsilon=5 conclusion is robust rather than influenced by post-hoc choices or task selection.

Authors: We agree that these elements are needed for robustness. The revised manuscript will include (i) error bars computed over multiple independent runs of AIM for each epsilon value, (ii) a non-private synthetic baseline generated with the same AIM procedure but epsilon set to infinity, and (iii) paired statistical tests (e.g., Wilcoxon signed-rank) comparing utility metrics across epsilon values. We will also briefly discuss why alternative mechanisms such as PATE or DP-GAN were not included, given the mixed tabular structure of the LEMURS dataset and the computational constraints of the study. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical application of external DP mechanism with held-out utility evaluation

full rationale

The paper applies the established Adaptive Iterative Mechanism (AIM) for differential privacy to generate synthetic data from the LEMURS dataset and measures utility empirically across epsilon values on predictive tasks drawn from actual dataset uses. Privacy guarantees derive from the standard DP definition rather than any internal construction, and no prediction or result reduces to a fitted parameter or self-citation by definition. The central claims rest on measured trade-offs rather than self-referential steps, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard differential-privacy definition and on the assumption that the chosen utility metrics reflect real downstream use cases. No new entities are postulated and no parameters are fitted to produce the privacy guarantee itself.

axioms (2)

standard math The Adaptive Iterative Mechanism satisfies epsilon-differential privacy for the chosen privacy budget.
Invoked when the authors state that AIM generates DP synthetic data.
domain assumption The utility framework based on actual LEMURS uses is representative of future analysts' needs.
Stated in the abstract when describing how utility is evaluated.

pith-pipeline@v0.9.0 · 5810 in / 1273 out tokens · 21311 ms · 2026-05-19T07:28:44.091202+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use the Adaptive Iterative Mechanism (AIM) to generate DP synthetic data... evaluate utility using regression models, Spearman correlation, UMAP, and L1/L2 marginal errors across epsilon = 1 to 100.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

synthetic data sets with epsilon = 5 preserve adequate predictive utility

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

[1]

A large clinical trial to improve well-being during the transition to college using wearables: The lived experiences measured using rings study,

M. Price, J. E. Hidalgo, Y . M. Bird, L. S. Bloomfield, C. Buck, J. Cerutti, P. S. Dodds, M. I. Fudolig, R. Gehman, M. Hickok et al., “A large clinical trial to improve well-being during the transition to college using wearables: The lived experiences measured using rings study,” Contemporary clinical trials , vol. 133, p. 107338, 2023

work page 2023
[2]

AboutMyInfo.org,

L. Sweeney, “AboutMyInfo.org,” 2024, accessed: 2024-08-26. [Online]. Available: https://aboutmyinfo.org/

work page 2024
[3]

Broken promises of privacy: Responding to the surprising failure of anonymization,

P. Ohm, “Broken promises of privacy: Responding to the surprising failure of anonymization,” UCLA l. Rev., vol. 57, p. 1701, 2009

work page 2009
[4]

Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule,

U.S. Department of Health and Human Services, “Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule,” 2024, accessed: 2024-08-26. [Online]. Available: https://www.hhs.gov/hipaa/for- professionals/special-topics/de-identification/index.html

work page 2024
[5]

Calibrating noise to sensitivity in private data analysis,

C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 2006, pp. 265–284

work page 2006
[6]

The algorithmic foundations of differential privacy,

C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,”Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014

work page 2014
[7]

Aim: An adaptive and iterative mechanism for differentially private synthetic data,

R. McKenna, B. Mullins, D. Sheldon, and G. Miklau, “Aim: An adaptive and iterative mechanism for differentially private synthetic data,” arXiv preprint arXiv:2201.12677 , 2022

work page arXiv 2022
[8]

The application of differential privacy to health data,

F. K. Dankar and K. El Emam, “The application of differential privacy to health data,” in Proceedings of the 2012 Joint EDBT/ICDT Workshops, 2012, pp. 158–166

work page 2012
[9]

The promise of differential privacy: a tutorial on al- gorithmic techniques,

C. Dwork, “The promise of differential privacy: a tutorial on al- gorithmic techniques,” in 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, D (Oct. 2011) . Citeseer, 2021, pp. 1–2

work page 2011
[10]

Differential privacy for clinical trial data: Preliminary evaluations,

D. Vu and A. Slavkovic, “Differential privacy for clinical trial data: Preliminary evaluations,” in 2009 IEEE International Conference on Data Mining Workshops. IEEE, 2009, pp. 138–143

work page 2009
[11]

Functional Mechanism: Regression Analysis under Differential Privacy

J. Zhang, Z. Zhang, X. Xiao, Y . Yang, and M. Winslett, “Functional mechanism: Regression analysis under differential privacy,” arXiv preprint arXiv:1208.0219, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[12]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , 2016, pp. 308–318

work page 2016
[13]

Collective sleep and activity patterns of college students from wearable devices,

M. I. Fudolig, L. S. Bloomfield, M. Price, Y . M. Bird, J. E. Hidalgo, J. Llorin, J. Lovato, E. W. McGinnis, R. S. McGinnis, T. Ricketts et al., “Collective sleep and activity patterns of college students from wearable devices,” arXiv preprint arXiv:2412.17969 , 2024

work page arXiv 2024
[14]

Detection of COVID-19 using multimodal data from a wearable device: results from the first TemPredict Study,

A. E. Mason, F. M. Hecht, S. K. Davis, J. L. Natale, W. Hartogensis, N. Damaso, K. T. Claypool, S. Dilchert, S. Dasgupta, S. Purawat et al. , “Detection of COVID-19 using multimodal data from a wearable device: results from the first TemPredict Study,” Scientific reports, vol. 12, no. 1, p. 3463, 2022

work page 2022
[15]

Assessing adherence to multi-modal Oura ring wearables from COVID-19 detection among healthcare workers,

S. K. Shiba, C. A. Temple, J. Krasnoff, S. Dilchert, B. L. Smarr, J. Robishaw, and A. E. Mason, “Assessing adherence to multi-modal Oura ring wearables from COVID-19 detection among healthcare workers,” Cureus, vol. 15, no. 9, 2023. 13

work page 2023
[16]

Netflix Prize,

W. contributors, “Netflix Prize,” 2024, accessed: 2024-08-26. [Online]. Available: https://en.wikipedia.org/wiki/Netflix%5FPrize

work page 2024
[17]

Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset,

A. Tockar, “Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset,” 2014, accessed: 2024-08-26. [Online]. Available: https://agkn.wordpress.com/2014/09/15/riding- with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/

work page 2014
[18]

The modernization of statistical disclosure limitation at the US Census Bureau,

A. N. Dajani, A. D. Lauger, P. E. Singer, D. Kifer, J. P. Re- iter, A. Machanavajjhala, S. L. Garfinkel, S. A. Dahl, M. Graham, V . Karwaet al., “The modernization of statistical disclosure limitation at the US Census Bureau,” in September 2017 meeting of the Census Scientific Advisory Committee , 2017

work page 2017
[19]

Epis- temic parity: Reproducibility as an evaluation metric for differential privacy,

L. Rosenblatt, B. Herman, A. Holovenko, W. Lee, J. Loftus, E. McK- innie, T. Rumezhak, A. Stadnik, B. Howe, and J. Stoyanovich, “Epis- temic parity: Reproducibility as an evaluation metric for differential privacy,”ACM SIGMOD Record , vol. 53, no. 1, pp. 65–74, 2024

work page 2024
[20]

Benchmarking differentially private synthetic data generation algo- rithms,

Y . Tao, R. McKenna, M. Hay, A. Machanavajjhala, and G. Miklau, “Benchmarking differentially private synthetic data generation algo- rithms,” arXiv preprint arXiv:2112.09238 , 2021

work page arXiv 2021
[21]

Winning the nist contest: A scalable and general approach to differentially private synthetic data,

R. McKenna, G. Miklau, and D. Sheldon, “Winning the nist contest: A scalable and general approach to differentially private synthetic data,” arXiv preprint arXiv:2108.04978 , 2021

work page arXiv 2021
[22]

Data synthesis via differentially private markov random fields,

K. Cai, X. Lei, J. Wei, and X. Xiao, “Data synthesis via differentially private markov random fields,”Proceedings of the VLDB Endowment, vol. 14, no. 11, pp. 2190–2202, 2021

work page 2021
[23]

Differentially private synthetic data: Applied evaluations and enhancements,

L. Rosenblatt, X. Liu, S. Pouyanfar, E. de Leon, A. Desai, and J. Allen, “Differentially private synthetic data: Applied evaluations and enhancements,” arXiv preprint arXiv:2011.05537 , 2020

work page arXiv 2011
[24]

Privbayes: Private data release via bayesian networks,

J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao, “Privbayes: Private data release via bayesian networks,” ACM Trans- actions on Database Systems (TODS) , vol. 42, no. 4, pp. 1–41, 2017

work page 2017
[25]

Iterative methods for private synthetic data: Unifying framework and new methods,

T. Liu, G. Vietri, and S. Z. Wu, “Iterative methods for private synthetic data: Unifying framework and new methods,” Advances in Neural Information Processing Systems , vol. 34, pp. 690–702, 2021

work page 2021
[26]

HDMM: Optimizing error of high-dimensional statistical queries under differ- ential privacy,

R. McKenna, G. Miklau, M. Hay, and A. Machanavajjhala, “HDMM: Optimizing error of high-dimensional statistical queries under differ- ential privacy,” arXiv preprint arXiv:2106.12118 , 2021

work page arXiv 2021
[27]

Privacy Col- laborative Research Cycle – Archive,

National Institute of Standards and Technology, “Privacy Col- laborative Research Cycle – Archive,” Available online, 2024, https://pages.nist.gov/privacy collaborative research cycle /pages/archive.html, Accessed: 2025-04-28

work page 2024
[28]

I. T. Jolliffe, Principal component analysis for special types of data . Springer, 2002

work page 2002
[29]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold ap- proximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

LEMURS: Lived Ex- periences Measured Using Rings Study,

V . C. S. Center, “LEMURS: Lived Ex- periences Measured Using Rings Study,” https://vermontcomplexsystems.org/research/projects/lemurs/, 2024, accessed: 2025-04-28

work page 2024
[31]

Predicting stress in first-year college students using sleep data from wearable devices,

L. S. Bloomfield, M. I. Fudolig, J. Kim, J. Llorin, J. L. Lovato, E. W. McGinnis, R. S. McGinnis, M. Price, T. H. Ricketts, P. S. Doddset al., “Predicting stress in first-year college students using sleep data from wearable devices,” PLOS Digital Health , vol. 3, no. 4, p. e0000473, 2024

work page 2024
[32]

Events and behaviors associated with symptoms of generalized anxiety disorder in first-year college students,

L. Bloomfield, M. I. Fudolig, P. S. Dodds, J. Kim, J. Llorin, J. L. Lovato, E. McGinnis, R. S. McGinnis, M. Price, T. Ricketts et al. , “Events and behaviors associated with symptoms of generalized anxiety disorder in first-year college students,” 2023

work page 2023
[33]

The Two Fundamental Shapes of Sleep Heart Rate Dynamics and Their Connection to Mental Health in College Students,

M. I. Fudolig, L. S. Bloomfield, M. Price, Y . M. Bird, J. E. Hidalgo, J. N. Kim, J. Llorin, J. Lovato, E. W. McGinnis, R. S. McGinnis et al., “The Two Fundamental Shapes of Sleep Heart Rate Dynamics and Their Connection to Mental Health in College Students,” Digital Biomarkers, vol. 8, no. 1, pp. 120–131, 2024

work page 2024
[34]

Hypothesis testing interpretations and renyi differential privacy,

B. Balle, G. Barthe, M. Gaboardi, J. Hsu, and T. Sato, “Hypothesis testing interpretations and renyi differential privacy,” in International Conference on Artificial Intelligence and Statistics . PMLR, 2020, pp. 2496–2506

work page 2020
[35]

Guidelines for evaluating differential privacy guarantees,

J. P. Near, D. Darais, N. Lefkovitz, G. Howarth et al. , “Guidelines for evaluating differential privacy guarantees,” National Institute of Standards and Technology, Tech. Rep, pp. 800–226, 2023

work page 2023
[36]

Differential privacy: A primer for a non-technical audience,

A. Wood, M. Altman, A. Bembenek, M. Bun, M. Gaboardi, J. Honaker, K. Nissim, D. R. O’Brien, T. Steinke, and S. Vadhan, “Differential privacy: A primer for a non-technical audience,” Vand. J. Ent. & Tech. L. , vol. 21, p. 209, 2018

work page 2018
[37]

TAPAS: a toolbox for adversarial privacy auditing of synthetic data,

F. Houssiau, J. Jordon, S. N. Cohen, O. Daniel, A. Elliott, J. Geddes, C. Mole, C. Rangel-Smith, and L. Szpruch, “TAPAS: a toolbox for adversarial privacy auditing of synthetic data,” arXiv preprint arXiv:2211.06550, 2022

work page arXiv 2022
[38]

Developing a hierarchical model for unraveling conspiracy theories,

M. Ghasemizade and J. Onaolapo, “Developing a hierarchical model for unraveling conspiracy theories,” EPJ Data Science, vol. 13, no. 1, p. 31, 2024

work page 2024
[39]

Record Linkage Doc- umentation,

Record Linkage Development Team, “Record Linkage Doc- umentation,” 2025, accessed: 2025-01-27. [Online]. Available: https://recordlinkage.readthedocs.io/en/latest/

work page 2025
[40]

Privacy in pharmacogenetics: An {End-to-End} case study of personalized warfarin dosing,

M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Risten- part, “Privacy in pharmacogenetics: An {End-to-End} case study of personalized warfarin dosing,” in 23rd USENIX security symposium (USENIX Security 14) , 2014, pp. 17–32. 14

work page 2014

[1] [1]

A large clinical trial to improve well-being during the transition to college using wearables: The lived experiences measured using rings study,

M. Price, J. E. Hidalgo, Y . M. Bird, L. S. Bloomfield, C. Buck, J. Cerutti, P. S. Dodds, M. I. Fudolig, R. Gehman, M. Hickok et al., “A large clinical trial to improve well-being during the transition to college using wearables: The lived experiences measured using rings study,” Contemporary clinical trials , vol. 133, p. 107338, 2023

work page 2023

[2] [2]

AboutMyInfo.org,

L. Sweeney, “AboutMyInfo.org,” 2024, accessed: 2024-08-26. [Online]. Available: https://aboutmyinfo.org/

work page 2024

[3] [3]

Broken promises of privacy: Responding to the surprising failure of anonymization,

P. Ohm, “Broken promises of privacy: Responding to the surprising failure of anonymization,” UCLA l. Rev., vol. 57, p. 1701, 2009

work page 2009

[4] [4]

Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule,

U.S. Department of Health and Human Services, “Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule,” 2024, accessed: 2024-08-26. [Online]. Available: https://www.hhs.gov/hipaa/for- professionals/special-topics/de-identification/index.html

work page 2024

[5] [5]

Calibrating noise to sensitivity in private data analysis,

C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 2006, pp. 265–284

work page 2006

[6] [6]

The algorithmic foundations of differential privacy,

C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,”Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014

work page 2014

[7] [7]

Aim: An adaptive and iterative mechanism for differentially private synthetic data,

R. McKenna, B. Mullins, D. Sheldon, and G. Miklau, “Aim: An adaptive and iterative mechanism for differentially private synthetic data,” arXiv preprint arXiv:2201.12677 , 2022

work page arXiv 2022

[8] [8]

The application of differential privacy to health data,

F. K. Dankar and K. El Emam, “The application of differential privacy to health data,” in Proceedings of the 2012 Joint EDBT/ICDT Workshops, 2012, pp. 158–166

work page 2012

[9] [9]

The promise of differential privacy: a tutorial on al- gorithmic techniques,

C. Dwork, “The promise of differential privacy: a tutorial on al- gorithmic techniques,” in 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, D (Oct. 2011) . Citeseer, 2021, pp. 1–2

work page 2011

[10] [10]

Differential privacy for clinical trial data: Preliminary evaluations,

D. Vu and A. Slavkovic, “Differential privacy for clinical trial data: Preliminary evaluations,” in 2009 IEEE International Conference on Data Mining Workshops. IEEE, 2009, pp. 138–143

work page 2009

[11] [11]

Functional Mechanism: Regression Analysis under Differential Privacy

J. Zhang, Z. Zhang, X. Xiao, Y . Yang, and M. Winslett, “Functional mechanism: Regression analysis under differential privacy,” arXiv preprint arXiv:1208.0219, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[12] [12]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , 2016, pp. 308–318

work page 2016

[13] [13]

Collective sleep and activity patterns of college students from wearable devices,

M. I. Fudolig, L. S. Bloomfield, M. Price, Y . M. Bird, J. E. Hidalgo, J. Llorin, J. Lovato, E. W. McGinnis, R. S. McGinnis, T. Ricketts et al., “Collective sleep and activity patterns of college students from wearable devices,” arXiv preprint arXiv:2412.17969 , 2024

work page arXiv 2024

[14] [14]

Detection of COVID-19 using multimodal data from a wearable device: results from the first TemPredict Study,

A. E. Mason, F. M. Hecht, S. K. Davis, J. L. Natale, W. Hartogensis, N. Damaso, K. T. Claypool, S. Dilchert, S. Dasgupta, S. Purawat et al. , “Detection of COVID-19 using multimodal data from a wearable device: results from the first TemPredict Study,” Scientific reports, vol. 12, no. 1, p. 3463, 2022

work page 2022

[15] [15]

Assessing adherence to multi-modal Oura ring wearables from COVID-19 detection among healthcare workers,

S. K. Shiba, C. A. Temple, J. Krasnoff, S. Dilchert, B. L. Smarr, J. Robishaw, and A. E. Mason, “Assessing adherence to multi-modal Oura ring wearables from COVID-19 detection among healthcare workers,” Cureus, vol. 15, no. 9, 2023. 13

work page 2023

[16] [16]

Netflix Prize,

W. contributors, “Netflix Prize,” 2024, accessed: 2024-08-26. [Online]. Available: https://en.wikipedia.org/wiki/Netflix%5FPrize

work page 2024

[17] [17]

Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset,

A. Tockar, “Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset,” 2014, accessed: 2024-08-26. [Online]. Available: https://agkn.wordpress.com/2014/09/15/riding- with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/

work page 2014

[18] [18]

The modernization of statistical disclosure limitation at the US Census Bureau,

A. N. Dajani, A. D. Lauger, P. E. Singer, D. Kifer, J. P. Re- iter, A. Machanavajjhala, S. L. Garfinkel, S. A. Dahl, M. Graham, V . Karwaet al., “The modernization of statistical disclosure limitation at the US Census Bureau,” in September 2017 meeting of the Census Scientific Advisory Committee , 2017

work page 2017

[19] [19]

Epis- temic parity: Reproducibility as an evaluation metric for differential privacy,

L. Rosenblatt, B. Herman, A. Holovenko, W. Lee, J. Loftus, E. McK- innie, T. Rumezhak, A. Stadnik, B. Howe, and J. Stoyanovich, “Epis- temic parity: Reproducibility as an evaluation metric for differential privacy,”ACM SIGMOD Record , vol. 53, no. 1, pp. 65–74, 2024

work page 2024

[20] [20]

Benchmarking differentially private synthetic data generation algo- rithms,

Y . Tao, R. McKenna, M. Hay, A. Machanavajjhala, and G. Miklau, “Benchmarking differentially private synthetic data generation algo- rithms,” arXiv preprint arXiv:2112.09238 , 2021

work page arXiv 2021

[21] [21]

Winning the nist contest: A scalable and general approach to differentially private synthetic data,

R. McKenna, G. Miklau, and D. Sheldon, “Winning the nist contest: A scalable and general approach to differentially private synthetic data,” arXiv preprint arXiv:2108.04978 , 2021

work page arXiv 2021

[22] [22]

Data synthesis via differentially private markov random fields,

K. Cai, X. Lei, J. Wei, and X. Xiao, “Data synthesis via differentially private markov random fields,”Proceedings of the VLDB Endowment, vol. 14, no. 11, pp. 2190–2202, 2021

work page 2021

[23] [23]

Differentially private synthetic data: Applied evaluations and enhancements,

L. Rosenblatt, X. Liu, S. Pouyanfar, E. de Leon, A. Desai, and J. Allen, “Differentially private synthetic data: Applied evaluations and enhancements,” arXiv preprint arXiv:2011.05537 , 2020

work page arXiv 2011

[24] [24]

Privbayes: Private data release via bayesian networks,

J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao, “Privbayes: Private data release via bayesian networks,” ACM Trans- actions on Database Systems (TODS) , vol. 42, no. 4, pp. 1–41, 2017

work page 2017

[25] [25]

Iterative methods for private synthetic data: Unifying framework and new methods,

T. Liu, G. Vietri, and S. Z. Wu, “Iterative methods for private synthetic data: Unifying framework and new methods,” Advances in Neural Information Processing Systems , vol. 34, pp. 690–702, 2021

work page 2021

[26] [26]

HDMM: Optimizing error of high-dimensional statistical queries under differ- ential privacy,

R. McKenna, G. Miklau, M. Hay, and A. Machanavajjhala, “HDMM: Optimizing error of high-dimensional statistical queries under differ- ential privacy,” arXiv preprint arXiv:2106.12118 , 2021

work page arXiv 2021

[27] [27]

Privacy Col- laborative Research Cycle – Archive,

National Institute of Standards and Technology, “Privacy Col- laborative Research Cycle – Archive,” Available online, 2024, https://pages.nist.gov/privacy collaborative research cycle /pages/archive.html, Accessed: 2025-04-28

work page 2024

[28] [28]

I. T. Jolliffe, Principal component analysis for special types of data . Springer, 2002

work page 2002

[29] [29]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold ap- proximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

LEMURS: Lived Ex- periences Measured Using Rings Study,

V . C. S. Center, “LEMURS: Lived Ex- periences Measured Using Rings Study,” https://vermontcomplexsystems.org/research/projects/lemurs/, 2024, accessed: 2025-04-28

work page 2024

[31] [31]

Predicting stress in first-year college students using sleep data from wearable devices,

L. S. Bloomfield, M. I. Fudolig, J. Kim, J. Llorin, J. L. Lovato, E. W. McGinnis, R. S. McGinnis, M. Price, T. H. Ricketts, P. S. Doddset al., “Predicting stress in first-year college students using sleep data from wearable devices,” PLOS Digital Health , vol. 3, no. 4, p. e0000473, 2024

work page 2024

[32] [32]

Events and behaviors associated with symptoms of generalized anxiety disorder in first-year college students,

L. Bloomfield, M. I. Fudolig, P. S. Dodds, J. Kim, J. Llorin, J. L. Lovato, E. McGinnis, R. S. McGinnis, M. Price, T. Ricketts et al. , “Events and behaviors associated with symptoms of generalized anxiety disorder in first-year college students,” 2023

work page 2023

[33] [33]

The Two Fundamental Shapes of Sleep Heart Rate Dynamics and Their Connection to Mental Health in College Students,

M. I. Fudolig, L. S. Bloomfield, M. Price, Y . M. Bird, J. E. Hidalgo, J. N. Kim, J. Llorin, J. Lovato, E. W. McGinnis, R. S. McGinnis et al., “The Two Fundamental Shapes of Sleep Heart Rate Dynamics and Their Connection to Mental Health in College Students,” Digital Biomarkers, vol. 8, no. 1, pp. 120–131, 2024

work page 2024

[34] [34]

Hypothesis testing interpretations and renyi differential privacy,

B. Balle, G. Barthe, M. Gaboardi, J. Hsu, and T. Sato, “Hypothesis testing interpretations and renyi differential privacy,” in International Conference on Artificial Intelligence and Statistics . PMLR, 2020, pp. 2496–2506

work page 2020

[35] [35]

Guidelines for evaluating differential privacy guarantees,

J. P. Near, D. Darais, N. Lefkovitz, G. Howarth et al. , “Guidelines for evaluating differential privacy guarantees,” National Institute of Standards and Technology, Tech. Rep, pp. 800–226, 2023

work page 2023

[36] [36]

Differential privacy: A primer for a non-technical audience,

A. Wood, M. Altman, A. Bembenek, M. Bun, M. Gaboardi, J. Honaker, K. Nissim, D. R. O’Brien, T. Steinke, and S. Vadhan, “Differential privacy: A primer for a non-technical audience,” Vand. J. Ent. & Tech. L. , vol. 21, p. 209, 2018

work page 2018

[37] [37]

TAPAS: a toolbox for adversarial privacy auditing of synthetic data,

F. Houssiau, J. Jordon, S. N. Cohen, O. Daniel, A. Elliott, J. Geddes, C. Mole, C. Rangel-Smith, and L. Szpruch, “TAPAS: a toolbox for adversarial privacy auditing of synthetic data,” arXiv preprint arXiv:2211.06550, 2022

work page arXiv 2022

[38] [38]

Developing a hierarchical model for unraveling conspiracy theories,

M. Ghasemizade and J. Onaolapo, “Developing a hierarchical model for unraveling conspiracy theories,” EPJ Data Science, vol. 13, no. 1, p. 31, 2024

work page 2024

[39] [39]

Record Linkage Doc- umentation,

Record Linkage Development Team, “Record Linkage Doc- umentation,” 2025, accessed: 2025-01-27. [Online]. Available: https://recordlinkage.readthedocs.io/en/latest/

work page 2025

[40] [40]

Privacy in pharmacogenetics: An {End-to-End} case study of personalized warfarin dosing,

M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Risten- part, “Privacy in pharmacogenetics: An {End-to-End} case study of personalized warfarin dosing,” in 23rd USENIX security symposium (USENIX Security 14) , 2014, pp. 17–32. 14

work page 2014