pith. sign in

arxiv: 2605.03188 · v1 · submitted 2026-05-04 · 💻 cs.CR

Dependency-Aware Privacy for Multi-turn Agents

Pith reviewed 2026-05-08 17:57 UTC · model grok-4.3

classification 💻 cs.CR
keywords differential privacymulti-turn agentsLLM agentsprivacy sanitizationpost-processing theoremroot attributescomputation graphmetric differential privacy
0
0 comments X

The pith

Sanitizing root private values once preserves differential privacy for all derived agent releases across any number of turns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that privacy in multi-turn LLM agents degrades under independent sanitization per release because adversaries can combine observations to reconstruct private roots, with amplification up to the Lipschitz constant of deriving functions. Instead, RootGuard identifies private attributes as roots of a computation graph, sanitizes them once, and derives all later outputs deterministically. By the post-processing theorem this keeps the privacy bound fixed no matter how many turns occur or what functions the adversary applies, so derived values receive the guarantee at zero marginal cost. The method also uses known structural relationships to distribute the total budget across roots more efficiently than per-release noise addition. This yields better utility on medical diagnostic tasks while the advantage grows rather than shrinks as the number of turns increases.

Core claim

RootGuard sanitizes root values once and computes subsequent releases deterministically from the noised roots. By the post-processing theorem, the privacy guarantee depends only on the initial root sanitization, regardless of the adversary's functions or number of turns, and derived values inherit privacy at zero marginal cost. RootGuard further exploits structural domain knowledge to allocate budget across roots, improving the privacy-utility tradeoff.

What carries the argument

One-time root sanitization under metric differential privacy followed by deterministic derivation of later releases, which invokes the post-processing theorem to hold the privacy bound constant irrespective of release count or adversary computation.

If this is right

  • Derived outputs such as BMI or other computed metrics receive the full privacy guarantee without spending any additional budget.
  • A worst-case adversary that forces more turns enlarges the total available budget for RootGuard while simultaneously strengthening attacks against per-turn independent noising.
  • The privacy-utility tradeoff improves because budget is spent only on the true roots rather than on every intermediate or final value.
  • Under MAP reconstruction, additional queries leave RootGuard's protection unchanged but increase the adversary's advantage against independent sanitizers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same root-plus-derivation pattern could reduce privacy cost in financial or sensor-based agent workflows where many outputs trace back to a small set of private inputs.
  • Agents could attempt to infer the computation graph on the fly from conversation history to apply the technique without full manual specification of roots.
  • Approximate or partially known derivation structures would still yield partial savings if the dominant roots and their Lipschitz bounds can be bounded conservatively.

Load-bearing premise

Private attributes can be correctly identified as the roots of the computation graph and the structure plus Lipschitz constants of the deriving functions are known enough to allocate budget across them.

What would settle it

Run an MAP reconstruction attack that combines multiple derived outputs produced by RootGuard and check whether root recovery accuracy stays identical to the single-release case; if accuracy improves materially with added turns the invariance claim is false.

Figures

Figures reproduced from arXiv: 2605.03188 by Divyam Anshumaan, Nils Palumbo, Sarthak Choudhary, Somesh Jha.

Figure 1
Figure 1. Figure 1: Overview of RootGuard in an agentic deployment. At initialization, the user passes its private values view at source ↗
Figure 2
Figure 2. Figure 2: Reconstruction wMAPE (%) vs. adversarial query count view at source ↗
Figure 3
Figure 3. Figure 3: Log-log scatter of normalized per-root budget view at source ↗
Figure 4
Figure 4. Figure 4: Log-log scatter of normalized per-root budget view at source ↗
Figure 5
Figure 5. Figure 5: Per-template wMAPE (%) vs. ε at B = (2k+1)ε. Top row: compressing formulas with high-error baselines (FIB4, HOMA, AIP, NLR). Bottom row: amplifying formulas with already-low base errors (ANEMIA, CONICITY, TYG, VASCULAR). Nine methods per panel (3 mechanisms × 3 variants). The headline cell (ε = 0.1, Exponential) is summarized in main-text view at source ↗
Figure 6
Figure 6. Figure 6: Per-root budget shares (%) for three mechanisms at view at source ↗
read the original abstract

LLM agents release private data across multi-service interactions. Existing prompt sanitizers based on metric differential privacy treat each release independently, so adversaries combining releases across turns can recover private attributes; privacy degrades with every release. This degradation is fundamental: when private attributes are the \emph{roots} of a computation graph, independently noising a derived value amplifies the root's distinguishability by up to the deriving function's Lipschitz constant $L$, which can far exceed the nominal privacy parameter for nonlinear functions in medical and financial workflows. RootGuard sanitizes root values once and computes subsequent releases deterministically from the noised roots. By the post-processing theorem, the privacy guarantee depends only on the initial root sanitization, regardless of the adversary's functions or number of turns, and derived values inherit privacy at zero marginal cost. RootGuard further exploits structural domain knowledge (e.g., BMI from height and weight, or a known target function) to allocate budget across roots, improving the privacy-utility tradeoff. A worst-case adversary forcing $t$ turns increases the total budget $B = t \cdot \varepsilon$. RootGuard distributes this larger budget across roots, while independent noising spends $\varepsilon$ per release and gives the adversary $t$ observations to combine via MAP reconstruction. This yields a \emph{double asymmetry}: more turns aid RootGuard while weakening independent noising. On eight NHANES medical diagnostic templates, RootGuard achieves $2.3$--$3.0\times$ lower target error than independent noising at $\varepsilon = 0.1$ (7.6\% vs.\ 17.1\% wMAPE at $B = (2k{+}1)\varepsilon$). Under MAP reconstruction, more queries strengthen attacks against independent noising while RootGuard remains invariant.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes RootGuard for privacy-preserving multi-turn LLM agents. Private attributes are modeled as roots of a computation graph; these are sanitized once under differential privacy, and all subsequent releases are computed as deterministic functions of the noised roots. By the post-processing theorem, the overall privacy guarantee depends only on the initial sanitization budget and is invariant to the number of turns or adversary functions. The paper identifies a double asymmetry: additional turns allow RootGuard to allocate a larger total budget B = t·ε across roots for better utility, while independent per-turn noising weakens under the same conditions. On eight NHANES medical diagnostic templates, RootGuard reports 2.3–3.0× lower weighted MAPE than independent noising at ε=0.1 under MAP reconstruction attacks.

Significance. If the computation-graph modeling assumptions hold, the work supplies a clean, theorem-backed method for achieving turn-invariant privacy in dependent release settings while improving the privacy-utility tradeoff via structural budget allocation. The explicit use of the post-processing theorem and the double-asymmetry observation are genuine strengths. The NHANES results illustrate concrete gains in a medical workflow, but the practical significance for general LLM agents remains conditional on accurate root identification and deterministic derivations. The approach could inform privacy engineering for agentic systems provided the modeling gaps are addressed.

major comments (2)
  1. [§3] §3 (Computation-graph modeling of agents): The invariance claim rests on every release being a purely deterministic function of the sanitized roots with no additional private inputs or stochasticity. The NHANES templates are stated to have fully known, complete graphs, but the paper must explicitly confirm that LLM generation in the eight templates introduces neither sampling stochasticity nor unmodeled private context; otherwise the post-processing application and zero-marginal-cost claim do not hold. This assumption is load-bearing for the central result.
  2. [Empirical Evaluation] Empirical section (NHANES results): The reported 2.3–3.0× error reduction (7.6 % vs. 17.1 % wMAPE at B=(2k+1)ε) is presented without error bars, number of runs, or the precise MAP reconstruction attack implementation. Because the double-asymmetry argument is illustrated by these numbers, the missing statistical and implementation details weaken the empirical support even though the theoretical claim is independent of them.
minor comments (2)
  1. [§4] The budget-allocation procedure that exploits domain knowledge (e.g., BMI from height/weight) should include an explicit statement of how Lipschitz constants of the deriving functions are obtained or bounded.
  2. Notation for the total budget B = t·ε versus per-root allocation could be clarified with a small example or diagram to avoid reader confusion about how the larger budget is distributed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify the modeling assumptions and empirical details in our work. We respond to each major comment below.

read point-by-point responses
  1. Referee: [§3] §3 (Computation-graph modeling of agents): The invariance claim rests on every release being a purely deterministic function of the sanitized roots with no additional private inputs or stochasticity. The NHANES templates are stated to have fully known, complete graphs, but the paper must explicitly confirm that LLM generation in the eight templates introduces neither sampling stochasticity nor unmodeled private context; otherwise the post-processing application and zero-marginal-cost claim do not hold. This assumption is load-bearing for the central result.

    Authors: We agree with the referee that the central invariance result relies on the releases being deterministic functions of the sanitized roots. In our NHANES evaluation, the templates are defined as complete, known computation graphs with purely deterministic derivations (e.g., fixed formulas for derived attributes like BMI) and no additional private inputs or stochastic LLM sampling. We will revise the manuscript in §3 to explicitly confirm this for the eight templates and reiterate that the approach assumes deterministic post-processing as stated in the problem formulation. This will address the load-bearing assumption directly. revision: yes

  2. Referee: [Empirical Evaluation] Empirical section (NHANES results): The reported 2.3–3.0× error reduction (7.6 % vs. 17.1 % wMAPE at B=(2k+1)ε) is presented without error bars, number of runs, or the precise MAP reconstruction attack implementation. Because the double-asymmetry argument is illustrated by these numbers, the missing statistical and implementation details weaken the empirical support even though the theoretical claim is independent of them.

    Authors: We acknowledge that providing statistical details and implementation specifics would strengthen the empirical presentation. In the revision, we will include error bars computed over multiple independent runs (specifying the number, e.g., 20 runs), and detail the MAP reconstruction attack implementation, including the adversary's optimization procedure for combining multi-turn observations. These changes will be made in the empirical evaluation section to better support the reported gains and the double-asymmetry argument. revision: yes

Circularity Check

0 steps flagged

No circularity; central claim follows from standard post-processing theorem

full rationale

The paper's derivation that privacy depends solely on initial root sanitization (with derived values at zero marginal cost) is obtained by direct application of the external post-processing theorem of differential privacy to the defined root/derived computation graph. No equations reduce any claimed prediction or guarantee to a fitted parameter, self-citation chain, or definitional tautology. The double asymmetry with independent noising follows logically from the same theorem plus the contrast in how each method consumes budget across turns. The NHANES evaluation is empirical validation, not part of the derivation. The argument is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the post-processing theorem of differential privacy and the assumption that root attributes and their deriving functions can be identified from domain knowledge.

axioms (1)
  • standard math Post-processing theorem of differential privacy
    Invoked to conclude that derived releases inherit the initial privacy guarantee at zero marginal cost.

pith-pipeline@v0.9.0 · 5638 in / 1161 out tokens · 77882 ms · 2026-05-08T17:57:31.647442+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages

  1. [1]

    Introducing the model context protocol.https://www.anthropic.com/news/model-c ontext-protocol, November 2024

    Anthropic. Introducing the model context protocol.https://www.anthropic.com/news/model-c ontext-protocol, November 2024. Accessed: 2025-04-18

  2. [2]

    Model context protocol (MCP): Landscape, security threats, and future research directions, 2025

    Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (MCP): Landscape, security threats, and future research directions, 2025

  3. [3]

    Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023

  4. [4]

    InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, 2024

  5. [5]

    AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovi ´c, Luca Beurer-Kellner, Marc Fischer, and Florian Tram `er. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. In Advances in Neural Information Processing Systems 37 (NeurIPS), 2024

  6. [6]

    Carl Franzen. Sam Altman calls for ‘AI privilege’ as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions.https://venturebeat.com/ai/sam-altman-calls-for-ai-privilege -as-openai-clarifies-court-order-to-retain-temporary-and-deleted-chatgpt -sessions, June 2025. Accessed: 2026-04-24. 17

  7. [7]

    Andr ´es, Nicol´as Emilio Bordenabe, and Catuscia Palamidessi

    Konstantinos Chatzikokolakis, Miguel E. Andr ´es, Nicol´as Emilio Bordenabe, and Catuscia Palamidessi. Broad- ening the scope of differential privacy using metrics. InPrivacy Enhancing Technologies (PETS 2013), volume 7981 ofLecture Notes in Computer Science, pages 82–102. Springer, 2013

  8. [8]

    Prϵϵmpt: Sanitizing sensitive prompts for llms, 2025

    Amrita Roy Chowdhury, David Glukhov, Divyam Anshumaan, Prasad Chalasani, Nicolas Papernot, Somesh Jha, and Mihir Bellare. Prϵϵmpt: Sanitizing sensitive prompts for llms, 2025

  9. [9]

    Cape: Context-aware prompt perturbation mechanism with differential privacy, 2025

    Haoqi Wu, Wei Dai, Li Wang, and Qiang Yan. Cape: Context-aware prompt perturbation mechanism with differential privacy, 2025

  10. [10]

    Calibrating noise to sensitivity in private data analysis

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. InTheory of Cryptography Conference (TCC 2006), volume 3876 ofLecture Notes in Computer Science, pages 265–284. Springer, 2006

  11. [11]

    The algorithmic foundations of differential privacy.Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014

    Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy.Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014

  12. [12]

    Introducing GPT-5.4 mini and nano.https://openai.com/index/introducing-gpt-5 -4-mini-and-nano/, March 2026

    OpenAI. Introducing GPT-5.4 mini and nano.https://openai.com/index/introducing-gpt-5 -4-mini-and-nano/, March 2026. Accessed: 2026-04-29

  13. [13]

    National health and nutrition examination survey, 2017–2018: Examination data, 2020

    National Center for Health Statistics. National health and nutrition examination survey, 2017–2018: Examination data, 2020. Accessed: 2026-04-12

  14. [14]

    Mechanism design via differential privacy

    Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In48th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’07), pages 94–103. IEEE, 2007

  15. [15]

    The staircase mechanism in differential privacy.IEEE Journal of Selected Topics in Signal Processing, 9(7):1176–1184, 2015

    Quan Geng, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The staircase mechanism in differential privacy.IEEE Journal of Selected Topics in Signal Processing, 9(7):1176–1184, 2015

  16. [16]

    Hyndman and Anne B

    Rob J. Hyndman and Anne B. Koehler. Another look at measures of forecast accuracy.International Journal of Forecasting, 22(4):679–688, 2006

  17. [17]

    Advantages of the MAD/Mean ratio over the MAPE.Foresight: The International Journal of Applied Forecasting, (6):40–43, 2007

    Stephan Kolassa and Wolfgang Sch ¨utz. Advantages of the MAD/Mean ratio over the MAPE.Foresight: The International Journal of Applied Forecasting, (6):40–43, 2007

  18. [18]

    Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, and Sherman S. M. Chow. Differential privacy for text analytics via natural text sanitization. InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3853–3866, Online, August 2021. Association for Computational Linguistics

  19. [19]

    A customized text sanitization mechanism with differential privacy

    Sai Chen, Fengran Mo, Yanhao Wang, Cen Chen, Jian-Yun Nie, Chengyu Wang, and Jamie Cui. A customized text sanitization mechanism with differential privacy. InFindings of the Association for Computational Linguis- tics: ACL 2023, pages 5747–5758, Toronto, Canada, July 2023. Association for Computational Linguistics

  20. [20]

    DP-MLM: Differentially private text rewriting using masked language models

    Stephen Meisenbacher, Maulik Chevli, Juraj Vladika, and Florian Matthes. DP-MLM: Differentially private text rewriting using masked language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 9314–9328, Bangkok, Thailand, August

  21. [21]

    Association for Computational Linguistics

  22. [22]

    Spend your budget wisely: Towards an intelligent distribution of the privacy budget in differentially private text rewriting

    Stephen Meisenbacher, Chaeeun Joy Lee, and Florian Matthes. Spend your budget wisely: Towards an intelligent distribution of the privacy budget in differentially private text rewriting. InProceedings of the Fifteenth ACM Conference on Data and Application Security and Privacy, CODASPY ’25, page 84–95, New York, NY , USA,

  23. [23]

    Association for Computing Machinery

  24. [24]

    Privacy preserving prompt engineering: A survey.ACM Computing Surveys, 2025

    Kennedy Edemacu and Xintao Wu. Privacy preserving prompt engineering: A survey.ACM Computing Surveys, 2025

  25. [25]

    Pufferfish: A framework for mathematical privacy definitions.ACM Transactions on Database Systems, 39(1):3:1–3:36, 2014

    Daniel Kifer and Ashwin Machanavajjhala. Pufferfish: A framework for mathematical privacy definitions.ACM Transactions on Database Systems, 39(1):3:1–3:36, 2014

  26. [26]

    Dependence makes you vulnerable: Differential privacy under dependent tuples

    Changchang Liu, Supriyo Chakraborty, and Prateek Mittal. Dependence makes you vulnerable: Differential privacy under dependent tuples. InProceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2016

  27. [27]

    Composition properties of Bayesian differential privacy.arXiv preprint arXiv:1911.00763, 2019

    Jun Zhao. Composition properties of Bayesian differential privacy.arXiv preprint arXiv:1911.00763, 2019

  28. [28]

    Pufferfish privacy mechanisms for correlated data

    Shuang Song, Yizhen Wang, and Kamalika Chaudhuri. Pufferfish privacy mechanisms for correlated data. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD), pages 1291–1306. ACM, 2017

  29. [29]

    R ´enyi Pufferfish privacy: General additive noise mechanisms and privacy amplification by iteration via shift reduction lemmas

    Cl ´ement Pierquin, Aur ´elien Bellet, Marc Tommasi, and Matthieu Boussard. R ´enyi Pufferfish privacy: General additive noise mechanisms and privacy amplification by iteration via shift reduction lemmas. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, pages 40762–40794....

  30. [30]

    Correlated-sequence differential privacy

    Yifan Luo, Meng Zhang, Jin Xu, Junting Chen, and Jianwei Huang. Correlated-sequence differential privacy. In Proceedings of the 34th International Conference on Computer Communications and Networks (ICCCN). IEEE, 2025

  31. [31]

    Searching for privacy risks in LLM agents via simulation, 2025

    Yanzhe Zhang and Diyi Yang. Searching for privacy risks in LLM agents via simulation, 2025

  32. [32]

    AirGapAgent: Protecting privacy-conscious conversational agents

    Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, and Daniel Ramage. AirGapAgent: Protecting privacy-conscious conversational agents. InProceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS ’24), pages 3868–3882, New York, NY , USA, 2024. Association for Computing ...

  33. [33]

    PAPILLON: Pri- vacy preservation from internet-based and local language model ensembles

    Li Siyan, Vethavikashini Chithrra Raghuram, Omar Khattab, Julia Hirschberg, and Zhou Yu. PAPILLON: Pri- vacy preservation from internet-based and local language model ensembles. InProceedings of the 2025 Confer- ence of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)...

  34. [34]

    Defeating prompt injections by design, 2025

    Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tram`er. Defeating prompt injections by design, 2025

  35. [35]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tram `er, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, ´Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In30th USENIX Security Symposium, pages 2633–2650, 2021

  36. [36]

    Beyond memorization: Violating privacy via inference with large language models

    Robin Staab, Mark Vero, Mislav Balunovi ´c, and Martin Vechev. Beyond memorization: Violating privacy via inference with large language models. InInternational Conference on Learning Representations (ICLR), 2024

  37. [37]

    Can LLMs keep a secret? testing privacy implications of language models via contextual integrity theory

    Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, and Yejin Choi. Can LLMs keep a secret? testing privacy implications of language models via contextual integrity theory. InInternational Conference on Learning Representations (ICLR), 2024

  38. [38]

    PrivacyLens: Evaluating privacy norm aware- ness of language models in action

    Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, and Diyi Yang. PrivacyLens: Evaluating privacy norm aware- ness of language models in action. InAdvances in Neural Information Processing Systems 37 (NeurIPS 2024) Datasets and Benchmarks Track, 2024

  39. [39]

    Victor Hoffbrand and David A

    A. Victor Hoffbrand and David A. Steensma.Hoffbrand’s Essential Haematology. Wiley-Blackwell, 8th edition, 2019

  40. [40]

    Automated blood cell counts: State of the art.American Journal of Clinical Pathology, 130(1):104–116, 2008

    Mauro Buttarello and Mario Plebani. Automated blood cell counts: State of the art.American Journal of Clinical Pathology, 130(1):104–116, 2008

  41. [41]

    Sterling, Eduardo Lissen, Nathan Clumeck, et al

    Richard K. Sterling, Eduardo Lissen, Nathan Clumeck, et al. Development of a simple noninvasive index to predict significant fibrosis in patients with HIV/HCV coinfection.Hepatology, 43(6):1317–1325, 2006

  42. [42]

    Age as a confounding factor for the accurate non-invasive diagnosis of advanced NAFLD fibrosis.American Journal of Gastroenterology, 112(5):740–751, 2017

    Stuart McPherson, Timothy Hardy, Jean-Franc ¸ois Dufour, et al. Age as a confounding factor for the accurate non-invasive diagnosis of advanced NAFLD fibrosis.American Journal of Gastroenterology, 112(5):740–751, 2017

  43. [43]

    Friedewald, Robert I

    William T. Friedewald, Robert I. Levy, and Donald S. Fredrickson. Estimation of the concentration of low- density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge.Clinical Chemistry, 18(6):499–502, 1972

  44. [44]

    Milada Dobiasova and Jiri Frohlich. The plasma parameter log(TG/HDL-C) as an atherogenic index: Correlation with lipoprotein particle size and esterification rate in apoB-lipoprotein-depleted plasma.Clinical Biochemistry, 34(7):583–588, 2001

  45. [45]

    Atherogenic index of plasma [log(triglycerides/HDL-cholesterol)]: Theoretical and practical implications.Clinical Chemistry, 50(7):1113–1115, 2004

    Milada Dobiasova. Atherogenic index of plasma [log(triglycerides/HDL-cholesterol)]: Theoretical and practical implications.Clinical Chemistry, 50(7):1113–1115, 2004

  46. [46]

    Seidell, Young I

    Rodolfo Valdez, Jacob C. Seidell, Young I. Ahn, and Kenneth M. Weiss. A new index of abdominal adiposity as an indicator of risk for cardiovascular disease: A cross-population study.International Journal of Obesity, 15(12):893–902, 1991

  47. [47]

    Sensitivity and specificity of the conicity index as a coronary risk predictor among adults in salvador, brazil.Revista Brasileira de Epidemiologia, 8(4):367–381, 2005

    Francisco Jos ´e Gondim Pitanga and In ˆes Lessa. Sensitivity and specificity of the conicity index as a coronary risk predictor among adults in salvador, brazil.Revista Brasileira de Epidemiologia, 8(4):367–381, 2005

  48. [48]

    Arterial stiffness parame- ters and pulse pressure index.Medicina (Kaunas), 44(11):863–870, 2008

    Milda Kovaite, Sigitas Gra ˇzulis, Jurgita Venclovaite, and Aleksandras Laurinaviˇcius. Arterial stiffness parame- ters and pulse pressure index.Medicina (Kaunas), 44(11):863–870, 2008

  49. [49]

    Franklin

    Stanley S. Franklin. The importance of diastolic blood pressure in predicting cardiovascular risk.Journal of the American Society of Hypertension, 1(1):82–93, 2007. 19

  50. [50]

    Simental-Mend ´ıa, Martha Rodr´ıguez-Mor´an, and Fernando Guerrero-Romero

    Luis E. Simental-Mend ´ıa, Martha Rodr´ıguez-Mor´an, and Fernando Guerrero-Romero. The product of fasting glucose and triglycerides as surrogate for identifying insulin resistance in apparently healthy subjects.Metabolic Syndrome and Related Disorders, 6(4):299–304, 2008

  51. [51]

    Simental-Mend ´ıa, Manuel Gonz´alez-Ortiz, Esperanza Mart´ınez-Abundis, Mar´ıa G

    Fernando Guerrero-Romero, Luis E. Simental-Mend ´ıa, Manuel Gonz´alez-Ortiz, Esperanza Mart´ınez-Abundis, Mar´ıa G. Ramos-Zavala, Sandra O. Hern ´andez-Gonz´alez, Omar Jacques-Camarena, and Martha Rodr ´ıguez- Mor´an. The product of triglycerides and glucose, a simple measure of insulin sensitivity: Comparison with the euglycemic-hyperinsulinemic clamp.Th...

  52. [52]

    Matthews, Janet P

    David R. Matthews, Janet P. Hosker, Alan S. Rudenski, Barbara A. Naylor, David F. Treacher, and Robert C. Turner. Homeostasis model assessment: Insulin resistance andβ-cell function from fasting plasma glucose and insulin concentrations in man.Diabetologia, 28(7):412–419, 1985

  53. [53]

    Insulin resistance (HOMA- IR) cut-off values and the metabolic syndrome in a general adult population.European Journal of Internal Medicine, 24(8):818–823, 2013

    Pilar Gayoso-Diz, Alfonso Otero-Gonz ´alez, Mar´ıa Xos´e Rodr´ıguez-´Alvarez, et al. Insulin resistance (HOMA- IR) cut-off values and the metabolic syndrome in a general adult population.European Journal of Internal Medicine, 24(8):818–823, 2013

  54. [54]

    Ratio of neutrophil to lymphocyte counts — rapid and simple parameter of systemic inflam- mation and stress in critically ill.Bratislavsk ´e Lek´arske Listy, 102(1):5–14, 2001

    Roman Zahorec. Ratio of neutrophil to lymphocyte counts — rapid and simple parameter of systemic inflam- mation and stress in critically ill.Bratislavsk ´e Lek´arske Listy, 102(1):5–14, 2001

  55. [55]

    What is the normal value of the neutrophil-to-lymphocyte ratio?BMC Research Notes, 10(1):12, 2017

    Patrice Forget, Cahrine Khalifa, Jean-Philippe Defour, Dominique Latinne, Marie-C ´eline Van Pel, and Marc De Kock. What is the normal value of the neutrophil-to-lymphocyte ratio?BMC Research Notes, 10(1):12, 2017. A Appendix A.1 Medical Profile Details

  56. [56]

    The target value MCHC (mean cor- puscular hemoglobin concentration, in g/dL) classifies chromicity: hypochromic (MCHC<32), normochromic (32≤MCHC≤36), or hyperchromic (MCHC>36) [38]

    Anemia Classification (MCHC) Red blood cell indices are derived from hemoglobin (Hb), hematocrit (Hct), and red blood cell count (RBC): MCV= Hct RBC ×10,MCH= Hb RBC ×10,MCHC= Hb Hct ×100(5) These are the standard red cell indices defined in clinical hematology [37]. The target value MCHC (mean cor- puscular hemoglobin concentration, in g/dL) classifies ch...

  57. [57]

    Risk thresholds: low risk/rule out (FIB-4<1.30), indeterminate (1.30≤FIB-4≤2.67), high risk/rule in (FIB-4>2.67)

    Liver Fibrosis (FIB-4) The FIB-4 index [39] is a validated non-invasive biomarker for hepatic fibrosis staging: FIB-4= age×AST PLT× √ ALT (6) where AST and ALT are in U/L and PLT is in109/L. Risk thresholds: low risk/rule out (FIB-4<1.30), indeterminate (1.30≤FIB-4≤2.67), high risk/rule in (FIB-4>2.67). These cutoffs are from Sterling et al. [39] and vali...

  58. [58]

    The atherogenic index of plasma [42] is: AIP= log 10 TG HDL (7) where TG and HDL are in mg/dL

    Atherogenic Index of Plasma (AIP) Lipid-derived values include non-HDL cholesterol (TC−HDL) and Friedewald LDL (TC−HDL−TG/5) [41]. The atherogenic index of plasma [42] is: AIP= log 10 TG HDL (7) where TG and HDL are in mg/dL. Risk classification: low cardiovascular risk (AIP<0.11), intermediate (0.11≤ AIP≤0.21), high (AIP>0.21) [43]

  59. [59]

    The denominator represents the circumference of a cylinder with the same height and mass

    Conicity Index (Obesity) Body composition is assessed via BMI (wt/ht2), waist-to-height ratio (waist/ht), and the conicity index [44]: CI= waist 0.109× p wt/ht (8) where waist and height are in meters, and weight is in kilograms. The denominator represents the circumference of a cylinder with the same height and mass. Central obesity is indicated when CI>1.25[45]

  60. [60]

    The pulse pressure index [46]: PPI= PP SBP = SBP−DBP SBP (9) High arterial stiffness is indicated when PPI>0.60[47]

    Vascular Stiffness (Pulse Pressure Index) Blood pressure derivatives include pulse pressure (PP=SBP−DBP), mean arterial pressure (MAP=DBP+PP/3), and mid-blood pressure (MBP= (SBP+DBP)/2). The pulse pressure index [46]: PPI= PP SBP = SBP−DBP SBP (9) High arterial stiffness is indicated when PPI>0.60[47]. 21

  61. [61]

    Insulin resistance is indicated when TyG>8.5[49]

    Triglyceride-Glucose Index (TyG) The TyG index [48] is a surrogate marker for insulin resistance: TyG= ln TG×Glu 2 (10) where TG is in mg/dL and Glu is fasting glucose in mg/dL. Insulin resistance is indicated when TyG>8.5[49]

  62. [62]

    The constant 405 normalizes to conventional units

    HOMA-IR (Insulin Resistance) The Homeostatic Model Assessment for Insulin Resistance [50]: HOMA-IR= Glu×Ins 405 (11) where Glu is fasting glucose (mg/dL) and Ins is fasting insulin (µU/mL). The constant 405 normalizes to conventional units. Classification: insulin sensitive (HOMA<1.0), normal (1.0≤HOMA<2.5), insulin resistant (HOMA≥ 2.5) [51]

  63. [63]

    Systemic inflammation or physiologic stress is indicated when NLR≥3.0[53]

    Neutrophil-to-Lymphocyte Ratio (NLR) The NLR is a marker of systemic inflammation [52]: NLR= Neutrophils Lymphocytes (12) with auxiliary values NLRsum =Neu+Lym and NLR diff =|Neu−Lym|included as intermediate nodes. Systemic inflammation or physiologic stress is indicated when NLR≥3.0[53]. A.2 Additional Baseline Details A.2.1 Discrete Exponential Mechanis...

  64. [64]

    Draw a signS∈ {−1,+1}uniformly at random

  65. [65]

    The integerGselects which “step” the noise lands on: the noise magnitude will fall in the interval[G, G+1)in index units

    DrawG∼Geom(1−e −ϵi), withPr[G=g] = (1−e −ϵi)e −ϵig forg= 0,1,2, . . .The integerGselects which “step” the noise lands on: the noise magnitude will fall in the interval[G, G+1)in index units

  66. [66]

    With probability1−γ: setˆη i =S·(G+ 1−U)

    With probabilityγ= 1/(1 +e ϵi/2): setˆηi =S·(G+U). With probability1−γ: setˆη i =S·(G+ 1−U). The geometric variableGdetermines the step, with each successive step having probabilitye −ϵi times the previous — the same decay rate as the Laplace. The uniform variableUplaces the noise within the step. Theγ-branch orientsUtoward the inner edge (closer to zero)...

  67. [67]

    Set all root values to their population means:x=µ

  68. [68]

    Propagate values forward through the DAG in topological order, computing all intermediate node values

  69. [69]

    Propagate gradient seeds forward using the chain rule with exact analytical local partial derivatives at each node. 26 0.005 0.01 0.02 0.05 0.1 0.2 0.5 1 2 5 100 101 102 103 wMAPE (%) HOMA (target: homa) Exp-All Exp-Roots Exp-Opt BLap-All BLap-Roots BLap-Opt Stair-All Stair-Roots Stair-Opt 0.005 0.01 0.02 0.05 0.1 0.2 0.5 1 2 5 100 101 102 NLR (target: nl...

  70. [70]

    Since target formulas may be nonlinear,∂g/∂x i can depend on the values of other roots; evaluating atµcaptures these cross-dependencies at a representative operating point

    Take absolute values:h i =|∂g/∂x i(µ)|. Since target formulas may be nonlinear,∂g/∂x i can depend on the values of other roots; evaluating atµcaptures these cross-dependencies at a representative operating point. The population means are domain knowledge — in our experiments, computed from the NHANES [13] reference population (excluding test data). If the...