Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation

Alexander Nemecek; Debargha Ganguly; Erman Ayday; Osama Zafar; Vikash Singh; Vipin Chaudhary; Wenbiao Li; Yiqian Zhang

arxiv: 2605.17034 · v1 · pith:IZ3TAFUZnew · submitted 2026-05-16 · 💻 cs.LG · cs.AI· cs.CR

Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation

Osama Zafar , Alexander Nemecek , Yiqian Zhang , Wenbiao Li , Debargha Ganguly , Vikash Singh , Vipin Chaudhary , Erman Ayday This is my paper

Pith reviewed 2026-05-19 20:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR

keywords privacy policy enforcementretrieval augmented generationone-class classificationcontextual data leakagesynthetic dataRAG guardrailsanomaly detection

0 comments

The pith

Dual one-class density estimators detect contextual privacy leaks in RAG with over 0.93 AUROC on borderline cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard PII filters miss contextual data leakage in RAG systems where non-regulated attributes together identify individuals. This paper introduces a Privacy Policy Enforcement framework built on dual one-class density estimators that use fused text embeddings and include a calibrated abstain region for out-of-distribution inputs. The authors generate training data with an axis-stratified multi-LLM pipeline spanning medicine, finance, and law to include safe and borderline-safe cases. A reader would care because the resulting T3+OCSVM detector reaches over 0.93 AUROC on hard borderline tests, cuts false positives sharply, and runs in milliseconds, offering better practicality than alternatives.

Core claim

The central discovery is that training a T3+OCSVM detector on safe and borderline-safe synthetic data allows it to identify privacy policy violations in RAG queries with a borderline AUROC of 0.93 or more. This comes with a 44-55 percentage point reduction in false positives compared to Gaussian Mixture models while keeping inference at millisecond speeds. The method proves more operationally viable than supervised MLP classifiers, which abstain too often, or 14B-parameter LLM judges, which are too slow and poorly calibrated.

What carries the argument

Dual one-class density estimators (specifically T3+OCSVM) applied to fused text embeddings, creating a model of safe query density with an abstain region for inputs that might leak private information.

If this is right

RAG systems gain a practical tool to block contextual leaks without high computational overhead.
Synthetic data from multi-LLM pipelines can serve as a reliable proxy for training privacy detectors across domains.
Gaussian Mixture baselines are inadequate for borderline cases because they latch onto linguistic register instead of semantic content.
The framework sets a standard for stress-testing any classifier trained on synthetic privacy data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar density-based approaches might help in other areas like detecting misinformation or bias in generated outputs.
Real-world deployment could involve feedback loops where abstained cases are reviewed to refine the model.
The method's focus on borderline cases suggests it could generalize to evolving privacy regulations by updating the training distribution.

Load-bearing premise

The axis-stratified multi-LLM synthetic data pipeline creates borderline-safe examples that have the same privacy leakage properties as those in real RAG deployments in medicine, finance, and law.

What would settle it

Running the T3+OCSVM detector on real production RAG queries from medical, financial, or legal applications and verifying if the borderline AUROC stays above 0.93 with comparable false positive reductions.

Figures

Figures reproduced from arXiv: 2605.17034 by Alexander Nemecek, Debargha Ganguly, Erman Ayday, Osama Zafar, Vikash Singh, Vipin Chaudhary, Wenbiao Li, Yiqian Zhang.

**Figure 1.** Figure 1: Layered privacy enforcement for RAG. Layer-1 catches direct identifiers via regex/NER; Layer-2 (this work) detects contextual QI-cluster leakage [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Axis-stratified multi-LLM data generation pipeline, instantiated per domain with the QI taxonomy of Table [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Layer-2 detector architecture. Three frozen encoders produce a [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Case-style confound and its remediation. (a) AUROC performance [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Standard PII filters often miss contextual data leakage in RAG systems, such as non-regulated attribute clusters that collectively identify individuals. We introduce a Privacy Policy Enforcement (PPE) framework using dual one-class density estimators with fused text embeddings and a calibrated abstain region for out-of-distribution inputs. Using an axis-stratified, multi-LLM synthetic data pipeline across medicine, finance, and law, we found that traditional Gaussian Mixture baselines fail on borderline-safe stress tests by focusing on linguistic register rather than content. Our proposed T3+OCSVM detector, trained on safe and borderline-safe data, achieves a borderline AUROC of 0.93+ while reducing false positives by 44-55 percentage points and maintaining millisecond latency. Compared to supervised MLP classifiers or 14B-parameter LLM judges, our framework offers superior operational suitability, as the former suffers from high abstention rates and the latter from latency and calibration issues. This methodology provides a robust stress-testing standard for any synthetic-data-trained classifier.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical dual one-class estimator setup for spotting contextual privacy leaks in RAG that basic filters miss, but the gains rest on synthetic data whose match to real deployments is unproven.

read the letter

The main takeaway is that this work builds a Privacy Policy Enforcement guardrail for RAG by running two one-class density estimators on fused embeddings and adding a calibrated abstain zone for odd inputs. They train on safe plus borderline-safe synthetic examples and report a 0.93 AUROC on borderline cases along with a 44-55 point false-positive drop versus Gaussian mixtures, all at millisecond speeds.

Referee Report

3 major / 2 minor

Summary. The paper introduces a Privacy Policy Enforcement (PPE) framework for RAG systems that employs dual one-class density estimators (specifically T3+OCSVM) on fused text embeddings together with a calibrated abstain region for OOD inputs. Using an axis-stratified multi-LLM synthetic data generation pipeline spanning medicine, finance, and law, the authors report that their detector achieves a borderline AUROC above 0.93, reduces false positives by 44-55 percentage points relative to Gaussian Mixture baselines, and operates at millisecond latency while outperforming supervised MLP classifiers and large LLM judges on operational metrics.

Significance. If the central performance claims hold under more rigorous validation, the work offers a practical, low-latency guardrail for contextual privacy leakage in RAG deployments that standard PII filters miss. The emphasis on one-class learning trained on safe and borderline-safe examples, combined with an explicit stress-testing protocol for synthetic-data classifiers, could influence guardrail design in regulated domains.

major comments (3)

Abstract and Evaluation section: the reported borderline AUROC of 0.93+ and 44-55 pp false-positive reductions are given without error bars, number of runs, or ablation studies on embedding fusion and abstain-region calibration; these omissions prevent assessment of whether the gains over Gaussian Mixture, MLP, and LLM baselines are statistically robust or sensitive to hyper-parameters.
Synthetic data pipeline description (Abstract and §3): the central operational claims rest on the assumption that axis-stratified multi-LLM generated borderline-safe examples reproduce the content-based privacy leakage distributions encountered in real RAG systems across medicine, finance, and law. No cross-validation against real query logs or attribute-cluster statistics is presented, leaving open the possibility that reported AUROC and FP reductions reflect LLM artifacts rather than transferable leakage patterns.
Comparison to baselines (Evaluation): the superiority claims versus 14B-parameter LLM judges cite latency and calibration issues, yet no quantitative latency measurements or calibration plots (e.g., ECE or reliability diagrams) are referenced for the proposed T3+OCSVM detector itself, making the operational-suitability argument incomplete.

minor comments (2)

Clarify the precise definition of 'borderline-safe' examples and the axis stratification criteria used in the synthetic pipeline; a short table or pseudocode would improve reproducibility.
The manuscript should state the embedding model and dimensionality explicitly when describing the fused text embeddings.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. We plan to incorporate several revisions to address the concerns raised.

read point-by-point responses

Referee: Abstract and Evaluation section: the reported borderline AUROC of 0.93+ and 44-55 pp false-positive reductions are given without error bars, number of runs, or ablation studies on embedding fusion and abstain-region calibration; these omissions prevent assessment of whether the gains over Gaussian Mixture, MLP, and LLM baselines are statistically robust or sensitive to hyper-parameters.

Authors: We agree that the current presentation lacks sufficient statistical detail. In the revised version, we will report results averaged over 10 independent training and evaluation runs, including error bars representing standard deviation for both AUROC and false positive reduction metrics. We will also include ablation studies examining the impact of different embedding fusion methods (e.g., concatenation vs. averaging) and variations in the abstain region calibration thresholds, demonstrating the robustness of our performance gains. revision: yes
Referee: Synthetic data pipeline description (Abstract and §3): the central operational claims rest on the assumption that axis-stratified multi-LLM generated borderline-safe examples reproduce the content-based privacy leakage distributions encountered in real RAG systems across medicine, finance, and law. No cross-validation against real query logs or attribute-cluster statistics is presented, leaving open the possibility that reported AUROC and FP reductions reflect LLM artifacts rather than transferable leakage patterns.

Authors: This is a valid concern regarding the generalizability of our synthetic data approach. We designed the multi-LLM, axis-stratified pipeline specifically to generate diverse and challenging borderline-safe examples across the specified domains. However, we do not have access to proprietary real-world RAG query logs for cross-validation, as such data would contain sensitive information. We will expand §3 with additional details on the generation process and add a new subsection on limitations, explicitly discussing the potential influence of LLM artifacts and the need for future validation on real data where possible. revision: partial
Referee: Comparison to baselines (Evaluation): the superiority claims versus 14B-parameter LLM judges cite latency and calibration issues, yet no quantitative latency measurements or calibration plots (e.g., ECE or reliability diagrams) are referenced for the proposed T3+OCSVM detector itself, making the operational-suitability argument incomplete.

Authors: We will enhance the Evaluation section by providing explicit quantitative latency measurements for the T3+OCSVM detector, including average and percentile inference times on standard hardware. Additionally, we will include calibration analysis with Expected Calibration Error (ECE) values and reliability diagrams for our detector to enable a complete comparison with the LLM-based baselines. revision: yes

standing simulated objections not resolved

Validation of the synthetic data pipeline through cross-validation against real query logs or attribute-cluster statistics from production RAG systems.

Circularity Check

0 steps flagged

No circularity: empirical evaluation on held-out synthetic data with no derivations or self-referential definitions

full rationale

The manuscript presents an applied ML framework for privacy policy enforcement in RAG systems. It trains a T3+OCSVM detector on axis-stratified synthetic safe and borderline-safe examples and reports standard evaluation metrics (borderline AUROC 0.93+, FP reduction) on held-out portions of that data. No equations, mathematical derivations, parameter-fitting steps that are then relabeled as predictions, or load-bearing self-citations appear in the abstract or described methodology. The central claims are falsifiable empirical performance numbers rather than quantities defined in terms of themselves. The paper is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review limited to abstract; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that synthetic data distributions capture real contextual leakage patterns and that one-class density estimation on embeddings is sufficient to separate safe from borderline content.

pith-pipeline@v0.9.0 · 5731 in / 1310 out tokens · 57643 ms · 2026-05-19T20:22:58.470375+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dual one-class density estimator over fused text embeddings with calibrated abstain region... T3+OCSVM detector... borderline-safe stress test
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

axis-stratified multi-LLM synthetic data pipeline across medicine, finance, and law

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 6 internal anchors

[1]

Sankar, B

D. Gangulyet al., “Trust the typical,”arXiv preprint arXiv:2602.04581, 2026

work page arXiv 2026
[2]

k-anonymity: A model for protecting privacy,

L. Sweeney, “k-anonymity: A model for protecting privacy,”Interna- tional Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, 2002

work page 2002
[3]

Extracting training data from large language models,

N. Carliniet al., “Extracting training data from large language models,” in30th USENIX Security Symposium, 2021

work page 2021
[4]

Quantifying memorization across neural language models,

——, “Quantifying memorization across neural language models,” in International Conference on Learning Representations (ICLR), 2023

work page 2023
[5]

Scalable Extraction of Training Data from (Production) Language Models

M. Nasret al., “Scalable extraction of training data from (production) language models,”arXiv preprint arXiv:2311.17035, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Membership inference attacks against machine learning models,

R. Shokriet al., “Membership inference attacks against machine learning models,” inIEEE Symposium on Security and Privacy (SP), 2017

work page 2017
[7]

Membership inference attacks against language models via neighbourhood comparison,

J. Matternet al., “Membership inference attacks against language models via neighbourhood comparison,” inFindings of the ACL, 2023

work page 2023
[8]

Exploring membership inference vulnerabilities in clinical large language models,

A. Nemecek, Z. Yun, Z. Rahmani, Y . Harel, V . Chaudhary, M. Sharif, and E. Ayday, “Exploring membership inference vulnerabilities in clinical large language models,”arXiv preprint arXiv:2510.18674, 2025

work page arXiv 2025
[9]

Retrieval-augmented generation for knowledge-intensive nlp tasks,

P. Lewiset al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[10]

Realm: Retrieval-augmented language model pre- training,

K. Guuet al., “Realm: Retrieval-augmented language model pre- training,” inProceedings of the 37th International Conference on Machine Learning (ICML), 2020

work page 2020
[11]

The good and the bad: Exploring privacy issues in retrieval-augmented generation,

Y . Zenget al., “The good and the bad: Exploring privacy issues in retrieval-augmented generation,” inFindings of the ACL 2024, 2024

work page 2024
[12]

Cohen, R

S. Cohenet al., “Compromptmized: Unleashing zero-click worms that target genai-powered applications,”arXiv preprint arXiv:2403.02817, 2024

work page arXiv 2024
[13]

Circumventing steerability in retrieval-augmented genera- tion,

Z. Qiet al., “Circumventing steerability in retrieval-augmented genera- tion,”arXiv preprint arXiv:2403.04832, 2024

work page arXiv 2024
[14]

React: Synergizing reasoning and acting in language mod- els,

S. Yaoet al., “React: Synergizing reasoning and acting in language mod- els,” inInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[15]

Toolformer: Language models can teach themselves to use tools,

T. Schicket al., “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[16]

and Schmidt, Ludwig , year =

N. Andersonet al., “Membership inference attacks on retrieval- augmented generation,”arXiv preprint arXiv:2406.12031, 2024

work page arXiv 2024
[17]

Zkprov: A zero-knowledge approach to dataset provenance for large language models,

M. Namazi, A. Nemecek, and E. Ayday, “Zkprov: A zero-knowledge approach to dataset provenance for large language models,”arXiv preprint arXiv:2506.20915, 2025

work page arXiv 2025
[18]

Health insurance portability and accountability act of 1996 (hipaa),

US Congress, “Health insurance portability and accountability act of 1996 (hipaa),” Pub. L. 104-191, 1996

work page 1996
[19]

l-diversity: Privacy beyond k-anonymity,

A. Machanavajjhalaet al., “l-diversity: Privacy beyond k-anonymity,” ACM Transactions on Knowledge Discovery from Data (TKDD), 2007

work page 2007
[20]

t-closeness: Privacy beyond k-anonymity and l-diversity,

N. Liet al., “t-closeness: Privacy beyond k-anonymity and l-diversity,” inIEEE ICDE, 2007

work page 2007
[21]

A systematic review of re-identification attacks on health data,

K. El Emamet al., “A systematic review of re-identification attacks on health data,”PLoS ONE, 2011

work page 2011
[22]

Estimating the success of re-identifications in incom- plete datasets using generative models,

L. Rocheret al., “Estimating the success of re-identifications in incom- plete datasets using generative models,”Nature Communications, 2019

work page 2019
[23]

The text anonymization benchmark (tab): A specialized corpus for measuring the effectiveness of de-identification,

I. Pil ´anet al., “The text anonymization benchmark (tab): A specialized corpus for measuring the effectiveness of de-identification,”Computa- tional Linguistics, 2022

work page 2022
[24]

Analyzing leakage of personally identifiable informa- tion in language models,

N. Lukaset al., “Analyzing leakage of personally identifiable informa- tion in language models,” inIEEE Symposium on Security and Privacy (SP), 2023

work page 2023
[25]

Training language models to follow instructions with human feedback,

L. Ouyanget al., “Training language models to follow instructions with human feedback,”NeurIPS, 2022

work page 2022
[26]

Constitutional AI: Harmlessness from AI Feedback

Y . Baiet al., “Constitutional ai: Harmlessness from ai feedback,”arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[27]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

H. Inanet al., “Llama guard: Llm-based input-output safeguard for human-ai conversations,”arXiv preprint arXiv:2312.06674, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[28]

Nemo guardrails: A toolkit for controllable and safe llm applications,

T. Rebedeaet al., “Nemo guardrails: A toolkit for controllable and safe llm applications,” inEMNLP System Demonstrations, 2023

work page 2023
[29]

Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms,

S. Hanet al., “Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms,”Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[30]

Universal and Transferable Adversarial Attacks on Aligned Language Models

A. Zouet al., “Universal and transferable adversarial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Jailbroken: How does llm safety training fail?

A. Weiet al., “Jailbroken: How does llm safety training fail?” in NeurIPS, 2023

work page 2023
[32]

McLachlan and K

G. McLachlan and K. Basford,Mixture Models: Inference and Applica- tions to Clustering. Marcel Dekker, 1988

work page 1988
[33]

Support vector method for novelty detection,

B. Scholkopfet al., “Support vector method for novelty detection,” in NeurIPS, 1999

work page 1999
[34]

Deep one-class classification,

L. Ruffet al., “Deep one-class classification,” inProceedings of the 35th ICML, 2018

work page 2018
[35]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. John Wiley and Sons, 2006

work page 2006
[36]

Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality,

E. Nalisnicket al., “Detecting out-of-distribution inputs to deep gener- ative models using typicality,”arXiv preprint arXiv:1906.02994, 2019

work page arXiv 1906
[37]

Density of states estimation for out of distribution detection,

W. Morningstar, C. Ham, A. Gallagher, B. Lakshminarayanan, A. Alemi, and J. Dillon, “Density of states estimation for out of distribution detection,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 3232–3240

work page 2021
[38]

Forte : Finding outliers with representation typicality estimation,

D. Ganguly, W. R. Morningstar, A. S. Yu, and V . Chaudhary, “Forte : Finding outliers with representation typicality estimation,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=7XNgVPxCiA

work page 2025
[39]

$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning

W. Chen, V . Singh, Z. Rahmani, D. Ganguly, M. Hariri, and V . Chaud- hary, “K 4: Online log anomaly detection via unsupervised typicality learning,”arXiv preprint arXiv:2507.20051, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

La- beling copilot: A deep research agent for automated data curation in computer vision,

D. Ganguly, S. Kumar, I. Balappanawar, W. Chen, S. Kambhatla, S. Iyengar, S. Kalyanaraman, P. Kumaraguru, and V . Chaudhary, “La- beling copilot: A deep research agent for automated data curation in computer vision,”arXiv preprint arXiv:2509.22631, 2025

work page arXiv 2025
[41]

Context determines optimal architecture in materials segmentation,

M. Lu, P. K. Tripathi, M. Shteyn, D. Ganguly, R. H. French, V . Chaud- hary, and Y . Wu, “Context determines optimal architecture in materials segmentation,”arXiv preprint arXiv:2602.04154, 2026

work page arXiv 2026
[42]

Likelihood ratios for out-of-distribution detection,

J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, J. Dillon, and B. Lakshminarayanan, “Likelihood ratios for out-of-distribution detection,”Advances in neural information processing systems, vol. 32, 2019

work page 2019
[43]

A question-entailment ap- proach to question answering,

A. Ben Abacha and D. Demner-Fushman, “A question-entailment ap- proach to question answering,”BMC Bioinformatics, vol. 20, no. 511, 2019, dataset: https://github.com/abachaa/MedQuAD

work page 2019
[44]

Medical Q&A vignettes (adrianf12),

adrianf12, “Medical Q&A vignettes (adrianf12),” HuggingFace Datasets, https://huggingface.co/datasets/adrianf12/healthcare conversational prompt completion 10k, 2024

work page 2024
[45]

Medical conversational Q&A (kabatubare),

kabatubare, “Medical conversational Q&A (kabatubare),” HuggingFace Datasets, https://huggingface.co/datasets/Kabatubare/medical/viewer/ default/train, 2024

work page 2024
[46]

FinanceBench: A New Benchmark for Financial Question Answering

P. Islam, A. Kannappan, D. Kiela, R. Qian, N. Scherrer, and B. Vidgen, “FinanceBench: A new benchmark for financial question answering,” arXiv preprint arXiv:2311.11944, 2023, dataset: https://huggingface.co/ datasets/PatronusAI/financebench

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

FinQA: A dataset of numerical reasoning over financial data,

Z. Chen, W. Chen, C. Smiley, S. Shah, I. Borova, D. Langdon, R. Moussa, M. Beane, T.-H. Huang, B. Routledge, and W. Y . Wang, “FinQA: A dataset of numerical reasoning over financial data,” in Empirical Methods in Natural Language Processing (EMNLP), 2021, dataset: https://github.com/czyssrs/FinQA

work page 2021
[48]

Money Stack Exchange data dump,

Stack Exchange, Inc., “Money Stack Exchange data dump,” Inter- net Archive Stack Exchange Collection, https://archive.org/download/ stackexchange/money.stackexchange.com.7z, 2024, community Q&A under CC BY-SA 4.0; site: https://money.stackexchange.com

work page 2024
[49]

Measuring massive multitask language understanding,

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” inInternational Conference on Learning Representations (ICLR), 2021, dataset (subsetprofessional_law): https://huggingface.co/datasets/ cais/mmlu/viewer/professional law

work page 2021
[50]

Open Australian legal Q&A,

U. Butler, “Open Australian legal Q&A,” HuggingFace Datasets, https: //huggingface.co/datasets/umarbutler/open-australian-legal-qa, 2023

work page 2023
[51]

LegalQA-v1: Legal question answering dataset,

dzunggg, “LegalQA-v1: Legal question answering dataset,” Hug- gingFace Datasets, https://huggingface.co/datasets/dzunggg/legal-qa-v1, 2023

work page 2023
[52]

Law Stack Exchange data dump,

Stack Exchange, Inc., “Law Stack Exchange data dump,” Inter- net Archive Stack Exchange Collection, https://archive.org/download/ stackexchange/law.stackexchange.com.7z, 2024, community Q&A under CC BY-SA 4.0; site: https://law.stackexchange.com

work page 2024
[53]

EUR-Lex-Sum: A multi- and cross-lingual dataset for long-form summarization in the legal domain,

D. Aumiller, A. Chouhan, and M. Gertz, “EUR-Lex-Sum: A multi- and cross-lingual dataset for long-form summarization in the legal domain,” inEmpirical Methods in Natural Language Processing (EMNLP), 2022, dataset: https://huggingface.co/datasets/dennlinger/eur-lex-sum. APPENDIX This appendix collects per-domain texture that did not fit in the merged main s...

work page 2022

[1] [1]

Sankar, B

D. Gangulyet al., “Trust the typical,”arXiv preprint arXiv:2602.04581, 2026

work page arXiv 2026

[2] [2]

k-anonymity: A model for protecting privacy,

L. Sweeney, “k-anonymity: A model for protecting privacy,”Interna- tional Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, 2002

work page 2002

[3] [3]

Extracting training data from large language models,

N. Carliniet al., “Extracting training data from large language models,” in30th USENIX Security Symposium, 2021

work page 2021

[4] [4]

Quantifying memorization across neural language models,

——, “Quantifying memorization across neural language models,” in International Conference on Learning Representations (ICLR), 2023

work page 2023

[5] [5]

Scalable Extraction of Training Data from (Production) Language Models

M. Nasret al., “Scalable extraction of training data from (production) language models,”arXiv preprint arXiv:2311.17035, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

Membership inference attacks against machine learning models,

R. Shokriet al., “Membership inference attacks against machine learning models,” inIEEE Symposium on Security and Privacy (SP), 2017

work page 2017

[7] [7]

Membership inference attacks against language models via neighbourhood comparison,

J. Matternet al., “Membership inference attacks against language models via neighbourhood comparison,” inFindings of the ACL, 2023

work page 2023

[8] [8]

Exploring membership inference vulnerabilities in clinical large language models,

A. Nemecek, Z. Yun, Z. Rahmani, Y . Harel, V . Chaudhary, M. Sharif, and E. Ayday, “Exploring membership inference vulnerabilities in clinical large language models,”arXiv preprint arXiv:2510.18674, 2025

work page arXiv 2025

[9] [9]

Retrieval-augmented generation for knowledge-intensive nlp tasks,

P. Lewiset al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020

[10] [10]

Realm: Retrieval-augmented language model pre- training,

K. Guuet al., “Realm: Retrieval-augmented language model pre- training,” inProceedings of the 37th International Conference on Machine Learning (ICML), 2020

work page 2020

[11] [11]

The good and the bad: Exploring privacy issues in retrieval-augmented generation,

Y . Zenget al., “The good and the bad: Exploring privacy issues in retrieval-augmented generation,” inFindings of the ACL 2024, 2024

work page 2024

[12] [12]

Cohen, R

S. Cohenet al., “Compromptmized: Unleashing zero-click worms that target genai-powered applications,”arXiv preprint arXiv:2403.02817, 2024

work page arXiv 2024

[13] [13]

Circumventing steerability in retrieval-augmented genera- tion,

Z. Qiet al., “Circumventing steerability in retrieval-augmented genera- tion,”arXiv preprint arXiv:2403.04832, 2024

work page arXiv 2024

[14] [14]

React: Synergizing reasoning and acting in language mod- els,

S. Yaoet al., “React: Synergizing reasoning and acting in language mod- els,” inInternational Conference on Learning Representations (ICLR), 2023

work page 2023

[15] [15]

Toolformer: Language models can teach themselves to use tools,

T. Schicket al., “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[16] [16]

and Schmidt, Ludwig , year =

N. Andersonet al., “Membership inference attacks on retrieval- augmented generation,”arXiv preprint arXiv:2406.12031, 2024

work page arXiv 2024

[17] [17]

Zkprov: A zero-knowledge approach to dataset provenance for large language models,

M. Namazi, A. Nemecek, and E. Ayday, “Zkprov: A zero-knowledge approach to dataset provenance for large language models,”arXiv preprint arXiv:2506.20915, 2025

work page arXiv 2025

[18] [18]

Health insurance portability and accountability act of 1996 (hipaa),

US Congress, “Health insurance portability and accountability act of 1996 (hipaa),” Pub. L. 104-191, 1996

work page 1996

[19] [19]

l-diversity: Privacy beyond k-anonymity,

A. Machanavajjhalaet al., “l-diversity: Privacy beyond k-anonymity,” ACM Transactions on Knowledge Discovery from Data (TKDD), 2007

work page 2007

[20] [20]

t-closeness: Privacy beyond k-anonymity and l-diversity,

N. Liet al., “t-closeness: Privacy beyond k-anonymity and l-diversity,” inIEEE ICDE, 2007

work page 2007

[21] [21]

A systematic review of re-identification attacks on health data,

K. El Emamet al., “A systematic review of re-identification attacks on health data,”PLoS ONE, 2011

work page 2011

[22] [22]

Estimating the success of re-identifications in incom- plete datasets using generative models,

L. Rocheret al., “Estimating the success of re-identifications in incom- plete datasets using generative models,”Nature Communications, 2019

work page 2019

[23] [23]

The text anonymization benchmark (tab): A specialized corpus for measuring the effectiveness of de-identification,

I. Pil ´anet al., “The text anonymization benchmark (tab): A specialized corpus for measuring the effectiveness of de-identification,”Computa- tional Linguistics, 2022

work page 2022

[24] [24]

Analyzing leakage of personally identifiable informa- tion in language models,

N. Lukaset al., “Analyzing leakage of personally identifiable informa- tion in language models,” inIEEE Symposium on Security and Privacy (SP), 2023

work page 2023

[25] [25]

Training language models to follow instructions with human feedback,

L. Ouyanget al., “Training language models to follow instructions with human feedback,”NeurIPS, 2022

work page 2022

[26] [26]

Constitutional AI: Harmlessness from AI Feedback

Y . Baiet al., “Constitutional ai: Harmlessness from ai feedback,”arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[27] [27]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

H. Inanet al., “Llama guard: Llm-based input-output safeguard for human-ai conversations,”arXiv preprint arXiv:2312.06674, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[28] [28]

Nemo guardrails: A toolkit for controllable and safe llm applications,

T. Rebedeaet al., “Nemo guardrails: A toolkit for controllable and safe llm applications,” inEMNLP System Demonstrations, 2023

work page 2023

[29] [29]

Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms,

S. Hanet al., “Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms,”Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[30] [30]

Universal and Transferable Adversarial Attacks on Aligned Language Models

A. Zouet al., “Universal and transferable adversarial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Jailbroken: How does llm safety training fail?

A. Weiet al., “Jailbroken: How does llm safety training fail?” in NeurIPS, 2023

work page 2023

[32] [32]

McLachlan and K

G. McLachlan and K. Basford,Mixture Models: Inference and Applica- tions to Clustering. Marcel Dekker, 1988

work page 1988

[33] [33]

Support vector method for novelty detection,

B. Scholkopfet al., “Support vector method for novelty detection,” in NeurIPS, 1999

work page 1999

[34] [34]

Deep one-class classification,

L. Ruffet al., “Deep one-class classification,” inProceedings of the 35th ICML, 2018

work page 2018

[35] [35]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. John Wiley and Sons, 2006

work page 2006

[36] [36]

Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality,

E. Nalisnicket al., “Detecting out-of-distribution inputs to deep gener- ative models using typicality,”arXiv preprint arXiv:1906.02994, 2019

work page arXiv 1906

[37] [37]

Density of states estimation for out of distribution detection,

W. Morningstar, C. Ham, A. Gallagher, B. Lakshminarayanan, A. Alemi, and J. Dillon, “Density of states estimation for out of distribution detection,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 3232–3240

work page 2021

[38] [38]

Forte : Finding outliers with representation typicality estimation,

D. Ganguly, W. R. Morningstar, A. S. Yu, and V . Chaudhary, “Forte : Finding outliers with representation typicality estimation,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=7XNgVPxCiA

work page 2025

[39] [39]

$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning

W. Chen, V . Singh, Z. Rahmani, D. Ganguly, M. Hariri, and V . Chaud- hary, “K 4: Online log anomaly detection via unsupervised typicality learning,”arXiv preprint arXiv:2507.20051, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

La- beling copilot: A deep research agent for automated data curation in computer vision,

D. Ganguly, S. Kumar, I. Balappanawar, W. Chen, S. Kambhatla, S. Iyengar, S. Kalyanaraman, P. Kumaraguru, and V . Chaudhary, “La- beling copilot: A deep research agent for automated data curation in computer vision,”arXiv preprint arXiv:2509.22631, 2025

work page arXiv 2025

[41] [41]

Context determines optimal architecture in materials segmentation,

M. Lu, P. K. Tripathi, M. Shteyn, D. Ganguly, R. H. French, V . Chaud- hary, and Y . Wu, “Context determines optimal architecture in materials segmentation,”arXiv preprint arXiv:2602.04154, 2026

work page arXiv 2026

[42] [42]

Likelihood ratios for out-of-distribution detection,

J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, J. Dillon, and B. Lakshminarayanan, “Likelihood ratios for out-of-distribution detection,”Advances in neural information processing systems, vol. 32, 2019

work page 2019

[43] [43]

A question-entailment ap- proach to question answering,

A. Ben Abacha and D. Demner-Fushman, “A question-entailment ap- proach to question answering,”BMC Bioinformatics, vol. 20, no. 511, 2019, dataset: https://github.com/abachaa/MedQuAD

work page 2019

[44] [44]

Medical Q&A vignettes (adrianf12),

adrianf12, “Medical Q&A vignettes (adrianf12),” HuggingFace Datasets, https://huggingface.co/datasets/adrianf12/healthcare conversational prompt completion 10k, 2024

work page 2024

[45] [45]

Medical conversational Q&A (kabatubare),

kabatubare, “Medical conversational Q&A (kabatubare),” HuggingFace Datasets, https://huggingface.co/datasets/Kabatubare/medical/viewer/ default/train, 2024

work page 2024

[46] [46]

FinanceBench: A New Benchmark for Financial Question Answering

P. Islam, A. Kannappan, D. Kiela, R. Qian, N. Scherrer, and B. Vidgen, “FinanceBench: A new benchmark for financial question answering,” arXiv preprint arXiv:2311.11944, 2023, dataset: https://huggingface.co/ datasets/PatronusAI/financebench

work page internal anchor Pith review Pith/arXiv arXiv 2023

[47] [47]

FinQA: A dataset of numerical reasoning over financial data,

Z. Chen, W. Chen, C. Smiley, S. Shah, I. Borova, D. Langdon, R. Moussa, M. Beane, T.-H. Huang, B. Routledge, and W. Y . Wang, “FinQA: A dataset of numerical reasoning over financial data,” in Empirical Methods in Natural Language Processing (EMNLP), 2021, dataset: https://github.com/czyssrs/FinQA

work page 2021

[48] [48]

Money Stack Exchange data dump,

Stack Exchange, Inc., “Money Stack Exchange data dump,” Inter- net Archive Stack Exchange Collection, https://archive.org/download/ stackexchange/money.stackexchange.com.7z, 2024, community Q&A under CC BY-SA 4.0; site: https://money.stackexchange.com

work page 2024

[49] [49]

Measuring massive multitask language understanding,

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” inInternational Conference on Learning Representations (ICLR), 2021, dataset (subsetprofessional_law): https://huggingface.co/datasets/ cais/mmlu/viewer/professional law

work page 2021

[50] [50]

Open Australian legal Q&A,

U. Butler, “Open Australian legal Q&A,” HuggingFace Datasets, https: //huggingface.co/datasets/umarbutler/open-australian-legal-qa, 2023

work page 2023

[51] [51]

LegalQA-v1: Legal question answering dataset,

dzunggg, “LegalQA-v1: Legal question answering dataset,” Hug- gingFace Datasets, https://huggingface.co/datasets/dzunggg/legal-qa-v1, 2023

work page 2023

[52] [52]

Law Stack Exchange data dump,

Stack Exchange, Inc., “Law Stack Exchange data dump,” Inter- net Archive Stack Exchange Collection, https://archive.org/download/ stackexchange/law.stackexchange.com.7z, 2024, community Q&A under CC BY-SA 4.0; site: https://law.stackexchange.com

work page 2024

[53] [53]

EUR-Lex-Sum: A multi- and cross-lingual dataset for long-form summarization in the legal domain,

D. Aumiller, A. Chouhan, and M. Gertz, “EUR-Lex-Sum: A multi- and cross-lingual dataset for long-form summarization in the legal domain,” inEmpirical Methods in Natural Language Processing (EMNLP), 2022, dataset: https://huggingface.co/datasets/dennlinger/eur-lex-sum. APPENDIX This appendix collects per-domain texture that did not fit in the merged main s...

work page 2022