{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:U5D6HC2VS2GCHQZBKOXHVRDGSN","short_pith_number":"pith:U5D6HC2V","schema_version":"1.0","canonical_sha256":"a747e38b55968c23c32153ae7ac466934b7c05785097fe084e950df749c2c5cb","source":{"kind":"arxiv","id":"2603.26074","version":2},"attestation_state":"computed","paper":{"title":"Not All Entities are Created Equal: A Dynamic Anonymization Framework for Privacy-Preserving Retrieval-Augmented Generation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"TRIP-RAG dynamically selects only high-risk entities for anonymization in RAG knowledge bases by scoring marginal privacy risk, knowledge divergence, and topical relevance.","cross_cats":[],"primary_cat":"cs.CR","authors_text":"Enye Wang, Guo Jia, Qingkai Zeng, Ruijie Wang, Ruiqi He, Xinyuan Zhu, Zekun Fei, Zheli Liu","submitted_at":"2026-03-27T05:03:24Z","abstract_excerpt":"Retrieval-Augmented Generation (RAG) enhances the utility of Large Language Models (LLMs) by retrieving external documents. Since the knowledge databases in RAG are predominantly utilized via cloud services, private data in sensitive domains such as finance and healthcare faces the risk of personal information leakage. Thus, effectively anonymizing knowledge bases is crucial for privacy preservation. Existing studies equate the privacy risk of text to the linear superposition of the privacy risks of individual, isolated sensitive entities. The \"one-size-fits-all\" full processing of all sensiti"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"2603.26074","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CR","submitted_at":"2026-03-27T05:03:24Z","cross_cats_sorted":[],"title_canon_sha256":"9ae9ca29dec8f34b6d995c3d1caedfcc8522b8df91994811a3f297269d23e04d","abstract_canon_sha256":"e3e1a88f01cfd624ccf1609bcc28ba042780ce37c6275038a1b03d51f77a4496"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T02:44:30.631876Z","signature_b64":"M6qBOxs5ht13pdYYognfBUPF8orpNEJ+5CvTipWL5QPoxC/bnGl/OlfHLuXuPM1xmNSIBATPCL7EAEVRH1G6Bw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"a747e38b55968c23c32153ae7ac466934b7c05785097fe084e950df749c2c5cb","last_reissued_at":"2026-05-18T02:44:30.631449Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T02:44:30.631449Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Not All Entities are Created Equal: A Dynamic Anonymization Framework for Privacy-Preserving Retrieval-Augmented Generation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"TRIP-RAG dynamically selects only high-risk entities for anonymization in RAG knowledge bases by scoring marginal privacy risk, knowledge divergence, and topical relevance.","cross_cats":[],"primary_cat":"cs.CR","authors_text":"Enye Wang, Guo Jia, Qingkai Zeng, Ruijie Wang, Ruiqi He, Xinyuan Zhu, Zekun Fei, Zheli Liu","submitted_at":"2026-03-27T05:03:24Z","abstract_excerpt":"Retrieval-Augmented Generation (RAG) enhances the utility of Large Language Models (LLMs) by retrieving external documents. Since the knowledge databases in RAG are predominantly utilized via cloud services, private data in sensitive domains such as finance and healthcare faces the risk of personal information leakage. Thus, effectively anonymizing knowledge bases is crucial for privacy preservation. Existing studies equate the privacy risk of text to the linear superposition of the privacy risks of individual, isolated sensitive entities. The \"one-size-fits-all\" full processing of all sensiti"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"While maintaining privacy protection comparable to full anonymization, TRIP-RAG's Recall@k decreases by less than 35% compared to the original data, and the generation quality improves by up to 56% over existing baselines; theoretical analysis and experiments indicate it can effectively reduce context inference risks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the three context-aware metrics (marginal privacy risk, knowledge divergence, topical relevance) can be reliably quantified and combined to correctly identify only the truly high-risk entities without missing leaks or over-anonymizing useful content.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"TRIP-RAG dynamically anonymizes only high-risk entities in RAG knowledge bases via three context-aware metrics, achieving privacy comparable to full anonymization with under 35% recall drop and up to 56% better generation quality than baselines.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"TRIP-RAG dynamically selects only high-risk entities for anonymization in RAG knowledge bases by scoring marginal privacy risk, knowledge divergence, and topical relevance.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8f0f43bc247de2bb515e87e1521a59fa8515d17c355c63c33495e065276e9dd4"},"source":{"id":"2603.26074","kind":"arxiv","version":2},"verdict":{"id":"f331b49e-503a-4f51-b85a-b2a0561064c9","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T23:25:05.693032Z","strongest_claim":"While maintaining privacy protection comparable to full anonymization, TRIP-RAG's Recall@k decreases by less than 35% compared to the original data, and the generation quality improves by up to 56% over existing baselines; theoretical analysis and experiments indicate it can effectively reduce context inference risks.","one_line_summary":"TRIP-RAG dynamically anonymizes only high-risk entities in RAG knowledge bases via three context-aware metrics, achieving privacy comparable to full anonymization with under 35% recall drop and up to 56% better generation quality than baselines.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the three context-aware metrics (marginal privacy risk, knowledge divergence, topical relevance) can be reliably quantified and combined to correctly identify only the truly high-risk entities without missing leaks or over-anonymizing useful content.","pith_extraction_headline":"TRIP-RAG dynamically selects only high-risk entities for anonymization in RAG knowledge bases by scoring marginal privacy risk, knowledge divergence, and topical relevance."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2603.26074","created_at":"2026-05-18T02:44:30.631501+00:00"},{"alias_kind":"arxiv_version","alias_value":"2603.26074v2","created_at":"2026-05-18T02:44:30.631501+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2603.26074","created_at":"2026-05-18T02:44:30.631501+00:00"},{"alias_kind":"pith_short_12","alias_value":"U5D6HC2VS2GC","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"U5D6HC2VS2GCHQZB","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"U5D6HC2V","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN","json":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN.json","graph_json":"https://pith.science/api/pith-number/U5D6HC2VS2GCHQZBKOXHVRDGSN/graph.json","events_json":"https://pith.science/api/pith-number/U5D6HC2VS2GCHQZBKOXHVRDGSN/events.json","paper":"https://pith.science/paper/U5D6HC2V"},"agent_actions":{"view_html":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN","download_json":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN.json","view_paper":"https://pith.science/paper/U5D6HC2V","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2603.26074&json=true","fetch_graph":"https://pith.science/api/pith-number/U5D6HC2VS2GCHQZBKOXHVRDGSN/graph.json","fetch_events":"https://pith.science/api/pith-number/U5D6HC2VS2GCHQZBKOXHVRDGSN/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN/action/timestamp_anchor","attest_storage":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN/action/storage_attestation","attest_author":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN/action/author_attestation","sign_citation":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN/action/citation_signature","submit_replication":"https://pith.science/pith/U5D6HC2VS2GCHQZBKOXHVRDGSN/action/replication_record"}},"created_at":"2026-05-18T02:44:30.631501+00:00","updated_at":"2026-05-18T02:44:30.631501+00:00"}