{"paper":{"title":"Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Domain-aware LLM prompting detects PII in math tutoring dialogues while preserving instructional numbers.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Bakhtawar Ahtisham, Chris Shaw, Daryl Hedley, Doug Pietrzak, Jinsook Lee, Jorge Dias, Kirk Vanacore, Ren\\'e F. Kizilcec, Ruth Sch\\\"afer, Zhuqian Zhou","submitted_at":"2026-02-18T16:12:46Z","abstract_excerpt":"Large-scale sharing of dialogue data is key to advancing the science of teaching and learning, yet rigorous de-identification remains a major barrier. In mathematics tutoring transcripts, numeric expressions frequently resemble structured identifiers (e.g., dates or IDs), leading generic Personally Identifiable Information (PII) detection systems to over-redact core instructional content and reduce data utility. This work asks how to detect PII while preserving educational utility, focusing on this \"numeric ambiguity\" problem. We introduce MathEd-PII, the first benchmark dataset for PII detect"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Domain-aware prompting, including both math-aware (F1: 0.802) and segment-aware versions (F1: 0.821), substantially outperforms the baseline (F1: 0.379) while reducing numeric false positives.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the human-in-the-loop LLM annotation process produces reliable ground-truth PII labels that generalize to unseen math tutoring dialogues without introducing systematic bias toward certain numeric patterns.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"The MathEd-PII benchmark shows that math-aware and segment-aware LLM prompting raises PII detection F1 from 0.379 to 0.821 while cutting false redactions of instructional numbers.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Domain-aware LLM prompting detects PII in math tutoring dialogues while preserving instructional numbers.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"4263b66280bd753a995280dbc6f5fb7b5d13b58d288e87fa12b3997affb1b238"},"source":{"id":"2602.16571","kind":"arxiv","version":3},"verdict":{"id":"2739c85f-4ef1-4018-8163-80c0628bb2ba","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T21:12:34.060180Z","strongest_claim":"Domain-aware prompting, including both math-aware (F1: 0.802) and segment-aware versions (F1: 0.821), substantially outperforms the baseline (F1: 0.379) while reducing numeric false positives.","one_line_summary":"The MathEd-PII benchmark shows that math-aware and segment-aware LLM prompting raises PII detection F1 from 0.379 to 0.821 while cutting false redactions of instructional numbers.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the human-in-the-loop LLM annotation process produces reliable ground-truth PII labels that generalize to unseen math tutoring dialogues without introducing systematic bias toward certain numeric patterns.","pith_extraction_headline":"Domain-aware LLM prompting detects PII in math tutoring dialogues while preserving instructional numbers."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2602.16571/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}