arxiv: 2602.16571 · v2 · submitted 2026-02-18 · 💻 cs.CL

Recognition: no theorem link

Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset

Zhuqian Zhou , Kirk Vanacore , Bakhtawar Ahtisham , Jinsook Lee , Doug Pietrzak , Daryl Hedley , Jorge Dias , Chris Shaw

show 2 more authors

Ruth Sch\"afer Ren\'e F. Kizilcec

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:12 UTC · model grok-4.3

classification 💻 cs.CL

keywords PII detectionde-identificationmath tutoringnumeric ambiguityLLM promptingbenchmark dataseteducational datautility preservation

0 comments

The pith

Domain-aware LLM prompting detects PII in math tutoring dialogues while preserving instructional numbers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In mathematics tutoring transcripts, numeric expressions often resemble personal identifiers such as dates or IDs, causing generic de-identification tools to remove core instructional content and reduce data utility for research. The paper introduces the MathEd-PII benchmark dataset, constructed via human-in-the-loop LLM annotation, to study this numeric ambiguity. Density-based segmentation shows that erroneous redactions cluster in math-dense regions. When LLM-based detectors receive basic, math-aware, or segment-aware prompts, performance rises from a baseline F1 of 0.379 to 0.802 and 0.821 respectively, with fewer false removals of numbers. The results establish that domain context must be incorporated into detection methods to maintain analytic value when sharing tutoring data at scale.

Core claim

Generic PII detection systems over-redact numeric expressions in math tutoring dialogues due to ambiguity with structured identifiers, but domain-aware prompting strategies for LLMs, including math-aware and segment-aware variants, substantially improve detection accuracy on the new MathEd-PII benchmark dataset while reducing numeric false positives and thereby preserving educational utility.

What carries the argument

Density-based segmentation to locate math-dense regions, paired with math-aware and segment-aware prompting of LLMs to distinguish instructional numbers from PII.

If this is right

Generic PII detectors are inadequate for domain-specific educational dialogues because they cannot resolve numeric ambiguity.
Domain-aware methods enable larger-scale sharing of de-identified math tutoring data without destroying core instructional content.
Segment-aware prompting delivers the highest accuracy by incorporating local density information during detection.
Human-in-the-loop annotation offers a practical route to building reliable domain-specific PII benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same numeric ambiguity problem likely appears in other quantitative tutoring domains such as physics or chemistry dialogues.
Direct integration of mathematical expression parsers into detection pipelines could reduce false positives beyond what prompting alone achieves.
These techniques could support standardized privacy practices for releasing learning analytics datasets from schools and platforms.
Testing the prompting strategies on real-time live tutoring sessions rather than transcripts would reveal additional practical constraints.

Load-bearing premise

The human-in-the-loop LLM annotation process yields reliable ground-truth PII labels that generalize to unseen math tutoring dialogues without systematic bias in numeric patterns.

What would settle it

Re-annotating a fresh held-out collection of math tutoring transcripts with independent human reviewers and measuring whether segment-aware prompting still achieves F1 above 0.8 with low numeric false positives.

Figures

Figures reproduced from arXiv: 2602.16571 by Bakhtawar Ahtisham, Chris Shaw, Daryl Hedley, Doug Pietrzak, Jinsook Lee, Jorge Dias, Kirk Vanacore, Ren\'e F. Kizilcec, Ruth Sch\"afer, Zhuqian Zhou.

**Figure 1.** Figure 1: Distribution of Original PII Redactions from the Upstream System (Presidio and Customized Rules) across Different [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Math Segmentation. Figure 2a visualizes how these proportions vary across the threshold space. Across all configurations, the false positive proportion remains consistently higher than the true positive proportion. As Tsim increases from 0.1 to 0.3, false positive capture declines gradually, while for Tsim ≥ 0.3 it stabilizes at approximately 50%–55% across anchor thresholds. In contrast, the true positi… view at source ↗

read the original abstract

Large-scale sharing of dialogue data is key to advancing the science of teaching and learning, yet rigorous de-identification remains a major barrier. In mathematics tutoring transcripts, numeric expressions frequently resemble structured identifiers (e.g., dates or IDs), leading generic Personally Identifiable Information (PII) detection systems to over-redact core instructional content and reduce data utility. This work asks how to detect PII while preserving educational utility, focusing on this "numeric ambiguity" problem. We introduce MathEd-PII, the first benchmark dataset for PII detection in math tutoring dialogues, built with human-in-the-loop LLM annotation. Using density-based segmentation, we show that false PII redactions cluster in math-dense regions, confirming numeric ambiguity as a key failure mode. We then compare four detection strategies: a Presidio baseline and three LLM-based approaches with basic, math-aware, and segment-aware prompting. Domain-aware prompting, including both math-aware (F1: 0.802) and segment-aware versions (F1: 0.821), substantially outperforms the baseline (F1: 0.379) while reducing numeric false positives, demonstrating that de-identification must incorporate domain context to preserve analytic utility. This work provides a new benchmark and evidence that utility-preserving de-identification for tutoring data requires domain-aware modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces the first benchmark for PII detection in math tutoring transcripts and shows domain-aware prompting lifts F1 from 0.38 to 0.82, but the gains rest on LLM-assisted labels whose reliability is not yet shown.

read the letter

The core contribution here is a new dataset called MathEd-PII plus a clear demonstration that generic PII tools over-redact numbers in tutoring dialogues. The authors segment the transcripts by math density and find that false positives cluster there, which matches what anyone who has looked at these logs would expect. They then test basic, math-aware, and segment-aware prompts against a Presidio baseline and report F1 scores rising from 0.379 to 0.802 and 0.821. That gap is large enough to matter for anyone trying to release tutoring data at scale.

Referee Report

2 major / 1 minor

Summary. The paper introduces MathEd-PII, the first benchmark dataset for PII detection in mathematics tutoring dialogues, constructed via human-in-the-loop LLM annotation. It uses density-based segmentation to demonstrate that false PII redactions cluster in math-dense regions, confirming numeric ambiguity as a failure mode for generic detectors. The work then evaluates four strategies—Presidio baseline plus three LLM prompting variants (basic, math-aware, segment-aware)—and reports that domain-aware prompting yields substantial gains (math-aware F1 0.802, segment-aware F1 0.821) over the baseline (F1 0.379) while reducing numeric false positives, arguing that utility-preserving de-identification requires domain context.

Significance. If the empirical results hold under proper validation, the paper supplies a needed domain-specific benchmark and concrete evidence that generic PII tools over-redact instructional content in math dialogues. This could directly support safer large-scale sharing of tutoring transcripts for learning-science research while preserving analytic utility.

major comments (2)

[Abstract] Abstract: the headline F1 gains (0.802 and 0.821 vs. 0.379) are presented without any mention of dataset size, inter-annotator agreement, statistical significance tests, or error analysis on numeric false positives; these omissions make it impossible to assess whether the reported outperformance is robust or merely an artifact of the annotation process.
[Dataset construction] Dataset construction section: the human-in-the-loop LLM annotation used to create MathEd-PII ground truth introduces a circularity risk because the same class of models is later employed for detection; without a purely human baseline, IAA metrics, or targeted error analysis on ambiguous numeric expressions, the claimed reduction in numeric false positives cannot be confidently attributed to the prompting strategies rather than annotation bias.

minor comments (1)

[Abstract] Abstract: the phrase 'density-based segmentation' is used without a one-sentence definition or citation, which may hinder readers who are not already familiar with the technique.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments, which highlight important aspects of robustness and potential biases in our evaluation. We address each major comment below and have made targeted revisions to the manuscript to improve transparency and strengthen the claims.

read point-by-point responses

Referee: [Abstract] Abstract: the headline F1 gains (0.802 and 0.821 vs. 0.379) are presented without any mention of dataset size, inter-annotator agreement, statistical significance tests, or error analysis on numeric false positives; these omissions make it impossible to assess whether the reported outperformance is robust or merely an artifact of the annotation process.

Authors: We agree that the abstract should provide sufficient context for readers to evaluate result robustness. The full manuscript already reports dataset size, inter-annotator agreement, significance testing, and numeric error analysis in the Dataset Construction and Results sections. We have revised the abstract to explicitly include dataset size, IAA metrics, and a brief reference to the error analysis on numeric false positives, while preserving the required length constraints. revision: yes
Referee: [Dataset construction] Dataset construction section: the human-in-the-loop LLM annotation used to create MathEd-PII ground truth introduces a circularity risk because the same class of models is later employed for detection; without a purely human baseline, IAA metrics, or targeted error analysis on ambiguous numeric expressions, the claimed reduction in numeric false positives cannot be confidently attributed to the prompting strategies rather than annotation bias.

Authors: We acknowledge the circularity concern inherent to LLM-assisted annotation. The process was strictly human-in-the-loop, with human annotators reviewing, correcting, and finalizing all labels; IAA metrics are reported in the Dataset Construction section to quantify annotator reliability. We have expanded the targeted error analysis on ambiguous numeric expressions in the revised Results section to better attribute performance gains to the prompting strategies. A purely human baseline at this scale was not feasible due to annotation cost, but the independent density-based segmentation analysis (showing false-positive clustering in math-dense regions) provides supporting evidence independent of the detection models. revision: partial

standing simulated objections not resolved

A complete purely human-annotated baseline for the full MathEd-PII dataset is not available and would require substantial additional resources beyond the current study.

Circularity Check

0 steps flagged

No circularity: empirical benchmark with direct F1 measurements

full rationale

The paper introduces the MathEd-PII benchmark via human-in-the-loop LLM annotation and reports direct empirical F1 scores (baseline 0.379, math-aware 0.802, segment-aware 0.821) for prompting strategies on numeric ambiguity detection. No equations, parameter fits, derivations, or self-citations appear in the provided text that reduce any claimed result to its own inputs by construction. The performance numbers are straightforward held-out measurements rather than quantities defined or predicted from the authors' prior work, satisfying the self-contained empirical standard with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions about PII definitions and LLM annotation quality rather than new free parameters or invented entities.

axioms (1)

domain assumption Human-in-the-loop LLM annotation produces sufficiently accurate PII labels for benchmarking
Invoked when constructing the MathEd-PII dataset

pith-pipeline@v0.9.0 · 5582 in / 1310 out tokens · 43173 ms · 2026-05-15T21:12:34.060180+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 2 internal anchors

[1]

Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset

INTRODUCTION The Educational Data Mining (EDM) community has long relied on large-scale open source data sets from digital learn- ing platforms [14, 18]. Many of these data sources can be easily de-identified because they are composed of action logs that, for the most part, do not typically contain person- ally identifiable information (PII). As more and ...

work page internal anchor Pith review Pith/arXiv arXiv 2000
[2]

sufficient

RELATED WORK 2.1 Policy Context De-identification is a prerequisite for sharing educational in- teraction data at scale, but what counts as “sufficient” de- identification depends on the legal regime and the assumed 2The benchmark dataset will be available following review. adversary model. In the United States, the Children’s On- line Privacy Protection ...

work page
[3]

Phase 1: Dataset Preparationinvolves constructing MathEd- PII, a benchmark dataset for PII detection in math tutoring context

OVERVIEW OF THE CURRENT RE- SEARCH To answer the research questions outlined above, we follow a three-phase research workflow moving from dataset con- struction to analytical validation and finally to evaluation of domain-aware de-identification strategies. Phase 1: Dataset Preparationinvolves constructing MathEd- PII, a benchmark dataset for PII detectio...

work page
[4]

PII”, “Not PII

PHASE 1: DATASET PREPARATION Reference datasets for PII detection in math education are currently lacking. To enable rigorous evaluation, we con- structed a benchmark dataset,MathEd-PII, from a PII- redacted large corpus. 4.1 Source Corpus Our source corpus comprises 1,000 math tutoring sessions (115,620 messages; 769,628 tokens) from a U.S.-based tutor- ...

work page
[5]

Use conversions to solve multi-step real-world problems

PHASE 2: MATH SEGMENTATION AND NUMERIC AMBIGUITY 5https://anonymized for blind review Table 2: PII Statistics Comparison between Source Corpus and MathEd-PII (ordered by the most common PII to the least in MathEd-PII) Category Source Corpus MathEd-PII Transcripts 1,000 1,000 Messages 115,620 115,620 PII Labels (Total) 5,263 1,995 PERSON 1,915 1,424 URL 24...

work page
[6]

We utilized its default analyzer which orchestrates a set of predefined recognizers

PHASE 3: PII DETECTION AND EV AL- UATION 6.1 PII Detection Methods 6.1.1 Baseline: Microsoft Presidio As a baseline, we deployed Microsoft Presidio (v2.2), an industry-standard open-source software development kit (SDK) for PII detection. We utilized its default analyzer which orchestrates a set of predefined recognizers. For high- structure entities (e.g...

work page
[7]

utility-preserving privacy

DISCUSSION AND CONCLUSION This study investigated the challenge of utility-preserving de-identification in the context of math tutoring transcripts, focusing on the phenomenon of numeric ambiguity. By in- troducing MathEd-PII, the first benchmark dataset for this domain, we provided a rigorous foundation for evaluating PII detection methods that balance p...

work page
[8]

Caines, H

A. Caines, H. Yannakoudakis, H. Allen, P. P´ erez-Paredes, B. Byrne, and P. Buttery. The teacher-student chatroom corpus version 2: more lessons, new annotation, automatic detection of sequence shifts. In D. Alfter, E. Volodina, T. Fran¸ cois, P. Desmet, F. Cornillie, A. J ¨onsson, and E. Rennes, editors,Proceedings of the 11th Workshop on NLP for Compute...

work page 2022
[9]

D. S. Carrell, B. Malin, J. Aberdeen, S. Bayer, and C. Clark. Hiding in plain sight: Use of realistic surrogates to reduce exposure of protected health information in clinical text.Journal of the American Medical Informatics Association, pages 342–348, 2013

work page 2013
[10]

Deacon and G

G. Deacon and G. Chojnacki. Impacts of upchieve on-demand tutoring on students’ math knowledge and perceptions. middle years math grantee report series. Mathematica, 2023

work page 2023
[11]

Children’s online privacy protection act (coppa) guidance

Federal Trade Commission. Children’s online privacy protection act (coppa) guidance. https://www.ftc.gov/business-guidance/resource s/complying-coppa-frequently-asked-questions,

work page
[12]

Accessed: 2026-02-10

work page 2026
[13]

Galley, K

M. Galley, K. McKeown, J. Hirschberg, and E. Shriberg. Discourse segmentation of multi-party conversation. InProceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), pages 562–569. Association for Computational Linguistics, 2003

work page 2003
[14]

S. L. Garfinkel. De-identification of personal information. NIST Interagency Report 8053, National Institute of Standards and Technology, Oct. 2015

work page 2015
[15]

S. L. Garfinkel. De-identifying government data sets: Techniques and governance. NIST Special Publication 800-188, National Institute of Standards and Technology, Sept. 2023

work page 2023
[16]

Henkel, H

O. Henkel, H. Horne-Robinson, N. Kozhakhmetova, and A. Lee. Effective and scalable math support: Evidence on the impact of an ai-tutor on math achievement in ghana.arXiv preprint arXiv:2402.09809, 2024

work page arXiv 2024
[17]

Holmes, S

L. Holmes, S. Crossley, N. Hayes, D. Kuehl, A. Trumbore, and G. Gutu-Robu. De-identification of student writing in technologically mediated educational settings. InPolyphonic Construction of Smart Learning Ecosystems: Improving Inclusive Digital Education, pages 177–189, Singapore, 2023. Springer Nature Singapore

work page 2023
[18]

Holmes, S

L. Holmes, S. Crossley, H. Sikka, and W. Morris. Piilo: an open-source system for personally identifiable information labeling and obfuscation.Information and Learning Sciences, 124(9-10):266–284, Oct. 2023

work page 2023
[19]

Holmes, S

L. Holmes, S. Crossley, J. Wang, and W. Zhang. The cleaned repository of annotated personally identifiable information. In P. Benjamin and D. E. Carrie, editors, Proceedings of the 17th International Conference on Educational Data Mining, pages 790–796, Atlanta, Georgia, USA, July 2024

work page 2024
[20]

Honnibal, I

M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd. spacy: Industrial-strength natural language processing in python. https://doi.org/10.5281/zenodo.1212303, 2020

work page doi:10.5281/zenodo.1212303 2020
[21]

Bidirectional LSTM-CRF Models for Sequence Tagging

Z. Huang, W. Xu, and K. Yu. Bidirectional LSTM-CRF models for sequence tagging.arXiv preprint arXiv: 1508.01991, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

K. R. Koedinger, R. S. Baker, K. Cunningham, A. Skogsholm, B. Leber, and J. Stamper. A data repository for the edm community: The pslc datashop. Handbook of educational data mining, 43:43–56, 2010

work page 2010
[23]

Learning components

Learning Commons Initiative. Learning components. https://docs.learningcommons.org/knowledge-gra ph/entity-and-relationship-reference/learnin g-components, 2024. Accessed: 2026-02-05

work page 2024
[24]

Presidio: Data protection and de-identification sdk

Microsoft. Presidio: Data protection and de-identification sdk. https://github.com/microsoft/presidio, 2020. GitHub repository. Accessed 2026-01-20

work page 2020
[25]

presidio-research: Research utilities for Presidio (including synthetic text generation)

Microsoft. presidio-research: Research utilities for Presidio (including synthetic text generation). https://github.com/microsoft/presidio-research,

work page
[26]

Accessed 2026-01-20

GitHub repository. Accessed 2026-01-20

work page 2026
[27]

M. C. Mihaescu and P. S. Popescu. Review on publicly available datasets for educational data mining.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(3):e1403, 2021

work page 2021
[28]

Mishra, H

K. Mishra, H. Pagare, and K. Sharma. A hybrid rule-based NLP and machine learning approach for PII detection and anonymization in financial documents.Scientific Reports, 15(1):22729, 2025

work page 2025
[29]

Misra, F

H. Misra, F. Yvon, O. Capp´ e, and C. Jose. Text segmentation: A topic modeling perspective. Information Processing & Management, 47(4):528–544, 2011

work page 2011
[30]

Neamatullah, M

I. Neamatullah, M. M. Douglass, L.-W. H. Lehman, A. Reisner, M. Villarroel, W. J. Long, P. Szolovits, G. B. Moody, R. G. Mark, and G. D. Clifford. Automated de-identification of free-text medical records.BMC Medical Informatics and Decision Making, 8(32), 2008

work page 2008
[31]

Prihar, A

E. Prihar, A. Moore, and N. Heffernan. Identifying explanations within student-tutor chat logs. In Proceedings of the 15th International Conference on Educational Data Mining, page 773, 2022

work page 2022
[32]

Savkin, T

M. Savkin, T. Ionov, and V. Konovalov. SPY: Enhancing privacy with synthetic PII detection dataset. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 236–246. Association for Computational Linguistics, 2025

work page 2025
[33]

Y. Shen, Z. Ji, J. Lin, and K. R. Koedinger. Enhancing the de-identification of personally identifiable information in educational data.arXiv preprint arXiv: 2501.09765, 2025

work page arXiv 2025
[34]

Singhal, J

K. Singhal, J. Zambrano, L. Pankiewicz, and R. Baker. Educational data de-identification with large language models. InProceedings of the 17th International Conference on Educational Data Mining (EDM), pages 559–565, 2024

work page 2024
[35]

khanmigo

N. Slijepcevic and A. Yaylali. Leveraging “khanmigo” generative ai-powered tool for personalized tutoring to learn scientific concepts.Journal of Teaching and Learning, 19(4):155–178, 2025

work page 2025
[36]

Stubbs and ¨O

A. Stubbs and ¨O. Uzuner. Automated systems for the de-identification of longitudinal clinical narratives: Overview of the 2014 i2b2/uthealth shared task track 1.Journal of Biomedical Informatics, 58:S11–S19, 2015

work page 2014
[37]

R. Tang, X. Han, X. Jiang, and X. Hu. Does synthetic data generation of LLMs help clinical text mining? arXiv preprint arXiv: 2303.04360, 2023

work page arXiv 2023
[38]

Department of Education, Privacy Technical Assistance Center (PTAC)

U.S. Department of Education, Privacy Technical Assistance Center (PTAC). Data de-identification: An overview of basic terms.https://studentprivacy.e d.gov/sites/default/files/resource_document/fi le/data_deidentification_terms_0.pdf, 2012. Updated May 2013. Accessed 2026-01-20

work page 2012
[39]

Department of Health and Human Services, Office for Civil Rights

U.S. Department of Health and Human Services, Office for Civil Rights. Guidance regarding methods for de-identification of protected health information in accordance with the HIPAA privacy rule. https://www.hhs.gov/sites/default/files/ocr/pr ivacy/hipaa/understanding/coveredentities/De-i dentification/hhs_deid_guidance.pdf, 2012. Accessed 2026-01-20

work page 2012
[40]

Zambrano, K

J. Zambrano, K. Singhal, L. Pankiewicz, R. Baker, L. Porter, and L. Liu. De-identifying student personally identifying information in discussion forum posts with large language models.Information and Learning Sciences, 126(5/6):401–424, 2025

work page 2025
[41]

M. Zent, D. Smith, and S. Woodhead. PIIvot: A lightweight NLP anonymization framework for question-anchored tutoring dialogues.arXiv, 2025

work page 2025
[42]

surrogates

APPENDIX 1: THE PROMPT USED FOR PII QUALITY EV ALUATION AND SUR- ROGATE GENERATION Role: You are a Senior PII (Personally Identifi- able Information) Analyst and Data Sanitization Expert specializing in math tutoring transcripts. Objective: Analyze transcripts to identify unredacted PII, validate existing redactions, and generate high-quality, context-awa...

work page
[43]

Some PII has been redacted

Detection: Scan each message for PII from the taxonomy. Some PII has been redacted. Some has not. For every PII instance, identify which PII type in the taxonomy it belongs to

work page
[44]

Label as ”PII”, ”Not PII”, or ”Uncertain”

Evaluation: For every PII instance (pre- redacted or newly found), use at least a window of 3 messages above and 3 messages below to deter- mine if the tag is valid. Label as ”PII”, ”Not PII”, or ”Uncertain”

work page
[45]

Redaction: If unredacted PII is found, provide the message with the PII replaced by the tag (e.g., <PERSON>)

work page
[46]

If ”PII” or ”Uncertain”: Generate a specific, realistic surrogate that fit the PII type (e.g., re- place<SCHOOL>with ”Northview High”, not ”the school”)

Surrogation: 4.1. If ”PII” or ”Uncertain”: Generate a specific, realistic surrogate that fit the PII type (e.g., re- place<SCHOOL>with ”Northview High”, not ”the school”). Keep the entity name consistent in a transcript. Meanwhile, do not reuse the same names or places across the transcript. If the origi- nal PII is know, the generated surrogate should be...

work page
[47]

pii type: The identified category from the 17 types of each message containing PII

work page
[48]

ai redacted content: The message with <PII TYPE>(only for newly discovered PII; oth- erwise leave blank)

work page
[49]

pii evaluation: ”PII”, ”Not PII”, or ”Uncertain”

work page
[50]

surrogate: The specific replacement value for the tag

work page
[51]

APPENDIX 2: MATH VOCABULARY The following vocabulary list, categorized by mathematical domain and grade level, was used to calculate the Math Density (Dmath) of messages in Phase 2. Operations: operation, add, addition, adding, sum, total, plus, subtract, subtraction, subtracting, minus, difference, multiply, multiplication, multiplying, times, product, d...

work page
[52]

We conducted a grid search to examine the sensitivity of math- segmentation outcomes to two parameters: theAnchor Threshold(T anchor) and theSimilarity Threshold(T sim)

APPENDIX 3: SEGMENTATION OPTI- MIZATION This appendix documents the procedure and results of the threshold optimization used for math segmentation. We conducted a grid search to examine the sensitivity of math- segmentation outcomes to two parameters: theAnchor Threshold(T anchor) and theSimilarity Threshold(T sim). The goal of this analysis was to assess...

work page
[53]

Your task is to identify ALL PII in the provided message content

APPENDIX 4: THE BASIC PROMPT USED FOR PII DETECTION You are a specialist in PII (Personally Identifiable Information) detection. Your task is to identify ALL PII in the provided message content. PII Types to detect: AGE COURSE: must be a subject or its acronym with a multi-digit number, e.g., algebra 300, geometry 101, CS 503; only a subject name without ...

work page
[54]

Your task is to identify ALL PII in the provided message content that comes from math tutoring sessions

APPENDIX 5: THE MATH-A W ARE PROMPT USED FOR PII DETECTION You are a specialist in PII (Personally Identifiable Information) detection. Your task is to identify ALL PII in the provided message content that comes from math tutoring sessions. Pay attention that general math content should not be annotated as PII, e.g., math sub- jects, concepts, symbols, eq...

work page
[55]

math label

APPENDIX 6: THE SEGMENT-A W ARE PROMPT USED FOR PII DETECTION You are a specialist in PII (Personally Identifiable Information) detection. Your task is to identify ALL PII in the provided message content. If the message is likely to be about mathemat- ics, its “math label” field will have the value ”MATH”. Otherwise, the “math label” will be “NON-MATH”. N...

work page