Evaluating LLM-based Personal Information Extraction and Countermeasures
Pith reviewed 2026-05-23 22:05 UTC · model grok-4.3
The pith
Large language models extract personal information from public profiles more accurately than traditional methods, but prompt injection reduces their advantage.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
What carries the argument
Framework for LLM-based extraction attacks and prompt injection mitigation strategy, benchmarked on ten LLMs and five datasets including synthetic and manually labeled real-world ones.
If this is right
- Attackers obtain a stronger tool for large-scale personal information gathering that supports targeted attacks such as spear phishing.
- Traditional extraction techniques prove insufficient when facing capable LLMs.
- Prompt injection serves as a deployable defense that removes the performance edge of LLM attacks.
- Results hold across a synthetic GPT-4 dataset and three real-world labeled datasets covering eight categories of personal information.
Where Pith is reading between the lines
- Platforms that host public profiles may need to explore automated text modifications as a standard privacy layer.
- The same prompt injection tactic could be adapted to limit LLM processing in other user-content scenarios.
- Attackers could experiment with varied prompt formats, so the defense requires repeated testing against new models.
Load-bearing premise
The manually labeled real-world datasets accurately represent the distribution and variety of personal information in actual public profiles, and the tested LLMs and prompt formats generalize to real attacker usage.
What would settle it
A test on a fresh collection of real profiles where LLM accuracy falls to or below traditional methods, or where prompt injection no longer limits LLM performance, would disprove the central claims.
Figures
read the original abstract
Automatically extracting personal information -- such as name, phone number, and email address -- from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods -- such as regular expression, keyword search, and entity detection -- achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts a systematic measurement study benchmarking LLM-based attacks for extracting eight categories of personal information (name, phone, email, etc.) from public profiles. It presents an attack framework, collects one GPT-4-generated synthetic dataset plus three manually labeled real-world datasets, proposes prompt injection as a novel mitigation, and evaluates ten LLMs against traditional baselines (regex, keyword search, entity detection), claiming LLMs achieve higher accuracy, outperform baselines, and that prompt injection reduces LLM attacks to the effectiveness of traditional methods.
Significance. If the datasets prove representative and results generalize beyond the tested profiles and models, the work supplies concrete empirical data on LLM misuse for privacy attacks and a deployable defense, informing both attacker capabilities and platform countermeasures in security research.
major comments (1)
- [Dataset section] Dataset section: the three manually labeled real-world datasets lack any reported inter-annotator agreement, sampling methodology across platforms, or validation that the eight-category label distribution matches broader public-profile statistics. These omissions are load-bearing for the central claims of LLM outperformance and prompt-injection effectiveness, because labeling noise or sampling bias could produce the observed results as artifacts of the evaluation set rather than intrinsic properties.
minor comments (1)
- [Abstract] Abstract: states collection of 'four datasets' but then reports benchmarking 'using ten LLMs and five datasets'; the inconsistency should be corrected for clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the major comment on the dataset section below.
read point-by-point responses
-
Referee: [Dataset section] Dataset section: the three manually labeled real-world datasets lack any reported inter-annotator agreement, sampling methodology across platforms, or validation that the eight-category label distribution matches broader public-profile statistics. These omissions are load-bearing for the central claims of LLM outperformance and prompt-injection effectiveness, because labeling noise or sampling bias could produce the observed results as artifacts of the evaluation set rather than intrinsic properties.
Authors: We agree that these details are important to include. In the revised manuscript we will report inter-annotator agreement (e.g., Cohen's kappa) for the manual labeling of the three real-world datasets, describe the sampling methodology used across platforms, and add a comparison of the observed eight-category label distributions against available public-profile statistics (or note limitations where such benchmarks are unavailable). These additions will directly address potential concerns about labeling noise or sampling bias. revision: yes
Circularity Check
Empirical benchmark study with no derivations or self-referential fitting
full rationale
This is a measurement study that collects four datasets (one synthetic via GPT-4, three manually labeled real-world), benchmarks ten LLMs against regex/keyword/entity baselines on eight personal-information categories, and evaluates a prompt-injection mitigation. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. Results rest on external models, external datasets, and direct comparisons rather than internal definitions or author-prior uniqueness theorems, so the evaluation chain is self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Manual labeling of eight categories of personal information in real-world datasets is accurate and unbiased.
Forward citations
Cited by 1 Pith paper
-
Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents
LLM agents can reconstruct high-fidelity personal profiles from minimal PII seeds with over 90% accuracy in under 10 minutes at less than $3 cost, exposing three escalating tiers of privacy risks.
Reference graph
Works this paper leans on
-
[1]
spaCy: Industrial-strength NLP. https://github. com/explosion/spaCy, 2019
work page 2019
-
[2]
https://github.com/lorey/mlscraper/ tree/master, 2020
mlscraper: Scrape data from HTML pages automat- ically. https://github.com/lorey/mlscraper/ tree/master, 2020
work page 2020
-
[3]
https://pypi.org/project/htmldocx/, 2021
htmldocx. https://pypi.org/project/htmldocx/, 2021
work page 2021
-
[4]
https://gist.github.com/olavarrieta/ 1761f4e3097a382f07a57795dc1eb8ce, 2023
Common regex used to extract data from Html. https://gist.github.com/olavarrieta/ 1761f4e3097a382f07a57795dc1eb8ce, 2023
work page 2023
-
[5]
https://the-decoder.com/gpt-4- architecture-datasets-costs-and-more-leaked, 2023
GPT-4 leaks. https://the-decoder.com/gpt-4- architecture-datasets-costs-and-more-leaked, 2023
work page 2023
-
[6]
https://github.com/jarrekk/imgkit, 2023
imgkit. https://github.com/jarrekk/imgkit, 2023
work page 2023
-
[7]
https://github.com/InternLM/ InternLM, 2023
Internlm. https://github.com/InternLM/ InternLM, 2023
work page 2023
-
[8]
https://pypi.org/project/ pyhtml2pdf/, 2023
pyhtml2pdf. https://pypi.org/project/ pyhtml2pdf/, 2023
work page 2023
-
[9]
https: //en.wikipedia.org/wiki/Category: 19th-century_American_physicians, 2024
19th-century American physicians. https: //en.wikipedia.org/wiki/Category: 19th-century_American_physicians, 2024
work page 2024
-
[10]
List of Top 100 Famous People. https://www. biographyonline.net/people/famous-100.html, 2024
work page 2024
-
[11]
https://github.com/ matthewwithanm/python-markdownify, 2024
python-markdownify. https://github.com/ matthewwithanm/python-markdownify, 2024
work page 2024
-
[12]
Fact-saboteurs: A taxonomy of evidence manipulation attacks against fact- verification systems
Sahar Abdelnabi and Mario Fritz. Fact-saboteurs: A taxonomy of evidence manipulation attacks against fact- verification systems. In USENIX Security, 2023
work page 2023
-
[13]
FLAIR: An easy-to-use framework for state-of-the-art NLP
Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland V ollgraf. FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL, 2019
work page 2019
-
[14]
Dai, Orhan Firat, Melvin John- son, Dmitry Lepikhin, Alexandre Passos, et al
Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin John- son, Dmitry Lepikhin, Alexandre Passos, et al. Palm 2 technical report. arXiv, 2023
work page 2023
-
[15]
Rat- gpt: Turning online llms into proxies for malware at- tacks
Mika Beckerich, Laura Plein, and Sergio Coronado. Rat- gpt: Turning online llms into proxies for malware at- tacks. arXiv, 2023
work page 2023
-
[16]
Mazal Bethany, Athanasios Galiopoulos, Emet Bethany, Mohammad Bahrami Karkevandi, Nishant Vishwamitra, and Peyman Najafirad. Large language model lateral spear phishing: A comparative study in large-scale orga- nizational settings. arXiv, 2024
work page 2024
-
[17]
Language models are few-shot learners
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. In NeurIPS, 2020
work page 2020
-
[18]
Sparks of artificial general intelligence: Early experi- ments with gpt-4
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experi- ments with gpt-4. arXiv, 2023
work page 2023
-
[19]
A llm assisted exploitation of ai- guardian
Nicholas Carlini. A llm assisted exploitation of ai- guardian. arXiv, 2023
work page 2023
-
[20]
Forbes: Five novel phishing tactics
Perry Carpenter. Forbes: Five novel phishing tactics. https://www.forbes.com/councils/forbesbusinesscouncil /2025/01/23/five-novel-phishing-tactics-to-beware-of- and-how-to-protect-your-company/, 2025
work page 2025
-
[21]
Can llm-generated misinfor- mation be detected? arXiv, 2023
Canyu Chen and Kai Shu. Can llm-generated misinfor- mation be detected? arXiv, 2023
work page 2023
-
[22]
BERT: Pre-training of deep bidi- rectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidi- rectional transformers for language understanding. In NAACL-HLT, 2019
work page 2019
-
[23]
On the effect of pretraining corpora on in- context learning by a large-scale language model
Shin et al. On the effect of pretraining corpora on in- context learning by a large-scale language model. In NAACL, 2022
work page 2022
-
[24]
Llama 2: Open foundation and fine-tuned chat models
Touvron et al. Llama 2: Open foundation and fine-tuned chat models. arXiv, 2023
work page 2023
-
[25]
Judging llm-as-a-judge with mt-bench and chatbot arena
Zheng et al. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv, 2023
work page 2023
-
[26]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. In AISec, 2023
work page 2023
-
[27]
A data- driven analysis of workers’ earnings on amazon mechan- ical turk
Kotaro Hara, Abi Adams, Kristy Milland, Saiph Sav- age, Chris Callison-Burch, and Jeffrey Bigham. A data- driven analysis of workers’ earnings on amazon mechan- ical turk. In CHI, 2018
work page 2018
-
[28]
Piilo: an open-source system for personally identifiable information labeling and obfus- cation
Langdon Holmes, Scott Crossley, Harshvardhan Sikka, and Wesley Morris. Piilo: an open-source system for personally identifiable information labeling and obfus- cation. Information and Learning Sciences, 2023
work page 2023
-
[29]
Microsoft: New Star Blizzard spear-phishing campaign targets WhatsApp accounts
Microsoft Threat Intelligence. Microsoft: New Star Blizzard spear-phishing campaign targets WhatsApp accounts. https://www.microsoft.com/en- us/security/blog/2025/01/16/new-star-blizzard-spear- phishing-campaign-targets-whatsapp-accounts/, 2025
work page 2025
-
[30]
Baseline defenses for adversarial attacks against aligned language models
Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models. arXiv, 2023
work page 2023
-
[31]
Tele- com fraud detection via hawkes-enhanced sequence model
Yan Jiang, Guannan Liu, Junjie Wu, and Hao Lin. Tele- com fraud detection via hawkes-enhanced sequence model. IEEE TKDE, 2023
work page 2023
-
[32]
Textwash – automated open-source text anonymisation
Bennett Kleinberg, Toby Davies, and Maximilian Mozes. Textwash – automated open-source text anonymisation. arXiv, 2022
work page 2022
-
[33]
ROUGE: A package for automatic eval- uation of summaries
Chin-Yew Lin. ROUGE: A package for automatic eval- uation of summaries. In Text Summarization Branches Out, 2004
work page 2004
-
[34]
Formalizing and benchmark- ing prompt injection attacks and defenses
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmark- ing prompt injection attacks and defenses. In USENIX Security, 2024
work page 2024
-
[35]
Ana- lyzing leakage of personally identifiable information in language models
Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Ana- lyzing leakage of personally identifiable information in language models. In IEEE S&P, 2023
work page 2023
-
[36]
Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettle- moyer. Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022
work page 2022
-
[37]
Prompting with pseudo-code instructions
Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V au2, Danish Contractor, and Srikanth Tamil- selvam. Prompting with pseudo-code instructions. arXiv, 2023
work page 2023
-
[38]
Maximilian Mozes, Xuanli He, Bennett Kleinberg, and Lewis D. Griffin. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv, 2023
work page 2023
-
[39]
PII-compass: Guiding LLM training data extraction prompts towards the target PII via grounding
Krishna Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, and Xuebing Zhou. PII-compass: Guiding LLM training data extraction prompts towards the target PII via grounding. In PrivNLP, 2024
work page 2024
-
[40]
OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[41]
Misinformation in the Age of AI
Merav Ozair. Misinformation in the Age of AI. https://www.nasdaq.com/articles/misinformation-in- the-age-of-artificial-intelligence-and-what-it-means- for-the-markets, 2023
work page 2023
-
[42]
The empirical impact of data sanitization on language models
Anwesan Pal, Radhika Bhargava, Kyle Hinsz, Jacques Esterhuizen, and Sudipta Bhattacharya. The empirical impact of data sanitization on language models. arXiv, 2024
work page 2024
-
[43]
On the risk of misinformation pollution with large language models
Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Wang. On the risk of misinformation pollution with large language models. In EMNLP Findings, 2023
work page 2023
-
[44]
Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal
Ashwinee Panda, Christopher A. Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal. Teach LLMs to phish: Stealing private information from language models. In ICLR, 2024
work page 2024
-
[45]
Constantinos Patsakis and Nikolaos Lykousas. Man vs the machine in the struggle for effective text anonymi- sation in the age of large language models. Scientific Reports, 2023
work page 2023
-
[46]
Data quality of platforms and panels for online behavioral research
Eyal Peer, David Rothschild, Andrew Gordon, Zak Ev- ernden, and Ekaterina Damer. Data quality of platforms and panels for online behavioral research. Behavior Research Methods, 2022
work page 2022
-
[47]
Jatmo: Prompt injection defense by task-specific finetuning
Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner. Jatmo: Prompt injection defense by task-specific finetuning. arXiv, 2024
work page 2024
-
[48]
Ildikó Pilán, Pierre Lison, Lilja Øvrelid, Anthi Pa- padopoulou, David Sánchez, and Montserrat Batet. The text anonymization benchmark (tab): A dedicated cor- pus and evaluation framework for text anonymization. Computational Linguistics, 2022
work page 2022
-
[49]
Attia Qammar, Hongmei Wang, Jianguo Ding, Abde- nacer Naouri, Mahmoud Daneshmand, and Huansheng Ning. Chatbots to chatgpt in a cybersecurity space: Evo- lution, vulnerabilities, attacks, challenges, and future recommendations. arXiv, 2023
work page 2023
-
[50]
Llm driven web profile extraction for identical names
Prateek Sancheti, Kamalakar Karlapalem, and Kavita Vemuri. Llm driven web profile extraction for identical names. In WWW, 2024
work page 2024
-
[51]
Digital deception: Generative artificial intelligence in social engineering and phishing
Marc Schmitt and Ivan Flechais. Digital deception: Generative artificial intelligence in social engineering and phishing. arXiv, 2023
work page 2023
-
[52]
Beyond memorization: Violating privacy via inference with large language models
Robin Staab, Mark Vero, Mislav Balunovi´c, and Martin Vechev. Beyond memorization: Violating privacy via inference with large language models. In ICLR, 2024
work page 2024
-
[53]
Xuchen Suo. Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applica- tions. arXiv, 2024
work page 2024
-
[54]
Context-tuning: Learning contextualized prompts for natural language generation
Tianyi Tang, Junyi Li, Wayne Xin Zhao, and Ji-Rong Wen. Context-tuning: Learning contextualized prompts for natural language generation. In ICCL, 2022
work page 2022
-
[55]
Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Gar- cia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Sia- mak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, and Donald Metzler. Ul2: Unifying language learning paradigms. In ICLR, 2023
work page 2023
-
[56]
Gemini: A family of highly capable mul- timodal models
Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, et al. Gemini: A family of highly capable mul- timodal models. arXiv, 2023
work page 2023
-
[57]
Users really do answer telephone scams
Huahong Tu, Adam Doupé, Ziming Zhao, and Gail- Joon Ahn. Users really do answer telephone scams. In USENIX Security, 2019
work page 2019
-
[58]
Finetuned language models are zero- shot learners
Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero- shot learners. In ICLR, 2022
work page 2022
-
[59]
Jules White, Quchen Fu, Sam Hays, Michael Sand- born, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. A prompt pat- tern catalog to enhance prompt engineering with chatgpt. arXiv, 2023
work page 2023
-
[60]
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. In ICLR, 2020
work page 2020
-
[61]
Jiawei Zhou, Yixuan Zhang, Qianni Luo, Andrea G Parker, and Munmun De Choudhury. Synthetic lies: Understanding ai-generated misinformation and evalu- ating algorithmic and human solutions. In CHI, 2023
work page 2023
-
[62]
Context-faithful prompting for large lan- guage models
Wenxuan Zhou, Sheng Zhang, Hoifung Poon, and Muhao Chen. Context-faithful prompting for large lan- guage models. In Findings of the Association for Com- putational Linguistics: EMNLP 2023, 2023
work page 2023
-
[63]
Hong Zhu, Shengzhi Zhang, and Kai Chen. Ai-guardian: Defeating adversarial attacks using backdoors. In IEEE S&P, 2023. Table 12: Summary of different prompt styles. <personal_profile> is a placeholder for the profile from which the attacker aims to extract information. Style Brief Example Direct Directly request the model to answer with the information th...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.