Evaluating LLM-based Personal Information Extraction and Countermeasures

Jinyuan Jia; Neil Zhenqiang Gong; Yupei Liu; Yuqi Jia

arxiv: 2408.07291 · v4 · submitted 2024-08-14 · 💻 cs.CR

Evaluating LLM-based Personal Information Extraction and Countermeasures

Yupei Liu , Yuqi Jia , Jinyuan Jia , Neil Zhenqiang Gong This is my paper

Pith reviewed 2026-05-23 22:05 UTC · model grok-4.3

classification 💻 cs.CR

keywords personal information extractionLLM attacksprompt injectioncountermeasurespublic profilesspear phishinginformation security

0 comments

The pith

Large language models extract personal information from public profiles more accurately than traditional methods, but prompt injection reduces their advantage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures how effectively large language models can pull details such as names, phone numbers, and email addresses from publicly posted profiles. It finds that LLMs succeed at higher rates than regular expressions, keyword search, or entity detection. The authors test a prompt injection approach that lowers LLM performance back to the level of those older methods. This matters because accurate large-scale extraction supports follow-on attacks like spear phishing. The benchmarks cover ten different LLMs and five datasets, three of them real-world profiles with eight labeled categories.

Core claim

LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.

What carries the argument

Framework for LLM-based extraction attacks and prompt injection mitigation strategy, benchmarked on ten LLMs and five datasets including synthetic and manually labeled real-world ones.

If this is right

Attackers obtain a stronger tool for large-scale personal information gathering that supports targeted attacks such as spear phishing.
Traditional extraction techniques prove insufficient when facing capable LLMs.
Prompt injection serves as a deployable defense that removes the performance edge of LLM attacks.
Results hold across a synthetic GPT-4 dataset and three real-world labeled datasets covering eight categories of personal information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platforms that host public profiles may need to explore automated text modifications as a standard privacy layer.
The same prompt injection tactic could be adapted to limit LLM processing in other user-content scenarios.
Attackers could experiment with varied prompt formats, so the defense requires repeated testing against new models.

Load-bearing premise

The manually labeled real-world datasets accurately represent the distribution and variety of personal information in actual public profiles, and the tested LLMs and prompt formats generalize to real attacker usage.

What would settle it

A test on a fresh collection of real profiles where LLM accuracy falls to or below traditional methods, or where prompt injection no longer limits LLM performance, would disprove the central claims.

Figures

Figures reproduced from arXiv: 2408.07291 by Jinyuan Jia, Neil Zhenqiang Gong, Yupei Liu, Yuqi Jia.

**Figure 2.** Figure 2: Impact of the number of in-context learning [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Impact of the personal profile complexity (mea [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of different prompts to generate personal profiles in the synthetic dataset. [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: An example profile from the synthetic dataset after rendering. The left one has no injected prompt and the [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: The prompt used to generate personal profiles [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: How we perform prompt injection for docu [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

read the original abstract

Automatically extracting personal information -- such as name, phone number, and email address -- from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods -- such as regular expression, keyword search, and entity detection -- achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLMs beat regex and keywords at pulling personal details from profiles and prompt injection cuts the success rate, but the real-world datasets lack the validation needed to trust the numbers.

read the letter

The paper's core contribution is a measurement study showing that current LLMs can extract eight categories of personal information from public profiles more accurately than traditional regex, keyword, or entity methods, plus a prompt-injection defense that brings LLM performance back down to those weaker baselines. They built a framework for the attacks, generated one synthetic dataset with GPT-4, manually labeled three real-world ones, and ran the whole thing across ten LLMs and five datasets total. That systematic comparison is the useful part; it gives concrete evidence that the attack surface has shifted with widely available models and that a simple mitigation can blunt it without heavy changes to the profiles themselves. The work is straightforward and stays on the empirical question rather than overclaiming generality. The main weakness is the evaluation data. The three manually labeled real-world sets are central to the outperformance and defense claims, yet the paper gives no inter-annotator agreement figures, no sampling details across platforms, and no check that the label distributions match broader public profiles. Without those, it's hard to know whether the reported gains are stable or tied to the particular profiles chosen. The abstract is also light on error analysis or statistical tests, which makes it tougher to judge how much the results would move under different conditions. This is the kind of paper that belongs in a security or privacy venue where measurement studies are common. Readers working on online data exposure or LLM misuse would get practical takeaways from the benchmarks. It deserves a serious referee because the topic is timely and the setup is reproducible in principle, but the review should focus on tightening the dataset documentation and adding basic validation stats before acceptance.

Referee Report

1 major / 1 minor

Summary. The paper conducts a systematic measurement study benchmarking LLM-based attacks for extracting eight categories of personal information (name, phone, email, etc.) from public profiles. It presents an attack framework, collects one GPT-4-generated synthetic dataset plus three manually labeled real-world datasets, proposes prompt injection as a novel mitigation, and evaluates ten LLMs against traditional baselines (regex, keyword search, entity detection), claiming LLMs achieve higher accuracy, outperform baselines, and that prompt injection reduces LLM attacks to the effectiveness of traditional methods.

Significance. If the datasets prove representative and results generalize beyond the tested profiles and models, the work supplies concrete empirical data on LLM misuse for privacy attacks and a deployable defense, informing both attacker capabilities and platform countermeasures in security research.

major comments (1)

[Dataset section] Dataset section: the three manually labeled real-world datasets lack any reported inter-annotator agreement, sampling methodology across platforms, or validation that the eight-category label distribution matches broader public-profile statistics. These omissions are load-bearing for the central claims of LLM outperformance and prompt-injection effectiveness, because labeling noise or sampling bias could produce the observed results as artifacts of the evaluation set rather than intrinsic properties.

minor comments (1)

[Abstract] Abstract: states collection of 'four datasets' but then reports benchmarking 'using ten LLMs and five datasets'; the inconsistency should be corrected for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment on the dataset section below.

read point-by-point responses

Referee: [Dataset section] Dataset section: the three manually labeled real-world datasets lack any reported inter-annotator agreement, sampling methodology across platforms, or validation that the eight-category label distribution matches broader public-profile statistics. These omissions are load-bearing for the central claims of LLM outperformance and prompt-injection effectiveness, because labeling noise or sampling bias could produce the observed results as artifacts of the evaluation set rather than intrinsic properties.

Authors: We agree that these details are important to include. In the revised manuscript we will report inter-annotator agreement (e.g., Cohen's kappa) for the manual labeling of the three real-world datasets, describe the sampling methodology used across platforms, and add a comparison of the observed eight-category label distributions against available public-profile statistics (or note limitations where such benchmarks are unavailable). These additions will directly address potential concerns about labeling noise or sampling bias. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark study with no derivations or self-referential fitting

full rationale

This is a measurement study that collects four datasets (one synthetic via GPT-4, three manually labeled real-world), benchmarks ten LLMs against regex/keyword/entity baselines on eight personal-information categories, and evaluates a prompt-injection mitigation. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. Results rest on external models, external datasets, and direct comparisons rather than internal definitions or author-prior uniqueness theorems, so the evaluation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical measurement study with no free parameters fitted to results, no new mathematical axioms, and no invented entities. Relies on standard assumptions about data labeling quality.

axioms (1)

domain assumption Manual labeling of eight categories of personal information in real-world datasets is accurate and unbiased.
The evaluation depends on these labels to measure extraction accuracy.

pith-pipeline@v0.9.0 · 5704 in / 1175 out tokens · 58509 ms · 2026-05-23T22:05:25.647102+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents
cs.CR 2026-05 unverdicted novelty 6.0

LLM agents can reconstruct high-fidelity personal profiles from minimal PII seeds with over 90% accuracy in under 10 minutes at less than $3 cost, exposing three escalating tiers of privacy risks.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

https://github

spaCy: Industrial-strength NLP. https://github. com/explosion/spaCy, 2019

work page 2019
[2]

https://github.com/lorey/mlscraper/ tree/master, 2020

mlscraper: Scrape data from HTML pages automat- ically. https://github.com/lorey/mlscraper/ tree/master, 2020

work page 2020
[3]

https://pypi.org/project/htmldocx/, 2021

htmldocx. https://pypi.org/project/htmldocx/, 2021

work page 2021
[4]

https://gist.github.com/olavarrieta/ 1761f4e3097a382f07a57795dc1eb8ce, 2023

Common regex used to extract data from Html. https://gist.github.com/olavarrieta/ 1761f4e3097a382f07a57795dc1eb8ce, 2023

work page 2023
[5]

https://the-decoder.com/gpt-4- architecture-datasets-costs-and-more-leaked, 2023

GPT-4 leaks. https://the-decoder.com/gpt-4- architecture-datasets-costs-and-more-leaked, 2023

work page 2023
[6]

https://github.com/jarrekk/imgkit, 2023

imgkit. https://github.com/jarrekk/imgkit, 2023

work page 2023
[7]

https://github.com/InternLM/ InternLM, 2023

Internlm. https://github.com/InternLM/ InternLM, 2023

work page 2023
[8]

https://pypi.org/project/ pyhtml2pdf/, 2023

pyhtml2pdf. https://pypi.org/project/ pyhtml2pdf/, 2023

work page 2023
[9]

https: //en.wikipedia.org/wiki/Category: 19th-century_American_physicians, 2024

19th-century American physicians. https: //en.wikipedia.org/wiki/Category: 19th-century_American_physicians, 2024

work page 2024
[10]

https://www

List of Top 100 Famous People. https://www. biographyonline.net/people/famous-100.html, 2024

work page 2024
[11]

https://github.com/ matthewwithanm/python-markdownify, 2024

python-markdownify. https://github.com/ matthewwithanm/python-markdownify, 2024

work page 2024
[12]

Fact-saboteurs: A taxonomy of evidence manipulation attacks against fact- verification systems

Sahar Abdelnabi and Mario Fritz. Fact-saboteurs: A taxonomy of evidence manipulation attacks against fact- verification systems. In USENIX Security, 2023

work page 2023
[13]

FLAIR: An easy-to-use framework for state-of-the-art NLP

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland V ollgraf. FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL, 2019

work page 2019
[14]

Dai, Orhan Firat, Melvin John- son, Dmitry Lepikhin, Alexandre Passos, et al

Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin John- son, Dmitry Lepikhin, Alexandre Passos, et al. Palm 2 technical report. arXiv, 2023

work page 2023
[15]

Rat- gpt: Turning online llms into proxies for malware at- tacks

Mika Beckerich, Laura Plein, and Sergio Coronado. Rat- gpt: Turning online llms into proxies for malware at- tacks. arXiv, 2023

work page 2023
[16]

Large language model lateral spear phishing: A comparative study in large-scale orga- nizational settings

Mazal Bethany, Athanasios Galiopoulos, Emet Bethany, Mohammad Bahrami Karkevandi, Nishant Vishwamitra, and Peyman Najafirad. Large language model lateral spear phishing: A comparative study in large-scale orga- nizational settings. arXiv, 2024

work page 2024
[17]

Language models are few-shot learners

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. In NeurIPS, 2020

work page 2020
[18]

Sparks of artificial general intelligence: Early experi- ments with gpt-4

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experi- ments with gpt-4. arXiv, 2023

work page 2023
[19]

A llm assisted exploitation of ai- guardian

Nicholas Carlini. A llm assisted exploitation of ai- guardian. arXiv, 2023

work page 2023
[20]

Forbes: Five novel phishing tactics

Perry Carpenter. Forbes: Five novel phishing tactics. https://www.forbes.com/councils/forbesbusinesscouncil /2025/01/23/five-novel-phishing-tactics-to-beware-of- and-how-to-protect-your-company/, 2025

work page 2025
[21]

Can llm-generated misinfor- mation be detected? arXiv, 2023

Canyu Chen and Kai Shu. Can llm-generated misinfor- mation be detected? arXiv, 2023

work page 2023
[22]

BERT: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidi- rectional transformers for language understanding. In NAACL-HLT, 2019

work page 2019
[23]

On the effect of pretraining corpora on in- context learning by a large-scale language model

Shin et al. On the effect of pretraining corpora on in- context learning by a large-scale language model. In NAACL, 2022

work page 2022
[24]

Llama 2: Open foundation and fine-tuned chat models

Touvron et al. Llama 2: Open foundation and fine-tuned chat models. arXiv, 2023

work page 2023
[25]

Judging llm-as-a-judge with mt-bench and chatbot arena

Zheng et al. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv, 2023

work page 2023
[26]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. In AISec, 2023

work page 2023
[27]

A data- driven analysis of workers’ earnings on amazon mechan- ical turk

Kotaro Hara, Abi Adams, Kristy Milland, Saiph Sav- age, Chris Callison-Burch, and Jeffrey Bigham. A data- driven analysis of workers’ earnings on amazon mechan- ical turk. In CHI, 2018

work page 2018
[28]

Piilo: an open-source system for personally identifiable information labeling and obfus- cation

Langdon Holmes, Scott Crossley, Harshvardhan Sikka, and Wesley Morris. Piilo: an open-source system for personally identifiable information labeling and obfus- cation. Information and Learning Sciences, 2023

work page 2023
[29]

Microsoft: New Star Blizzard spear-phishing campaign targets WhatsApp accounts

Microsoft Threat Intelligence. Microsoft: New Star Blizzard spear-phishing campaign targets WhatsApp accounts. https://www.microsoft.com/en- us/security/blog/2025/01/16/new-star-blizzard-spear- phishing-campaign-targets-whatsapp-accounts/, 2025

work page 2025
[30]

Baseline defenses for adversarial attacks against aligned language models

Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models. arXiv, 2023

work page 2023
[31]

Tele- com fraud detection via hawkes-enhanced sequence model

Yan Jiang, Guannan Liu, Junjie Wu, and Hao Lin. Tele- com fraud detection via hawkes-enhanced sequence model. IEEE TKDE, 2023

work page 2023
[32]

Textwash – automated open-source text anonymisation

Bennett Kleinberg, Toby Davies, and Maximilian Mozes. Textwash – automated open-source text anonymisation. arXiv, 2022

work page 2022
[33]

ROUGE: A package for automatic eval- uation of summaries

Chin-Yew Lin. ROUGE: A package for automatic eval- uation of summaries. In Text Summarization Branches Out, 2004

work page 2004
[34]

Formalizing and benchmark- ing prompt injection attacks and defenses

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmark- ing prompt injection attacks and defenses. In USENIX Security, 2024

work page 2024
[35]

Ana- lyzing leakage of personally identifiable information in language models

Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Ana- lyzing leakage of personally identifiable information in language models. In IEEE S&P, 2023

work page 2023
[36]

Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettle- moyer. Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022

work page 2022
[37]

Prompting with pseudo-code instructions

Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V au2, Danish Contractor, and Srikanth Tamil- selvam. Prompting with pseudo-code instructions. arXiv, 2023

work page 2023
[38]

Maximilian Mozes, Xuanli He, Bennett Kleinberg, and Lewis D. Griffin. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv, 2023

work page 2023
[39]

PII-compass: Guiding LLM training data extraction prompts towards the target PII via grounding

Krishna Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, and Xuebing Zhou. PII-compass: Guiding LLM training data extraction prompts towards the target PII via grounding. In PrivNLP, 2024

work page 2024
[40]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[41]

Misinformation in the Age of AI

Merav Ozair. Misinformation in the Age of AI. https://www.nasdaq.com/articles/misinformation-in- the-age-of-artificial-intelligence-and-what-it-means- for-the-markets, 2023

work page 2023
[42]

The empirical impact of data sanitization on language models

Anwesan Pal, Radhika Bhargava, Kyle Hinsz, Jacques Esterhuizen, and Sudipta Bhattacharya. The empirical impact of data sanitization on language models. arXiv, 2024

work page 2024
[43]

On the risk of misinformation pollution with large language models

Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Wang. On the risk of misinformation pollution with large language models. In EMNLP Findings, 2023

work page 2023
[44]

Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal

Ashwinee Panda, Christopher A. Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal. Teach LLMs to phish: Stealing private information from language models. In ICLR, 2024

work page 2024
[45]

Man vs the machine in the struggle for effective text anonymi- sation in the age of large language models

Constantinos Patsakis and Nikolaos Lykousas. Man vs the machine in the struggle for effective text anonymi- sation in the age of large language models. Scientific Reports, 2023

work page 2023
[46]

Data quality of platforms and panels for online behavioral research

Eyal Peer, David Rothschild, Andrew Gordon, Zak Ev- ernden, and Ekaterina Damer. Data quality of platforms and panels for online behavioral research. Behavior Research Methods, 2022

work page 2022
[47]

Jatmo: Prompt injection defense by task-specific finetuning

Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner. Jatmo: Prompt injection defense by task-specific finetuning. arXiv, 2024

work page 2024
[48]

The text anonymization benchmark (tab): A dedicated cor- pus and evaluation framework for text anonymization

Ildikó Pilán, Pierre Lison, Lilja Øvrelid, Anthi Pa- padopoulou, David Sánchez, and Montserrat Batet. The text anonymization benchmark (tab): A dedicated cor- pus and evaluation framework for text anonymization. Computational Linguistics, 2022

work page 2022
[49]

Chatbots to chatgpt in a cybersecurity space: Evo- lution, vulnerabilities, attacks, challenges, and future recommendations

Attia Qammar, Hongmei Wang, Jianguo Ding, Abde- nacer Naouri, Mahmoud Daneshmand, and Huansheng Ning. Chatbots to chatgpt in a cybersecurity space: Evo- lution, vulnerabilities, attacks, challenges, and future recommendations. arXiv, 2023

work page 2023
[50]

Llm driven web profile extraction for identical names

Prateek Sancheti, Kamalakar Karlapalem, and Kavita Vemuri. Llm driven web profile extraction for identical names. In WWW, 2024

work page 2024
[51]

Digital deception: Generative artificial intelligence in social engineering and phishing

Marc Schmitt and Ivan Flechais. Digital deception: Generative artificial intelligence in social engineering and phishing. arXiv, 2023

work page 2023
[52]

Beyond memorization: Violating privacy via inference with large language models

Robin Staab, Mark Vero, Mislav Balunovi´c, and Martin Vechev. Beyond memorization: Violating privacy via inference with large language models. In ICLR, 2024

work page 2024
[53]

Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applica- tions

Xuchen Suo. Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applica- tions. arXiv, 2024

work page 2024
[54]

Context-tuning: Learning contextualized prompts for natural language generation

Tianyi Tang, Junyi Li, Wayne Xin Zhao, and Ji-Rong Wen. Context-tuning: Learning contextualized prompts for natural language generation. In ICCL, 2022

work page 2022
[55]

Tran, Xavier Gar- cia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Sia- mak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, and Donald Metzler

Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Gar- cia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Sia- mak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, and Donald Metzler. Ul2: Unifying language learning paradigms. In ICLR, 2023

work page 2023
[56]

Gemini: A family of highly capable mul- timodal models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, et al. Gemini: A family of highly capable mul- timodal models. arXiv, 2023

work page 2023
[57]

Users really do answer telephone scams

Huahong Tu, Adam Doupé, Ziming Zhao, and Gail- Joon Ahn. Users really do answer telephone scams. In USENIX Security, 2019

work page 2019
[58]

Finetuned language models are zero- shot learners

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero- shot learners. In ICLR, 2022

work page 2022
[59]

Jules White, Quchen Fu, Sam Hays, Michael Sand- born, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. A prompt pat- tern catalog to enhance prompt engineering with chatgpt. arXiv, 2023

work page 2023
[60]

Weinberger, and Yoav Artzi

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. In ICLR, 2020

work page 2020
[61]

Synthetic lies: Understanding ai-generated misinformation and evalu- ating algorithmic and human solutions

Jiawei Zhou, Yixuan Zhang, Qianni Luo, Andrea G Parker, and Munmun De Choudhury. Synthetic lies: Understanding ai-generated misinformation and evalu- ating algorithmic and human solutions. In CHI, 2023

work page 2023
[62]

Context-faithful prompting for large lan- guage models

Wenxuan Zhou, Sheng Zhang, Hoifung Poon, and Muhao Chen. Context-faithful prompting for large lan- guage models. In Findings of the Association for Com- putational Linguistics: EMNLP 2023, 2023

work page 2023
[63]

none”. “<personal_profile>

Hong Zhu, Shengzhi Zhang, and Kai Chen. Ai-guardian: Defeating adversarial attacks using backdoors. In IEEE S&P, 2023. Table 12: Summary of different prompt styles. <personal_profile> is a placeholder for the profile from which the attacker aims to extract information. Style Brief Example Direct Directly request the model to answer with the information th...

work page 2023

[1] [1]

https://github

spaCy: Industrial-strength NLP. https://github. com/explosion/spaCy, 2019

work page 2019

[2] [2]

https://github.com/lorey/mlscraper/ tree/master, 2020

mlscraper: Scrape data from HTML pages automat- ically. https://github.com/lorey/mlscraper/ tree/master, 2020

work page 2020

[3] [3]

https://pypi.org/project/htmldocx/, 2021

htmldocx. https://pypi.org/project/htmldocx/, 2021

work page 2021

[4] [4]

https://gist.github.com/olavarrieta/ 1761f4e3097a382f07a57795dc1eb8ce, 2023

Common regex used to extract data from Html. https://gist.github.com/olavarrieta/ 1761f4e3097a382f07a57795dc1eb8ce, 2023

work page 2023

[5] [5]

https://the-decoder.com/gpt-4- architecture-datasets-costs-and-more-leaked, 2023

GPT-4 leaks. https://the-decoder.com/gpt-4- architecture-datasets-costs-and-more-leaked, 2023

work page 2023

[6] [6]

https://github.com/jarrekk/imgkit, 2023

imgkit. https://github.com/jarrekk/imgkit, 2023

work page 2023

[7] [7]

https://github.com/InternLM/ InternLM, 2023

Internlm. https://github.com/InternLM/ InternLM, 2023

work page 2023

[8] [8]

https://pypi.org/project/ pyhtml2pdf/, 2023

pyhtml2pdf. https://pypi.org/project/ pyhtml2pdf/, 2023

work page 2023

[9] [9]

https: //en.wikipedia.org/wiki/Category: 19th-century_American_physicians, 2024

19th-century American physicians. https: //en.wikipedia.org/wiki/Category: 19th-century_American_physicians, 2024

work page 2024

[10] [10]

https://www

List of Top 100 Famous People. https://www. biographyonline.net/people/famous-100.html, 2024

work page 2024

[11] [11]

https://github.com/ matthewwithanm/python-markdownify, 2024

python-markdownify. https://github.com/ matthewwithanm/python-markdownify, 2024

work page 2024

[12] [12]

Fact-saboteurs: A taxonomy of evidence manipulation attacks against fact- verification systems

Sahar Abdelnabi and Mario Fritz. Fact-saboteurs: A taxonomy of evidence manipulation attacks against fact- verification systems. In USENIX Security, 2023

work page 2023

[13] [13]

FLAIR: An easy-to-use framework for state-of-the-art NLP

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland V ollgraf. FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL, 2019

work page 2019

[14] [14]

Dai, Orhan Firat, Melvin John- son, Dmitry Lepikhin, Alexandre Passos, et al

Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin John- son, Dmitry Lepikhin, Alexandre Passos, et al. Palm 2 technical report. arXiv, 2023

work page 2023

[15] [15]

Rat- gpt: Turning online llms into proxies for malware at- tacks

Mika Beckerich, Laura Plein, and Sergio Coronado. Rat- gpt: Turning online llms into proxies for malware at- tacks. arXiv, 2023

work page 2023

[16] [16]

Large language model lateral spear phishing: A comparative study in large-scale orga- nizational settings

Mazal Bethany, Athanasios Galiopoulos, Emet Bethany, Mohammad Bahrami Karkevandi, Nishant Vishwamitra, and Peyman Najafirad. Large language model lateral spear phishing: A comparative study in large-scale orga- nizational settings. arXiv, 2024

work page 2024

[17] [17]

Language models are few-shot learners

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. In NeurIPS, 2020

work page 2020

[18] [18]

Sparks of artificial general intelligence: Early experi- ments with gpt-4

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experi- ments with gpt-4. arXiv, 2023

work page 2023

[19] [19]

A llm assisted exploitation of ai- guardian

Nicholas Carlini. A llm assisted exploitation of ai- guardian. arXiv, 2023

work page 2023

[20] [20]

Forbes: Five novel phishing tactics

Perry Carpenter. Forbes: Five novel phishing tactics. https://www.forbes.com/councils/forbesbusinesscouncil /2025/01/23/five-novel-phishing-tactics-to-beware-of- and-how-to-protect-your-company/, 2025

work page 2025

[21] [21]

Can llm-generated misinfor- mation be detected? arXiv, 2023

Canyu Chen and Kai Shu. Can llm-generated misinfor- mation be detected? arXiv, 2023

work page 2023

[22] [22]

BERT: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidi- rectional transformers for language understanding. In NAACL-HLT, 2019

work page 2019

[23] [23]

On the effect of pretraining corpora on in- context learning by a large-scale language model

Shin et al. On the effect of pretraining corpora on in- context learning by a large-scale language model. In NAACL, 2022

work page 2022

[24] [24]

Llama 2: Open foundation and fine-tuned chat models

Touvron et al. Llama 2: Open foundation and fine-tuned chat models. arXiv, 2023

work page 2023

[25] [25]

Judging llm-as-a-judge with mt-bench and chatbot arena

Zheng et al. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv, 2023

work page 2023

[26] [26]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. In AISec, 2023

work page 2023

[27] [27]

A data- driven analysis of workers’ earnings on amazon mechan- ical turk

Kotaro Hara, Abi Adams, Kristy Milland, Saiph Sav- age, Chris Callison-Burch, and Jeffrey Bigham. A data- driven analysis of workers’ earnings on amazon mechan- ical turk. In CHI, 2018

work page 2018

[28] [28]

Piilo: an open-source system for personally identifiable information labeling and obfus- cation

Langdon Holmes, Scott Crossley, Harshvardhan Sikka, and Wesley Morris. Piilo: an open-source system for personally identifiable information labeling and obfus- cation. Information and Learning Sciences, 2023

work page 2023

[29] [29]

Microsoft: New Star Blizzard spear-phishing campaign targets WhatsApp accounts

Microsoft Threat Intelligence. Microsoft: New Star Blizzard spear-phishing campaign targets WhatsApp accounts. https://www.microsoft.com/en- us/security/blog/2025/01/16/new-star-blizzard-spear- phishing-campaign-targets-whatsapp-accounts/, 2025

work page 2025

[30] [30]

Baseline defenses for adversarial attacks against aligned language models

Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models. arXiv, 2023

work page 2023

[31] [31]

Tele- com fraud detection via hawkes-enhanced sequence model

Yan Jiang, Guannan Liu, Junjie Wu, and Hao Lin. Tele- com fraud detection via hawkes-enhanced sequence model. IEEE TKDE, 2023

work page 2023

[32] [32]

Textwash – automated open-source text anonymisation

Bennett Kleinberg, Toby Davies, and Maximilian Mozes. Textwash – automated open-source text anonymisation. arXiv, 2022

work page 2022

[33] [33]

ROUGE: A package for automatic eval- uation of summaries

Chin-Yew Lin. ROUGE: A package for automatic eval- uation of summaries. In Text Summarization Branches Out, 2004

work page 2004

[34] [34]

Formalizing and benchmark- ing prompt injection attacks and defenses

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmark- ing prompt injection attacks and defenses. In USENIX Security, 2024

work page 2024

[35] [35]

Ana- lyzing leakage of personally identifiable information in language models

Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Ana- lyzing leakage of personally identifiable information in language models. In IEEE S&P, 2023

work page 2023

[36] [36]

Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettle- moyer. Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022

work page 2022

[37] [37]

Prompting with pseudo-code instructions

Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V au2, Danish Contractor, and Srikanth Tamil- selvam. Prompting with pseudo-code instructions. arXiv, 2023

work page 2023

[38] [38]

Maximilian Mozes, Xuanli He, Bennett Kleinberg, and Lewis D. Griffin. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv, 2023

work page 2023

[39] [39]

PII-compass: Guiding LLM training data extraction prompts towards the target PII via grounding

Krishna Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, and Xuebing Zhou. PII-compass: Guiding LLM training data extraction prompts towards the target PII via grounding. In PrivNLP, 2024

work page 2024

[40] [40]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[41] [41]

Misinformation in the Age of AI

Merav Ozair. Misinformation in the Age of AI. https://www.nasdaq.com/articles/misinformation-in- the-age-of-artificial-intelligence-and-what-it-means- for-the-markets, 2023

work page 2023

[42] [42]

The empirical impact of data sanitization on language models

Anwesan Pal, Radhika Bhargava, Kyle Hinsz, Jacques Esterhuizen, and Sudipta Bhattacharya. The empirical impact of data sanitization on language models. arXiv, 2024

work page 2024

[43] [43]

On the risk of misinformation pollution with large language models

Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Wang. On the risk of misinformation pollution with large language models. In EMNLP Findings, 2023

work page 2023

[44] [44]

Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal

Ashwinee Panda, Christopher A. Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal. Teach LLMs to phish: Stealing private information from language models. In ICLR, 2024

work page 2024

[45] [45]

Man vs the machine in the struggle for effective text anonymi- sation in the age of large language models

Constantinos Patsakis and Nikolaos Lykousas. Man vs the machine in the struggle for effective text anonymi- sation in the age of large language models. Scientific Reports, 2023

work page 2023

[46] [46]

Data quality of platforms and panels for online behavioral research

Eyal Peer, David Rothschild, Andrew Gordon, Zak Ev- ernden, and Ekaterina Damer. Data quality of platforms and panels for online behavioral research. Behavior Research Methods, 2022

work page 2022

[47] [47]

Jatmo: Prompt injection defense by task-specific finetuning

Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner. Jatmo: Prompt injection defense by task-specific finetuning. arXiv, 2024

work page 2024

[48] [48]

The text anonymization benchmark (tab): A dedicated cor- pus and evaluation framework for text anonymization

Ildikó Pilán, Pierre Lison, Lilja Øvrelid, Anthi Pa- padopoulou, David Sánchez, and Montserrat Batet. The text anonymization benchmark (tab): A dedicated cor- pus and evaluation framework for text anonymization. Computational Linguistics, 2022

work page 2022

[49] [49]

Chatbots to chatgpt in a cybersecurity space: Evo- lution, vulnerabilities, attacks, challenges, and future recommendations

Attia Qammar, Hongmei Wang, Jianguo Ding, Abde- nacer Naouri, Mahmoud Daneshmand, and Huansheng Ning. Chatbots to chatgpt in a cybersecurity space: Evo- lution, vulnerabilities, attacks, challenges, and future recommendations. arXiv, 2023

work page 2023

[50] [50]

Llm driven web profile extraction for identical names

Prateek Sancheti, Kamalakar Karlapalem, and Kavita Vemuri. Llm driven web profile extraction for identical names. In WWW, 2024

work page 2024

[51] [51]

Digital deception: Generative artificial intelligence in social engineering and phishing

Marc Schmitt and Ivan Flechais. Digital deception: Generative artificial intelligence in social engineering and phishing. arXiv, 2023

work page 2023

[52] [52]

Beyond memorization: Violating privacy via inference with large language models

Robin Staab, Mark Vero, Mislav Balunovi´c, and Martin Vechev. Beyond memorization: Violating privacy via inference with large language models. In ICLR, 2024

work page 2024

[53] [53]

Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applica- tions

Xuchen Suo. Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applica- tions. arXiv, 2024

work page 2024

[54] [54]

Context-tuning: Learning contextualized prompts for natural language generation

Tianyi Tang, Junyi Li, Wayne Xin Zhao, and Ji-Rong Wen. Context-tuning: Learning contextualized prompts for natural language generation. In ICCL, 2022

work page 2022

[55] [55]

Tran, Xavier Gar- cia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Sia- mak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, and Donald Metzler

Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Gar- cia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Sia- mak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, and Donald Metzler. Ul2: Unifying language learning paradigms. In ICLR, 2023

work page 2023

[56] [56]

Gemini: A family of highly capable mul- timodal models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, et al. Gemini: A family of highly capable mul- timodal models. arXiv, 2023

work page 2023

[57] [57]

Users really do answer telephone scams

Huahong Tu, Adam Doupé, Ziming Zhao, and Gail- Joon Ahn. Users really do answer telephone scams. In USENIX Security, 2019

work page 2019

[58] [58]

Finetuned language models are zero- shot learners

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero- shot learners. In ICLR, 2022

work page 2022

[59] [59]

Jules White, Quchen Fu, Sam Hays, Michael Sand- born, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. A prompt pat- tern catalog to enhance prompt engineering with chatgpt. arXiv, 2023

work page 2023

[60] [60]

Weinberger, and Yoav Artzi

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. In ICLR, 2020

work page 2020

[61] [61]

Synthetic lies: Understanding ai-generated misinformation and evalu- ating algorithmic and human solutions

Jiawei Zhou, Yixuan Zhang, Qianni Luo, Andrea G Parker, and Munmun De Choudhury. Synthetic lies: Understanding ai-generated misinformation and evalu- ating algorithmic and human solutions. In CHI, 2023

work page 2023

[62] [62]

Context-faithful prompting for large lan- guage models

Wenxuan Zhou, Sheng Zhang, Hoifung Poon, and Muhao Chen. Context-faithful prompting for large lan- guage models. In Findings of the Association for Com- putational Linguistics: EMNLP 2023, 2023

work page 2023

[63] [63]

none”. “<personal_profile>

Hong Zhu, Shengzhi Zhang, and Kai Chen. Ai-guardian: Defeating adversarial attacks using backdoors. In IEEE S&P, 2023. Table 12: Summary of different prompt styles. <personal_profile> is a placeholder for the profile from which the attacker aims to extract information. Style Brief Example Direct Directly request the model to answer with the information th...

work page 2023