pith. sign in

arxiv: 2408.07291 · v4 · submitted 2024-08-14 · 💻 cs.CR

Evaluating LLM-based Personal Information Extraction and Countermeasures

Pith reviewed 2026-05-23 22:05 UTC · model grok-4.3

classification 💻 cs.CR
keywords personal information extractionLLM attacksprompt injectioncountermeasurespublic profilesspear phishinginformation security
0
0 comments X

The pith

Large language models extract personal information from public profiles more accurately than traditional methods, but prompt injection reduces their advantage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures how effectively large language models can pull details such as names, phone numbers, and email addresses from publicly posted profiles. It finds that LLMs succeed at higher rates than regular expressions, keyword search, or entity detection. The authors test a prompt injection approach that lowers LLM performance back to the level of those older methods. This matters because accurate large-scale extraction supports follow-on attacks like spear phishing. The benchmarks cover ten different LLMs and five datasets, three of them real-world profiles with eight labeled categories.

Core claim

LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.

What carries the argument

Framework for LLM-based extraction attacks and prompt injection mitigation strategy, benchmarked on ten LLMs and five datasets including synthetic and manually labeled real-world ones.

If this is right

  • Attackers obtain a stronger tool for large-scale personal information gathering that supports targeted attacks such as spear phishing.
  • Traditional extraction techniques prove insufficient when facing capable LLMs.
  • Prompt injection serves as a deployable defense that removes the performance edge of LLM attacks.
  • Results hold across a synthetic GPT-4 dataset and three real-world labeled datasets covering eight categories of personal information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platforms that host public profiles may need to explore automated text modifications as a standard privacy layer.
  • The same prompt injection tactic could be adapted to limit LLM processing in other user-content scenarios.
  • Attackers could experiment with varied prompt formats, so the defense requires repeated testing against new models.

Load-bearing premise

The manually labeled real-world datasets accurately represent the distribution and variety of personal information in actual public profiles, and the tested LLMs and prompt formats generalize to real attacker usage.

What would settle it

A test on a fresh collection of real profiles where LLM accuracy falls to or below traditional methods, or where prompt injection no longer limits LLM performance, would disprove the central claims.

Figures

Figures reproduced from arXiv: 2408.07291 by Jinyuan Jia, Neil Zhenqiang Gong, Yupei Liu, Yuqi Jia.

Figure 1
Figure 1. Figure 1: LLM-based personal information extraction. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Impact of the number of in-context learning [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Impact of the personal profile complexity (mea [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of different prompts to generate personal profiles in the synthetic dataset. [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An example profile from the synthetic dataset after rendering. The left one has no injected prompt and the [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The prompt used to generate personal profiles [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: How we perform prompt injection for docu [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
read the original abstract

Automatically extracting personal information -- such as name, phone number, and email address -- from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods -- such as regular expression, keyword search, and entity detection -- achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper conducts a systematic measurement study benchmarking LLM-based attacks for extracting eight categories of personal information (name, phone, email, etc.) from public profiles. It presents an attack framework, collects one GPT-4-generated synthetic dataset plus three manually labeled real-world datasets, proposes prompt injection as a novel mitigation, and evaluates ten LLMs against traditional baselines (regex, keyword search, entity detection), claiming LLMs achieve higher accuracy, outperform baselines, and that prompt injection reduces LLM attacks to the effectiveness of traditional methods.

Significance. If the datasets prove representative and results generalize beyond the tested profiles and models, the work supplies concrete empirical data on LLM misuse for privacy attacks and a deployable defense, informing both attacker capabilities and platform countermeasures in security research.

major comments (1)
  1. [Dataset section] Dataset section: the three manually labeled real-world datasets lack any reported inter-annotator agreement, sampling methodology across platforms, or validation that the eight-category label distribution matches broader public-profile statistics. These omissions are load-bearing for the central claims of LLM outperformance and prompt-injection effectiveness, because labeling noise or sampling bias could produce the observed results as artifacts of the evaluation set rather than intrinsic properties.
minor comments (1)
  1. [Abstract] Abstract: states collection of 'four datasets' but then reports benchmarking 'using ten LLMs and five datasets'; the inconsistency should be corrected for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment on the dataset section below.

read point-by-point responses
  1. Referee: [Dataset section] Dataset section: the three manually labeled real-world datasets lack any reported inter-annotator agreement, sampling methodology across platforms, or validation that the eight-category label distribution matches broader public-profile statistics. These omissions are load-bearing for the central claims of LLM outperformance and prompt-injection effectiveness, because labeling noise or sampling bias could produce the observed results as artifacts of the evaluation set rather than intrinsic properties.

    Authors: We agree that these details are important to include. In the revised manuscript we will report inter-annotator agreement (e.g., Cohen's kappa) for the manual labeling of the three real-world datasets, describe the sampling methodology used across platforms, and add a comparison of the observed eight-category label distributions against available public-profile statistics (or note limitations where such benchmarks are unavailable). These additions will directly address potential concerns about labeling noise or sampling bias. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark study with no derivations or self-referential fitting

full rationale

This is a measurement study that collects four datasets (one synthetic via GPT-4, three manually labeled real-world), benchmarks ten LLMs against regex/keyword/entity baselines on eight personal-information categories, and evaluates a prompt-injection mitigation. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. Results rest on external models, external datasets, and direct comparisons rather than internal definitions or author-prior uniqueness theorems, so the evaluation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical measurement study with no free parameters fitted to results, no new mathematical axioms, and no invented entities. Relies on standard assumptions about data labeling quality.

axioms (1)
  • domain assumption Manual labeling of eight categories of personal information in real-world datasets is accurate and unbiased.
    The evaluation depends on these labels to measure extraction accuracy.

pith-pipeline@v0.9.0 · 5704 in / 1175 out tokens · 58509 ms · 2026-05-23T22:05:25.647102+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    LLM agents can reconstruct high-fidelity personal profiles from minimal PII seeds with over 90% accuracy in under 10 minutes at less than $3 cost, exposing three escalating tiers of privacy risks.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    https://github

    spaCy: Industrial-strength NLP. https://github. com/explosion/spaCy, 2019

  2. [2]

    https://github.com/lorey/mlscraper/ tree/master, 2020

    mlscraper: Scrape data from HTML pages automat- ically. https://github.com/lorey/mlscraper/ tree/master, 2020

  3. [3]

    https://pypi.org/project/htmldocx/, 2021

    htmldocx. https://pypi.org/project/htmldocx/, 2021

  4. [4]

    https://gist.github.com/olavarrieta/ 1761f4e3097a382f07a57795dc1eb8ce, 2023

    Common regex used to extract data from Html. https://gist.github.com/olavarrieta/ 1761f4e3097a382f07a57795dc1eb8ce, 2023

  5. [5]

    https://the-decoder.com/gpt-4- architecture-datasets-costs-and-more-leaked, 2023

    GPT-4 leaks. https://the-decoder.com/gpt-4- architecture-datasets-costs-and-more-leaked, 2023

  6. [6]

    https://github.com/jarrekk/imgkit, 2023

    imgkit. https://github.com/jarrekk/imgkit, 2023

  7. [7]

    https://github.com/InternLM/ InternLM, 2023

    Internlm. https://github.com/InternLM/ InternLM, 2023

  8. [8]

    https://pypi.org/project/ pyhtml2pdf/, 2023

    pyhtml2pdf. https://pypi.org/project/ pyhtml2pdf/, 2023

  9. [9]

    https: //en.wikipedia.org/wiki/Category: 19th-century_American_physicians, 2024

    19th-century American physicians. https: //en.wikipedia.org/wiki/Category: 19th-century_American_physicians, 2024

  10. [10]

    https://www

    List of Top 100 Famous People. https://www. biographyonline.net/people/famous-100.html, 2024

  11. [11]

    https://github.com/ matthewwithanm/python-markdownify, 2024

    python-markdownify. https://github.com/ matthewwithanm/python-markdownify, 2024

  12. [12]

    Fact-saboteurs: A taxonomy of evidence manipulation attacks against fact- verification systems

    Sahar Abdelnabi and Mario Fritz. Fact-saboteurs: A taxonomy of evidence manipulation attacks against fact- verification systems. In USENIX Security, 2023

  13. [13]

    FLAIR: An easy-to-use framework for state-of-the-art NLP

    Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland V ollgraf. FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL, 2019

  14. [14]

    Dai, Orhan Firat, Melvin John- son, Dmitry Lepikhin, Alexandre Passos, et al

    Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin John- son, Dmitry Lepikhin, Alexandre Passos, et al. Palm 2 technical report. arXiv, 2023

  15. [15]

    Rat- gpt: Turning online llms into proxies for malware at- tacks

    Mika Beckerich, Laura Plein, and Sergio Coronado. Rat- gpt: Turning online llms into proxies for malware at- tacks. arXiv, 2023

  16. [16]

    Large language model lateral spear phishing: A comparative study in large-scale orga- nizational settings

    Mazal Bethany, Athanasios Galiopoulos, Emet Bethany, Mohammad Bahrami Karkevandi, Nishant Vishwamitra, and Peyman Najafirad. Large language model lateral spear phishing: A comparative study in large-scale orga- nizational settings. arXiv, 2024

  17. [17]

    Language models are few-shot learners

    Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. In NeurIPS, 2020

  18. [18]

    Sparks of artificial general intelligence: Early experi- ments with gpt-4

    Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experi- ments with gpt-4. arXiv, 2023

  19. [19]

    A llm assisted exploitation of ai- guardian

    Nicholas Carlini. A llm assisted exploitation of ai- guardian. arXiv, 2023

  20. [20]

    Forbes: Five novel phishing tactics

    Perry Carpenter. Forbes: Five novel phishing tactics. https://www.forbes.com/councils/forbesbusinesscouncil /2025/01/23/five-novel-phishing-tactics-to-beware-of- and-how-to-protect-your-company/, 2025

  21. [21]

    Can llm-generated misinfor- mation be detected? arXiv, 2023

    Canyu Chen and Kai Shu. Can llm-generated misinfor- mation be detected? arXiv, 2023

  22. [22]

    BERT: Pre-training of deep bidi- rectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidi- rectional transformers for language understanding. In NAACL-HLT, 2019

  23. [23]

    On the effect of pretraining corpora on in- context learning by a large-scale language model

    Shin et al. On the effect of pretraining corpora on in- context learning by a large-scale language model. In NAACL, 2022

  24. [24]

    Llama 2: Open foundation and fine-tuned chat models

    Touvron et al. Llama 2: Open foundation and fine-tuned chat models. arXiv, 2023

  25. [25]

    Judging llm-as-a-judge with mt-bench and chatbot arena

    Zheng et al. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv, 2023

  26. [26]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. In AISec, 2023

  27. [27]

    A data- driven analysis of workers’ earnings on amazon mechan- ical turk

    Kotaro Hara, Abi Adams, Kristy Milland, Saiph Sav- age, Chris Callison-Burch, and Jeffrey Bigham. A data- driven analysis of workers’ earnings on amazon mechan- ical turk. In CHI, 2018

  28. [28]

    Piilo: an open-source system for personally identifiable information labeling and obfus- cation

    Langdon Holmes, Scott Crossley, Harshvardhan Sikka, and Wesley Morris. Piilo: an open-source system for personally identifiable information labeling and obfus- cation. Information and Learning Sciences, 2023

  29. [29]

    Microsoft: New Star Blizzard spear-phishing campaign targets WhatsApp accounts

    Microsoft Threat Intelligence. Microsoft: New Star Blizzard spear-phishing campaign targets WhatsApp accounts. https://www.microsoft.com/en- us/security/blog/2025/01/16/new-star-blizzard-spear- phishing-campaign-targets-whatsapp-accounts/, 2025

  30. [30]

    Baseline defenses for adversarial attacks against aligned language models

    Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models. arXiv, 2023

  31. [31]

    Tele- com fraud detection via hawkes-enhanced sequence model

    Yan Jiang, Guannan Liu, Junjie Wu, and Hao Lin. Tele- com fraud detection via hawkes-enhanced sequence model. IEEE TKDE, 2023

  32. [32]

    Textwash – automated open-source text anonymisation

    Bennett Kleinberg, Toby Davies, and Maximilian Mozes. Textwash – automated open-source text anonymisation. arXiv, 2022

  33. [33]

    ROUGE: A package for automatic eval- uation of summaries

    Chin-Yew Lin. ROUGE: A package for automatic eval- uation of summaries. In Text Summarization Branches Out, 2004

  34. [34]

    Formalizing and benchmark- ing prompt injection attacks and defenses

    Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmark- ing prompt injection attacks and defenses. In USENIX Security, 2024

  35. [35]

    Ana- lyzing leakage of personally identifiable information in language models

    Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Ana- lyzing leakage of personally identifiable information in language models. In IEEE S&P, 2023

  36. [36]

    Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022

    Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettle- moyer. Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022

  37. [37]

    Prompting with pseudo-code instructions

    Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V au2, Danish Contractor, and Srikanth Tamil- selvam. Prompting with pseudo-code instructions. arXiv, 2023

  38. [38]

    Maximilian Mozes, Xuanli He, Bennett Kleinberg, and Lewis D. Griffin. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv, 2023

  39. [39]

    PII-compass: Guiding LLM training data extraction prompts towards the target PII via grounding

    Krishna Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, and Xuebing Zhou. PII-compass: Guiding LLM training data extraction prompts towards the target PII via grounding. In PrivNLP, 2024

  40. [40]

    GPT-4 Technical Report

    OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

  41. [41]

    Misinformation in the Age of AI

    Merav Ozair. Misinformation in the Age of AI. https://www.nasdaq.com/articles/misinformation-in- the-age-of-artificial-intelligence-and-what-it-means- for-the-markets, 2023

  42. [42]

    The empirical impact of data sanitization on language models

    Anwesan Pal, Radhika Bhargava, Kyle Hinsz, Jacques Esterhuizen, and Sudipta Bhattacharya. The empirical impact of data sanitization on language models. arXiv, 2024

  43. [43]

    On the risk of misinformation pollution with large language models

    Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Wang. On the risk of misinformation pollution with large language models. In EMNLP Findings, 2023

  44. [44]

    Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal

    Ashwinee Panda, Christopher A. Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal. Teach LLMs to phish: Stealing private information from language models. In ICLR, 2024

  45. [45]

    Man vs the machine in the struggle for effective text anonymi- sation in the age of large language models

    Constantinos Patsakis and Nikolaos Lykousas. Man vs the machine in the struggle for effective text anonymi- sation in the age of large language models. Scientific Reports, 2023

  46. [46]

    Data quality of platforms and panels for online behavioral research

    Eyal Peer, David Rothschild, Andrew Gordon, Zak Ev- ernden, and Ekaterina Damer. Data quality of platforms and panels for online behavioral research. Behavior Research Methods, 2022

  47. [47]

    Jatmo: Prompt injection defense by task-specific finetuning

    Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner. Jatmo: Prompt injection defense by task-specific finetuning. arXiv, 2024

  48. [48]

    The text anonymization benchmark (tab): A dedicated cor- pus and evaluation framework for text anonymization

    Ildikó Pilán, Pierre Lison, Lilja Øvrelid, Anthi Pa- padopoulou, David Sánchez, and Montserrat Batet. The text anonymization benchmark (tab): A dedicated cor- pus and evaluation framework for text anonymization. Computational Linguistics, 2022

  49. [49]

    Chatbots to chatgpt in a cybersecurity space: Evo- lution, vulnerabilities, attacks, challenges, and future recommendations

    Attia Qammar, Hongmei Wang, Jianguo Ding, Abde- nacer Naouri, Mahmoud Daneshmand, and Huansheng Ning. Chatbots to chatgpt in a cybersecurity space: Evo- lution, vulnerabilities, attacks, challenges, and future recommendations. arXiv, 2023

  50. [50]

    Llm driven web profile extraction for identical names

    Prateek Sancheti, Kamalakar Karlapalem, and Kavita Vemuri. Llm driven web profile extraction for identical names. In WWW, 2024

  51. [51]

    Digital deception: Generative artificial intelligence in social engineering and phishing

    Marc Schmitt and Ivan Flechais. Digital deception: Generative artificial intelligence in social engineering and phishing. arXiv, 2023

  52. [52]

    Beyond memorization: Violating privacy via inference with large language models

    Robin Staab, Mark Vero, Mislav Balunovi´c, and Martin Vechev. Beyond memorization: Violating privacy via inference with large language models. In ICLR, 2024

  53. [53]

    Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applica- tions

    Xuchen Suo. Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applica- tions. arXiv, 2024

  54. [54]

    Context-tuning: Learning contextualized prompts for natural language generation

    Tianyi Tang, Junyi Li, Wayne Xin Zhao, and Ji-Rong Wen. Context-tuning: Learning contextualized prompts for natural language generation. In ICCL, 2022

  55. [55]

    Tran, Xavier Gar- cia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Sia- mak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, and Donald Metzler

    Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Gar- cia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Sia- mak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, and Donald Metzler. Ul2: Unifying language learning paradigms. In ICLR, 2023

  56. [56]

    Gemini: A family of highly capable mul- timodal models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, et al. Gemini: A family of highly capable mul- timodal models. arXiv, 2023

  57. [57]

    Users really do answer telephone scams

    Huahong Tu, Adam Doupé, Ziming Zhao, and Gail- Joon Ahn. Users really do answer telephone scams. In USENIX Security, 2019

  58. [58]

    Finetuned language models are zero- shot learners

    Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero- shot learners. In ICLR, 2022

  59. [59]

    Jules White, Quchen Fu, Sam Hays, Michael Sand- born, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. A prompt pat- tern catalog to enhance prompt engineering with chatgpt. arXiv, 2023

  60. [60]

    Weinberger, and Yoav Artzi

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. In ICLR, 2020

  61. [61]

    Synthetic lies: Understanding ai-generated misinformation and evalu- ating algorithmic and human solutions

    Jiawei Zhou, Yixuan Zhang, Qianni Luo, Andrea G Parker, and Munmun De Choudhury. Synthetic lies: Understanding ai-generated misinformation and evalu- ating algorithmic and human solutions. In CHI, 2023

  62. [62]

    Context-faithful prompting for large lan- guage models

    Wenxuan Zhou, Sheng Zhang, Hoifung Poon, and Muhao Chen. Context-faithful prompting for large lan- guage models. In Findings of the Association for Com- putational Linguistics: EMNLP 2023, 2023

  63. [63]

    none”. “<personal_profile>

    Hong Zhu, Shengzhi Zhang, and Kai Chen. Ai-guardian: Defeating adversarial attacks using backdoors. In IEEE S&P, 2023. Table 12: Summary of different prompt styles. <personal_profile> is a placeholder for the profile from which the attacker aims to extract information. Style Brief Example Direct Directly request the model to answer with the information th...