Evaluating LLM-based Personal Information Extraction and Countermeasures

· 2024 · cs.CR · arXiv 2408.07291

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Automatically extracting personal information -- such as name, phone number, and email address -- from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods -- such as regular expression, keyword search, and entity detection -- achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.

representative citing papers

Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

LLM agents can reconstruct high-fidelity personal profiles from minimal PII seeds with over 90% accuracy in under 10 minutes at less than $3 cost, exposing three escalating tiers of privacy risks.

citing papers explorer

Showing 1 of 1 citing paper.

Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents cs.CR · 2026-05-07 · unverdicted · none · ref 22 · internal anchor
LLM agents can reconstruct high-fidelity personal profiles from minimal PII seeds with over 90% accuracy in under 10 minutes at less than $3 cost, exposing three escalating tiers of privacy risks.

Evaluating LLM-based Personal Information Extraction and Countermeasures

fields

years

verdicts

representative citing papers

citing papers explorer