{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:JUXQCDAH6DD6QOU5S5GRAHY5CV","short_pith_number":"pith:JUXQCDAH","schema_version":"1.0","canonical_sha256":"4d2f010c07f0c7e83a9d974d101f1d15410a4d776d87aa93b28d1d8f8b213c7e","source":{"kind":"arxiv","id":"2308.03825","version":2},"attestation_state":"computed","paper":{"title":"\"Do Anything Now\": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.","cross_cats":["cs.LG"],"primary_cat":"cs.CR","authors_text":"Michael Backes, Xinyue Shen, Yang Zhang, Yun Shen, Zeyuan Chen","submitted_at":"2023-08-07T16:55:20Z","abstract_excerpt":"The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has emerged as the main attack vector to bypass the safeguards and elicit harmful content from LLMs. In this paper, employing our new framework JailbreakHub, we conduct a comprehensive analysis of 1,405 jailbreak prompts spanning from December 2022 to December 2023. We identify 131 jailbreak communities and discover unique characteristics of jailbreak prompts and their major attack strategies, such as prompt in"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2308.03825","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CR","submitted_at":"2023-08-07T16:55:20Z","cross_cats_sorted":["cs.LG"],"title_canon_sha256":"c5717085b3c9718de16aa4cd67ced5c7a868ded0d8bb8e4113999ea62b24d0ef","abstract_canon_sha256":"2ec07f8bc40c9d26f66dc397dafce609b4911c07d8b55a00594e0c5e0e44747c"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:14.561505Z","signature_b64":"UEwvNRuj/TFjWvstMnwC2Y7NEyRBCpp6SfvmoHmwlCIGCNHVSfIkheYvjUtI8CmfqjZSHatEx6yInQ9iWD3vBg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"4d2f010c07f0c7e83a9d974d101f1d15410a4d776d87aa93b28d1d8f8b213c7e","last_reissued_at":"2026-05-17T23:38:14.560748Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:14.560748Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"\"Do Anything Now\": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.","cross_cats":["cs.LG"],"primary_cat":"cs.CR","authors_text":"Michael Backes, Xinyue Shen, Yang Zhang, Yun Shen, Zeyuan Chen","submitted_at":"2023-08-07T16:55:20Z","abstract_excerpt":"The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has emerged as the main attack vector to bypass the safeguards and elicit harmful content from LLMs. In this paper, employing our new framework JailbreakHub, we conduct a comprehensive analysis of 1,405 jailbreak prompts spanning from December 2022 to December 2023. We identify 131 jailbreak communities and discover unique characteristics of jailbreak prompts and their major attack strategies, such as prompt in"},"claims":{"count":3,"items":[{"kind":"strongest_claim","text":"our experiments on six popular LLMs show that their safeguards cannot adequately defend jailbreak prompts in all scenarios. Particularly, we identify five highly effective jailbreak prompts that achieve 0.95 attack success rates on ChatGPT (GPT-3.5) and GPT-4","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The 1,405 collected prompts and the 107,250-question set across 13 scenarios are representative enough to support broad conclusions about the inadequacy of safeguards on all LLMs.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"}],"snapshot_sha256":"4baa3b4ed8eb101189a43c0a17875ec17036929bf8888959893376f413598192"},"source":{"id":"2308.03825","kind":"arxiv","version":2},"verdict":{"id":"d9b54ae9-1ad6-4d96-aa37-235348d72816","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T08:29:12.747903Z","strongest_claim":"our experiments on six popular LLMs show that their safeguards cannot adequately defend jailbreak prompts in all scenarios. Particularly, we identify five highly effective jailbreak prompts that achieve 0.95 attack success rates on ChatGPT (GPT-3.5) and GPT-4","one_line_summary":"Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The 1,405 collected prompts and the 107,250-question set across 13 scenarios are representative enough to support broad conclusions about the inadequacy of safeguards on all LLMs.","pith_extraction_headline":""},"references":{"count":98,"sample":[{"doi":"","year":null,"title":"https: //assets.publishing.service.gov.uk/government/ uploads/system/uploads/attachment_data/file/ 1146542/a_pro-innovation_approach_to_AI_ regulation.pdf","work_id":"502f913d-e96e-4d5c-84b9-631c5717f429","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"https://www.aiprm.com/","work_id":"bd0c8e0b-3f35-46da-bf1e-8ff70918fa6b","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"https://huggingface.co/ datasets/fka/awesome-chatgpt-prompts","work_id":"37765adf-2f9e-4180-a3c6-4e8fdd03fd5c","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"https://chat.openai.com/chat","work_id":"0aa94ec8-2047-4b5c-a599-16735e25fbcf","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"https://disboard.org/","work_id":"bb04aafd-7abb-42b4-957f-cb9e3dcf5f05","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":98,"snapshot_sha256":"971c24f219f515eac5b68e35c04af7e43987f29f97844424a7eb75e4afa1e4bf","internal_anchors":9},"formal_canon":{"evidence_count":2,"snapshot_sha256":"5b18a39d923688a8b2e5d4e48025090c0b621f21d6f0f60104e3a21c6f0a31e8"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2308.03825","created_at":"2026-05-17T23:38:14.560883+00:00"},{"alias_kind":"arxiv_version","alias_value":"2308.03825v2","created_at":"2026-05-17T23:38:14.560883+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2308.03825","created_at":"2026-05-17T23:38:14.560883+00:00"},{"alias_kind":"pith_short_12","alias_value":"JUXQCDAH6DD6","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"JUXQCDAH6DD6QOU5","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"JUXQCDAH","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":26,"internal_anchor_count":26,"sample":[{"citing_arxiv_id":"2409.18169","citing_title":"Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey","ref_index":135,"is_internal_anchor":true},{"citing_arxiv_id":"2412.16720","citing_title":"OpenAI o1 System Card","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2504.20984","citing_title":"ACE: A Security Architecture for LLM-Integrated App Systems","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2507.04227","citing_title":"Mobile GUI Agents under Real-world Threats: Are We There Yet?","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2508.20325","citing_title":"GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2401.05561","citing_title":"TrustLLM: Trustworthiness in Large Language Models","ref_index":251,"is_internal_anchor":true},{"citing_arxiv_id":"2404.01833","citing_title":"Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2511.02356","citing_title":"ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2310.02446","citing_title":"Low-Resource Languages Jailbreak GPT-4","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2310.06987","citing_title":"Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2402.10260","citing_title":"A StrongREJECT for Empty Jailbreaks","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2405.14782","citing_title":"Lessons from the Trenches on Reproducible Evaluation of Language Models","ref_index":234,"is_internal_anchor":true},{"citing_arxiv_id":"2404.01318","citing_title":"JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2407.04295","citing_title":"Jailbreak Attacks and Defenses Against Large Language Models: A Survey","ref_index":79,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12869","citing_title":"Quantifying LLM Safety Degradation Under Repeated Attacks Using Survival Analysis","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2309.00614","citing_title":"Baseline Defenses for Adversarial Attacks Against Aligned Language Models","ref_index":50,"is_internal_anchor":true},{"citing_arxiv_id":"2402.17177","citing_title":"Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models","ref_index":112,"is_internal_anchor":true},{"citing_arxiv_id":"2406.11717","citing_title":"Refusal in Language Models Is Mediated by a Single Direction","ref_index":183,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03441","citing_title":"Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2412.05579","citing_title":"LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods","ref_index":199,"is_internal_anchor":true},{"citing_arxiv_id":"2605.04901","citing_title":"On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference","ref_index":161,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00267","citing_title":"Jailbroken Frontier Models Retain Their Capabilities","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06833","citing_title":"FedDetox: Robust Federated SLM Alignment via On-Device Data Sanitization","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06154","citing_title":"Exclusive Unlearning","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2604.15717","citing_title":"Into the Gray Zone: Domain Contexts Can Blur LLM Safety Boundaries","ref_index":5,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV","json":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV.json","graph_json":"https://pith.science/api/pith-number/JUXQCDAH6DD6QOU5S5GRAHY5CV/graph.json","events_json":"https://pith.science/api/pith-number/JUXQCDAH6DD6QOU5S5GRAHY5CV/events.json","paper":"https://pith.science/paper/JUXQCDAH"},"agent_actions":{"view_html":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV","download_json":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV.json","view_paper":"https://pith.science/paper/JUXQCDAH","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2308.03825&json=true","fetch_graph":"https://pith.science/api/pith-number/JUXQCDAH6DD6QOU5S5GRAHY5CV/graph.json","fetch_events":"https://pith.science/api/pith-number/JUXQCDAH6DD6QOU5S5GRAHY5CV/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV/action/timestamp_anchor","attest_storage":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV/action/storage_attestation","attest_author":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV/action/author_attestation","sign_citation":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV/action/citation_signature","submit_replication":"https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV/action/replication_record"}},"created_at":"2026-05-17T23:38:14.560883+00:00","updated_at":"2026-05-17T23:38:14.560883+00:00"}