COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

Changjia Zhu; Junjie Xiong; Junyu Wang; Lingyao Li; Mingkui Wei; Xu He; Yuanbo Zhou

arxiv: 2512.02318 · v3 · submitted 2025-12-02 · 💻 cs.CR · cs.AI

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

Junyu Wang , Changjia Zhu , Yuanbo Zhou , Lingyao Li , Xu He , Mingkui Wei , Junjie Xiong This is my paper

Pith reviewed 2026-05-17 03:19 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords CAPTCHA securitymultimodal LLMsvisual puzzlesautomated solversdefense mechanismslocalization tasks

0 comments

The pith

Certain visual CAPTCHA designs using fine-grained localization and implicit counting reduce state-of-the-art MLLM solver success rates from over 95% to zero.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates how multimodal large language models undermine visual CAPTCHA security by evaluating seven leading MLLMs across eighteen real-world task types. Models solve recognition and low-interaction tasks reliably at human-like cost and speed, but struggle with tasks demanding fine-grained localization, multi-step spatial reasoning, or cross-frame consistency. The authors analyze reasoning traces to derive defense guidelines and validate them through a case study where hardening a CAPTCHA with localization and implicit counting drops success rates to zero. This shows that targeted structural changes can restore CAPTCHA effectiveness against current automated solvers.

Core claim

Multimodal LLMs can solve many visual CAPTCHA tasks effectively, yet incorporating fine-grained localization and implicit counting into task design reduces their success rate from over 95% to 0%, providing a concrete way to strengthen defenses.

What carries the argument

Fine-grained localization and implicit counting, which require models to perform precise spatial analysis and enumeration within the CAPTCHA puzzle.

If this is right

Platform operators should prioritize CAPTCHA tasks that demand localization and counting to counter MLLM threats.
Analysis of model reasoning traces can guide the selection and strengthening of specific CAPTCHA types.
Current MLLMs remain limited on tasks involving multi-step spatial reasoning or cross-frame consistency.
Prompt engineering and few-shot examples boost solver performance on vulnerable tasks but not on hardened ones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future advancements in MLLMs could eventually overcome these defenses, suggesting the need for continuous CAPTCHA evolution.
These defense principles might extend to other visual security challenges beyond CAPTCHAs.
Testing against a broader range of models and real-world deployments would further validate the approach.

Load-bearing premise

The seven evaluated MLLMs and eighteen task types represent the threat surface for visual CAPTCHAs deployed in the wild.

What would settle it

Demonstrating that an advanced MLLM or new prompt technique can solve the hardened CAPTCHA with high success rate would falsify the claim of effective defense.

Figures

Figures reproduced from arXiv: 2512.02318 by Changjia Zhu, Junjie Xiong, Junyu Wang, Lingyao Li, Mingkui Wei, Xu He, Yuanbo Zhou.

**Figure 1.** Figure 1: CAPTCHA robustness evaluation framework against MLLMs. 3.1 Problem Formulation We consider a generic web service that uses visual CAPTCHAs as part of its abuse-mitigation pipeline, for example, before creating new accounts, submitting content, or accessing high-value resources. Whenever a user reaches such a protected step, the service displays a CAPTCHA widget in the browser. Each CAPTCHA [PITH_FULL_IM… view at source ↗

**Figure 3.** Figure 3: Cross-model Pass@1 distributions per task type in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Heatmap of CAPTCHA task difficulty in Exp2 (opti [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Cross-model Pass@1 distributions per task type [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Per-task Pass@1 for GPT-5 (Medium) in Exp1 (origi [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: Expected number of API calls until the first success [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 9.** Figure 9: Cost and latency trade-offs for GPT-5 (Medium) across CAPTCHA task types. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

read the original abstract

This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluate 7 leading commercial and open-source MLLMs across 18 real-world CAPTCHA task types, measuring single-shot accuracy, success under limited retries, end-to-end latency, and per-solve cost. We further analyze the impact of task-specific prompt engineering and few-shot demonstrations on solver effectiveness. We reveal that MLLMs can reliably solve recognition-oriented and low-interaction CAPTCHA tasks at human-like cost and latency, whereas tasks requiring fine-grained localization, multi-step spatial reasoning, or cross-frame consistency remain significantly harder for current models. By examining the reasoning traces of such MLLMs, we investigate the underlying mechanisms of why models succeed/fail on specific CAPTCHA puzzles and use these insights to derive defense-oriented guidelines for selecting and strengthening CAPTCHA tasks. To validate these principles, we perform a case study by hardening a vulnerable CAPTCHA type using our guidelines. We demonstrate that incorporating fine-grained localization and implicit counting reduces the success rate of state-of-the-art MLLMs from over 95% to 0%, confirming that structural changes can effectively mitigate the threat. We conclude by discussing the implications for platform operators who deploy CAPTCHA as part of their abuse-mitigation pipeline.Code Availability (https://anonymous.4open.science/r/Captcha-465E/).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps MLLM performance across many CAPTCHA types and shows one structural fix that drops success to zero, but it is unclear if that fix was tested against the stronger prompts the authors document elsewhere.

read the letter

The main thing to know is that current multimodal models can solve a wide range of visual CAPTCHAs at low cost and latency, yet the authors give evidence that adding fine-grained localization and implicit counting can push success rates down to zero on at least one previously easy task type. They evaluated seven models on eighteen real-world task types, measured single-shot accuracy plus retries, and tracked latency and cost. They also tested how prompt engineering and few-shot examples improve solver results, then looked at the models' reasoning traces to extract patterns about what makes tasks hard. From those patterns they derived guidelines and applied them in a case study that hardened one vulnerable type with the localization and counting changes. The scale of the comparison and the direct move from failure analysis to a concrete defense example are the parts that stand out as useful. The work supplies practical numbers that matter for anyone running abuse mitigation on the web. The clearest soft spot is the defense result. The paper shows that task-specific prompts and few-shot examples raise accuracy on recognition tasks, but the case-study description does not say whether those stronger attack settings were used on the hardened version. If the zero percent figure came only from default prompts, the structural change has not been checked against the full attack surface the authors themselves measured. The abstract also leaves out details on exact prompt templates, retry protocols, and statistical controls, which makes it harder to judge how reproducible the numbers are. The seven models and eighteen tasks give a reasonable snapshot but cannot cover every future model or every deployed CAPTCHA variant. This paper is for web-security engineers and CAPTCHA designers who need current data on automated solvers. Readers who want concrete comparisons and a worked example of hardening will get value from it. It has enough new empirical ground and a testable defense claim to deserve a serious referee.

Referee Report

1 major / 1 minor

Summary. The paper evaluates seven commercial and open-source MLLMs across 18 real-world visual CAPTCHA task types, reporting single-shot accuracy, retry success, latency, and cost. It examines the effects of task-specific prompt engineering and few-shot demonstrations, identifies why models succeed or fail via reasoning traces, derives defense guidelines, and validates them in a case study showing that adding fine-grained localization and implicit counting reduces SOTA MLLM success from >95% to 0%.

Significance. If the central empirical results hold under consistent attack conditions, the work is significant for abuse-mitigation practice: it supplies concrete measurements of MLLM threat levels on recognition versus reasoning-heavy tasks and demonstrates that modest structural hardening can neutralize current solvers at human-like cost. The multi-model, multi-task design and explicit cost/latency data strengthen its utility for platform operators.

major comments (1)

[Case study section] Case study / abstract claim: the headline result that fine-grained localization plus implicit counting drops success from >95% to 0% is load-bearing for the defense contribution. The manuscript separately demonstrates that task-specific prompt engineering and few-shot demonstrations materially raise solver accuracy on recognition-oriented tasks. It is not stated whether the same optimized prompting regime was applied when evaluating the hardened variant. If the 0% figure reflects only default or weaker prompts, the structural defense has not been stress-tested against the attack surface the authors themselves document.

minor comments (1)

The code-availability statement points to an anonymous repository; the manuscript would benefit from a brief reproducibility note on prompt templates, retry protocols, and statistical controls even if the repository remains anonymous.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The feedback on ensuring the case study is evaluated under the strongest documented attack conditions is well taken, and we address it directly below.

read point-by-point responses

Referee: [Case study section] Case study / abstract claim: the headline result that fine-grained localization plus implicit counting drops success from >95% to 0% is load-bearing for the defense contribution. The manuscript separately demonstrates that task-specific prompt engineering and few-shot demonstrations materially raise solver accuracy on recognition-oriented tasks. It is not stated whether the same optimized prompting regime was applied when evaluating the hardened variant. If the 0% figure reflects only default or weaker prompts, the structural defense has not been stress-tested against the attack surface the authors themselves document.

Authors: We thank the referee for identifying this important clarification point. The case study evaluations of the hardened variant were performed using the same task-specific prompt engineering and few-shot demonstrations that maximized solver accuracy on the corresponding recognition-oriented tasks in the main evaluation. This choice was made precisely to stress-test the structural defense against the strongest attack surface we document. We agree, however, that the manuscript does not explicitly state the prompting regime used for the hardened variant. We will revise the case study section (and the corresponding abstract claim) to make this explicit, including a direct reference to the optimized prompting results from the earlier analysis. No changes to the reported numbers or experimental data are required. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation and case study with direct measurements

full rationale

The paper is an empirical study measuring MLLM success rates on 18 CAPTCHA task types across 7 models, analyzing prompt engineering effects, and validating defense guidelines via a single case study that hardens one task type. No equations, closed-form derivations, fitted parameters, or self-citation chains are present in the provided text. Reported accuracies (e.g., >95% to 0%) are direct experimental outcomes from the described evaluations and hardening, not reductions of predictions to inputs by construction. The work is self-contained against external benchmarks of MLLM performance on visual tasks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on empirical measurement rather than theoretical derivation. The paper assumes off-the-shelf MLLMs can be prompted effectively for CAPTCHA solving and that the chosen 18 task types cover the relevant attack surface.

axioms (1)

domain assumption Off-the-shelf MLLMs with standard prompting can be treated as representative automated CAPTCHA solvers
Invoked throughout the evaluation of 7 models and the analysis of reasoning traces

pith-pipeline@v0.9.0 · 5580 in / 1317 out tokens · 41206 ms · 2026-05-17T03:19:47.730555+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

incorporating fine-grained localization and implicit counting reduces the success rate of state-of-the-art MLLMs from over 95% to 0%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 2 internal anchors

[1]

2025.Image CAPTCHA Solver — Online image CAPTCHA solving service

2Captcha. 2025.Image CAPTCHA Solver — Online image CAPTCHA solving service. https://2captcha.com/p/image-picture-captcha-solver Accessed: 2025-11-20

work page 2025
[2]

Ismail Akrout, Amal Feriani, and Mohamed Akrout. 2019. Hacking google recaptcha v3 using reinforcement learning.arXiv preprint arXiv:1903.01003 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[3]

Elie Bursztein, Steven Bethard, Celine Fabry, John C Mitchell, and Dan Jurafsky

work page
[4]

In 2010 IEEE symposium on security and privacy (SP)

How good are humans at solving CAPTCHAs? A large scale evaluation. In 2010 IEEE symposium on security and privacy (SP). IEEE, 399–413

work page 2010
[5]

2025.Hybrid CAPTCHA Solving Service: API & human/OCR based service

CaptchaCoder. 2025.Hybrid CAPTCHA Solving Service: API & human/OCR based service. https://captchacoder.com/ Accessed: 2025-11-20

work page 2025
[6]

2025.CAPTCHA Decoding

DeCaptcher. 2025.CAPTCHA Decoding. https://www.decaptcher.com/ Accessed: 2025-11-20

work page 2025
[7]

Gelei Deng, Haoran Ou, Yi Liu, Jie Zhang, Tianwei Zhang, and Yang Liu. 2025. Oedipus: Llm-enchanced reasoning captcha solver. (2025)

work page 2025
[8]

Elie Dessant. 2020. Buster: Bypass CAPTCHA by filling fake audio challenges. https://github.com/dessant/buster. GitHub repository

work page 2020
[9]

Ziqi Ding, Gelei Deng, Yi Liu, Junchen Ding, Jieshan Chen, Yulei Sui, and Yuekang Li. 2025. IllusionCAPTCHA: A CAPTCHA based on visual illusion. InProceedings of the ACM on Web Conference 2025 (WWW). 3683–3691

work page 2025
[10]

Yipeng Gao, Haichang Gao, Sainan Luo, Yang Zi, Shudong Zhang, Wenjie Mao, Ping Wang, Yulong Shen, and Jeff Yan. 2021. Research on the security of visual reasoning {CAPTCHA}. In30th USENIX security symposium (USENIX security 21). 3291–3308

work page 2021
[11]

Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser fingerprinting: A survey.ACM Transactions on the Web (TWEB)14, 2 (2020), 1–33

work page 2020
[12]

Jingmeng Li, Lukang Fu, Surun Yang, and Hui Wei. 2025. MI-CAPTCHA: Enhance the Security of CAPTCHA Using Mooney Images. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 39. 1383–1391

work page 2025
[13]

2025.Websites using reCAPTCHA

BuiltWith Pty Ltd. 2025.Websites using reCAPTCHA. https://trends.builtwith. com/websitelist/reCAPTCHA Accessed: 2025-11-20

work page 2025
[14]

Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, and Zhiqiang Shen. 2025. Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents. arXiv:2505.24878 [cs.AI] https://arxiv.org/abs/2505.24878

work page arXiv 2025
[15]

Marti Motoyama, Kirill Levchenko, Chris Kanich, Damon McCoy, Geoffrey M Voelker, and Stefan Savage. 2010. Re:{CAPTCHAs—Understanding } {CAPTCHA- Solving} services in an economic context. In19th USENIX Security Symposium (USENIX Security 10)

work page 2010
[16]

Hoang Dai Nguyen, Karthika Subramani, Bhupendra Acharya, Roberto Perdisci, and Phani Vadrevu. 2024. C-Frame: Characterizing and measuring in-the-wild CAPTCHA attacks. In2024 IEEE Symposium on Security and Privacy (SP). 277–295. doi:10.1109/SP54263.2024.00200

work page doi:10.1109/sp54263.2024.00200 2024
[17]

NopeCHA LLC. 2025. NopeCHA API Documentation. https://developers. nopecha.com/. Accessed: 2025-05-23

work page 2025
[18]

Hoang, Mohammad Ali Tofighi, Cuong V

Behzad Ousat, Esteban Schafir, Duc C. Hoang, Mohammad Ali Tofighi, Cuong V. Nguyen, Sajjad Arshad, Selcuk Uluagac, and Amin Kharraz. 2024. The Matter of Captchas: An Analysis of a Brittle Security Feature on the Modern Web. In Proceedings of the ACM Web Conference 2024 (WWW) (WWW ’24). 1835–1846. doi:10.1145/3589334.3645619

work page doi:10.1145/3589334.3645619 2024
[19]

Andreas Plesner, Tobias Vontobel, and Roger Wattenhofer. 2024. Breaking re- CAPTCHAv2. In2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 1047–1056. doi:10.1109/compsac61105.2024.00142

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/compsac61105.2024.00142 2024
[20]

Andrew Searles, Yoshimichi Nakatsuka, Ercan Ozturk, Andrew Paverd, Gene Tsudik, and Ai Enkoji. 2023. An Empirical Study & Evaluation of Modern CAPTCHAs. In32nd USENIX Security Symposium (USENIX Security 23). 3081– 3097

work page 2023
[21]

Chenghui Shi, Shouling Ji, Qianjun Liu, Changchang Liu, Yuefeng Chen, Yuan He, Zhe Liu, Raheem Beyah, and Ting Wang. 2020. Text captcha is dead? a large scale deployment and empirical study. InProceedings of the 2020 ACM SIGSAC conference on computer and communications security (CCS). 1391–1406

work page 2020
[22]

Suphannee Sivakorn, Iasonas Polakis, and Angelos D Keromytis. 2016. I am robot:(deep) learning to break semantic image captchas. In2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 388–403

work page 2016
[23]

Keromytis

Suphannee Sivakorn, Iason Polakis, and Angelos D. Keromytis. 2016. I’m Not a Human: Breaking the Google reCAPTCHA. InProceedings of the 2016 ACM Asia Conference on Computer and Communications Security (ASIACCS ’16). ACM, 191–202. doi:10.1145/2897845.2897847

work page doi:10.1145/2897845.2897847 2016
[24]

Python Song, Luke Tenyi Chang, Yun-Yun Tsai, Penghui Li, and Junfeng Yang

work page
[25]

Reasoning under Vision: Understanding Visual-Spatial Cognition in Vision- Language Models for CAPTCHA.arXiv preprint arXiv:2510.06067(2025)

work page arXiv 2025
[26]

2024.CAPTCHA Farms: The Forgotten Threat in Human Verification

Verified Visitors Threat Research Team. 2024.CAPTCHA Farms: The Forgotten Threat in Human Verification. https://www.verifiedvisitors.com/threat-research/ captcha-farms Accessed: 2025-11-20

work page 2024
[27]

Xiwen Teoh, Yun Lin, Siqi Li, Ruofan Liu, Avi Sollomoni, Yaniv Harel, and Jin Song Dong. 2025. Are {CAPTCHAs} still bot-hard? generalized visual {CAPTCHA} solving with agentic vision language model. In34th USENIX Security Symposium (USENIX Security 25). 3747–3766

work page 2025
[28]

Theyka. 2025. Turnstile-Solver: GitHub repository for Cloudflare Turnstile bypass scripts. https://github.com/Theyka/Turnstile-Solver. Accessed: 2025-05-23

work page 2025
[29]

Sheng Tian and Tao Xiong. 2020. A generic solver combining unsupervised learn- ing and representation learning for breaking text-based captchas. InProceedings of The Web Conference 2020 (WWW). 860–871

work page 2020
[30]

Ilias Tsingenopoulos, Davy Preuveneers, Lieven Desmet, and Wouter Joosen

work page
[31]

In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)

Captcha me if you can: Imitation Games with Reinforcement Learning. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 719–735

work page
[32]

Zonglin Wu, Yule Xue, Yaoyao Feng, Xiaolong Wang, and Yiren Song. 2025. MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks.arXiv preprint arXiv:2506.05982(2025)

work page arXiv 2025
[33]

Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, and Zheng Wang. 2018. Yet another text captcha solver: A generative adversarial network based approach. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security (CCS). 332–348

work page 2018
[34]

Jiaming Zhang, Jitao Sang, Kaiyuan Xu, Shangxi Wu, Xian Zhao, Yanfeng Sun, Yongli Hu, and Jian Yu. 2020. Robust CAPTCHAs towards malicious OCR.IEEE Transactions on Multimedia23 (2020), 2575–2587

work page 2020
[35]

do not click

Ruijie Zhao, Xianwen Deng, Yanhao Wang, Zhicong Yan, Zhengguang Han, Libo Chen, Zhi Xue, and Yijun Wang. 2023. GeeSolver: A generic, efficient, and effortless solver with self-supervised learning for breaking text captchas. In2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1649–1666. Junyu Wang, Changjia Zhu, Yuanbo Zhou, Lingyao Li, Xu He, and Ju...

work page 2023

[1] [1]

2025.Image CAPTCHA Solver — Online image CAPTCHA solving service

2Captcha. 2025.Image CAPTCHA Solver — Online image CAPTCHA solving service. https://2captcha.com/p/image-picture-captcha-solver Accessed: 2025-11-20

work page 2025

[2] [2]

Ismail Akrout, Amal Feriani, and Mohamed Akrout. 2019. Hacking google recaptcha v3 using reinforcement learning.arXiv preprint arXiv:1903.01003 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[3] [3]

Elie Bursztein, Steven Bethard, Celine Fabry, John C Mitchell, and Dan Jurafsky

work page

[4] [4]

In 2010 IEEE symposium on security and privacy (SP)

How good are humans at solving CAPTCHAs? A large scale evaluation. In 2010 IEEE symposium on security and privacy (SP). IEEE, 399–413

work page 2010

[5] [5]

2025.Hybrid CAPTCHA Solving Service: API & human/OCR based service

CaptchaCoder. 2025.Hybrid CAPTCHA Solving Service: API & human/OCR based service. https://captchacoder.com/ Accessed: 2025-11-20

work page 2025

[6] [6]

2025.CAPTCHA Decoding

DeCaptcher. 2025.CAPTCHA Decoding. https://www.decaptcher.com/ Accessed: 2025-11-20

work page 2025

[7] [7]

Gelei Deng, Haoran Ou, Yi Liu, Jie Zhang, Tianwei Zhang, and Yang Liu. 2025. Oedipus: Llm-enchanced reasoning captcha solver. (2025)

work page 2025

[8] [8]

Elie Dessant. 2020. Buster: Bypass CAPTCHA by filling fake audio challenges. https://github.com/dessant/buster. GitHub repository

work page 2020

[9] [9]

Ziqi Ding, Gelei Deng, Yi Liu, Junchen Ding, Jieshan Chen, Yulei Sui, and Yuekang Li. 2025. IllusionCAPTCHA: A CAPTCHA based on visual illusion. InProceedings of the ACM on Web Conference 2025 (WWW). 3683–3691

work page 2025

[10] [10]

Yipeng Gao, Haichang Gao, Sainan Luo, Yang Zi, Shudong Zhang, Wenjie Mao, Ping Wang, Yulong Shen, and Jeff Yan. 2021. Research on the security of visual reasoning {CAPTCHA}. In30th USENIX security symposium (USENIX security 21). 3291–3308

work page 2021

[11] [11]

Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser fingerprinting: A survey.ACM Transactions on the Web (TWEB)14, 2 (2020), 1–33

work page 2020

[12] [12]

Jingmeng Li, Lukang Fu, Surun Yang, and Hui Wei. 2025. MI-CAPTCHA: Enhance the Security of CAPTCHA Using Mooney Images. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 39. 1383–1391

work page 2025

[13] [13]

2025.Websites using reCAPTCHA

BuiltWith Pty Ltd. 2025.Websites using reCAPTCHA. https://trends.builtwith. com/websitelist/reCAPTCHA Accessed: 2025-11-20

work page 2025

[14] [14]

Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, and Zhiqiang Shen. 2025. Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents. arXiv:2505.24878 [cs.AI] https://arxiv.org/abs/2505.24878

work page arXiv 2025

[15] [15]

Marti Motoyama, Kirill Levchenko, Chris Kanich, Damon McCoy, Geoffrey M Voelker, and Stefan Savage. 2010. Re:{CAPTCHAs—Understanding } {CAPTCHA- Solving} services in an economic context. In19th USENIX Security Symposium (USENIX Security 10)

work page 2010

[16] [16]

Hoang Dai Nguyen, Karthika Subramani, Bhupendra Acharya, Roberto Perdisci, and Phani Vadrevu. 2024. C-Frame: Characterizing and measuring in-the-wild CAPTCHA attacks. In2024 IEEE Symposium on Security and Privacy (SP). 277–295. doi:10.1109/SP54263.2024.00200

work page doi:10.1109/sp54263.2024.00200 2024

[17] [17]

NopeCHA LLC. 2025. NopeCHA API Documentation. https://developers. nopecha.com/. Accessed: 2025-05-23

work page 2025

[18] [18]

Hoang, Mohammad Ali Tofighi, Cuong V

Behzad Ousat, Esteban Schafir, Duc C. Hoang, Mohammad Ali Tofighi, Cuong V. Nguyen, Sajjad Arshad, Selcuk Uluagac, and Amin Kharraz. 2024. The Matter of Captchas: An Analysis of a Brittle Security Feature on the Modern Web. In Proceedings of the ACM Web Conference 2024 (WWW) (WWW ’24). 1835–1846. doi:10.1145/3589334.3645619

work page doi:10.1145/3589334.3645619 2024

[19] [19]

Andreas Plesner, Tobias Vontobel, and Roger Wattenhofer. 2024. Breaking re- CAPTCHAv2. In2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 1047–1056. doi:10.1109/compsac61105.2024.00142

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/compsac61105.2024.00142 2024

[20] [20]

Andrew Searles, Yoshimichi Nakatsuka, Ercan Ozturk, Andrew Paverd, Gene Tsudik, and Ai Enkoji. 2023. An Empirical Study & Evaluation of Modern CAPTCHAs. In32nd USENIX Security Symposium (USENIX Security 23). 3081– 3097

work page 2023

[21] [21]

Chenghui Shi, Shouling Ji, Qianjun Liu, Changchang Liu, Yuefeng Chen, Yuan He, Zhe Liu, Raheem Beyah, and Ting Wang. 2020. Text captcha is dead? a large scale deployment and empirical study. InProceedings of the 2020 ACM SIGSAC conference on computer and communications security (CCS). 1391–1406

work page 2020

[22] [22]

Suphannee Sivakorn, Iasonas Polakis, and Angelos D Keromytis. 2016. I am robot:(deep) learning to break semantic image captchas. In2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 388–403

work page 2016

[23] [23]

Keromytis

Suphannee Sivakorn, Iason Polakis, and Angelos D. Keromytis. 2016. I’m Not a Human: Breaking the Google reCAPTCHA. InProceedings of the 2016 ACM Asia Conference on Computer and Communications Security (ASIACCS ’16). ACM, 191–202. doi:10.1145/2897845.2897847

work page doi:10.1145/2897845.2897847 2016

[24] [24]

Python Song, Luke Tenyi Chang, Yun-Yun Tsai, Penghui Li, and Junfeng Yang

work page

[25] [25]

Reasoning under Vision: Understanding Visual-Spatial Cognition in Vision- Language Models for CAPTCHA.arXiv preprint arXiv:2510.06067(2025)

work page arXiv 2025

[26] [26]

2024.CAPTCHA Farms: The Forgotten Threat in Human Verification

Verified Visitors Threat Research Team. 2024.CAPTCHA Farms: The Forgotten Threat in Human Verification. https://www.verifiedvisitors.com/threat-research/ captcha-farms Accessed: 2025-11-20

work page 2024

[27] [27]

Xiwen Teoh, Yun Lin, Siqi Li, Ruofan Liu, Avi Sollomoni, Yaniv Harel, and Jin Song Dong. 2025. Are {CAPTCHAs} still bot-hard? generalized visual {CAPTCHA} solving with agentic vision language model. In34th USENIX Security Symposium (USENIX Security 25). 3747–3766

work page 2025

[28] [28]

Theyka. 2025. Turnstile-Solver: GitHub repository for Cloudflare Turnstile bypass scripts. https://github.com/Theyka/Turnstile-Solver. Accessed: 2025-05-23

work page 2025

[29] [29]

Sheng Tian and Tao Xiong. 2020. A generic solver combining unsupervised learn- ing and representation learning for breaking text-based captchas. InProceedings of The Web Conference 2020 (WWW). 860–871

work page 2020

[30] [30]

Ilias Tsingenopoulos, Davy Preuveneers, Lieven Desmet, and Wouter Joosen

work page

[31] [31]

In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)

Captcha me if you can: Imitation Games with Reinforcement Learning. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 719–735

work page

[32] [32]

Zonglin Wu, Yule Xue, Yaoyao Feng, Xiaolong Wang, and Yiren Song. 2025. MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks.arXiv preprint arXiv:2506.05982(2025)

work page arXiv 2025

[33] [33]

Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, and Zheng Wang. 2018. Yet another text captcha solver: A generative adversarial network based approach. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security (CCS). 332–348

work page 2018

[34] [34]

Jiaming Zhang, Jitao Sang, Kaiyuan Xu, Shangxi Wu, Xian Zhao, Yanfeng Sun, Yongli Hu, and Jian Yu. 2020. Robust CAPTCHAs towards malicious OCR.IEEE Transactions on Multimedia23 (2020), 2575–2587

work page 2020

[35] [35]

do not click

Ruijie Zhao, Xianwen Deng, Yanhao Wang, Zhicong Yan, Zhengguang Han, Libo Chen, Zhi Xue, and Yijun Wang. 2023. GeeSolver: A generic, efficient, and effortless solver with self-supervised learning for breaking text captchas. In2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1649–1666. Junyu Wang, Changjia Zhu, Yuanbo Zhou, Lingyao Li, Xu He, and Ju...

work page 2023