Recognition: 2 theorem links
· Lean TheoremFighting AI with AI: AI-Agent Augmented DNS Blocking of LLM Services during Student Evaluations
Pith reviewed 2026-05-15 06:32 UTC · model grok-4.3
The pith
AI-Sinkhole uses quantized LLMs and DNS to dynamically discover and block new chatbot services during exams.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AI-Sinkhole is an AI-agent augmented DNS-based framework that dynamically discovers, semantically classifies, and temporarily network-wide blocks emerging LLM chatbot services during proctored exams, with explainable classification via quantized LLMs achieving robust cross-lingual performance with an F1-score greater than 0.83.
What carries the argument
AI-Sinkhole framework that augments DNS blocking with AI agents for real-time discovery and semantic classification of LLM services.
If this is right
- Dynamic discovery removes the need for manual updates to blocklists as new services appear.
- Explainable outputs from the quantized LLMs allow administrators to review blocking decisions.
- Temporary network-wide blocks can be applied only during exam windows and lifted afterward.
- Cross-lingual F1 performance above 0.83 supports use in settings with multiple languages.
- Open release of code and blocklist enables institutions to adapt the system locally.
Where Pith is reading between the lines
- The same classification pipeline could be extended to other categories of AI tools if the prompt engineering is generalized.
- Integration with existing proctoring software would allow the blocks to activate automatically when an exam starts.
- Over time the approach might shift institutional policy from blanket AI bans toward timed, context-specific restrictions.
- The method highlights a trade-off between blocking access and preserving legitimate uses of AI outside evaluation periods.
Load-bearing premise
Quantized LLMs will continue to classify emerging LLM services accurately in real time without high false-positive rates that block legitimate educational resources.
What would settle it
A deployment test that records frequent blocks of non-LLM sites such as research databases or educational portals during simulated exams would show the classification step fails to meet the required reliability.
read the original abstract
The transformative potential of large language models (LLMs) in education, such as improving accessibility and personalized learning, is being eclipsed by significant challenges. These challenges stem from concerns that LLMs undermine academic assessment by enabling bypassing of critical thinking, leading to increased cognitive offloading. This emerging trend stresses the dual imperative of harnessing AI's educational benefits while safeguarding critical thinking and academic rigor in the evolving AI ecosystem. To this end, we introduce AI-Sinkhole, an AI-agent augmented DNS-based framework that dynamically discovers, semantically classifies, and temporarily network-wide blocks emerging LLM chatbot services during proctored exams. AI-Sinkhole offers explainable classification via quantized LLMs (LLama 3, DeepSeek-R1, Qwen-3) and dynamic DNS blocking with Pi-Hole. We also share our observations in using LLMs as explainable classifiers which achieved robust cross-lingual performance (F1-score > 0.83). To support future research and development in this domain initial codes with a readily deployable 'AI-Sinkhole' blockist is available on https://github.com/AIMLEdu/ai-sinkhole.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AI-Sinkhole, an AI-agent augmented DNS-based framework that dynamically discovers emerging LLM chatbot services, uses quantized LLMs (Llama 3, DeepSeek-R1, Qwen-3) for explainable semantic classification, and applies Pi-Hole for temporary network-wide blocking during proctored exams; it reports robust cross-lingual performance with F1-score > 0.83 and releases initial code and a blocklist on GitHub.
Significance. If the performance claims are substantiated with proper datasets and protocols, the work would provide a practical, deployable system at the intersection of AI classification and network enforcement for preserving academic integrity, with the open-source release and use of quantized models for explainability as notable strengths that could enable reproducibility and extension by others.
major comments (2)
- [Abstract] Abstract: the central claim of cross-lingual F1-score > 0.83 is presented without any description of the labeled corpus, construction or hold-out of 'emerging' LLM services, prompt templates, evaluation protocol, or baselines, rendering the metric uninterpretable as evidence of robustness.
- [Evaluation] Evaluation section (implied by performance reporting): no empirical results are supplied on false-positive rates against non-LLM educational domains or on real-time behavior with previously unseen services, leaving the key assumption that quantized LLMs will classify accurately without blocking legitimate resources unsupported.
minor comments (1)
- [Abstract] The GitHub repository link is mentioned but its contents (e.g., exact blocklist format or deployment scripts) are not described in the text, which would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the clarity and rigor of our work. We address each major comment below and commit to revisions that provide the requested details and additional empirical support.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of cross-lingual F1-score > 0.83 is presented without any description of the labeled corpus, construction or hold-out of 'emerging' LLM services, prompt templates, evaluation protocol, or baselines, rendering the metric uninterpretable as evidence of robustness.
Authors: We agree that the abstract would benefit from additional context to make the performance claim interpretable on its own. In the revised manuscript we will expand the abstract with concise descriptions of the labeled corpus construction, the hold-out procedure used for emerging LLM services, the prompt templates, the evaluation protocol, and the baselines. These additions will be kept brief to preserve abstract length while directly addressing the concern. revision: yes
-
Referee: [Evaluation] Evaluation section (implied by performance reporting): no empirical results are supplied on false-positive rates against non-LLM educational domains or on real-time behavior with previously unseen services, leaving the key assumption that quantized LLMs will classify accurately without blocking legitimate resources unsupported.
Authors: We acknowledge the value of these additional evaluations. While the current manuscript emphasizes cross-lingual LLM classification performance, we will incorporate new experiments in the revised Evaluation section that report false-positive rates on non-LLM educational domains and real-time classification results for previously unseen LLM services. These results will be obtained using the same quantized models and will directly support the claim of accurate classification with minimal impact on legitimate resources. revision: yes
Circularity Check
No significant circularity; observational metrics only
full rationale
The paper describes an AI-Sinkhole framework for DNS-based blocking of LLM services using quantized LLMs for semantic classification and reports an F1-score > 0.83 as an observation from system use. No equations, derivations, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. The central claims rest on empirical deployment observations rather than any self-referential reduction or ansatz smuggled via prior work, rendering the content self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Quantized LLMs can perform reliable semantic classification of web services across languages
- domain assumption DNS-level blocking via Pi-Hole can temporarily and network-wide prevent access to identified services
invented entities (1)
-
AI-Sinkhole framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.lean (Jcost uniqueness, washburn_uniqueness_aczel)reality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AI-Sinkhole offers explainable classification via quantized LLMs (LLama 3, DeepSeek-R1, Qwen-3) and dynamic DNS blocking with Pi-Hole... F1-score > 0.83
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Q. Wen, J. Liang, C. Sierra, R. Luckin, R. Tong, Z. Liu, P. Cui, J. Tang, inProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(2024), pp. 6743–6744
work page 2024
-
[3]
D.R. Cotton, P.A. Cotton, J.R. Shipway, Chatting and cheating: Ensuring academic integrity in the era of chatgpt, Innovations in education and teaching international61(2), 228 (2024)
work page 2024
-
[4]
M. Gerlich, Ai tools in society: Impacts on cognitive offloading and the future of critical thinking, Societies15(1), 6 (2025)
work page 2025
-
[5]
D. Weber-Wulff, et al., Testing of detection tools for ai-generated text, International Journal for Educational Integrity19(1), 1 (2023)
work page 2023
-
[7]
A. Carr, A. Alam, J. Allison. Monitoring malicious dns queries: An experimental case study of utilising the national cyber security centre’s protective dns within a uk public sector organisation (2023)
work page 2023
-
[8]
T.P. hole Project. Pi-hole: Network-wide ad blocking.https: //pi-hole.net(2025). Accessed: 2025-11-13
work page 2025
-
[9]
Y. Ye, Z. Zhang, T. Ma, Z. Wang, Y. Li, S. Hou, W. Sun, K. Shi, Y. Ma, W. Song, et al., Llms4all: A systematic review of large language models across academic disciplines, arXiv e-prints pp. arXiv–2509 (2025)
work page 2025
-
[10]
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
C. White, S. Dooley, M. Roberts, A. Pal, B. Feuer, S. Jain, R. Shwartz-Ziv, N. Jain, K. Saifullah, S. Naidu, et al., Livebench: A challenging, contamination-free llm benchmark, arXiv preprint arXiv:2406.193144(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
S. Sosnovsky, P. Brusilovsky, A. Lan, Intelligent textbooks, Inter- national Journal of Artificial Intelligence in Education pp. 1–20 (2025)
work page 2025
-
[12]
S. Schmidgall, Y. Su, Z. Wang, X. Sun, J. Wu, X. Yu, J. Liu, M. Moor, Z. Liu, E. Barsoum, Agent laboratory: Using llm agents as research assistants, arXiv preprint arXiv:2501.04227 (2025)
-
[13]
M. Bernabei, S. Colabianchi, A. Falegnami, F. Costantino, Stu- dents’ use of large language models in engineering education: A case study on technology acceptance, perceptions, efficacy, and de- tection chances, Computers and Education: Artificial Intelligence 5, 100172 (2023)
work page 2023
- [14]
- [15]
- [16]
-
[17]
M. Perkins, J. Roe, D. Postma, J. McGaughran, D. Hickerson, Game of tones: Faculty detection of gpt-4 generated content in university assessments, arXiv preprint arXiv:2305.18081 (2023)
-
[18]
B. Borges, et al., Could chatgpt get an engineering degree? evalu- ating higher education vulnerability to ai assistants, Proceedings of the National Academy of Sciences121(49), e2414955121 (2024)
work page 2024
-
[19]
J. Singh, A.K. Mishra, L. Chopra, G. Agarwal, M. Diwakar, P. Singh, inInternational Conference on Electrical and Electronics Engineering(Springer, 2023), pp. 173–185
work page 2023
-
[20]
T. Potluri, V.P.K. Sistla, in2022 International Conference on Re- cent Trends in Microelectronics, Automation, Computing and Com- munications Systems (ICMACC)(IEEE, 2022), pp. 407–411
work page 2022
-
[21]
J. Uramova, P. Segec, M. Moravc ´ık, in2024 International Con- ference on Emerging eLearning Technologies and Applications (ICETA)(IEEE, 2024), pp. 1–7
work page 2024
-
[22]
Yablonski,Laws of UX(” O’Reilly Media, Inc.”, 2024)
J. Yablonski,Laws of UX(” O’Reilly Media, Inc.”, 2024)
work page 2024
-
[23]
Crawl4ai: Open-source llm friendly web crawler & scraper.https://github.com/unclecode/crawl4ai(2024)
UncleCode. Crawl4ai: Open-source llm friendly web crawler & scraper.https://github.com/unclecode/crawl4ai(2024)
work page 2024
-
[24]
S. Bozzolan, S. Calzavara, L. Cazzaro, Llm-assisted web measure- ments, arXiv preprint arXiv:2510.08101 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
V. Singh, Y. Kassa, A. Kale, B. Ricks, R. Gandhi,Model- Cart: A Machine Learning Meta-Framework with Explainability and Human-in-the-Loop(Springer, 2025), pp. 316–326. DOI 10.1007/978-3-031-89063-5 27
-
[26]
Ollama: Large language models locally.https: //ollama.com/(2025)
Ollama team. Ollama: Large language models locally.https: //ollama.com/(2025)
work page 2025
- [27]
-
[28]
S.M. Farjad, S.R. Patllola, Y. Kassa, G. Grispos, R. Gandhi, in Proceedings of the 2025 ACM Southeast Conference(Association for Computing Machinery, New York, NY, USA, 2025), ACMSE 2025, p. 145–154. DOI 10.1145/3696673.3723074. URLhttps: //doi.org/10.1145/3696673.3723074
- [29]
-
[30]
R. Melcarne, S. Yucel, S. Vitale, D. De Benedictis, E. Leopardi, C. Violani, G. Familiari, M. Maranghi, G. D’ Amati, et al., inAMEE 2025(AMEE, 2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.