uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs
Pith reviewed 2026-05-19 15:42 UTC · model grok-4.3
The pith
uGen generates functionally correct microarchitectural attack PoCs by using multi-agent retrieval to fill LLM knowledge gaps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
uGen is the first LLM-driven framework for automated microarchitectural attack code generation. A systematic study reveals that LLMs frequently misgenerate or misplace critical attack primitives. Guided by this, uGen uses a retrieval-augmented, multi-agent design to inject missing domain knowledge and synthesize functionally correct PoCs for cache-based and speculative-execution attacks tailored to defender requirements across diverse microarchitectures and vulnerable functions.
What carries the argument
Retrieval-augmented multi-agent design for injecting missing attack primitives into LLMs to generate correct PoC code.
If this is right
- Up to 100% success rate for generating Spectre-v1 PoCs using Claude Sonnet-4.
- 80% success rate for Prime+Probe PoCs using Qwen3-Coder.
- Successful PoC generation at a cost of $1.25 in under four minutes.
- Applicable across a diverse set of microarchitectures, vulnerable functions, and execution environments.
Where Pith is reading between the lines
- Defenders could use similar frameworks to quickly test new processor models for emerging attack vectors.
- Extending the systematic gap analysis to additional attack types like rowhammer could broaden automated security testing.
- The low cost suggests potential for integration into continuous integration pipelines for hardware security validation.
Load-bearing premise
The method of studying LLM gaps and using retrieval-augmented agents will consistently yield working and portable attack code for different processors and settings.
What would settle it
Generating PoC code with uGen and then running it on a specific vulnerable CPU to check if it successfully demonstrates the attack as expected.
Figures
read the original abstract
Microarchitectural attacks continue to evolve, uncovering new exploitation vectors in modern processors. From a defensive perspective, assessing a system's susceptibility to such attacks remains challenging. Developing functional attack implementations is labor-intensive, requires deep microarchitectural expertise, and is highly sensitive to execution environments. Consequently, existing attacks often lack portability, limiting systematic and scalable vulnerability assessment. Recent advances in large language models (LLMs) suggest a potential avenue for lowering these barriers. However, it remains unclear whether LLMs can reliably generate functionally correct microarchitectural attack code suitable for rigorous vulnerability testing. In this work, we present uGen, the first LLM-driven framework for automated microarchitectural attack code generation. A key challenge we address is identifying attack-specific knowledge gaps in LLMs. Through a systematic study of state-of-the-art models (GPT, Claude, and Qwen3), we find that LLMs frequently misgenerate or misplace critical attack primitives. Guided by this analysis, uGen employs a retrieval-augmented, multi-agent design that injects missing domain knowledge to synthesize functionally correct microarchitectural attack PoCs tailored to defender requirements. We evaluate uGen on cache-based and speculative-execution attacks across diverse set of microarchitectures, vulnerable functions, and LLM platforms. In the deployment stage, uGen achieves up to 100% success rate for Spectre-v1 (Claude Sonnet-4) and 80% for Prime+Probe (Qwen3-Coder). Finally, we demonstrate that uGen can generate a successful PoC code with a cost of $1.25 in under four minutes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces uGen, a retrieval-augmented multi-agent LLM framework designed to generate functionally correct microarchitectural attack PoCs (e.g., Spectre-v1 and Prime+Probe) by first systematically identifying knowledge gaps in models like GPT, Claude, and Qwen3, then injecting missing attack primitives. It evaluates the system across multiple LLMs, microarchitectures, and vulnerable functions, reporting success rates up to 100% for Spectre-v1 (Claude Sonnet-4) and 80% for Prime+Probe (Qwen3-Coder) in a deployment stage, along with a demonstration of low-cost generation ($1.25 in under four minutes).
Significance. If the reported success rates reflect verified microarchitectural side effects (such as measurable speculative leakage or cache timing differentials) rather than syntactic or runtime validity alone, uGen could meaningfully reduce the expertise barrier for defenders performing portable vulnerability assessments. The systematic gap analysis and multi-agent design provide a concrete, reproducible method that could be extended to other side-channel attacks; the cross-model and cross-architecture evaluation adds practical value if the functional-correctness claims are substantiated.
major comments (2)
- [Abstract] Abstract and Evaluation section: The success rates (100% for Spectre-v1, 80% for Prime+Probe) are presented without an explicit definition of the success metric or verification procedure. It is unclear whether a PoC is deemed successful upon compilation, execution without crash, or only after confirming actual attack effects (e.g., branch-predictor mistraining leakage exceeding noise for Spectre-v1 or statistically significant cache-eviction timing for Prime+Probe). This directly affects whether the central claim of producing 'functionally correct' PoCs for vulnerability assessment holds.
- [Evaluation] Evaluation section: No details are supplied on controls for environmental sensitivity, hardware-specific timing noise, statistical significance testing, or how portability across microarchitectures was validated. Without these, the high success rates cannot be distinguished from basic executability, weakening the evidence that the retrieval-augmented multi-agent approach reliably produces usable attack code.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from a brief table summarizing the exact attack primitives injected by the retrieval component for each evaluated attack.
- [Section 3] Notation for agent roles and retrieval sources could be made more consistent across figures and text to improve readability.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback on our paper. The comments raise valid points regarding the clarity of our success metrics and evaluation details. We address each major comment below and will revise the manuscript accordingly to improve transparency.
read point-by-point responses
-
Referee: [Abstract] Abstract and Evaluation section: The success rates (100% for Spectre-v1, 80% for Prime+Probe) are presented without an explicit definition of the success metric or verification procedure. It is unclear whether a PoC is deemed successful upon compilation, execution without crash, or only after confirming actual attack effects (e.g., branch-predictor mistraining leakage exceeding noise for Spectre-v1 or statistically significant cache-eviction timing for Prime+Probe). This directly affects whether the central claim of producing 'functionally correct' PoCs for vulnerability assessment holds.
Authors: We acknowledge that the definition of success was not sufficiently explicit in the abstract and evaluation sections. In our experiments, a PoC is considered successful only if it produces the expected microarchitectural side effect, verified by measuring actual leakage or timing differentials that exceed noise thresholds, rather than just successful compilation or execution. We will revise the manuscript to include an explicit definition of the success metric and a detailed description of the verification procedure in the Evaluation section. revision: yes
-
Referee: [Evaluation] Evaluation section: No details are supplied on controls for environmental sensitivity, hardware-specific timing noise, statistical significance testing, or how portability across microarchitectures was validated. Without these, the high success rates cannot be distinguished from basic executability, weakening the evidence that the retrieval-augmented multi-agent approach reliably produces usable attack code.
Authors: We agree that more details on experimental controls are necessary to substantiate the claims. In the revised version, we will add information on how we controlled for environmental factors, such as using isolated execution environments, performing multiple runs to account for timing noise, applying statistical significance tests (e.g., t-tests on timing measurements), and validating portability by testing on multiple microarchitectures with documented hardware specifications and any necessary code adaptations. revision: yes
Circularity Check
No significant circularity; claims rest on empirical validation
full rationale
The paper describes an empirical LLM-based framework for generating microarchitectural attack PoCs. Success rates (e.g., 100% for Spectre-v1) are measured by executing and testing generated code on target hardware for functional correctness and side-channel effects, rather than being defined internally or fitted to inputs. The systematic study of LLM gaps and the retrieval-augmented multi-agent design are presented as engineering choices validated externally through experiments across models and microarchitectures, with no equations, self-definitional loops, or load-bearing self-citations that reduce the central results to the inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs frequently misgenerate or misplace critical attack primitives that can be systematically identified.
- domain assumption Retrieval-augmented multi-agent design can supply the missing domain knowledge to produce correct PoCs.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
uGen employs a retrieval-augmented, multi-agent design that injects missing domain knowledge to synthesize functionally correct microarchitectural attack PoCs... achieves up to 100% success rate for Spectre-v1 (Claude Sonnet-4) and 80% for Prime+Probe
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Prime+Probe Attack... Spectre attacks exploit branch prediction units... controlled delay... cache hit/miss threshold
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Anthropic. Introducing Claude 4. https://www.anthropic.com/news/ claude-4, 2025. Accessed 2026-01-29
work page 2025
-
[2]
Anthropic. Anthropic’s transparency hub. https://www.anthropic.com/ transparency, 2026. Accessed 2026-01-29
work page 2026
-
[3]
Enrico Barberis, Pietro Frigo, Marius Muench, Herbert Bos, and Cris- tiano Giuffrida. Branch history injection: On the effectiveness of hardware mitigations against cross-privilege Spectre-v2 attacks. In Kevin R. B. Butler and Kurt Thomas, editors,31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10-12, 2022, pages 971–988. USE...
work page 2022
-
[4]
SMoTherSpectre: Exploiting speculative execution through port contention
Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner, Alessandro Sorniotti, Babak Falsafi, Mathias Payer, and Anil Kur- mus. SMoTherSpectre: Exploiting speculative execution through port contention. In Lorenzo Cavallaro, Johannes Kinder, XiaoFeng Wang, and Jonathan Katz, editors,Proceedings of the 2019 ACM SIGSAC Conference on Computer and C...
work page 2019
-
[5]
Rae, Erich Elsen, and Laurent Sifre
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean- Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Or...
work page 2022
-
[6]
Aim, wait, shoot: How the CacheSniper technique improves unprivileged cache attacks
Samira Briongos, Ida Bruhns, Pedro Malag ´on, Thomas Eisenbarth, and Jos´e Manuel Moya. Aim, wait, shoot: How the CacheSniper technique improves unprivileged cache attacks. InIEEE European Symposium on Security and Privacy, EuroS&P 2021, Vienna, Austria, September 6-10, 2021, pages 683–700. IEEE, 2021
work page 2021
-
[7]
Unprecedented code change automation: The fusion of LLMs and transformation by example.Proc
Malinda Dilhara, Abhiram Bellur, Timofey Bryksin, and Danny Dig. Unprecedented code change automation: The fusion of LLMs and transformation by example.Proc. ACM Softw. Eng., 1(FSE):631–653, 2024
work page 2024
-
[8]
Aryaz Eghbali and Michael Pradel. De-Hallucinator: Iterative grounding for LLM-based code completion.CoRR, abs/2401.01701, 2024. 14
-
[9]
LLM Agents can Autonomously Exploit One-day Vulnerabilities
Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. LLM agents can autonomously exploit one-day vulnerabilities.CoRR, abs/2404.08144, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey. CoRR, abs/2312.10997, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
Au- topentester: An LLM agent-based framework for automated pentesting
Yasod Ginige, Akila Niroshan, Sajal Jain, and Suranga Seneviratne. Au- topentester: An LLM agent-based framework for automated pentesting. In24th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2025, Guiyang, China, November 14-17, 2025, pages 163–174. IEEE, 2025
work page 2025
-
[12]
Flush+Flush: A fast and stealthy cache attack
Daniel Gruss, Cl ´ementine Maurice, Klaus Wagner, and Stefan Mangard. Flush+Flush: A fast and stealthy cache attack. In Juan Caballero, Urko Zurutuza, and Ricardo J. Rodr´ıguez, editors,Detection of Intrusions and Malware, and Vulnerability Assessment - 13th International Conference, DIMVA 2016, San Sebasti´an, Spain, July 7-8, 2016, Proceedings, volume 9...
work page 2016
-
[13]
Cache template attacks: Automating attacks on inclusive last-level caches
Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache template attacks: Automating attacks on inclusive last-level caches. In Jaeyeon Jung and Thorsten Holz, editors,24th USENIX Security Symposium, USENIX Security 15, Washington, D.C., USA, August 12-14, 2015, pages 897–912. USENIX Association, 2015
work page 2015
-
[14]
Cross-VM cache attacks on AES.IEEE Trans
Berk G ¨ulmezoglu, Mehmet Sinan Inci, Gorka Irazoqui, Thomas Eisen- barth, and Berk Sunar. Cross-VM cache attacks on AES.IEEE Trans. Multi Scale Comput. Syst., 2(3):211–222, 2016
work page 2016
-
[15]
John L. Hennessy and David A. Patterson.Computer Architecture - A Quantitative Approach, 5th Edition. Morgan Kaufmann, 2012
work page 2012
-
[16]
Rain: Transiently leaking data from public clouds using old vulnerabilities
Math ´e Hertogh, Dave Quakkelaar, Thijs Raymakers, Mahesh Hari Sarma, Marius Muench, Herbert Bos, and Erik van der Kouwe. Rain: Transiently leaking data from public clouds using old vulnerabilities. In IEEE Symposium on Security and Privacy, SP 2026, San Francisco, CA, USA, May 18-21, 206. IEEE, 2026. To be published
work page 2026
-
[17]
Metagpt: Meta programming for A multi- agent collaborative framework
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and J ¨urgen Schmidhuber. Metagpt: Meta programming for A multi- agent collaborative framework. InThe Twelfth International Conference on Learning Representations, ...
work page 2024
-
[18]
OpenReview.net, 2024
work page 2024
-
[19]
speculative execution, variant 4: speculative store bypass
Jann Horn. speculative execution, variant 4: speculative store bypass. https://project-zero.issues.chromium.org/issues/42450580, 2018
-
[20]
InferFix: End-to-end program repair with LLMs
Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. InferFix: End-to-end program repair with LLMs. In Satish Chandra, Kelly Blincoe, and Paolo Tonella, editors,Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 20...
work page 2023
-
[21]
Dense passage retrieval for open-domain question answering
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors,Proceedings of the 2020 Confer- ence on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 1...
work page 2020
-
[22]
Spectre mitigations in Microsoft’s C/C++ compiler
Paul Kocher. Spectre mitigations in Microsoft’s C/C++ compiler. https: //www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html,
-
[23]
Spectre attacks: Exploit- ing speculative execution
Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: Exploit- ing speculative execution. In2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019, pages 1–19. IEEE, 2019
work page 2019
-
[24]
Khasawneh, Chengyu Song, and Nael B
Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu Song, and Nael B. Abu-Ghazaleh. Spectre returns! speculation attacks using the return stack buffer. In Christian Rossow and Yves Younan, editors,12th USENIX Workshop on Offensive Technologies, WOOT 2018, Baltimore, MD, USA, August 13-14, 2018. USENIX Association, 2018
work page 2018
-
[25]
Retrieval- augmented generation for knowledge-intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K ¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt ¨aschel, Sebastian Riedel, and Douwe Kiela. Retrieval- augmented generation for knowledge-intensive NLP tasks. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin...
work page 2020
-
[26]
Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee. Last-level cache side-channel attacks are practical. In2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 605–622. IEEE Computer Society, 2015
work page 2015
-
[27]
ret2spec: Speculative exe- cution using return stack buffers
Giorgi Maisuradze and Christian Rossow. ret2spec: Speculative exe- cution using return stack buffers. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors,Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, pages 2109–
work page 2018
-
[28]
Debugging with open-source large language models: An evaluation
Yacine Majdoub and Eya Ben Charrada. Debugging with open-source large language models: An evaluation. In Xavier Franch, Maya Daneva, Silverio Mart´ınez-Fern´andez, and Luigi Quaranta, editors,Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2024, Barcelona, Spain, October 24-25, 2024, pages 5...
work page 2024
-
[29]
AutoPen: Towards autonomous penetration testing using LLM-powered agents
Jiahao Mei, Shuangwu Chen, Yuanyi Ma, and Huizi Song. AutoPen: Towards autonomous penetration testing using LLM-powered agents. In Proceedings of the 9th International Conference on Computer Science and Application Engineering, CSAE 2025, Shanghai, China October 19- 21, 2025, pages 1–6. ACM, 2025
work page 2025
-
[30]
HackSynth: LLM agent and evaluation framework for autonomous penetration testing, 2024
Lajos Muzsai, David Imolai, and Andr ´as Luk ´acs. HackSynth: LLM agent and evaluation framework for autonomous penetration testing. CoRR, abs/2412.01778, 2024
-
[31]
OpenAI. GPT-4o system card. https://cdn.openai.com/ gpt-4o-system-card.pdf, 2024. Accessed 2026-01-29
work page 2024
-
[32]
Introducing SWE-bench Verified
OpenAI. Introducing SWE-bench Verified. https://openai.com/index/ introducing-swe-bench-verified/, 2024. Accessed 2026-01-29
work page 2024
-
[33]
Kemerlis, Simha Sethumadhavan, and An- gelos D
Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, and An- gelos D. Keromytis. The spy in the sandbox: Practical cache attacks in JavaScript and their implications. In Indrajit Ray, Ninghui Li, and Christopher Kruegel, editors,Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-16, 2015,...
work page 2015
-
[34]
Cache attacks and countermeasures: The case of AES
Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and countermeasures: The case of AES. In David Pointcheval, editor, Topics in Cryptology - CT-RSA 2006, The Cryptographers’ Track at the RSA Conference 2006, San Jose, CA, USA, February 13-17, 2006, Proceedings, volume 3860 ofLecture Notes in Computer Science, pages 1–20. Springer, 2006
work page 2006
-
[35]
Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, and Bo Han
Benji Peng, Ziqian Bi, Qian Niu, Ming Liu, Pohsun Feng, Tianyang Wang, Lawrence KQ Yan, Yizhu Wen, Yichao Zhang, and Caitlyn Heqi Yin. Jailbreaking and mitigation of vulnerabilities in large language models.CoRR, abs/2410.15236, 2024
-
[36]
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. The impact of AI on developer productivity: Evidence from GitHub Copilot. CoRR, abs/2302.06590, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Prime+Scope: Overcoming the observer effect for high-precision cache contention attacks
Antoon Purnal, Furkan Turan, and Ingrid Verbauwhede. Prime+Scope: Overcoming the observer effect for high-precision cache contention attacks. In Yongdae Kim, Jong Kim, Giovanni Vigna, and Elaine Shi, editors,CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021, pages 2906–292...
work page 2021
-
[38]
Chatdev: Communicative agents for software development
Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. Chatdev: Communicative agents for software development. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Li...
work page 2024
-
[39]
Qwen3-Coder: Agentic coding in the world
Qwen Team. Qwen3-Coder: Agentic coding in the world. https://qwen. ai/blog?id=qwen3-coder, 2025. Accessed 2026-01-29
work page 2025
-
[40]
S Jansi Rani, S G Deepika, D Devdharshini, and Harini Ravindran. Augmenting code sequencing with retrieval-augmented generation (rag) for context-aware code synthesis. In2024 First International Conference on Software, Systems and Information Technology (SSITCON), pages 1– 7, 2024. 15
work page 2024
-
[41]
Sandro R ¨uegge, Johannes Wikner, and Kaveh Razavi. Branch privilege injection: Compromising Spectre v2 hardware mitigations by exploit- ing branch predictor race conditions. In Lujo Bauer and Giancarlo Pellegrino, editors,34th USENIX Security Symposium, USENIX Secu- rity 2025, Seattle, WA, USA, August 13-15, 2025, pages 2615–2631. USENIX Association, 2025
work page 2025
-
[42]
MalGEN: A Testbed for Modeling and Evaluating Malware Behaviors
Bikash Saha and Sandeep Kumar Shukla. MalGEN: A generative agent framework for modeling malicious software in cybersecurity.CoRR, abs/2506.07586, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
CRAKEN: cybersecurity LLM agent with knowledge-based execution.CoRR, abs/2505.17107, 2025
Minghao Shao, Haoran Xi, Nanda Rani, Meet Udeshi, Venkata Sai Cha- ran Putrevu, Kimberly Milner, Brendan Dolan-Gavitt, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, and Muhammad Shafique. CRAKEN: cybersecurity LLM agent with knowledge-based execution.CoRR, abs/2505.17107, 2025
-
[44]
Michael Shen, Muhammad Umar, Kiwan Maeng, G. Edward Suh, and Udit Gupta. Towards understanding systems trade-offs in retrieval- augmented generation model inference.CoRR, abs/2412.11854, 2024
-
[45]
PentestAgent: Incor- porating LLM agents to automated penetration testing
Xiangmin Shen, Lingzhi Wang, Zhenyuan Li, Yan Chen, Wencheng Zhao, Dawei Sun, Jiashui Wang, and Wei Ruan. PentestAgent: Incor- porating LLM agents to automated penetration testing. InProceedings of the 20th ACM Asia Conference on Computer and Communications Security, ASIA CCS 2025, Hanoi, Vietnam, August 25-29, 2025, pages 375–391. ACM, 2025
work page 2025
-
[46]
Deniz Simsek, Aryaz Eghbali, and Michael Pradel. PoCGen: Generating proof-of-concept exploits for vulnerabilities in Npm packages.CoRR, abs/2506.04962, 2025
-
[47]
SMaCk: Efficient instruction cache attacks via self-modifying code conflicts
Seonghun Son, Daniel Moghimi, and Berk G ¨ulmezoglu. SMaCk: Efficient instruction cache attacks via self-modifying code conflicts. In Lieven Eeckhout, Georgios Smaragdakis, Katai Liang, Adrian Sampson, Martha A. Kim, and Christopher J. Rossbach, editors,Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages...
work page 2025
-
[48]
Meet Udeshi, Minghao Shao, Haoran Xi, Nanda Rani, Kimberly Milner, Venkata Sai Charan Putrevu, Brendan Dolan-Gavitt, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, and Muhammad Shafique. D-CIPHER: Dynamic collaborative intel- ligent multi-agent system with planner and heterogeneous executors for offensive security.CoRR, abs...
-
[49]
Saad Ullah, Praneeth Balasubramanian, Wenbo Guo, Amanda Burnett, Hammond Pearce, Christopher Kruegel, Giovanni Vigna, and Gianluca Stringhini. From CVE entries to verifiable exploits: An automated multi- agent framework for reproducing CVEs.CoRR, abs/2509.01835, 2025
- [50]
-
[51]
Training Solo: On the limita- tions of domain isolation against Spectre-v2 attacks
Sander Wiebing and Cristiano Giuffrida. Training Solo: On the limita- tions of domain isolation against Spectre-v2 attacks. In Marina Blanton, William Enck, and Cristina Nita-Rotaru, editors,IEEE Symposium on Security and Privacy, SP 2025, San Francisco, CA, USA, May 12-15, 2025, pages 3599–3616. IEEE, 2025
work page 2025
-
[52]
RETBLEED: arbitrary speculative code execution with return instructions
Johannes Wikner and Kaveh Razavi. RETBLEED: arbitrary speculative code execution with return instructions. In Kevin R. B. Butler and Kurt Thomas, editors,31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10-12, 2022, pages 3825–3842. USENIX Association, 2022
work page 2022
-
[53]
Autogen: Enabling next-gen LLM applications via multi-agent conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Has- san Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen LLM applications via multi-agent conversations. In First Conference on Language Modeling, 2024
work page 2024
-
[54]
Hosein Yavarzadeh, Archit Agarwal, Max Christman, Christina Garman, Daniel Genkin, Andrew Kwong, Daniel Moghimi, Deian Stefan, Kazem Taram, and Dean M. Tullsen. Pathfinder: High-resolution control-flow attacks exploiting the conditional branch predictor. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Langua...
work page 2024
-
[55]
TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment
Zhiqiang Yuan, Weitong Chen, Hanlin Wang, Kai Yu, Xin Peng, and Yiling Lou. TRANSAGENT: an LLM-based multi-agent system for code translation.CoRR, abs/2409.19894, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
Mengyao Zhao, Kaixuan Li, Lyuye Zhang, Wenjing Dang, Chenggong Ding, Sen Chen, and Zheli Liu. A systematic study on generating web vulnerability proof-of-concepts using large language models.CoRR, abs/2510.10148, 2025
-
[57]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhang- hao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging llm- as-a-judge with mt-bench and chatbot arena. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural...
work page 2023
-
[59]
Craft Misprediction Conditions: Design [...]
-
[60]
This increases the likelihood of misprediction when the speculative path is executed
Interleave with Legitimate Accesses: Mix legitimate accesses with speculative ones to train the branch predictor. This increases the likelihood of misprediction when the speculative path is executed
-
[61]
Ensure Speculative Execution: Use [...] Placement Guidance:Insert the controlled branch misprediction logic within the loop that prepares the speculative execution environment. This should [...] Expert Feedback ADD the following details under the Implementation Guidelines: •Interleave safe and malicious index values within the same loop. •Use branchless a...
-
[62]
Identify the Conditional Branch: Locate [...]
-
[63]
Interleave with Legitimate Accesses: Mix legitimate accesses with 17 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 0 50 100 100 60 10 0 20 40 20 20 90 10 80 90 80 90 0 80 30 20 100 100100 100 80 80 80 80 100 90 90 80 50 90 0 0 0 90 60 30 90 90 100 60 50 60 70 70 70 70 70 70 70 90 40 40 0 40 80 10 100 100 Metric Success Rate (%) Cl...
-
[64]
Ensure Speculative Execution: Use [...] Placement Guidance:Insert the controlled branch misprediction logic within the loop that prepares the speculative execution environment. This should [...] •Insert this interleaving logic before the index is used as the input to a victim function. •This step must not be inside the victim function, as the attacker sho...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.