InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 39, pages 27491–27499

· 2025 · arXiv 2511.10714

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Red Teaming Large Reasoning Models

cs.CR · 2025-11-29 · unverdicted · novelty 7.0

RT-LRM benchmark finds Large Reasoning Models more fragile than standard LLMs to risks like CoT-hijacking and prompt-induced issues.

RecurGuard: Runtime Monitoring for Reasoning-Token Consumption Attacks

cs.CR · 2026-06-06 · unverdicted · novelty 6.0

RecurGuard monitors recurrence rate, volume growth, and query progress in exposed reasoning traces to terminate generation on token-consumption attacks, reporting 99% detection on OverThink and 92% on ExtendAttack with near-zero false positives.

OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents

cs.LG · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

OTora is a two-stage framework that generates insertion-aware adversarial triggers and ICL-guided genetic payloads to induce reasoning-level denial-of-service in tool-augmented LLM agents across multiple backbones while preserving task correctness.

Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

cs.AI · 2026-03-26 · unverdicted · novelty 6.0

An external zero-shot monitor detects nine unsafe reasoning behaviors in LLMs at 87% step-level accuracy with low false positives and low latency.

citing papers explorer

Showing 4 of 4 citing papers.

Red Teaming Large Reasoning Models cs.CR · 2025-11-29 · unverdicted · none · ref 2
RT-LRM benchmark finds Large Reasoning Models more fragile than standard LLMs to risks like CoT-hijacking and prompt-induced issues.
RecurGuard: Runtime Monitoring for Reasoning-Token Consumption Attacks cs.CR · 2026-06-06 · unverdicted · none · ref 30
RecurGuard monitors recurrence rate, volume growth, and query progress in exposed reasoning traces to terminate generation on token-consumption attacks, reporting 99% detection on OverThink and 92% on ExtendAttack with near-zero false positives.
OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents cs.LG · 2026-05-09 · unverdicted · none · ref 7 · 2 links
OTora is a two-stage framework that generates insertion-aware adversarial triggers and ICL-guided genetic payloads to induce reasoning-level denial-of-service in tool-augmented LLM agents across multiple backbones while preserving task correctness.
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models cs.AI · 2026-03-26 · unverdicted · none · ref 19
An external zero-shot monitor detects nine unsafe reasoning behaviors in LLMs at 87% step-level accuracy with low false positives and low latency.

InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 39, pages 27491–27499

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer