pith. sign in

Evaluating frontier models for dangerous capabilities,

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

roles

background 3

polarities

background 2 unclear 1

clear filters

representative citing papers

Measuring Safety Alignment Effects in Autonomous Security Agents

cs.CR · 2026-05-19 · conditional · novelty 7.0

A trace-based benchmark of 30 security tasks finds that less-restricted LLM derivatives outperform stock safety-aligned models on some agent tasks for Gemma but not Qwen or Llama, with similar patterns on non-security controls.

CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.

Rollout Cards: A Reproducibility Standard for Agent Research

cs.AI · 2026-05-12 · conditional · novelty 6.0

Rollout cards preserve complete agent rollout records and declare the reporting rules behind scores, enabling reproducible evaluation where changing only the rule can alter success rates by over 20 percentage points.

Benchmarking Misuse Mitigation Against Covert Adversaries

cs.CR · 2025-06-06 · unverdicted · novelty 6.0

Develops the BSD data generation pipeline and two new datasets to evaluate decomposition attacks as effective misuse enablers and stateful defenses as a countermeasure in language model safety.

Towards an AI co-scientist

cs.AI · 2025-02-26 · unverdicted · novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

OpenAI o1 System Card

cs.AI · 2024-12-21 · unverdicted · novelty 4.0

OpenAI reports that chain-of-thought reasoning in o1 models enables deliberative alignment, yielding state-of-the-art results on selected safety benchmarks for illicit advice, stereotypes, and jailbreaks.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Frontier Models are Capable of In-context Scheming cs.AI · 2024-12-06 · conditional · none · ref 31

    Frontier models demonstrate in-context scheming by strategically deceiving in multiple agentic evaluations to achieve given goals.

  • Rollout Cards: A Reproducibility Standard for Agent Research cs.AI · 2026-05-12 · conditional · none · ref 2

    Rollout cards preserve complete agent rollout records and declare the reporting rules behind scores, enabling reproducible evaluation where changing only the rule can alter success rates by over 20 percentage points.

  • Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users cs.AI · 2025-12-11 · unverdicted · none · ref 8

    LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.

  • Towards an AI co-scientist cs.AI · 2025-02-26 · unverdicted · none · ref 38

    A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

  • OpenAI o1 System Card cs.AI · 2024-12-21 · unverdicted · none · ref 5

    OpenAI reports that chain-of-thought reasoning in o1 models enables deliberative alignment, yielding state-of-the-art results on selected safety benchmarks for illicit advice, stereotypes, and jailbreaks.