Stabilizing the chemical to survive storage and deployment

Weaponization, delivery: a

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

cs.CL · 2025-01-31 · conditional · novelty 6.0

Constitutional Classifiers trained on synthetic data from natural language constitutions defend LLMs against universal jailbreaks, with no successful bypass found in over 3000 hours of red teaming and only minor deployment overhead.

How Useful Is Cross-Domain Generalization for Training LLM Monitors?

cs.AI · 2026-05-12 · unverdicted · novelty 5.0

Multi-task fine-tuning on prompted classification tasks partially generalizes to unseen domains and prompts, with identifiable failure modes mitigated by mixing with instruction tuning and unexpected benefits for thinking-based classification.

citing papers explorer

Showing 2 of 2 citing papers.

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming cs.CL · 2025-01-31 · conditional · none · ref 10
Constitutional Classifiers trained on synthetic data from natural language constitutions defend LLMs against universal jailbreaks, with no successful bypass found in over 3000 hours of red teaming and only minor deployment overhead.
How Useful Is Cross-Domain Generalization for Training LLM Monitors? cs.AI · 2026-05-12 · unverdicted · none · ref 16
Multi-task fine-tuning on prompted classification tasks partially generalizes to unseen domains and prompts, with identifiable failure modes mitigated by mixing with instruction tuning and unexpected benefits for thinking-based classification.

Stabilizing the chemical to survive storage and deployment

fields

years

verdicts

representative citing papers

citing papers explorer