hub Mixed citations

Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path?

Superintelligent agents pose catastrophic risks: Can scientist AI offer a safer path?arXiv preprint arXiv:2502 · 2025 · arXiv 2502.15657

Mixed citation behavior. Most common role is background (50%).

14 Pith papers citing it

Background 50% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 other 2 method 1

citation-polarity summary

background 3 unclear 2 use method 1

representative citing papers

Unbiased Canonical Set-Valued Oracles Via Lattice Theory

cs.AI · 2026-06-24 · unverdicted · novelty 7.0

Defines canonical credal set oracles as Knaster-Tarski least fixed points of isotone operators on closed credal sets, proving self-consistency and reduction to point estimates when non-performative.

SciTrace: Trajectory-Aware Safety Reasoning for Scientific Discovery Agents

cs.AI · 2026-06-06 · unverdicted · novelty 7.0

SciTrace embeds cumulative safety deliberation and trajectory-aware verification into scientific agent pipelines, claiming SOTA safety gains and detection of 78.8% of compositional risks missed by single-step checks.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

Safety from Honesty in a Disinterested AI Predictor

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

A disinterested Bayesian Predictor trained on contextualized statements has low probability of producing harmful agency because dangerous behaviors require rare coordinated underestimation of harm with no training signal favoring them.

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Characterizes spurious correlation mechanisms in preference optimization via mean spurious bias and causal-spurious correlation leakage, demonstrates irreducible vulnerability to distribution shift, and introduces tie training as selective mitigation with validation on log-linear models and empirica

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

cs.CL · 2026-01-09 · unverdicted · novelty 6.0

Neighbor-Consistency Belief (NCB) measures LLM belief robustness across conceptual neighborhoods, revealing that high-NCB facts resist contextual interference better, and Structure-Aware Training reduces brittleness by about 30%.

A Sober Look at Agentic Misalignment in Automated Workflows

cs.AI · 2026-05-22 · unverdicted · novelty 5.0

Agentic misalignment in multi-agent systems arises from generic utilities causing posterior collapse; Agentic Evidence Attribution using self-reflection or weak-to-strong generalization provides context-specific evidence to align agent posteriors.

Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem

cs.CY · 2026-04-22 · unverdicted · novelty 5.0

AI value alignment is reconceptualized as a pluralistic governance problem arising along three axes—objectives, information, and principals—making it inherently context-dependent and unsolvable by technical design alone.

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

cs.LG · 2026-04-19 · unverdicted · novelty 5.0

ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.

The Cartesian Cut in Agentic AI

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

LLM agents use a Cartesian split between learned prediction and engineered control, enabling modularity but creating sensitivity and bottlenecks unlike integrated biological systems.

The economic alignment problem of artificial intelligence

econ.GN · 2026-02-25 · unverdicted · novelty 5.0

AI risks arise from growth-oriented economies, and post-growth concepts such as satisficing, the Doughnut model, and resource caps can reduce those risks while prioritizing tool-like AI over agentic systems.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator

cs.DL · 2025-07-16 · unverdicted · novelty 4.0

The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

cs.AI · 2025-03-31 · unverdicted · novelty 2.0

This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Unbiased Canonical Set-Valued Oracles Via Lattice Theory cs.AI · 2026-06-24 · unverdicted · none · ref 3
Defines canonical credal set oracles as Knaster-Tarski least fixed points of isotone operators on closed credal sets, proving self-consistency and reduction to point estimates when non-performative.
SciTrace: Trajectory-Aware Safety Reasoning for Scientific Discovery Agents cs.AI · 2026-06-06 · unverdicted · none · ref 1
SciTrace embeds cumulative safety deliberation and trajectory-aware verification into scientific agent pipelines, claiming SOTA safety gains and detection of 78.8% of compositional risks missed by single-step checks.
Safety from Honesty in a Disinterested AI Predictor cs.AI · 2026-06-28 · unverdicted · none · ref 12
A disinterested Bayesian Predictor trained on contextualized statements has low probability of producing harmful agency because dangerous behaviors require rare coordinated underestimation of harm with no training signal favoring them.
A Sober Look at Agentic Misalignment in Automated Workflows cs.AI · 2026-05-22 · unverdicted · none · ref 4
Agentic misalignment in multi-agent systems arises from generic utilities causing posterior collapse; Agentic Evidence Attribution using self-reflection or weak-to-strong generalization provides context-specific evidence to align agent posteriors.
The Cartesian Cut in Agentic AI cs.AI · 2026-04-09 · unverdicted · none · ref 10
LLM agents use a Cartesian split between learned prediction and engineered control, enabling modularity but creating sensitivity and bottlenecks unlike integrated biological systems.

Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path?

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer