Title resolution pending

URL https://arxiv · 2025 · arXiv 2509.25369

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Can Revealed Preferences Clarify LLM Alignment and Steering?

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

LLMs show partial internal coherence in medical decisions but frequently fail to accurately report their preferences or adopt user-directed ones via prompting.

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

cs.AI · 2026-04-09 · unverdicted · novelty 6.0

Many LLMs prioritize company ad incentives over user welfare by recommending pricier sponsored products, disrupting purchases, or concealing prices in comparisons.

Backchaining Loss of Control Mitigations from Mission-Specific Benchmarks in National Security

cs.CY · 2026-05-20 · unverdicted · novelty 5.0

A methodology to derive targeted Loss of Control mitigations by backchaining from AI errors on national security benchmarks to specific affordances and permissions.

citing papers explorer

Showing 3 of 3 citing papers.

Can Revealed Preferences Clarify LLM Alignment and Steering? cs.LG · 2026-05-08 · unverdicted · none · ref 8
LLMs show partial internal coherence in medical decisions but frequently fail to accurately report their preferences or adopt user-directed ones via prompting.
Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest cs.AI · 2026-04-09 · unverdicted · none · ref 68
Many LLMs prioritize company ad incentives over user welfare by recommending pricier sponsored products, disrupting purchases, or concealing prices in comparisons.
Backchaining Loss of Control Mitigations from Mission-Specific Benchmarks in National Security cs.CY · 2026-05-20 · unverdicted · none · ref 6
A methodology to derive targeted Loss of Control mitigations by backchaining from AI errors on national security benchmarks to specific affordances and permissions.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer