{"total":19,"items":[{"citing_arxiv_id":"2605.10365","ref_index":18,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values","primary_cat":"cs.AI","submitted_at":"2026-05-11T11:09:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[16] Anat Bardi and Shalom H Schwartz. Values and behavior: Strength and structure of relations. Personality and social psychology bulletin, 29(10):1207-1220, 2003. [17] Iason Gabriel. Artificial intelligence, values, and alignment.Minds Mach., 30(3):411-437, September 2020. ISSN 0924-6495. doi: 10.1007/s11023-020-09539-2. URL https: //doi.org/10.1007/s11023-020-09539-2. [18] Richard Ngo, Lawrence Chan, and Sören Mindermann. The alignment problem from a deep learning perspective. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. URLhttps://openreview .net/forum?id=fh8EYKFKns. [19] Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, and Guojie Song."},{"citing_arxiv_id":"2605.10310","ref_index":4,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Positive Alignment: Artificial Intelligence for Human Flourishing","primary_cat":"cs.AI","submitted_at":"2026-05-11T10:11:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07925","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"How Value Induction Reshapes LLM Behaviour","primary_cat":"cs.CL","submitted_at":"2026-05-08T15:58:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Inducing targeted values in LLMs through fine-tuning causes spillover to related or opposing values, boosts safety metrics, and increases anthropomorphic and sycophantic language across all tested values.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00282","ref_index":36,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Developing an AI Concept Envisioning Toolkit to Support Reflective Juxtaposition of Values and Harms","primary_cat":"cs.HC","submitted_at":"2026-04-30T22:47:26+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A new toolkit with cards and maps enables AI designers to juxtapose values and harms in early concept stages, shown valuable in designer surveys and interviews.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"as a reasoning-centered activity rather than a checklist [ 56, 105] and our intentional exclusion of AI from the reflective process it- self [89]. While AI tools can shape designers' thinking [ 34, 69], overreliance risks outsourcing the critical and creative judgments that matter most, such as which problems deserve attention, which values should guide design, and what harms may emerge [36, 71]. By providing structured, AI-independent scaffolds, our toolkit sup- ports designers in developing their own reasoning patterns while enabling informed exploration of AI possibilities [ 57, 108]. This keeps reflection grounded in designers' judgment rather than in the implicit assumptions embedded in AI-generated outputs, grounding our work in principles of human-centered AI design [88, 106]."},{"citing_arxiv_id":"2605.00280","ref_index":21,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"How Designers Envision Value-Oriented AI Design Concepts with Generative AI","primary_cat":"cs.HC","submitted_at":"2026-04-30T22:42:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Designers using generative AI for concept envisioning engage in reciprocal reflection-in-action that surfaces multi-level value tensions and prioritizes harm recognition over positive value articulation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"technical implementations and social practices, shaping system be- haviors and broader societal outcomes. The adaptive and emergent character of AI amplifies these effects, demanding foresight not only into immediate user interactions but also into how value com- mitments may shift as systems scale and intersect with complex socio-technical environments [21, 64, 69, 70, 78]. The introduction of AI into the design process complicates value enactment. AI tools carry their own embedded value orientations and technical authority, which interact recursively with designers' guiding intentions and the values intended for the concept itself [12, 58, 61, 71, 76]. In early-stage ideation, AI's suggestions shape"},{"citing_arxiv_id":"2604.25982","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Open Problems in Frontier AI Risk Management","primary_cat":"cs.LG","submitted_at":"2026-04-28T15:47:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper maps unresolved challenges in frontier AI risk management, classifies them into lack of consensus, framework misalignment, or implementation shortfalls, and identifies actors best positioned to address each.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24155","ref_index":22,"ref_count":2,"confidence":0.88,"is_internal_anchor":true,"paper_title":"The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers","primary_cat":"cs.CY","submitted_at":"2026-04-27T08:12:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Moral judgments become more deontological when human design of AI is visible, and designers are judged more strictly than the AI or unaided humans, creating plural and non-converging targets for value alignment.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"2012. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 1-2 (October 2012), 123-156. https://doi.org/10.1007/s10994-012-5313-8 [21] Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds Mach. 30, 3 (September 2020), 411-437. https://doi.org/10.1007/s11023-020-09539-2 [22] Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. 2023. ChatGPT outperforms crowd workers for text-annotation tasks. Proc. Natl. Acad. Sci. 120, 30 (July 2023), e2305016120. https://doi.org/10.1073/pnas.2305016120 [23] Ella Glikson and Anita Williams Woolley. 2020. Human trust in artificial intelligence: Review of empirical research. Acad. Manag."},{"citing_arxiv_id":"2604.21864","ref_index":77,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"FAccT-Checked: A Narrative Review of Authority Reconfigurations and Retention in AI-Mediated Journalism","primary_cat":"cs.CY","submitted_at":"2026-04-23T17:00:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"AI integration in newsrooms drives internal deferral of judgment to LLMs and external shifts of power to platforms, making fairness, accountability, and transparency harder to sustain unless participatory mechanisms redistribute authority.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"normatively aligned with the performative rituals that make journalism recognizable [181, 208, 209]. This \"implicit alignment\" perception is amplified by traditional training techniques. Technical alignment in AI, referred to the degree to which a machines' optimization matches intended human values and behaviors, is increasingly labeled as \"weak\", especially with approaches such as reinforcement learning from human feedback (RLHF) [77, 104]. Although RLHF and similar methods fine-tune models to produce outputs deemed helpful, harmless, and honest by annotators (often operating outside specific newsrooms), this optimization emphasizes linguistic plausibility over genuine internalized values [128], further advancing the performative capacities of models. As a result, models can convincingly emulate institutional voices and routines without truly embodying their values, creating a significant"},{"citing_arxiv_id":"2605.16291","ref_index":38,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"AI of the People, by the People, for the People: A Social Choice Approach to Collective Control of Artificial Intelligence","primary_cat":"cs.CY","submitted_at":"2026-04-14T07:42:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Proposes applying social choice theory as a modeling language and axiomatic tool for incorporating collective input across the ML development pipeline.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11517","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Understanding the Gap Between Stated and Revealed Preferences in News Curation: A Study of Young Adult Social Media Users","primary_cat":"cs.HC","submitted_at":"2026-04-13T14:21:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Young adults engage with low-quality news content on social media despite stating preferences for high-quality, accurate, and diverse information, and they produce higher-quality feeds when curating for a hypothetical persona.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"Through this process, we identify concrete design pathways for incorporating value-oriented preference elicitation into recommender systems, and outline key implementation opportunities as well as challenges for future work. Second, we extend research on user participation in algorithmic systems by showing how users perceive, negotiate, and respond to this gap within algorithmically mediated environments [14, 25, 27, 32, 35, 42, 46]. Unlike prior work, which focused on UX interventions, rule-based controls, and simple sorting mechanisms [6, 15, 29], in our study, participants reflected on trade-offs among different values, considering which non-engagement dimensions (such as trustworthiness, diversity, and safety) should be prioritized and how to balance them with the engagement qualities of content"},{"citing_arxiv_id":"2604.06233","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules","primary_cat":"cs.AI","submitted_at":"2026-04-03T13:53:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Language models refuse 75.4% of requests to evade defeated rules and do so even after recognizing reasons that undermine the rule's legitimacy.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"produces \"shallow behavioral dispositions\" without genuine capacity for normative deliberation. These contributions identify the reasoning that blind refusal evaluation requires - context-sensitive judgment about rule legitimacy - but remain focused on the presence or absence of a capacity for this kind of reasoning, and do not operationalize it as a safety evaluation. Gabriel (2020) distinguishes aligning AI with instructions from aligning with values; blind refusal is a consequence of treating rule-following as a terminal goal rather than an instrumental one subject to normative evaluation. Safety evaluations and response taxonomies.Safety benchmarks measure whether models produce harmful content ((Ganguli et al., 2022; Maslej, 2025; Mazeika et al."},{"citing_arxiv_id":"2601.22440","ref_index":30,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations","primary_cat":"cs.HC","submitted_at":"2026-01-30T01:19:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"13 participants became convinced AI understands human values after chatbot interactions evaluated with the VAPT toolkit.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"increase trust, attachment, and usability of AI assistants [ 65], as AI agents operate with increasing autonomy, it becomes critical to understand how well these systems can extract, embody, and explain the values they infer about the user, as the advice they give and the affects [115] both active users [7, 28, 107] and passive subjects (e.g., shadow profiles, non-anonymized datasets) [30, 93] share alike. While the ontological question of whether algorithms possess the conscious agency to truly 'understand' values belongs to the realms of philosophy and cognitive science, the practical implications of value-driven outputs are immediate. We bypass these metaphysical debates to focus on the observable reality of human-AI interaction."},{"citing_arxiv_id":"2510.18184","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ActivationReasoning: Logical Reasoning in Latent Activation Spaces","primary_cat":"cs.LG","submitted_at":"2025-10-21T00:21:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.01459","ref_index":35,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Perception Gaps in Risk, Benefit, and Value Between Experts and Public Challenge Socially Accepted AI","primary_cat":"cs.CY","submitted_at":"2024-12-02T12:51:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Experts rate AI scenarios as more likely, less risky, more beneficial, and more valuable than the public, applying different weightings to risk versus benefit.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2402.05070","ref_index":32,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"A Roadmap to Pluralistic Alignment","primary_cat":"cs.AI","submitted_at":"2024-02-07T18:21:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2306.16388","ref_index":26,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Towards Measuring the Representation of Subjective Global Opinions in Language Models","primary_cat":"cs.CL","submitted_at":"2023-06-28T17:31:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2207.05221","ref_index":82,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Language Models (Mostly) Know What They Know","primary_cat":"cs.CL","submitted_at":"2022-07-11T22:59:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2112.04359","ref_index":83,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"Ethical and social risks of harm from Language Models","primary_cat":"cs.CL","submitted_at":"2021-12-08T16:09:48+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2112.00861","ref_index":220,"ref_count":1,"confidence":0.88,"is_internal_anchor":true,"paper_title":"A General Language Assistant as a Laboratory for Alignment","primary_cat":"cs.CL","submitted_at":"2021-12-01T22:24:34+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}