arxiv: 2601.15895 · v2 · submitted 2026-01-22 · 💻 cs.HC

Recognition: no theorem link

Co-Constructing Alignment: A Participatory Approach to Situate AI Values

Anne Arzberger , Enrico Liscio , Maria Luce Lupetti , Inigo Martinez de Rituerto de Troya , Jie Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:23 UTC · model grok-4.3

classification 💻 cs.HC

keywords AI alignmentparticipatory designhuman-AI interactionvalue misalignmentsituated practiceLLM usersco-construction

0 comments

The pith

Alignment between users and AI is co-constructed through their ongoing interactions rather than preset in the model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that treating alignment as a fixed model property misses how users actually encounter and respond to value mismatches during real use. It supports this through a participatory workshop in which researchers using large language models kept misalignment diaries and then used generative design activities to imagine ways to act on those experiences. Participants described misalignments mainly as unexpected outputs or breakdowns in tasks and social exchanges, not as abstract ethical problems. They proposed practical responses such as adjusting prompts, reinterpreting outputs, or choosing deliberate non-use. The work concludes that systems should be designed to treat alignment as a shared, situated practice that continues over time.

Core claim

Alignment is an interactional practice co-constructed during human-AI interaction. In a workshop that paired misalignment diaries with generative design activities, researchers using large language models as research assistants reported that misalignments appear as unexpected responses and task or social breakdowns. Participants described contributing to alignment through roles that include adjusting model behavior, interpreting outputs, and using deliberate non-engagement as a strategy.

What carries the argument

The participatory workshop combining misalignment diaries with generative design activities, which makes visible how users experience misalignments in context and how they envision acting on them.

If this is right

AI systems should provide interfaces that let users adjust or reinterpret outputs during use.
Designs should recognize deliberate non-engagement as one legitimate way users maintain alignment.
Alignment support must be ongoing and tied to specific tasks and social contexts rather than delivered once.
User roles in alignment include active interpretation and response rather than passive reception of model values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Interfaces could add lightweight logging features so users can note and revisit misalignments without extra effort.
The same diary-plus-design method could be tested with other user groups such as educators or clinicians to see if patterns repeat.
Over time, systems that treat alignment as co-construction might reduce the need for repeated retraining by letting users steer behavior in context.

Load-bearing premise

That experiences shared in one workshop with researchers using large language models reflect how alignment works for wider groups of users and in actual daily practice.

What would settle it

A longitudinal study that logs real LLM interactions, records observed misalignments, and compares them against users' later self-reports would show whether the workshop descriptions match day-to-day dynamics.

Figures

Figures reproduced from arXiv: 2601.15895 by Anne Arzberger, Enrico Liscio, Inigo Martinez de Rituerto de Troya, Jie Yang, Maria Luce Lupetti.

**Figure 1.** Figure 1: Overview of the three workshop phases. Phase 0 uses a diary to sensitise participants to different forms of misalignment in AI [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Example of a misaligned interaction traced from initial diary entry to concern and values at stake across Phases 0–2. The [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: From a shared alignment goal to an action metaphor in Steps 4–6. P7 and Group 3 define an alignment goal emphasising [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: P7-envisioned interface for Step 7, supporting reflexive alignment through visible model positionality. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

As AI systems become embedded in everyday practice, value misalignment has emerged as a pressing concern. Yet, dominant alignment approaches remain model centric, treating users as passive recipients of prespecified values rather than as epistemic agents who encounter and respond to misalignment during interactions. Drawing on situated perspectives, we frame alignment as an interactional practice co-constructed during human AI interaction. We investigate how users understand and wish to contribute to this process through a participatory workshop that combines misalignment diaries with generative design activities. We surface how misalignments materialise in practice and how users envision acting on them, grounded in the context of researchers using Large Language Models as research assistants. Our findings show that misalignments are experienced less as abstract ethical violations than as unexpected responses, and task or social breakdowns. Participants articulated roles ranging from adjusting and interpreting model behaviour to deliberate non-engagement as an alignment strategy. We conclude with implications for designing systems that support alignment as an ongoing, situated, and shared practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable workshop method for treating alignment as something users help build in practice, but the single small study leaves the claims on shaky ground.

read the letter

The core takeaway is that users of LLMs as research assistants treat misalignment as everyday breakdowns in responses or tasks rather than abstract value clashes, and they respond by adjusting outputs, interpreting them, or simply disengaging. The authors frame this as co-constructed alignment and test it with a participatory workshop that pairs misalignment diaries with generative design sessions. That protocol is the clearest new piece: it turns the situated-perspective literature into a repeatable activity that surfaces specific user roles in one applied setting. The approach feels honest about moving beyond model-only fixes and gives designers something concrete to try. The evidence base is thin. The abstract and description supply no participant count, recruitment details, or analysis steps, so it is hard to judge whether the reported patterns are stable or just what surfaced in this group. One workshop with self-selected academic researchers also makes the step to broader populations or other contexts a stretch; the stress-test concern about generalization holds up on the available information. This is for HCI researchers and alignment practitioners who want participatory tools rather than purely technical ones. A reader already working on user studies or design workshops could borrow the diary-plus-design format and adapt it. It should go to peer review so the methods can be documented properly and the scope of the claims can be tightened.

Referee Report

1 major / 1 minor

Summary. The paper claims that dominant model-centric approaches to AI value alignment overlook users as epistemic agents who actively encounter and respond to misalignment in practice. Drawing on situated perspectives, it reframes alignment as an interactional, co-constructed practice. This is investigated via a participatory workshop combining misalignment diaries and generative design activities with researchers who use LLMs as research assistants. The study surfaces misalignments as unexpected responses or task/social breakdowns rather than abstract ethical violations, identifies user roles such as adjusting, interpreting, and deliberate non-engagement, and derives design implications for systems that support alignment as an ongoing, situated, and shared process.

Significance. If the empirical patterns hold, the work offers a useful counterpoint to technical alignment research by grounding value alignment in everyday HCI practices. It provides concrete examples of how users already manage misalignment through interactional strategies, which could inform the design of more responsive AI interfaces and participatory alignment methods. The participatory approach itself demonstrates a method for eliciting user perspectives on alignment that may be adaptable to other domains.

major comments (1)

The central framing of alignment as a general interactional practice rests on data from a single participatory workshop with a narrow, self-selected sample of LLM researchers. The manuscript must clarify whether the surfaced misalignment types and roles are presented as context-specific to academic LLM use or as evidence for broader dynamics; without this scoping or additional validation, the leap to design implications for general systems risks overgeneralization.

minor comments (1)

The abstract omits basic methodological details (participant count, recruitment, analysis procedure) that are standard for qualitative HCI papers and would help readers assess the findings' grounding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for this constructive comment on scoping and generalizability. We agree that the single-workshop design with a specific participant group requires clearer boundaries in the manuscript and have revised accordingly to avoid overgeneralization while retaining the value of the exploratory insights.

read point-by-point responses

Referee: The central framing of alignment as a general interactional practice rests on data from a single participatory workshop with a narrow, self-selected sample of LLM researchers. The manuscript must clarify whether the surfaced misalignment types and roles are presented as context-specific to academic LLM use or as evidence for broader dynamics; without this scoping or additional validation, the leap to design implications for general systems risks overgeneralization.

Authors: We agree that the empirical basis is a single participatory workshop with a self-selected group of LLM researchers and that this constrains claims to broader populations. The original manuscript already situates the work in the specific context of academic researchers using LLMs as research assistants (see abstract and Section 3), but we acknowledge that the transition to design implications could be read as implying wider applicability. In the revision we have added explicit scoping language in the introduction, findings, and conclusion: the observed misalignment types (unexpected responses, task/social breakdowns) and user roles (adjusting, interpreting, deliberate non-engagement) are presented as patterns identified within this academic LLM-assistant setting rather than as universal. Design implications are now framed as context-informed suggestions that illustrate how systems might support ongoing, situated alignment practices, with an explicit caveat that further validation across other user groups and domains is needed. We have also strengthened the limitations section to discuss sample characteristics and the exploratory nature of the participatory method. These changes directly address the risk of overgeneralization without requiring new data collection. revision: yes

Circularity Check

0 steps flagged

No circularity in qualitative framing or derivation

full rationale

The paper is a qualitative participatory study that frames alignment as co-constructed based on workshop findings with LLM researchers. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the derivation. The central claim is inductively supported by the described misalignment diaries and design activities rather than reducing to its inputs by construction. External situated-perspective literature is invoked without smuggling ansatzes or uniqueness theorems from the authors' prior work. This is a standard honest non-finding for non-mathematical empirical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that participatory methods reliably surface situated user understandings of misalignment; no free parameters or new entities are introduced.

axioms (1)

domain assumption Situated perspectives frame alignment as interactional practice
Invoked in the opening framing of alignment as co-constructed during human-AI interaction.

pith-pipeline@v0.9.0 · 5485 in / 1126 out tokens · 26647 ms · 2026-05-16T12:23:17.919094+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 3 internal anchors

[1]

Arnold, D

T. Arnold, D. Kasenberg, and M. Scheutz. Value alignment or misalignment-what will keep systems accountable? InAAAI Workshops, pages 81–88, 2017

work page 2017
[2]

S. R. Arnstein. A ladder of citizen participation.Journal of the American Institute of planners, 35(4):216–224, 1969

work page 1969
[3]

Aroyo and C

L. Aroyo and C. Welty. Truth is a lie: Crowd truth and the seven myths of human annotation.AI Magazine, 36(1):15–24, 2015

work page 2015
[4]

Arzberger, S

A. Arzberger, S. Buijsman, M. L. Lupetti, A. Bozzon, and J. Yang. Nothing comes without its world–practical challenges of aligning llms to situated human values through rlhf. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 61–73, 2024

work page 2024
[5]

Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[6]

Beirami, A

A. Beirami, A. Agarwal, J. Berant, A. N. D’Amour, J. Eisenstein, C. Nagpal, and A. T. Suresh. Theoretical guarantees on the best-of-n alignment policy. InForty-second International Conference on Machine Learning, 2025

work page 2025
[7]

E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, page 610–623, Virtual Event Canada, Mar. 2021. ACM

work page 2021
[8]

J. O. Berger. Statistical decision theory. InThe New Palgrave Dictionary of Economics, pages 1–6. Springer, 1987

work page 1987
[9]

Birhane, W

A. Birhane, W. Isaac, V. Prabhakaran, M. Diaz, M. C. Elish, I. Gabriel, and S. Mohamed. Power to the people? opportunities and challenges for participatory ai. InProceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–8, 2022

work page 2022
[10]

Bommasani, D

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, et al. On the opportunities and risks of foundation models. Technical report, Stanford Center for Research on Foundation Models, 2021. Stanford Technical Report

work page 2021
[11]

Braun and V

V. Braun and V. Clarke. Using thematic analysis in psychology.Qualitative research in psychology, 3(2):77–101, 2006

work page 2006
[12]

Braun and V

V. Braun and V. Clarke. Reflecting on reflexive thematic analysis.Qualitative research in sport, exercise and health, 11(4):589–597, 2019

work page 2019
[13]

P.-Y. Chen. Ai alignment dialogues: An interactive approach to ai alignment in support agents. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, pages 894–894, 2022

work page 2022
[14]

Christian.The alignment problem: Machine learning and human values

B. Christian.The alignment problem: Machine learning and human values. WW Norton & Company, 2020

work page 2020
[15]

P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei. Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

work page 2017
[16]

Clarke and V

V. Clarke and V. Braun. Thematic analysis.The journal of positive psychology, 12(3):297–298, 2017

work page 2017
[17]

Cooke and U

B. Cooke and U. Kothari.Participation: The new tyranny?Zed books, 2001

work page 2001
[18]

Dahlgren Lindström, L

A. Dahlgren Lindström, L. Methnani, L. Krause, P. Ericson, Í. M. de Rituerto de Troya, D. Coelho Mollo, and R. Dobbe. Helpful, harmless, honest? sociotechnical limits of ai alignment and safety through reinforcement learning from human feedback: Ad lindström et al.Ethics and Information Technology, 27(2):28, 2025

work page 2025
[19]

de Wet, D

J. de Wet, D. Wetzelhütter, and J. Bacher. Revisiting the trans-situationality of values in Schwartz’s Portrait Values Questionnaire.Quality and Quantity, 53(2):685–711, 2018

work page 2018
[20]

Delgado, S

F. Delgado, S. Yang, M. Madaio, and Q. Yang. The participatory turn in ai design: Theoretical foundations and the current state of practice. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–23, 2023

work page 2023
[21]

X. Fan, Q. Xiao, X. Zhou, J. Pei, M. Sap, Z. Lu, and H. Shen. User-driven value alignment: Understanding users’ perceptions and strategies for addressing biased and discriminatory statements in ai companions. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–19, 2025

work page 2025
[22]

J. Fang, Z. Bi, R. Wang, H. Jiang, Y. Gao, K. Wang, A. Zhang, J. Shi, X. Wang, and T.-S. Chua. Towards neuron attributions in multi-modal large language models. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 122867–122890. Curran Associates, ...

work page 2024
[23]

Friedman, P

B. Friedman, P. H. Kahn Jr, A. Borning, and A. Huldtgren. Value sensitive design and information systems. InEarly engagement and new technologies: Opening up the laboratory, pages 55–95. Springer, 2013

work page 2013
[24]

I. Gabriel. Artificial intelligence, values, and alignment.Minds and machines, 30(3):411–437, 2020. Manuscript submitted to ACM Co-Constructing Alignment: A Participatory Approach to Situate AI Values 17

work page 2020
[25]

Graham, J

J. Graham, J. Haidt, and B. A. Nosek. Liberals and Conservatives Rely on Different Sets of Moral Foundations.Journal of Personality and Social Psychology, 96(5):1029–1046, 2009

work page 2009
[26]

Greenbaum and M

J. Greenbaum and M. Kyng.Design at work: Cooperative design of computer systems. CRC Press, 2020

work page 2020
[27]

K. D. Gutiérrez and B. Rogoff. Cultural ways of learning: Individual traits or repertoires of practice.Educational researcher, 32(5):19–25, 2003

work page 2003
[28]

Hendrycks, C

D. Hendrycks, C. Burns, S. Basart, A. Critch, J. Li, D. Song, and J. Steinhardt. Aligning AI With Shared Human Values. InProceedings of the Ninth International Conference on Learning Representations, ICLR ’21, pages 1–29, Online, 2021. OpenReview.net

work page 2021
[29]

Hoover, G

J. Hoover, G. Portillo-Wightman, L. Yeh, S. Havaldar, A. M. Davani, Y. Lin, B. Kennedy, M. Atari, Z. Kamel, M. Mendlen, et al. Moral foundations twitter corpus: A collection of 35k tweets annotated for moral sentiment.Social Psychological and Personality Science, 11(8):1057–1071, 2020

work page 2020
[30]

Hopf and C

C. Hopf and C. Schmidt.Zum Verhältnis von innerfamilialen sozialen Erfahrungen, Persönlichkeitsentwicklung und politischen Orientierungen: Dokumentation und Erörterung des methodischen Vorgehens in einer Studie zu diesem Thema. DEU, 1993

work page 1993
[31]

alignment

H. Kirk, B. Vidgen, P. Rottger, and S. Hale. The empty signifier problem: Towards clearer paradigms for operationalising" alignment”in large language models. InSocially Responsible Language Modelling Research (SoLaR) Workshop, 2023

work page 2023
[32]

S. Lee, Z. J. Wang, A. Chakravarthy, A. Helbling, S. Peng, M. Phute, D. H. P. Chau, and M. Kahng. Llm attributor: Interactive visual attribution for llm generation.Proceedings of the AAAI Conference on Artificial Intelligence, 39(28):29655–29657, Apr. 2025

work page 2025
[33]

B. Y. Lin, A. Ravichander, X. Lu, N. Dziri, M. Sclar, K. Chandu, C. Bhagavatula, and Y. Choi. The unlocking spell on base LLMs: Rethinking alignment via in-context learning. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[34]

Liscio, A

E. Liscio, A. E. Dondera, A. Geadau, C. M. Jonker, and P. K. Murukannaiah. Cross-Domain Classification of Moral Values. InFindings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’22, pages 2727–2745, Seattle, WA, USA, 2022

work page 2022
[35]

Liscio, M

E. Liscio, M. van der Meer, L. C. Siebert, C. M. Jonker, N. Mouter, and P. K. Murukannaiah. Axies: Identifying and evaluating context-specific values. InProceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), pages 799–808, 2021

work page 2021
[36]

McDonald, S

N. McDonald, S. Schoenebeck, and A. Forte. Reliability and inter-rater reliability in qualitative research: Norms and guidelines for cscw and hci practice.Proceedings of the ACM on human-computer interaction, 3(CSCW):1–23, 2019

work page 2019
[37]

S. Mishra. Decision-making under risk: Integrating perspectives from biology, economics, and psychology.Personality and Social Psychology Review, 18(3):280–307, 2014

work page 2014
[38]

M. J. Muller. The human-computer interaction handbook.Participatory design: the third space in HCI, pages 1051–1068, 2003

work page 2003
[39]

M. J. Muller and A. Druin. Participatory design: the third space in hci in the humancomputer interaction handbook: Fundamentals, evolving technologies and emerging applications, 2002

work page 2002
[40]

M. J. Muller and S. Kuhn. Participatory design.Communications of the ACM, 36(6):24–28, 1993

work page 1993
[41]

Ouyang, J

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems, 35:27730–27744, 2022

work page 2022
[42]

B. Pan, Y. Li, W. Zhang, W. Lu, M. Xu, S. Zhou, Y. Zhu, M. Zhong, and T. Qian. A survey on training-free alignment of large language models. In C. Christodoulopoulos, T. Chakraborty, C. Rose, and V. Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pages 4445–4461, Suzhou, China, Nov. 2025. Association for Computational ...

work page 2025
[43]

Parekh, P

J. Parekh, P. Khayatan, M. Shukor, A. Newson, and M. Cord. A concept-based explainability framework for large multimodal models. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 135783–135818. Curran Associates, Inc., 2024

work page 2024
[44]

Pasman, S

G. Pasman, S. Boess, P. Desmet, et al. Interaction vision: expressing and identifying the qualities of user-product interactions. InDS 69: Proceedings of E&PDE 2011, the 13th International Conference on Engineering and Product Design Education, London, UK, 08.-09.09. 2011, pages 149–154, 2011

work page 2011
[45]

Polletta.Freedom is an endless meeting: Democracy in American social movements

F. Polletta.Freedom is an endless meeting: Democracy in American social movements. University of Chicago Press, 2019

work page 2019
[46]

Pommeranz, C

A. Pommeranz, C. Detweiler, P. Wiggers, and C. Jonker. Elicitation of situated values: need for tools to help stakeholders and designers to reflect and communicate.Ethics and Information Technology, 14(4):285–303, 2012

work page 2012
[47]

L. Qiu, Y. Zhao, J. Li, P. Lu, B. Peng, J. Gao, and S.-C. Zhu. Valuenet: A new dataset for human value driven dialogue system. InProceedings of the AAAI Conference on Artificial Intelligence, pages 11183–11191, 2022

work page 2022
[48]

Rafailov, A

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[49]

Ricketts and D

D. Ricketts and D. Lockton. Mental landscapes: Externalizing mental models through metaphors.Interactions, 26(2):86–90, 2019

work page 2019
[50]

Russell.Human compatible: Artificial intelligence and the problem of control

S. Russell.Human compatible: Artificial intelligence and the problem of control. Penguin, 2019

work page 2019
[51]

Sadek, R

M. Sadek, R. A. Calvo, and C. Mougenot. Designing value-sensitive ai: a critical review and recommendations for socio-technical design processes. AI and Ethics, 4(4):949–967, 2024

work page 2024
[52]

Sanchez, S

C. Sanchez, S. Wang, K. Savolainen, F. A. Epp, and A. Salovaara. Let’s talk futures: A literature review of hci’s future orientation. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–36, 2025

work page 2025
[53]

E. B.-N. Sanders and P. J. Stappers.Convivial toolbox: Generative research for the front end of design. Bis, 2012

work page 2012
[54]

Agent Laboratory: Using LLM Agents as Research Assistants

S. Schmidgall, Y. Su, Z. Wang, X. Sun, J. Wu, X. Yu, J. Liu, Z. Liu, and E. Barsoum. Agent laboratory: Using llm agents as research assistants.arXiv preprint arXiv:2501.04227, 2025

work page internal anchor Pith review arXiv 2025
[55]

S. H. Schwartz. An overview of the Schwartz theory of basic values.Online Readings in Psychology and Culture, 2(1):11:1–11:20, 2012. Manuscript submitted to ACM 18 Arzberger, et al

work page 2012
[56]

H. Shen, T. Knearem, R. Ghosh, K. Alkiek, K. Krishna, Y. Liu, Z. Ma, S. Petridis, Y.-H. Peng, L. Qiwei, et al. Towards bidirectional human-ai alignment: A systematic review for clarifications, framework, and future directions.arXiv preprint arXiv:2406.09264, 2406:1–56, 2024

work page arXiv 2024
[57]

T. Shi, Z. Wang, L. Yang, Y.-C. Lin, Z. He, M. Wan, P. Zhou, S. Jauhar, S. Chen, S. Xia, et al. Wildfeedback: Aligning llms with in-situ user interactions and feedback.arXiv preprint arXiv:2408.15549, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[58]

Simonsen and T

J. Simonsen and T. Robertson.Routledge international handbook of participatory design, volume 711. Routledge New York, 2013

work page 2013
[59]

Sohail and L

A. Sohail and L. Zhang. Using large language models to facilitate academic work in the psychological sciences.Current Psychology, 44(9):7910–7918, 2025

work page 2025
[60]

Sorensen, L

T. Sorensen, L. Jiang, J. D. Hwang, S. Levine, V. Pyatkin, P. West, N. Dziri, X. Lu, K. Rao, C. Bhagavatula, et al. Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties. InProceedings of the AAAI Conference on Artificial Intelligence, pages 19937–19947, 2024

work page 2024
[61]

Stiennon, L

N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. F. Christiano. Learning to summarize with human feedback.Advances in Neural Information Processing Systems, 33:3008–3021, 2020

work page 2020
[62]

Terry, C

M. Terry, C. Kulkarni, M. Wattenberg, L. Dixon, and M. R. Morris. Interactive ai alignment: Specification, process, and evaluation alignment.arXiv preprint arXiv:2311.00710, 2023

work page arXiv 2023
[63]

S. Tiwari. Biases and fairness in llms. InGenerative AI: Techniques, Models and Applications, pages 229–242. Springer, 2025

work page 2025
[64]

Watson, M

E. Watson, M. Nguyen, S. Pan, and S. Zhang. Choice vectors: Streamlining personal ai alignment through binary selection.Multimodal Technologies and Interaction, 9(3):22, 2025

work page 2025
[65]

Watson, T

E. Watson, T. Viana, S. Zhang, B. Sturgeon, and L. Petersson. Towards an end-to-end personal fine-tuning framework for ai value alignment. Electronics, 13(20):art–4044, 2024

work page 2024
[66]

Wehner, S

J. Wehner, S. Abdelnabi, D. Tan, D. Krueger, and M. Fritz. Taxonomy, opportunities, and challenges of representation engineering for large language models.Transactions on Machine Learning Research, 2025

work page 2025
[67]

Whitfield and M

S. Whitfield and M. A. Hofmann. Elicit: Ai literature review research assistant.Public Services Quarterly, 19(3):201–207, 2023

work page 2023
[68]

P.-H. Wong. Cultural differences as excuses? human rights and cultural values in global ethics and governance of ai.Philosophy & Technology, 33(4):705–715, 2020

work page 2020
[69]

H. Zhao, H. Chen, F. Yang, N. Liu, H. Deng, H. Cai, S. Wang, D. Yin, and M. Du. Explainability for large language models: A survey.ACM Trans. Intell. Syst. Technol., 15(2), Feb. 2024. A PARTICIPANT DEMOGRAPHICS Figure A1 illustrates the demographics of our workshop participants for the variables age, gender, and region of origin, as well as self-assigned ...

work page 2024