pith. machine review for the scientific record. sign in

arxiv: 2601.15895 · v2 · submitted 2026-01-22 · 💻 cs.HC

Recognition: no theorem link

Co-Constructing Alignment: A Participatory Approach to Situate AI Values

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:23 UTC · model grok-4.3

classification 💻 cs.HC
keywords AI alignmentparticipatory designhuman-AI interactionvalue misalignmentsituated practiceLLM usersco-construction
0
0 comments X

The pith

Alignment between users and AI is co-constructed through their ongoing interactions rather than preset in the model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that treating alignment as a fixed model property misses how users actually encounter and respond to value mismatches during real use. It supports this through a participatory workshop in which researchers using large language models kept misalignment diaries and then used generative design activities to imagine ways to act on those experiences. Participants described misalignments mainly as unexpected outputs or breakdowns in tasks and social exchanges, not as abstract ethical problems. They proposed practical responses such as adjusting prompts, reinterpreting outputs, or choosing deliberate non-use. The work concludes that systems should be designed to treat alignment as a shared, situated practice that continues over time.

Core claim

Alignment is an interactional practice co-constructed during human-AI interaction. In a workshop that paired misalignment diaries with generative design activities, researchers using large language models as research assistants reported that misalignments appear as unexpected responses and task or social breakdowns. Participants described contributing to alignment through roles that include adjusting model behavior, interpreting outputs, and using deliberate non-engagement as a strategy.

What carries the argument

The participatory workshop combining misalignment diaries with generative design activities, which makes visible how users experience misalignments in context and how they envision acting on them.

If this is right

  • AI systems should provide interfaces that let users adjust or reinterpret outputs during use.
  • Designs should recognize deliberate non-engagement as one legitimate way users maintain alignment.
  • Alignment support must be ongoing and tied to specific tasks and social contexts rather than delivered once.
  • User roles in alignment include active interpretation and response rather than passive reception of model values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interfaces could add lightweight logging features so users can note and revisit misalignments without extra effort.
  • The same diary-plus-design method could be tested with other user groups such as educators or clinicians to see if patterns repeat.
  • Over time, systems that treat alignment as co-construction might reduce the need for repeated retraining by letting users steer behavior in context.

Load-bearing premise

That experiences shared in one workshop with researchers using large language models reflect how alignment works for wider groups of users and in actual daily practice.

What would settle it

A longitudinal study that logs real LLM interactions, records observed misalignments, and compares them against users' later self-reports would show whether the workshop descriptions match day-to-day dynamics.

Figures

Figures reproduced from arXiv: 2601.15895 by Anne Arzberger, Enrico Liscio, Inigo Martinez de Rituerto de Troya, Jie Yang, Maria Luce Lupetti.

Figure 1
Figure 1. Figure 1: Overview of the three workshop phases. Phase 0 uses a diary to sensitise participants to different forms of misalignment in AI [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of a misaligned interaction traced from initial diary entry to concern and values at stake across Phases 0–2. The [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: From a shared alignment goal to an action metaphor in Steps 4–6. P7 and Group 3 define an alignment goal emphasising [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: P7-envisioned interface for Step 7, supporting reflexive alignment through visible model positionality. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

As AI systems become embedded in everyday practice, value misalignment has emerged as a pressing concern. Yet, dominant alignment approaches remain model centric, treating users as passive recipients of prespecified values rather than as epistemic agents who encounter and respond to misalignment during interactions. Drawing on situated perspectives, we frame alignment as an interactional practice co-constructed during human AI interaction. We investigate how users understand and wish to contribute to this process through a participatory workshop that combines misalignment diaries with generative design activities. We surface how misalignments materialise in practice and how users envision acting on them, grounded in the context of researchers using Large Language Models as research assistants. Our findings show that misalignments are experienced less as abstract ethical violations than as unexpected responses, and task or social breakdowns. Participants articulated roles ranging from adjusting and interpreting model behaviour to deliberate non-engagement as an alignment strategy. We conclude with implications for designing systems that support alignment as an ongoing, situated, and shared practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that dominant model-centric approaches to AI value alignment overlook users as epistemic agents who actively encounter and respond to misalignment in practice. Drawing on situated perspectives, it reframes alignment as an interactional, co-constructed practice. This is investigated via a participatory workshop combining misalignment diaries and generative design activities with researchers who use LLMs as research assistants. The study surfaces misalignments as unexpected responses or task/social breakdowns rather than abstract ethical violations, identifies user roles such as adjusting, interpreting, and deliberate non-engagement, and derives design implications for systems that support alignment as an ongoing, situated, and shared process.

Significance. If the empirical patterns hold, the work offers a useful counterpoint to technical alignment research by grounding value alignment in everyday HCI practices. It provides concrete examples of how users already manage misalignment through interactional strategies, which could inform the design of more responsive AI interfaces and participatory alignment methods. The participatory approach itself demonstrates a method for eliciting user perspectives on alignment that may be adaptable to other domains.

major comments (1)
  1. The central framing of alignment as a general interactional practice rests on data from a single participatory workshop with a narrow, self-selected sample of LLM researchers. The manuscript must clarify whether the surfaced misalignment types and roles are presented as context-specific to academic LLM use or as evidence for broader dynamics; without this scoping or additional validation, the leap to design implications for general systems risks overgeneralization.
minor comments (1)
  1. The abstract omits basic methodological details (participant count, recruitment, analysis procedure) that are standard for qualitative HCI papers and would help readers assess the findings' grounding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for this constructive comment on scoping and generalizability. We agree that the single-workshop design with a specific participant group requires clearer boundaries in the manuscript and have revised accordingly to avoid overgeneralization while retaining the value of the exploratory insights.

read point-by-point responses
  1. Referee: The central framing of alignment as a general interactional practice rests on data from a single participatory workshop with a narrow, self-selected sample of LLM researchers. The manuscript must clarify whether the surfaced misalignment types and roles are presented as context-specific to academic LLM use or as evidence for broader dynamics; without this scoping or additional validation, the leap to design implications for general systems risks overgeneralization.

    Authors: We agree that the empirical basis is a single participatory workshop with a self-selected group of LLM researchers and that this constrains claims to broader populations. The original manuscript already situates the work in the specific context of academic researchers using LLMs as research assistants (see abstract and Section 3), but we acknowledge that the transition to design implications could be read as implying wider applicability. In the revision we have added explicit scoping language in the introduction, findings, and conclusion: the observed misalignment types (unexpected responses, task/social breakdowns) and user roles (adjusting, interpreting, deliberate non-engagement) are presented as patterns identified within this academic LLM-assistant setting rather than as universal. Design implications are now framed as context-informed suggestions that illustrate how systems might support ongoing, situated alignment practices, with an explicit caveat that further validation across other user groups and domains is needed. We have also strengthened the limitations section to discuss sample characteristics and the exploratory nature of the participatory method. These changes directly address the risk of overgeneralization without requiring new data collection. revision: yes

Circularity Check

0 steps flagged

No circularity in qualitative framing or derivation

full rationale

The paper is a qualitative participatory study that frames alignment as co-constructed based on workshop findings with LLM researchers. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the derivation. The central claim is inductively supported by the described misalignment diaries and design activities rather than reducing to its inputs by construction. External situated-perspective literature is invoked without smuggling ansatzes or uniqueness theorems from the authors' prior work. This is a standard honest non-finding for non-mathematical empirical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that participatory methods reliably surface situated user understandings of misalignment; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption Situated perspectives frame alignment as interactional practice
    Invoked in the opening framing of alignment as co-constructed during human-AI interaction.

pith-pipeline@v0.9.0 · 5485 in / 1126 out tokens · 26647 ms · 2026-05-16T12:23:17.919094+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 3 internal anchors

  1. [1]

    Arnold, D

    T. Arnold, D. Kasenberg, and M. Scheutz. Value alignment or misalignment-what will keep systems accountable? InAAAI Workshops, pages 81–88, 2017

  2. [2]

    S. R. Arnstein. A ladder of citizen participation.Journal of the American Institute of planners, 35(4):216–224, 1969

  3. [3]

    Aroyo and C

    L. Aroyo and C. Welty. Truth is a lie: Crowd truth and the seven myths of human annotation.AI Magazine, 36(1):15–24, 2015

  4. [4]

    Arzberger, S

    A. Arzberger, S. Buijsman, M. L. Lupetti, A. Bozzon, and J. Yang. Nothing comes without its world–practical challenges of aligning llms to situated human values through rlhf. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 61–73, 2024

  5. [5]

    Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073, 2022

  6. [6]

    Beirami, A

    A. Beirami, A. Agarwal, J. Berant, A. N. D’Amour, J. Eisenstein, C. Nagpal, and A. T. Suresh. Theoretical guarantees on the best-of-n alignment policy. InForty-second International Conference on Machine Learning, 2025

  7. [7]

    E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, page 610–623, Virtual Event Canada, Mar. 2021. ACM

  8. [8]

    J. O. Berger. Statistical decision theory. InThe New Palgrave Dictionary of Economics, pages 1–6. Springer, 1987

  9. [9]

    Birhane, W

    A. Birhane, W. Isaac, V. Prabhakaran, M. Diaz, M. C. Elish, I. Gabriel, and S. Mohamed. Power to the people? opportunities and challenges for participatory ai. InProceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–8, 2022

  10. [10]

    Bommasani, D

    R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, et al. On the opportunities and risks of foundation models. Technical report, Stanford Center for Research on Foundation Models, 2021. Stanford Technical Report

  11. [11]

    Braun and V

    V. Braun and V. Clarke. Using thematic analysis in psychology.Qualitative research in psychology, 3(2):77–101, 2006

  12. [12]

    Braun and V

    V. Braun and V. Clarke. Reflecting on reflexive thematic analysis.Qualitative research in sport, exercise and health, 11(4):589–597, 2019

  13. [13]

    P.-Y. Chen. Ai alignment dialogues: An interactive approach to ai alignment in support agents. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, pages 894–894, 2022

  14. [14]

    Christian.The alignment problem: Machine learning and human values

    B. Christian.The alignment problem: Machine learning and human values. WW Norton & Company, 2020

  15. [15]

    P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei. Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

  16. [16]

    Clarke and V

    V. Clarke and V. Braun. Thematic analysis.The journal of positive psychology, 12(3):297–298, 2017

  17. [17]

    Cooke and U

    B. Cooke and U. Kothari.Participation: The new tyranny?Zed books, 2001

  18. [18]

    Dahlgren Lindström, L

    A. Dahlgren Lindström, L. Methnani, L. Krause, P. Ericson, Í. M. de Rituerto de Troya, D. Coelho Mollo, and R. Dobbe. Helpful, harmless, honest? sociotechnical limits of ai alignment and safety through reinforcement learning from human feedback: Ad lindström et al.Ethics and Information Technology, 27(2):28, 2025

  19. [19]

    de Wet, D

    J. de Wet, D. Wetzelhütter, and J. Bacher. Revisiting the trans-situationality of values in Schwartz’s Portrait Values Questionnaire.Quality and Quantity, 53(2):685–711, 2018

  20. [20]

    Delgado, S

    F. Delgado, S. Yang, M. Madaio, and Q. Yang. The participatory turn in ai design: Theoretical foundations and the current state of practice. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–23, 2023

  21. [21]

    X. Fan, Q. Xiao, X. Zhou, J. Pei, M. Sap, Z. Lu, and H. Shen. User-driven value alignment: Understanding users’ perceptions and strategies for addressing biased and discriminatory statements in ai companions. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–19, 2025

  22. [22]

    J. Fang, Z. Bi, R. Wang, H. Jiang, Y. Gao, K. Wang, A. Zhang, J. Shi, X. Wang, and T.-S. Chua. Towards neuron attributions in multi-modal large language models. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 122867–122890. Curran Associates, ...

  23. [23]

    Friedman, P

    B. Friedman, P. H. Kahn Jr, A. Borning, and A. Huldtgren. Value sensitive design and information systems. InEarly engagement and new technologies: Opening up the laboratory, pages 55–95. Springer, 2013

  24. [24]

    I. Gabriel. Artificial intelligence, values, and alignment.Minds and machines, 30(3):411–437, 2020. Manuscript submitted to ACM Co-Constructing Alignment: A Participatory Approach to Situate AI Values 17

  25. [25]

    Graham, J

    J. Graham, J. Haidt, and B. A. Nosek. Liberals and Conservatives Rely on Different Sets of Moral Foundations.Journal of Personality and Social Psychology, 96(5):1029–1046, 2009

  26. [26]

    Greenbaum and M

    J. Greenbaum and M. Kyng.Design at work: Cooperative design of computer systems. CRC Press, 2020

  27. [27]

    K. D. Gutiérrez and B. Rogoff. Cultural ways of learning: Individual traits or repertoires of practice.Educational researcher, 32(5):19–25, 2003

  28. [28]

    Hendrycks, C

    D. Hendrycks, C. Burns, S. Basart, A. Critch, J. Li, D. Song, and J. Steinhardt. Aligning AI With Shared Human Values. InProceedings of the Ninth International Conference on Learning Representations, ICLR ’21, pages 1–29, Online, 2021. OpenReview.net

  29. [29]

    Hoover, G

    J. Hoover, G. Portillo-Wightman, L. Yeh, S. Havaldar, A. M. Davani, Y. Lin, B. Kennedy, M. Atari, Z. Kamel, M. Mendlen, et al. Moral foundations twitter corpus: A collection of 35k tweets annotated for moral sentiment.Social Psychological and Personality Science, 11(8):1057–1071, 2020

  30. [30]

    Hopf and C

    C. Hopf and C. Schmidt.Zum Verhältnis von innerfamilialen sozialen Erfahrungen, Persönlichkeitsentwicklung und politischen Orientierungen: Dokumentation und Erörterung des methodischen Vorgehens in einer Studie zu diesem Thema. DEU, 1993

  31. [31]

    alignment

    H. Kirk, B. Vidgen, P. Rottger, and S. Hale. The empty signifier problem: Towards clearer paradigms for operationalising" alignment”in large language models. InSocially Responsible Language Modelling Research (SoLaR) Workshop, 2023

  32. [32]

    S. Lee, Z. J. Wang, A. Chakravarthy, A. Helbling, S. Peng, M. Phute, D. H. P. Chau, and M. Kahng. Llm attributor: Interactive visual attribution for llm generation.Proceedings of the AAAI Conference on Artificial Intelligence, 39(28):29655–29657, Apr. 2025

  33. [33]

    B. Y. Lin, A. Ravichander, X. Lu, N. Dziri, M. Sclar, K. Chandu, C. Bhagavatula, and Y. Choi. The unlocking spell on base LLMs: Rethinking alignment via in-context learning. InThe Twelfth International Conference on Learning Representations, 2024

  34. [34]

    Liscio, A

    E. Liscio, A. E. Dondera, A. Geadau, C. M. Jonker, and P. K. Murukannaiah. Cross-Domain Classification of Moral Values. InFindings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’22, pages 2727–2745, Seattle, WA, USA, 2022

  35. [35]

    Liscio, M

    E. Liscio, M. van der Meer, L. C. Siebert, C. M. Jonker, N. Mouter, and P. K. Murukannaiah. Axies: Identifying and evaluating context-specific values. InProceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), pages 799–808, 2021

  36. [36]

    McDonald, S

    N. McDonald, S. Schoenebeck, and A. Forte. Reliability and inter-rater reliability in qualitative research: Norms and guidelines for cscw and hci practice.Proceedings of the ACM on human-computer interaction, 3(CSCW):1–23, 2019

  37. [37]

    S. Mishra. Decision-making under risk: Integrating perspectives from biology, economics, and psychology.Personality and Social Psychology Review, 18(3):280–307, 2014

  38. [38]

    M. J. Muller. The human-computer interaction handbook.Participatory design: the third space in HCI, pages 1051–1068, 2003

  39. [39]

    M. J. Muller and A. Druin. Participatory design: the third space in hci in the humancomputer interaction handbook: Fundamentals, evolving technologies and emerging applications, 2002

  40. [40]

    M. J. Muller and S. Kuhn. Participatory design.Communications of the ACM, 36(6):24–28, 1993

  41. [41]

    Ouyang, J

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems, 35:27730–27744, 2022

  42. [42]

    B. Pan, Y. Li, W. Zhang, W. Lu, M. Xu, S. Zhou, Y. Zhu, M. Zhong, and T. Qian. A survey on training-free alignment of large language models. In C. Christodoulopoulos, T. Chakraborty, C. Rose, and V. Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pages 4445–4461, Suzhou, China, Nov. 2025. Association for Computational ...

  43. [43]

    Parekh, P

    J. Parekh, P. Khayatan, M. Shukor, A. Newson, and M. Cord. A concept-based explainability framework for large multimodal models. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 135783–135818. Curran Associates, Inc., 2024

  44. [44]

    Pasman, S

    G. Pasman, S. Boess, P. Desmet, et al. Interaction vision: expressing and identifying the qualities of user-product interactions. InDS 69: Proceedings of E&PDE 2011, the 13th International Conference on Engineering and Product Design Education, London, UK, 08.-09.09. 2011, pages 149–154, 2011

  45. [45]

    Polletta.Freedom is an endless meeting: Democracy in American social movements

    F. Polletta.Freedom is an endless meeting: Democracy in American social movements. University of Chicago Press, 2019

  46. [46]

    Pommeranz, C

    A. Pommeranz, C. Detweiler, P. Wiggers, and C. Jonker. Elicitation of situated values: need for tools to help stakeholders and designers to reflect and communicate.Ethics and Information Technology, 14(4):285–303, 2012

  47. [47]

    L. Qiu, Y. Zhao, J. Li, P. Lu, B. Peng, J. Gao, and S.-C. Zhu. Valuenet: A new dataset for human value driven dialogue system. InProceedings of the AAAI Conference on Artificial Intelligence, pages 11183–11191, 2022

  48. [48]

    Rafailov, A

    R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems, 36, 2024

  49. [49]

    Ricketts and D

    D. Ricketts and D. Lockton. Mental landscapes: Externalizing mental models through metaphors.Interactions, 26(2):86–90, 2019

  50. [50]

    Russell.Human compatible: Artificial intelligence and the problem of control

    S. Russell.Human compatible: Artificial intelligence and the problem of control. Penguin, 2019

  51. [51]

    Sadek, R

    M. Sadek, R. A. Calvo, and C. Mougenot. Designing value-sensitive ai: a critical review and recommendations for socio-technical design processes. AI and Ethics, 4(4):949–967, 2024

  52. [52]

    Sanchez, S

    C. Sanchez, S. Wang, K. Savolainen, F. A. Epp, and A. Salovaara. Let’s talk futures: A literature review of hci’s future orientation. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–36, 2025

  53. [53]

    E. B.-N. Sanders and P. J. Stappers.Convivial toolbox: Generative research for the front end of design. Bis, 2012

  54. [54]

    Agent Laboratory: Using LLM Agents as Research Assistants

    S. Schmidgall, Y. Su, Z. Wang, X. Sun, J. Wu, X. Yu, J. Liu, Z. Liu, and E. Barsoum. Agent laboratory: Using llm agents as research assistants.arXiv preprint arXiv:2501.04227, 2025

  55. [55]

    S. H. Schwartz. An overview of the Schwartz theory of basic values.Online Readings in Psychology and Culture, 2(1):11:1–11:20, 2012. Manuscript submitted to ACM 18 Arzberger, et al

  56. [56]

    H. Shen, T. Knearem, R. Ghosh, K. Alkiek, K. Krishna, Y. Liu, Z. Ma, S. Petridis, Y.-H. Peng, L. Qiwei, et al. Towards bidirectional human-ai alignment: A systematic review for clarifications, framework, and future directions.arXiv preprint arXiv:2406.09264, 2406:1–56, 2024

  57. [57]

    T. Shi, Z. Wang, L. Yang, Y.-C. Lin, Z. He, M. Wan, P. Zhou, S. Jauhar, S. Chen, S. Xia, et al. Wildfeedback: Aligning llms with in-situ user interactions and feedback.arXiv preprint arXiv:2408.15549, 2024

  58. [58]

    Simonsen and T

    J. Simonsen and T. Robertson.Routledge international handbook of participatory design, volume 711. Routledge New York, 2013

  59. [59]

    Sohail and L

    A. Sohail and L. Zhang. Using large language models to facilitate academic work in the psychological sciences.Current Psychology, 44(9):7910–7918, 2025

  60. [60]

    Sorensen, L

    T. Sorensen, L. Jiang, J. D. Hwang, S. Levine, V. Pyatkin, P. West, N. Dziri, X. Lu, K. Rao, C. Bhagavatula, et al. Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties. InProceedings of the AAAI Conference on Artificial Intelligence, pages 19937–19947, 2024

  61. [61]

    Stiennon, L

    N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. F. Christiano. Learning to summarize with human feedback.Advances in Neural Information Processing Systems, 33:3008–3021, 2020

  62. [62]

    Terry, C

    M. Terry, C. Kulkarni, M. Wattenberg, L. Dixon, and M. R. Morris. Interactive ai alignment: Specification, process, and evaluation alignment.arXiv preprint arXiv:2311.00710, 2023

  63. [63]

    S. Tiwari. Biases and fairness in llms. InGenerative AI: Techniques, Models and Applications, pages 229–242. Springer, 2025

  64. [64]

    Watson, M

    E. Watson, M. Nguyen, S. Pan, and S. Zhang. Choice vectors: Streamlining personal ai alignment through binary selection.Multimodal Technologies and Interaction, 9(3):22, 2025

  65. [65]

    Watson, T

    E. Watson, T. Viana, S. Zhang, B. Sturgeon, and L. Petersson. Towards an end-to-end personal fine-tuning framework for ai value alignment. Electronics, 13(20):art–4044, 2024

  66. [66]

    Wehner, S

    J. Wehner, S. Abdelnabi, D. Tan, D. Krueger, and M. Fritz. Taxonomy, opportunities, and challenges of representation engineering for large language models.Transactions on Machine Learning Research, 2025

  67. [67]

    Whitfield and M

    S. Whitfield and M. A. Hofmann. Elicit: Ai literature review research assistant.Public Services Quarterly, 19(3):201–207, 2023

  68. [68]

    P.-H. Wong. Cultural differences as excuses? human rights and cultural values in global ethics and governance of ai.Philosophy & Technology, 33(4):705–715, 2020

  69. [69]

    H. Zhao, H. Chen, F. Yang, N. Liu, H. Deng, H. Cai, S. Wang, D. Yin, and M. Du. Explainability for large language models: A survey.ACM Trans. Intell. Syst. Technol., 15(2), Feb. 2024. A PARTICIPANT DEMOGRAPHICS Figure A1 illustrates the demographics of our workshop participants for the variables age, gender, and region of origin, as well as self-assigned ...