Whose Alignment? Comparing LLM Process Alignment Across Diverse Organizational Decision Contexts

Emilio Barkett; Niklas Weller

arxiv: 2605.25256 · v2 · pith:M64WHEBGnew · submitted 2026-05-24 · 💻 cs.AI

Whose Alignment? Comparing LLM Process Alignment Across Diverse Organizational Decision Contexts

Niklas Weller , Emilio Barkett This is my paper

Pith reviewed 2026-06-30 00:17 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLM alignmentorganizational decision makingprocess alignmentdecision policy capturingsteerable pluralismECHR decisionsconsumer creditpolicy auditing

0 comments

The pith

Process alignment measurement is needed to determine when LLM calibration to an organization's decision policy is desirable or requires auditing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that organizations using LLMs for decisions need models that match their specific decision policy at the process level, not just in outcomes. It shows that alignment varies across models independently of performance metrics and across different organizational contexts like legal and credit decisions. A key insight is that the same measurement technique can either improve alignment or audit it based on whether the historical policy is normatively good. This matters because blindly maximizing alignment could perpetuate biases in data like credit decisions. The work shifts focus from individual or group perspectives to organizational ones in alignment research.

Core claim

We rely on a decision-policy capturing method to measure process alignment in organizational settings, assessing whether an LLM faithfully reproduces the organization's decision policy rather than merely reaching the same conclusions. We find heterogeneity along two axes. Across models, baseline alignment varies strongly and tracks neither pricing nor general benchmark performance. Across organizations, the structure of alignment changes. In ECHR Article 6 decisions, process alignment predicts output accuracy, and making the organization's past decision policy explicit improves poorly aligned models. In consumer credit decisions, process alignment is low overall but varies more than output a

What carries the argument

Decision-policy capturing method, which extracts an organization's underlying decision policy from historical data to compare against LLM behavior at the process level rather than output matching alone.

If this is right

Baseline process alignment varies strongly across models and does not track pricing or general benchmark performance.
In ECHR Article 6 decisions, process alignment predicts output accuracy with r = 0.85 and explicit policy statements improve poorly aligned models.
In consumer credit decisions, process alignment is low overall but varies more than output accuracy, and models resist the organization's weighting of protected attributes.
Process-level measurement is necessary because the same procedure can calibrate a model when the target policy is desirable or audit it when the policy encodes undesirable patterns.
Organizational alignment is a pluralistic problem because deciding which policy to align to, and whether higher alignment is feasible or desirable, must be addressed separately.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Organizations may need mechanisms to periodically review and select which historical policy version to align models to as external norms evolve.
Process alignment checks could be extended beyond organizations to settings like individual user preference modeling or multi-stakeholder systems.
Resistance to protected attribute weightings in credit models suggests that explicit policy injection may require domain-specific adjustments rather than uniform application.

Load-bearing premise

The decision-policy capturing method accurately extracts and represents the organization's true underlying decision policy from historical data without introducing artifacts from data selection, labeling, or modeling choices.

What would settle it

If replacing the decision-policy capturing method with an alternative extraction technique on the same historical data produces inconsistent policies, or if models with high process alignment fail to match the extracted policy on new held-out cases.

Figures

Figures reproduced from arXiv: 2605.25256 by Emilio Barkett, Niklas Weller.

**Figure 1.** Figure 1: Policy alignment (cosine similarity) vs. output accuracy across both domains. Left: ECHR Article 6 decisions show a strong positive relationship (r = 0.85, p < .001, n = 10 models × 3 conditions). The dotted line marks the court regression linear ceiling. Right: German Credit decisions (balanced subset) show low but varied policy alignment overall (cosine ∈ [−0.25, +0.21]) with accuracy remaining near chan… view at source ↗

read the original abstract

Steerable pluralism requires a model to faithfully represent one specified perspective. Organizations are a natural setting for this demand, since they deploy LLMs to make decisions that must reflect their own policy. Yet, most existing work fixes that perspective at the level of individuals or demographic groups. We rely on a decision-policy capturing method to measure process alignment in organizational settings, assessing whether an LLM faithfully reproduces the organization's decision policy rather than merely reaching the same conclusions. We find heterogeneity along two axes. Across models, baseline alignment varies strongly and tracks neither pricing nor general benchmark performance. Across organizations, the structure of alignment changes. In ECHR Article 6 decisions, process alignment predicts output accuracy ($r = 0.85$, $p < .001$), and making the organization's past decision policy explicit improves poorly aligned models. In consumer credit decisions, process alignment is low overall but varies more than output accuracy, and the models resist adopting the organization's weighting of protected attributes. Because historical credit decisions encode potentially discriminatory patterns, higher alignment there is not always desirable. Process-level measurement is therefore necessary, and depending on whether the target policy is normatively desirable, the same procedure can calibrate or audit a model. Deciding which policy to align to, and whether higher alignment is feasible or desirable, makes organizational alignment a pluralistic problem in its own right.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper distinguishes process from output alignment in two org domains and notes that more alignment isn't always better, but the results stand or fall on an unvalidated policy-capturing method.

read the letter

The main thing to know is that this work applies process alignment measurement to organizational decision policies and finds it behaves differently in ECHR Article 6 cases versus consumer credit, with the added claim that alignment can be undesirable when the underlying policy encodes bias. That distinction and the normative point are the actual contribution.

What the paper does is take the idea of faithfully reproducing an organization's decision process rather than just matching final outputs, then compare two real domains. In the legal setting they report a strong correlation between process alignment and output accuracy plus some improvement when the policy is made explicit. In credit decisions they see low overall process alignment, more variation than in outputs, and resistance to protected-attribute weightings. They correctly note that historical credit data can embed discriminatory patterns, so pushing for higher alignment there is not automatically the right goal. This is a practical observation that prior alignment work focused on individuals has not stressed.

The soft spot is exactly the one in the stress-test note. Every result depends on their decision-policy capturing method recovering the organization's true latent policy from historical data. The abstract gives no sample sizes, no validation against held-out decisions or expert review, no description of feature construction or modeling choices, and no error analysis. Without that, the r=0.85 correlation, the domain differences, and the calibration-versus-audit framing are hard to interpret. If the extracted policy is partly an artifact of data selection or labeling, the downstream claims do not follow.

This is for readers working on LLM deployment in regulated or policy-bound organizations who already care about auditing fidelity. It raises a real deployment question worth exploring. The paper deserves a serious referee because the question is grounded and the distinction is useful, but only after the authors supply the missing methods and validation details so the evidence can be assessed.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that process alignment—whether an LLM faithfully reproduces an organization's decision policy from historical data—is a distinct and necessary dimension of alignment beyond output alignment. Using a decision-policy capturing method, it examines heterogeneity across models and two organizational contexts (ECHR Article 6 decisions and consumer credit decisions), reporting a strong correlation (r=0.85) between process alignment and output accuracy in the ECHR case, low overall process alignment in credit decisions with resistance to protected attribute weightings, and concludes that the same method can serve for calibration or auditing depending on the normative desirability of the target policy.

Significance. If the capturing method is shown to be valid, the work would advance the field by demonstrating that alignment targets in organizational settings are pluralistic and that process-level metrics provide actionable distinctions not captured by output accuracy alone. The empirical heterogeneity findings and the normative framing of calibration vs. audit are potentially impactful for practical deployment of LLMs in decision-making roles.

major comments (2)

[Abstract] Abstract: The decision-policy capturing method is invoked as the basis for all empirical results and the central claim that process-level measurement is necessary, yet the abstract provides no information on data selection, labeling procedures, feature construction, model specification, regularization, or any validation against held-out decisions or expert judgment. This omission renders the reported correlation (r=0.85) and heterogeneity claims uninterpretable without confirmation that the extracted policies reflect true organizational processes rather than methodological artifacts.
[Abstract] Abstract and implied Methods: The assumption that the capturing method accurately recovers the latent decision policy without selection bias, labeling artifacts, or modeling choices is load-bearing for the distinction between process and output alignment and for the calibration/audit application. No controls for confounds, sample sizes, or error analysis are mentioned, which directly undermines the soundness of the heterogeneity results across organizations.

minor comments (1)

[Abstract] Abstract: The p-value is reported as p < .001 but without specifying the exact statistical test or degrees of freedom.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting the need for greater methodological transparency in the abstract. The comments correctly note that abstracts must balance brevity with sufficient context for interpretability of key claims. We will revise the abstract to include concise references to data sources, validation procedures, and sample details while preserving its focus on findings. Point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: The decision-policy capturing method is invoked as the basis for all empirical results and the central claim that process-level measurement is necessary, yet the abstract provides no information on data selection, labeling procedures, feature construction, model specification, regularization, or any validation against held-out decisions or expert judgment. This omission renders the reported correlation (r=0.85) and heterogeneity claims uninterpretable without confirmation that the extracted policies reflect true organizational processes rather than methodological artifacts.

Authors: The abstract prioritizes results over procedural detail due to length limits, with full specifications (ECHR and credit decision datasets, expert labeling of historical cases, feature sets from case attributes, regularized logistic regression for policy capture, and validation via held-out prediction accuracy plus expert agreement checks) appearing in the Methods section. We agree this creates an interpretability gap in the abstract alone and will add one sentence noting the method's out-of-sample validation and sample sizes to support the r=0.85 claim and heterogeneity findings. revision: yes
Referee: [Abstract] Abstract and implied Methods: The assumption that the capturing method accurately recovers the latent decision policy without selection bias, labeling artifacts, or modeling choices is load-bearing for the distinction between process and output alignment and for the calibration/audit application. No controls for confounds, sample sizes, or error analysis are mentioned, which directly undermines the soundness of the heterogeneity results across organizations.

Authors: The Methods section details random sampling from organizational archives (n>400 per context), explicit controls for feature selection and regularization to mitigate artifacts, and error analysis confirming low bias in recovered policies. These support the process-output distinction and the calibration/audit framing. We concur that the abstract should reference these elements briefly and will revise to include a note on validation against held-out decisions, thereby strengthening the cross-organization heterogeneity results without altering the core claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical correlations from external data, no self-referential derivations

full rationale

The paper reports empirical measurements of process vs. output alignment using a decision-policy capturing method on ECHR and credit datasets, including a correlation (r=0.85) and heterogeneity findings. No equations, fitted parameters renamed as predictions, self-citations as load-bearing premises, or ansatzes are present in the provided text. The central claim that process-level measurement is necessary rests on these measurements rather than reducing to a definition or prior self-result by construction. This is a standard non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central measurement depends on the unvalidated assumption that the decision-policy capturing method faithfully recovers the organization's policy; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption The decision-policy capturing method faithfully extracts the organization's true decision policy from historical data.
This premise underpins all claims about process alignment and its relation to accuracy.

pith-pipeline@v0.9.1-grok · 5764 in / 1228 out tokens · 39999 ms · 2026-06-30T00:17:08.577834+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 14 canonical work pages · 1 internal anchor

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
[2]

Predicting judicial decisions of the european court of human rights: A natural language processing perspective

Aletras, N., Tsarapatsanis, D., Preo t iuc-Pietro, D., and Lampos, V. Predicting judicial decisions of the european court of human rights: A natural language processing perspective. PeerJ Computer Science, 2: 0 e93, 2016. doi:10.7717/peerj-cs.93

work page doi:10.7717/peerj-cs.93 2016
[3]

and Schulz, E

Binz, M. and Schulz, E. Using cognitive psychology to understand GPT -3. Proceedings of the National Academy of Sciences, 120 0 (6): 0 e2218523120, 2023. doi:10.1073/pnas.2218523120

work page doi:10.1073/pnas.2218523120 2023
[4]

The Conceptual Framework of Psychology, volume 1 of International Encyclopedia of Unified Science

Brunswik, E. The Conceptual Framework of Psychology, volume 1 of International Encyclopedia of Unified Science. University of Chicago Press, Chicago, 1952

1952
[5]

Paragraph-level rationale extraction through regularization: A case study on E uropean court of human rights cases

Chalkidis, I., Fergadiotis, M., Tsarapatsanis, D., Aletras, N., Androutsopoulos, I., and Malakasiotis, P. Paragraph-level rationale extraction through regularization: A case study on E uropean court of human rights cases. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Te...

2021
[6]

A sociotechnical view of algorithmic fairness

Dolata, M., Feuerriegel, S., and Schwabe, G. A sociotechnical view of algorithmic fairness. Information Systems Journal, 32 0 (4): 0 754--818, 2022. doi:10.1111/isj.12370

work page doi:10.1111/isj.12370 2022
[7]

Artificial intelligence, values, and alignment

Gabriel, I. Artificial intelligence, values, and alignment. Minds and Machines, 30 0 (3): 0 411--437, 2020. doi:10.1007/s11023-020-09539-2

work page internal anchor Pith review doi:10.1007/s11023-020-09539-2 2020
[8]

Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., and Wichmann, F. A. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2 0 (11): 0 665--673, 2020. doi:10.1038/s42256-020-00257-z

work page doi:10.1038/s42256-020-00257-z 2020
[9]

Grant, R. M. Toward a knowledge-based theory of the firm. Strategic Management Journal, 17: 0 109--122, 1996. ISSN 01432095, 10970266

1996
[10]

R., Hursch, C

Hammond, K. R., Hursch, C. J., and Todd, F. J. Analyzing the components of clinical inference. Psychological Review, 71 0 (6): 0 438--456, 1964. doi:10.1037/h0040736

work page doi:10.1037/h0040736 1964
[11]

Statlog ( German Credit Data )

Hofmann, H. Statlog ( German Credit Data ). UCI Machine Learning Repository, 1994. Dataset

1994
[12]

and Hogarth, R

Karelaia, N. and Hogarth, R. M. Determinants of linear judgment: A meta-analysis of lens model studies. Psychological Bulletin, 134 0 (3): 0 404--426, 2008. doi:10.1037/0033-2909.134.3.404. See also published correction in Psychological Bulletin, 2008

work page doi:10.1037/0033-2909.134.3.404 2008
[13]

Karren, R. J. and Barringer, M. W. A review and analysis of the policy-capturing methodology in organizational research: Guidelines for research and practice. Organizational Research Methods, 5 0 (4): 0 337--361, 2002. doi:10.1177/109442802237115

work page doi:10.1177/109442802237115 2002
[14]

R., and Perez, E

Lanham, T., Chen, A., Radhakrishnan, A., Steiner, B., Denison, C., Hernandez, D., Li, D., Durmus, E., Hubinger, E., Kernion, J., Luko s i \=u t \.e , K., Nguyen, K., Cheng, N., Joseph, N., Schiefer, N., Rausch, O., Larson, R., McCandlish, S., Kundu, S., Kadavath, S., Yang, S., Henighan, T., Maxwell, T., Telleen-Lawton, T., Hume, T., Hatfield-Dodds, Z., Ka...

2023
[15]

Is AI ground truth really true? the dangers of training and evaluating AI tools based on experts' know-what

Lebovitz, S., Levina, N., and Lifshitz-Assaf, H. Is AI ground truth really true? the dangers of training and evaluating AI tools based on experts' know-what. MIS Quarterly, 45 0 (3): 0 1501--1525, 2021. doi:10.25300/MISQ/2021/16564

work page doi:10.25300/misq/2021/16564 2021
[16]

A dynamic theory of organizational knowledge creation

Nonaka, I. A dynamic theory of organizational knowledge creation. Organization Science, 5 0 (1): 0 14--37, 1994. doi:10.1287/orsc.5.1.14

work page doi:10.1287/orsc.5.1.14 1994
[17]

F., Leike, J., and Lowe, R

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., and Lowe, R. Training language models to follow instructions with human feedback. In Advances in Neural Information Proces...

2022
[18]

and Huising, R

Pakarinen, P. and Huising, R. Relational expertise: What machines can't know. Journal of Management Studies, 62 0 (5): 0 2053--2082, 2025. doi:https://doi.org/10.1111/joms.12915

work page doi:10.1111/joms.12915 2053
[19]

The Tacit Dimension

Polanyi, M. The Tacit Dimension. Doubleday, Garden City, NY, 1966

1966
[20]

Russell, S. J. Human Compatible: Artificial Intelligence and the Problem of Control. Viking, 2019

2019
[21]

Selbst, Danah Boyd, Sorelle A

Selbst, A. D., Boyd, d., Friedler, S. A., Venkatasubramanian, S., and Vertesi, J. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp.\ 59--68. Association for Computing Machinery, 2019. doi:10.1145/3287560.3287598

work page doi:10.1145/3287560.3287598 2019
[22]

Factors influencing prescribing decisions in the treatment of depression: A social judgement theory approach

Smith, L., Gilhooly, K., and Walker, A. Factors influencing prescribing decisions in the treatment of depression: A social judgement theory approach. Applied Cognitive Psychology, 17 0 (1): 0 51--63, 2003. doi:10.1002/acp.844

work page doi:10.1002/acp.844 2003
[23]

M., Ye, A., Jiang, L., Lu, X., Dziri, N., Althoff, T., and Choi, Y

Sorensen, T., Moore, J., Fisher, J., Gordon, M., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., Althoff, T., and Choi, Y. Position: A roadmap to pluralistic alignment. In Proceedings of the 41st International Conference on Machine Learning, 2024

2024
[24]

Turpin, M., Michael, J., Perez, E., and Bowman, S. R. Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting. In Advances in Neural Information Processing Systems, volume 36, 2023

2023
[25]

Tyler, T. R. Why People Obey the Law. Yale University Press, New Haven, CT, 1990

1990
[26]

Walsh, J. P. and Ungson, G. R. Organizational memory. The Academy of Management Review, 16 0 (1): 0 57--91, 1991. ISSN 03637425

1991
[27]

What to account for when accounting for algorithms: A systematic literature review on algorithmic accountability

Wieringa, M. What to account for when accounting for algorithms: A systematic literature review on algorithmic accountability. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp.\ 1--18. Association for Computing Machinery, 2020. doi:10.1145/3351095.3372833

work page doi:10.1145/3351095.3372833 2020

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

[2] [2]

Predicting judicial decisions of the european court of human rights: A natural language processing perspective

Aletras, N., Tsarapatsanis, D., Preo t iuc-Pietro, D., and Lampos, V. Predicting judicial decisions of the european court of human rights: A natural language processing perspective. PeerJ Computer Science, 2: 0 e93, 2016. doi:10.7717/peerj-cs.93

work page doi:10.7717/peerj-cs.93 2016

[3] [3]

and Schulz, E

Binz, M. and Schulz, E. Using cognitive psychology to understand GPT -3. Proceedings of the National Academy of Sciences, 120 0 (6): 0 e2218523120, 2023. doi:10.1073/pnas.2218523120

work page doi:10.1073/pnas.2218523120 2023

[4] [4]

The Conceptual Framework of Psychology, volume 1 of International Encyclopedia of Unified Science

Brunswik, E. The Conceptual Framework of Psychology, volume 1 of International Encyclopedia of Unified Science. University of Chicago Press, Chicago, 1952

1952

[5] [5]

Paragraph-level rationale extraction through regularization: A case study on E uropean court of human rights cases

Chalkidis, I., Fergadiotis, M., Tsarapatsanis, D., Aletras, N., Androutsopoulos, I., and Malakasiotis, P. Paragraph-level rationale extraction through regularization: A case study on E uropean court of human rights cases. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Te...

2021

[6] [6]

A sociotechnical view of algorithmic fairness

Dolata, M., Feuerriegel, S., and Schwabe, G. A sociotechnical view of algorithmic fairness. Information Systems Journal, 32 0 (4): 0 754--818, 2022. doi:10.1111/isj.12370

work page doi:10.1111/isj.12370 2022

[7] [7]

Artificial intelligence, values, and alignment

Gabriel, I. Artificial intelligence, values, and alignment. Minds and Machines, 30 0 (3): 0 411--437, 2020. doi:10.1007/s11023-020-09539-2

work page internal anchor Pith review doi:10.1007/s11023-020-09539-2 2020

[8] [8]

Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., and Wichmann, F. A. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2 0 (11): 0 665--673, 2020. doi:10.1038/s42256-020-00257-z

work page doi:10.1038/s42256-020-00257-z 2020

[9] [9]

Grant, R. M. Toward a knowledge-based theory of the firm. Strategic Management Journal, 17: 0 109--122, 1996. ISSN 01432095, 10970266

1996

[10] [10]

R., Hursch, C

Hammond, K. R., Hursch, C. J., and Todd, F. J. Analyzing the components of clinical inference. Psychological Review, 71 0 (6): 0 438--456, 1964. doi:10.1037/h0040736

work page doi:10.1037/h0040736 1964

[11] [11]

Statlog ( German Credit Data )

Hofmann, H. Statlog ( German Credit Data ). UCI Machine Learning Repository, 1994. Dataset

1994

[12] [12]

and Hogarth, R

Karelaia, N. and Hogarth, R. M. Determinants of linear judgment: A meta-analysis of lens model studies. Psychological Bulletin, 134 0 (3): 0 404--426, 2008. doi:10.1037/0033-2909.134.3.404. See also published correction in Psychological Bulletin, 2008

work page doi:10.1037/0033-2909.134.3.404 2008

[13] [13]

Karren, R. J. and Barringer, M. W. A review and analysis of the policy-capturing methodology in organizational research: Guidelines for research and practice. Organizational Research Methods, 5 0 (4): 0 337--361, 2002. doi:10.1177/109442802237115

work page doi:10.1177/109442802237115 2002

[14] [14]

R., and Perez, E

Lanham, T., Chen, A., Radhakrishnan, A., Steiner, B., Denison, C., Hernandez, D., Li, D., Durmus, E., Hubinger, E., Kernion, J., Luko s i \=u t \.e , K., Nguyen, K., Cheng, N., Joseph, N., Schiefer, N., Rausch, O., Larson, R., McCandlish, S., Kundu, S., Kadavath, S., Yang, S., Henighan, T., Maxwell, T., Telleen-Lawton, T., Hume, T., Hatfield-Dodds, Z., Ka...

2023

[15] [15]

Is AI ground truth really true? the dangers of training and evaluating AI tools based on experts' know-what

Lebovitz, S., Levina, N., and Lifshitz-Assaf, H. Is AI ground truth really true? the dangers of training and evaluating AI tools based on experts' know-what. MIS Quarterly, 45 0 (3): 0 1501--1525, 2021. doi:10.25300/MISQ/2021/16564

work page doi:10.25300/misq/2021/16564 2021

[16] [16]

A dynamic theory of organizational knowledge creation

Nonaka, I. A dynamic theory of organizational knowledge creation. Organization Science, 5 0 (1): 0 14--37, 1994. doi:10.1287/orsc.5.1.14

work page doi:10.1287/orsc.5.1.14 1994

[17] [17]

F., Leike, J., and Lowe, R

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., and Lowe, R. Training language models to follow instructions with human feedback. In Advances in Neural Information Proces...

2022

[18] [18]

and Huising, R

Pakarinen, P. and Huising, R. Relational expertise: What machines can't know. Journal of Management Studies, 62 0 (5): 0 2053--2082, 2025. doi:https://doi.org/10.1111/joms.12915

work page doi:10.1111/joms.12915 2053

[19] [19]

The Tacit Dimension

Polanyi, M. The Tacit Dimension. Doubleday, Garden City, NY, 1966

1966

[20] [20]

Russell, S. J. Human Compatible: Artificial Intelligence and the Problem of Control. Viking, 2019

2019

[21] [21]

Selbst, Danah Boyd, Sorelle A

Selbst, A. D., Boyd, d., Friedler, S. A., Venkatasubramanian, S., and Vertesi, J. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp.\ 59--68. Association for Computing Machinery, 2019. doi:10.1145/3287560.3287598

work page doi:10.1145/3287560.3287598 2019

[22] [22]

Factors influencing prescribing decisions in the treatment of depression: A social judgement theory approach

Smith, L., Gilhooly, K., and Walker, A. Factors influencing prescribing decisions in the treatment of depression: A social judgement theory approach. Applied Cognitive Psychology, 17 0 (1): 0 51--63, 2003. doi:10.1002/acp.844

work page doi:10.1002/acp.844 2003

[23] [23]

M., Ye, A., Jiang, L., Lu, X., Dziri, N., Althoff, T., and Choi, Y

Sorensen, T., Moore, J., Fisher, J., Gordon, M., Mireshghallah, N., Rytting, C. M., Ye, A., Jiang, L., Lu, X., Dziri, N., Althoff, T., and Choi, Y. Position: A roadmap to pluralistic alignment. In Proceedings of the 41st International Conference on Machine Learning, 2024

2024

[24] [24]

Turpin, M., Michael, J., Perez, E., and Bowman, S. R. Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting. In Advances in Neural Information Processing Systems, volume 36, 2023

2023

[25] [25]

Tyler, T. R. Why People Obey the Law. Yale University Press, New Haven, CT, 1990

1990

[26] [26]

Walsh, J. P. and Ungson, G. R. Organizational memory. The Academy of Management Review, 16 0 (1): 0 57--91, 1991. ISSN 03637425

1991

[27] [27]

What to account for when accounting for algorithms: A systematic literature review on algorithmic accountability

Wieringa, M. What to account for when accounting for algorithms: A systematic literature review on algorithmic accountability. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp.\ 1--18. Association for Computing Machinery, 2020. doi:10.1145/3351095.3372833

work page doi:10.1145/3351095.3372833 2020