pith. sign in

arxiv: 2604.04333 · v2 · submitted 2026-04-06 · 💻 cs.CY

What is Human in Judgment? Comparing Automation Bias and Algorithm Aversion Between the United States Military Academy and the General Public

Pith reviewed 2026-05-10 20:23 UTC · model grok-4.3

classification 💻 cs.CY
keywords automation biasalgorithm aversionmilitary AIdecision support systemsWest Point cadetshuman-AI interactiontarget identificationcognitive bias
0
0 comments X

The pith

West Point cadets display better calibrated trust in algorithmic advice than the general public in a target identification task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how military training affects human interaction with AI decision support systems by comparing West Point cadets to a similar public sample. Participants completed a target identification task, received advice from either an algorithm or a human, and could update their judgments. Cadets showed more appropriate reliance on the algorithmic advice, avoiding both over-trust and undue skepticism that the public exhibited. This suggests that military education helps mitigate cognitive biases that could lead to errors in AI-assisted conflict decisions.

Core claim

West Point cadets are less prone to cognitive distortion than members of the general public, displaying better calibrated trust in algorithmic decision support systems. The experiment directly measured changes in identification after receiving advice, revealing that cadets adjusted their assessments in line with the quality of the input more effectively than civilians.

What carries the argument

Survey experiment with a target identification task where participants receive advice from an algorithm or human analyst and have the chance to reassess their initial judgment.

If this is right

  • Military personnel may be less likely to err due to automation bias or algorithm aversion when using AI in operational settings.
  • AI integration in militaries could be managed with lower risk of miscalculation if training emphasizes calibrated trust.
  • Exposure to AI through education influences how humans interact with decision support in high-stakes environments.
  • The role of human judgment in war may evolve differently in professional military forces than in civilian contexts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar training approaches could be adapted for other high-stakes AI users like doctors or pilots to improve calibration.
  • These findings point to education as a lever for shaping AI's impact on international security.
  • Further studies could test if the effect holds in more complex, realistic military scenarios beyond the lab task.

Load-bearing premise

The target identification task and survey responses capture real-world susceptibility to automation bias and algorithm aversion in military decision-making.

What would settle it

A study observing actual military operators using AI decision support in field exercises or simulations to check if their trust calibration matches the cadet results.

Figures

Figures reproduced from arXiv: 2604.04333 by Laura Resnick Samotin, Lauren Kahn, Michael C. Horowitz.

Figure 1
Figure 1. Figure 1: AI Indicator Values, By Sample 19 [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Views of AI, By Sample ment) (e.g., if 70% disagree, and 50% agree, net perception is −20%). Further granularity, including the breakdown of each sample’s responses to these statements, is available in fig￾ures A.6 and A.7. Overall, both West Point cadets and the general public are open to using AI: each group shows relatively strong interest in using AI applications in daily life, with net agreement just … view at source ↗
Figure 3
Figure 3. Figure 3: Switching Rate Across Samples and Experimental Conditions [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Automation Bias and Algorithm Aversion, By Sample [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Predicted Probabilities of Automation Bias [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗
read the original abstract

Human judgment has always been central to conflict and escalation, but how will a world of artificial intelligence (AI) change the role of humans in war? As militaries increasingly adopt AI-enabled decision-support systems (DSS), including the United States in the war against Iran, concerns about automation bias -- over-reliance on algorithmic recommendations -- and algorithm aversion -- premature distrust of automated outputs -- raise fears that relying on AI too much could increase the risk of error, miscalculation, and accidents. Yet existing evidence on how militaries actually interact with AI remains limited. We test theories about the susceptibility of militaries to automation bias by comparing the results from a survey experiment conducted with 236 cadets at the United States Military Academy at West Point to a demographically similar cross-national public sample. Respondents completed a target identification task and then received advice from either an algorithm or a human analyst and had the opportunity to re-assess their initial identification, allowing direct measurement of automation bias and algorithm aversion. We find that West Point cadets are less prone to cognitive distortion than members of the general public, displaying better calibrated trust in algorithmic decision support systems. While the findings are limited, they suggest that military education and exposure to AI can meaningfully shape how AI influences international politics in matters of war and peace.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports results from a survey experiment with 236 US Military Academy cadets and a demographically matched public sample. Participants performed a target-identification task, received advice labeled as either algorithmic or human, and were allowed to revise their initial judgment. The central finding is that cadets exhibited better-calibrated trust (lower automation bias and algorithm aversion) than the public sample, suggesting that military education and AI exposure can reduce cognitive distortions in interactions with decision-support systems.

Significance. If the result holds, the work provides rare empirical evidence on how military training shapes human-AI interaction in a domain with high stakes for international security. It directly addresses a gap in the literature on automation bias and algorithm aversion within professional military populations and offers a falsifiable claim that education can produce more calibrated reliance on algorithmic advice.

major comments (2)
  1. [Methods and Results (target identification task description)] The headline claim that cadets show reduced cognitive distortion rests on the assumption that the low-stakes target-identification task elicits the same mechanisms that operate under operational time pressure, accountability, and lethal consequences. No manipulation checks, high-fidelity simulation arm, or within-cadet correlation with actual training/deployment experience are reported to support this mapping.
  2. [Abstract and Discussion] The abstract states that 'the findings are limited,' yet the manuscript provides no explicit discussion of how the artificial setting, absence of real-world consequences, or lack of demographic matching on military-specific variables (e.g., prior AI exposure, command experience) might artifactually produce the observed cadet-public difference.
minor comments (2)
  1. [Abstract] The sample size (N=236 cadets) and exact statistical tests, effect sizes, and confidence intervals for the key cadet-public comparison are not summarized in the abstract or early sections, making it difficult to assess the precision of the 'better calibrated trust' claim.
  2. [Experimental design] The paper does not report whether the algorithmic advice was actually more accurate than human advice in the task, which is necessary to distinguish calibrated trust from simple accuracy following.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments, which highlight important considerations regarding the generalizability of our findings. We address each major comment below and have revised the manuscript accordingly where feasible.

read point-by-point responses
  1. Referee: [Methods and Results (target identification task description)] The headline claim that cadets show reduced cognitive distortion rests on the assumption that the low-stakes target-identification task elicits the same mechanisms that operate under operational time pressure, accountability, and lethal consequences. No manipulation checks, high-fidelity simulation arm, or within-cadet correlation with actual training/deployment experience are reported to support this mapping.

    Authors: We acknowledge that the target identification task is a controlled, low-stakes survey experiment and does not replicate operational conditions such as time pressure or lethal consequences. The design was chosen to isolate the effects of advice source (algorithm vs. human) on judgment revision in a standardized manner across both samples, enabling a direct comparison of automation bias and algorithm aversion. No manipulation checks for perceived stakes or high-fidelity simulation arms were included, as the study was a survey-based experiment focused on population differences rather than ecological validity. We will add an expanded limitations subsection in the Discussion to explicitly address the assumptions required to map these results to high-stakes military contexts and note the absence of within-cadet correlations with training experience. revision: partial

  2. Referee: [Abstract and Discussion] The abstract states that 'the findings are limited,' yet the manuscript provides no explicit discussion of how the artificial setting, absence of real-world consequences, or lack of demographic matching on military-specific variables (e.g., prior AI exposure, command experience) might artifactually produce the observed cadet-public difference.

    Authors: We agree that while the abstract notes the findings are limited, the Discussion would benefit from more explicit treatment of these potential artifacts. We will revise the Discussion to include a dedicated paragraph addressing how the artificial setting and lack of real-world consequences could influence results, as well as the incomplete matching on military-specific variables such as prior AI exposure and command experience. This will clarify possible alternative explanations for the observed differences without overstating generalizability. revision: yes

standing simulated objections not resolved
  • We cannot add manipulation checks, a high-fidelity simulation arm, or within-cadet correlations with deployment experience, as these would require new data collection beyond the existing survey experiment.

Circularity Check

0 steps flagged

No circularity: direct empirical comparison with no derivations or self-referential steps

full rationale

The paper reports results from a survey experiment comparing West Point cadets and a demographically matched public sample on a target identification task with algorithmic or human advice. No mathematical derivations, equations, parameter fitting presented as predictions, uniqueness theorems, or self-citation chains appear in the work. The central finding (better-calibrated trust among cadets) follows directly from observed differences in pre- and post-advice identification changes, without any reduction of outputs to inputs by construction. External validity concerns are separate from circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on untested assumptions about what the experimental task measures and how well the samples represent their populations. No free parameters or invented entities are evident from the abstract.

axioms (2)
  • domain assumption The target identification task accurately measures susceptibility to automation bias and algorithm aversion.
    This links the behavioral measure to the theoretical concepts of cognitive distortion in AI-assisted decisions.
  • domain assumption The cadet sample and demographically similar public sample allow valid causal inference about the effect of military education.
    The abstract relies on this comparability for the group difference claim.

pith-pipeline@v0.9.0 · 5537 in / 1271 out tokens · 46684 ms · 2026-05-10T20:23:33.719955+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    Political declaration on responsible military use of artificial intelli- gence and autonomy

    (2023). Political declaration on responsible military use of artificial intelli- gence and autonomy. US Department of State.https://www.state.gov/ political-declaration-on-responsible-military-use-of-artificial-intelligence-and-autonomy-2/. (2025). Young adults are leading the way in AI adoption. Associated Press.https:// apnorc.org/projects/young-adults-...

  2. [2]

    Theproblemofalgorithmicbiasinai-basedmilitary decision support systems.Humanitarian Law and Policy, September,

    Bode, IngvildandBhila, Ishmael(2024). Theproblemofalgorithmicbiasinai-basedmilitary decision support systems.Humanitarian Law and Policy, September,

  3. [3]

    What makes a military professional? evaluating norm socialization in west point cadets.Armed Forces & Society, 48(4):803–827

    Brooks, Risa A, Robinson, Michael A, and Urben, Heidi A (2022). What makes a military professional? evaluating norm socialization in west point cadets.Armed Forces & Society, 48(4):803–827. Chong, Leah, Zhang, Guanglu, Goucher-Lambert, Kosa, Kotovsky, Kenneth, and Cagan, Jonathan (2022). Human confidence in artificial intelligence and in themselves: The e...

  4. [4]

    Speeding Up the OODA Loop with AI

    31 Daniels, Owen (2021). Speeding Up the OODA Loop with AI. Institute for Defense Analysis. https://www.japcc.org/essays/speeding-up-the-ooda-loop-with-ai/. de Vaal, Johanna H. Kordes (1996). Intention and the omission bias: Omissions perceived as nondecisions.Acta Psychologica, 93(1-3):161–172. Dempsey, Jason K (2009).Our army: Soldiers, politics, and Am...

  5. [5]

    Ai-enabled decision-support systems in the joint targeting cycle.International Law Studies,

    Dorsey, Jessica and Bo, Marta (2025). Ai-enabled decision-support systems in the joint targeting cycle.International Law Studies,

  6. [6]

    Dramsch, J.S., Kuglitsch, M.M., and Fernández-Torres, MA et al. (2025). Explainability can foster trust in artificial intelligence in geoscience.Nature Geosciences, 18:11–114. Dwoskin, Elizabeth (2024). Israel built an ‘ai factory’ for war. it unleashed it in gaza. The washington Post.https://www.washingtonpost.com/technology/2024/12/29/ ai-israel-war-gaz...

  7. [7]

    trust paradox

    Hicks, Kathleen H. (2023). Statement by deputy secretary of defense 33 kathleen h. hicks marking one year of the defense department’s chief digital and artificial intelligence office (cdao). Department of De- fense.https://www.war.gov/News/Releases/Release/Article/3464007/ statement-by-deputy-secretary-of-defense-kathleen-h-hicks-marking-one-year-of-t/. H...

  8. [8]

    Americans use ai in everyday products with- out realizing it.Gallup.https://news.gallup.com/poll/654905/ americans-everyday-products-without-realizing.aspx

    Maese, Ellyn (2025). Americans use ai in everyday products with- out realizing it.Gallup.https://news.gallup.com/poll/654905/ americans-everyday-products-without-realizing.aspx. Mahmud, Hasan, Najmul Islam, A.K.M., Ahmed, Syed Ishtiaque, and Smolander, Kari (2022). What influences algorithmic decision-making? a systematic literature review on algorithm av...

  9. [9]

    Manson, Katrina (2026). Us military relying on ai as tool to speed iran op- erations.Bloomberg.https://www.bloomberg.com/news/articles/2026-03-05/ us-military-relying-on-ai-as-key-tool-to-speed-iran-operations. McDermott, Rose (2026). How Emotions Shape Crisis Decision-Making: The Role of Fear, Anger, and Risk. In Clinton, Hillary Rodham and Yarhi-Milo, K...

  10. [10]

    Notoracles of the battlefield: Safety considerations for ai-based military decision support systems

    Probasco, Emelia, Burtell, Matthew, Toner, Helen, andRudner, TimGJ(2024b). Notoracles of the battlefield: Safety considerations for ai-based military decision support systems. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7:1157–1165. 37 Puscas, Ioana (2023). AI and International Security: Understanding the Risks and Paving the Path f...

  11. [11]

    appropri- ate levels of human judgement

    Depart- ment of Military Instruction. 39 U.S. Mission Geneva (2016). U.s. delegation statement on "appropri- ate levels of human judgement". US Mission to International Or- ganizations in Geneva.https://geneva.usmission.gov/2016/04/12/ u-s-delegation-statement-on-appropriate-levels-of-human-judgment/. Williams, Major Blair S. (2010). Heuristics and biases...