pith. machine review for the scientific record.
sign in

arxiv: 2604.05658 · v1 · submitted 2026-04-07 · 💻 cs.HC

How Much Trust is Enough? Towards Calibrating Trust in Technology

Pith reviewed 2026-05-10 18:48 UTC · model grok-4.3

classification 💻 cs.HC
keywords trust in technologyHuman-Computer Trust Scaletrust propensityHCItrust calibrationempirical studyinterpretation guidelineshuman-computer interaction
0
0 comments X

The pith

The Human-Computer Trust Scale offers an initial measure of trust propensity but requires contextual interpretation to be useful.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports on an empirical study that develops guidelines for interpreting scores from the Human-Computer Trust Scale. The study aims to help users and designers calibrate appropriate levels of trust in technology systems. A sympathetic reader would care because as technology becomes more autonomous and opaque, mismatched trust can lead to underuse or overreliance. The authors argue that trust assessment is not straightforward and needs reflection within the specific interaction context. Their approach provides a practical starting point for evaluating propensity to trust.

Core claim

The paper's central claim is that the HCTS is a promising tool for an initial evaluation of propensity to trust, but such an assessment requires reflection and interpretation that should be considered within the context of the interaction. The authors present the process used to develop a guideline for interpreting the instrument's results and explain the rationale for their decisions, advocating for calibrating trust in technology within HCI.

What carries the argument

The Human-Computer Trust Scale (HCTS) as a tool for initial trust propensity evaluation, supported by newly developed interpretation guidelines derived from empirical data.

If this is right

  • Designers can use HCTS scores early to adjust system transparency and reduce mismatched expectations.
  • Users can apply the guidelines to reflect on their trust tendencies before committing to new technologies.
  • HCI researchers obtain a structured method for turning abstract trust concepts into actionable assessments.
  • Trust calibration shifts from an ideal to a practical process integrated into system development.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These guidelines could be adapted for high-stakes domains such as medical or autonomous systems to mitigate risks from misplaced trust.
  • The focus on context implies that static trust metrics alone may fall short for rapidly evolving technologies.
  • Training users to apply the interpretation process might increase long-term adoption rates of complex systems.

Load-bearing premise

That the empirical study provides sufficient evidence to create reliable, generalizable interpretation guidelines for the HCTS without further validation or details on study methods, sample, or limitations.

What would settle it

A replication study that applies the developed HCTS interpretation guidelines in a different interaction context and finds that the resulting trust level predictions do not align with observed user behaviors or system outcomes.

Figures

Figures reproduced from arXiv: 2604.05658 by David Lamas, Debora F. de Souza, Gabriela Beltr\~ao, Sonia Sousa.

Figure 2
Figure 2. Figure 2: Overview of the procedure adopted for defining [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual representation of proposed interpretation range [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the workflow across studies, from ad [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

The role of trust within Human-Computer Interaction is being redefined. With the increasing omnipresence, autonomy, and opacity of technology, users often struggle to understand the capabilities and limitations of systems. In this article, we present the results of an empirical study designed to provide a practical, evidence-based interpretation of trust propensity assessment using the Human-Computer Trust Scale (HCTS). We outline the process used to develop a guideline for interpreting the instrument's results and explain the rationale for our decisions, advocating for calibrating trust in technology within HCI. Our findings demonstrate that the HCTS is a promising tool for conducting an initial evaluation of propensity to trust, but that such an assessment requires reflection and interpretation that should be considered within the context of the interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents results from an empirical study intended to yield a practical, evidence-based guideline for interpreting scores on the Human-Computer Trust Scale (HCTS) as a measure of users' propensity to trust technology. The authors describe the guideline-development process, rationalize their decisions, and conclude that the HCTS is a promising instrument for initial propensity assessment provided that interpretation remains contextual and reflective.

Significance. If the underlying empirical evidence is shown to be robust, the work would offer a timely contribution to HCI by supplying a concrete tool for calibrating trust in increasingly autonomous and opaque systems. The explicit caveat that assessment requires contextual reflection is a constructive strength that guards against over-interpretation. Transparent reporting of the decision rationale during guideline construction is also a positive feature.

major comments (2)
  1. [§3] §3 (Empirical Study / Methods): The manuscript provides no information on participant count, demographics, recruitment, task design, or statistical procedures used to derive the interpretation guidelines. These omissions make it impossible to evaluate whether the guidelines are supported by adequate evidence or influenced by post-hoc choices.
  2. [§4] §4 (Results / Guideline Presentation): No quantitative findings, tables, or validation metrics (e.g., reliability coefficients, inter-rater agreement, or cross-validation results) are reported to justify the specific score thresholds or interpretive categories in the proposed guideline.
minor comments (2)
  1. [Abstract] The abstract could include a brief statement of sample size and primary quantitative outcomes to help readers gauge the strength of the claims at first reading.
  2. [Discussion] The limitations paragraph should explicitly discuss the scope of generalizability given the (currently unreported) participant pool and study context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments identify key omissions in the description of our empirical study that must be addressed to allow proper evaluation of the work. We respond to each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses
  1. Referee: [§3] §3 (Empirical Study / Methods): The manuscript provides no information on participant count, demographics, recruitment, task design, or statistical procedures used to derive the interpretation guidelines. These omissions make it impossible to evaluate whether the guidelines are supported by adequate evidence or influenced by post-hoc choices.

    Authors: We acknowledge that the methods section omits these critical details. This was an oversight during manuscript preparation. In the revised version we will expand §3 to report the exact participant count, full demographics, recruitment procedures, task design, and the statistical analyses used to derive the interpretation guidelines. These additions will make the empirical foundation transparent and allow readers to assess whether the guidelines rest on adequate evidence. revision: yes

  2. Referee: [§4] §4 (Results / Guideline Presentation): No quantitative findings, tables, or validation metrics (e.g., reliability coefficients, inter-rater agreement, or cross-validation results) are reported to justify the specific score thresholds or interpretive categories in the proposed guideline.

    Authors: We agree that the results section currently lacks quantitative findings, tables, and validation metrics. The submitted manuscript focused on the guideline and its rationale but did not present the supporting data. We will revise §4 to include the relevant quantitative results, tables summarizing participant responses, reliability coefficients, and any validation metrics (such as inter-rater agreement or cross-validation) that justify the chosen thresholds and categories. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical guideline development does not reduce to self-referential inputs

full rationale

The paper presents results from an empirical study to develop an interpretation guideline for the Human-Computer Trust Scale (HCTS). No mathematical derivations, equations, parameter fitting, or predictive models are described that could reduce by construction to the study's own inputs. The central claim rests on the outcomes of the empirical process itself, which is presented as external evidence rather than a self-defining loop. Self-citations are not invoked as load-bearing uniqueness theorems or ansatzes. The work is self-contained as a descriptive empirical contribution without the circular patterns enumerated in the analysis criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a non-mathematical empirical study focused on scale interpretation guidelines; no free parameters, axioms, or invented entities are identifiable from the abstract.

pith-pipeline@v0.9.0 · 5430 in / 1071 out tokens · 58377 ms · 2026-05-10T18:48:48.341654+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

  1. [1]

    1989.Questionnaire construction manual

    Bettina A Babbitt and Charles O Nystrom. 1989.Questionnaire construction manual. Technical Report

  2. [2]

    Tita Alissa Bach, Amna Khan, Harry Hallock, Gabriela Beltrão, and Sonia Sousa

  3. [3]

    A Systematic Literature Review of User Trust in AI-Enabled Systems: An HCI Perspective.International Journal of Human–Computer Interaction(2022), 1–16

  4. [4]

    Aaron Bangor, Philip Kortum, and James Miller. 2009. Determining what indi- vidual SUS scores mean: Adding an adjective rating scale.Journal of usability studies4, 3 (2009), 114–123

  5. [5]

    Aaron Bangor, Philip T Kortum, and James T Miller. 2008. An empirical evaluation of the system usability scale.Intl. Journal of Human–Computer Interaction24, 6 (2008), 574–594

  6. [6]

    Gabriela Beltrão and Sonia Sousa. 2021. Factors Influencing Trust in WhatsApp: A Cross-Cultural Study. InInternational Conference on Human-Computer Interaction. Springer, 495–508

  7. [7]

    Gabriela Beltrão, Sonia Sousa, and David Lamas. 2023. Trust in Facial Recognition Systems: A Perspective from the Users. InIFIP Conference on Human-Computer Interaction. Springer, 379–388

  8. [8]

    Gabriela Beltrão, Sonia Sousa, and David Lamas. 2025. Assessing the Measure- ment Invariance of the Human–Computer Trust Scale.Electronics14, 9 (2025), 1806

  9. [9]

    Anol Bhattacherjee. 2002. Individual trust in online firms: Scale development and initial test.Journal of management information systems19, 1 (2002), 211–241

  10. [10]

    Susanne Bødker. 2006. When second wave HCI meets third wave challenges. In Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles. 1–8

  11. [11]

    Susanne Bødker. 2015. Third-wave HCI, 10 years later—participation and sharing. interactions22, 5 (2015), 24–31. CHI ’26, April 13–17, 2026, Barcelona, Spain Beltrão et al

  12. [12]

    John Brooke et al. 1996. SUS-A quick and dirty usability scale.Usability evaluation in industry189, 194 (1996), 4–7

  13. [13]

    Debora Firmino de Souza, Sonia Sousa, Kadri Kristjuhan-Ling, Olga Dunajeva, Mare Roosileht, Avar Pentel, Mati Mõttus, Mustafa Can Özdemir, and Žanna Gratšjova. 2025. Trust and Trustworthiness from Human-Centered Perspective in HRI – A Systematic Literature Review. arXiv:2501.19323 [cs.HC] https://arxiv. org/abs/2501.19323

  14. [14]

    Ewart J De Visser, Marieke MM Peeters, Malte F Jung, Spencer Kohn, Tyler H Shaw, Richard Pak, and Mark A Neerincx. 2020. Towards a theory of longitudinal trust calibration in human–robot teams.International journal of social robotics 12, 2 (2020), 459–478

  15. [15]

    Stuart Carter Dodd and Thomas R Gerbrick. 1960. Word scales for degrees of opinion.Language and Speech3, 1 (1960), 18–31

  16. [16]

    Fred E Emery and Eric L Trist. 1960. Socio-technical systems.Management science, models and techniques2 (1960), 83–97

  17. [17]

    Kairi Fimberg and Sonia Sousa. 2020. The Impact of Website Design on Users’ Trust Perceptions. InInternational Conference on Applied Human Factors and Ergonomics. Springer, 267–274

  18. [18]

    Kerstin Fischer, Hanna Mareike Weigelin, and Leon Bodenhagen. 2018. In- creasing trust in human–robot medical interactions: effects of transparency and adaptability.Paladyn, Journal of Behavioral Robotics9, 1 (2018), 95–109. doi:10.1515/pjbr-2018-0007

  19. [19]

    Ilaria Gaudiello, Elisabetta Zibetti, Sébastien Lefort, Mohamed Chetouani, and Serena Ivaldi. 2016. Trust as indicator of robot functional and social acceptance. An experimental study on user conformation to iCub answers.Computers in Human Behavior61 (2016), 633–655

  20. [20]

    David Gefen, Elena Karahanna, and Detmar W Straub. 2003. Trust and TAM in online shopping: An integrated model.MIS quarterly(2003), 51–90

  21. [21]

    Harjinder Gill, Kathleen Boies, Joan E Finegan, and Jeffrey McNally. 2005. An- tecedents of trust: Establishing a boundary condition for the relation between propensity to trust and intention to trust.Journal of business and psychology19 (2005), 287–302

  22. [22]

    Siddharth Gulati, Sonia Sousa, and David Lamas. 2017. Modelling trust: An em- pirical assessment. InIFIP Conference on Human-Computer Interaction. Springer, 40–61

  23. [23]

    Siddharth Gulati, Sonia Sousa, and David Lamas. 2019. Design, development and evaluation of a human-computer trust scale.Behaviour & Information Technology 38, 10 (2019), 1004–1015

  24. [24]

    Hancock, Deborah R

    Peter A. Hancock, Deborah R. Billings, Kristin E. Schaefer, Jessie Y. C. Chen, Ewart J. de Visser, and Raja Parasuraman. 2011. A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction.Human Factors: The Journal of Human Factors and Ergonomics Society53, 5 (2011), 517–527. doi:10.1177/0018720811417254

  25. [25]

    Kevin Anthony Hoff and Masooda Bashir. 2014. Trust in Automation.Human Factors: The Journal of Human Factors and Ergonomics Society57, 3 (2014), 407–434. doi:10.1177/0018720814547570

  26. [26]

    Alexandra D Kaplan, Theresa T Kessler, J Christopher Brill, and Peter A Hancock

  27. [27]

    Trust in artificial intelligence: Meta-analytic findings.Human factors65, 2 (2023), 337–359

  28. [28]

    Spencer C Kohn, Ewart J De Visser, Eva Wiese, Yi-Ching Lee, and Tyler H Shaw

  29. [29]

    Measurement of trust in automation: A narrative review and reference guide.Frontiers in psychology12 (2021), 604977

  30. [30]

    Bing Cai Kok and Harold Soh. 2020. Trust in robots: Challenges and opportunities. Current Robotics Reports1, 4 (2020), 297–309

  31. [31]

    2015.How many subjects?: Statis- tical power analysis in research

    Helena Chmura Kraemer and Christine Blasey. 2015.How many subjects?: Statis- tical power analysis in research. Sage publications

  32. [32]

    John Lee and Neville Moray. 1992. Trust, control strategies and allocation of function in human-machine systems.Ergonomics35, 10 (1992), 1243–1270

  33. [33]

    John D Lee and Katrina A See. 2004. Trust in automation: Designing for appro- priate reliance.Human factors46, 1 (2004), 50–80

  34. [34]

    David Lewis and Andrew Weigert

    J. David Lewis and Andrew Weigert. 1985. Trust as a Social Reality.Social Forces 63, 4 (1985), 967–985. doi:10.1093/sf/63.4.967

  35. [35]

    Robb, Bruce W

    Mei Yii Lim, David A. Robb, Bruce W. Wilson, and Helen Hastie. 2023. Feeding the Coffee Habit: A Longitudinal Study of a Robo-Barista.2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)00 (2023), 1983–1990. doi:10.1109/ro-man57019.2023.10309621

  36. [36]

    1978.The perceived favorableness of selected scale anchors and response alternatives

    Josephine L Matthews, Calvin E Wright, Kenneth L Yudowitch, James Geddie, and RL Palmer. 1978.The perceived favorableness of selected scale anchors and response alternatives. Technical Report

  37. [37]

    Roger C Mayer, James H Davis, and F David Schoorman. 1995. An integrative model of organizational trust.Academy of management review20, 3 (1995), 709–734

  38. [38]

    D Harrison McKnight, Vivek Choudhury, and Charles Kacmar. 2002. Devel- oping and validating trust measures for e-commerce: An integrative typology. Information systems research13, 3 (2002), 334–359

  39. [39]

    Linda Miller, Johannes Kraus, Franziska Babel, and Martin Baumann. 2021. More Than a Feeling—Interrelation of Trust Layers in Human-Robot Interaction and the Role of User Dispositions and State Anxiety.Frontiers in Psychology12 (2021). doi:10.3389/fpsyg.2021.592711

  40. [40]

    Bonnie M Muir. 1987. Trust between humans and machines, and the design of decision aids.International journal of man-machine studies27, 5-6 (1987), 527–539

  41. [41]

    Bonnie M Muir and Neville Moray. 1996. Trust in automation. Part II. Experi- mental studies of trust and human intervention in a process control simulation. Ergonomics39, 3 (1996), 429–460

  42. [42]

    Triin Oper and Sonia Sousa. 2020. User attitudes towards Facebook: perception and reassurance of trust (Estonian Case Study). InInternational Conference on Human-Computer Interaction. Springer, 224–230

  43. [43]

    Tim O’reilly. 2005. What is web 2.0. InOnline Communication and Collaboration: A Reader, Helen Donelan, Karen Kear, and Magnus Ramage (Eds.). Routledge

  44. [44]

    Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse.Human factors39, 2 (1997), 230–253

  45. [45]

    Paul A Pavlou and David Gefen. 2004. Building effective online marketplaces with institution-based trust.Information systems research15, 1 (2004), 37–59

  46. [46]

    Ana Pinto, Sonia Sousa, Cristóvão Silva, and Pedro Coelho. 2020. Adaptation and validation of the HCTM scale into human-robot interaction Portuguese context: a study of measuring trust in human-robot interactions. InProceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society. 1–4

  47. [47]

    Ana Pinto, Sónia Sousa, Ana Simões, Joana Santos, et al. [n. d.]. A Trust Scale for Human-Robot Interaction: Translation, Adaptation, and Validation of a Human Computer Trust Scale.Human Behavior and Emerging Technologies2022 ([n. d.])

  48. [48]

    Felix Schoeller, Mark Miller, Roy Salomon, and Karl J Friston. 2021. Trust as extended control: Human-machine interactions as active inference.Frontiers in Systems Neuroscience15 (2021), 669810

  49. [49]

    Sonia Sousa, David Lamas, and Paulo Dias. 2014. A model for Human-computer Trust. InInternational Conference on Learning and Collaboration Technologies. Springer, 128–137

  50. [50]

    Real World

    Nathan Tenhundfeld, Mustafa Demir, and Ewart de Visser. 2022. Assessment of Trust in Automation in the “Real World”: Requirements for New Trust in Automation Measurement Techniques for Use by Practitioners.Journal of Cognitive Engineering and Decision Making16, 2 (2022), 101–118. doi:10.1177/ 15553434221096261

  51. [51]

    Who Should I Believe?

    Hong Wang, Natalia Calvo-Barajas, Katie Winkle, and Ginevra Castellano. 2025. “Who Should I Believe?”: User Interpretation and Decision-Making When a Family Healthcare Robot Contradicts Human Memory.arXiv(2025). doi:10. 48550/arxiv.2506.21322

  52. [52]

    1-Strongly disagree

    Larry Wasserman. 2013.All of statistics: a concise course in statistical inference. Springer Science & Business Media. A How to use the HCTS and interpret the results The Human Computer Trust scale (HCTS) is a quick-and-dirty psy- chometric scale that assesses individuals’ predisposition to trust a technological artifact. This instrument can be used for e...