arxiv: 2604.05658 · v1 · submitted 2026-04-07 · 💻 cs.HC

How Much Trust is Enough? Towards Calibrating Trust in Technology

Gabriela Beltr\~ao , Debora F. de Souza , Sonia Sousa , David Lamas This is my paper

Pith reviewed 2026-05-10 18:48 UTC · model grok-4.3

classification 💻 cs.HC

keywords trust in technologyHuman-Computer Trust Scaletrust propensityHCItrust calibrationempirical studyinterpretation guidelineshuman-computer interaction

0 comments

The pith

The Human-Computer Trust Scale offers an initial measure of trust propensity but requires contextual interpretation to be useful.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports on an empirical study that develops guidelines for interpreting scores from the Human-Computer Trust Scale. The study aims to help users and designers calibrate appropriate levels of trust in technology systems. A sympathetic reader would care because as technology becomes more autonomous and opaque, mismatched trust can lead to underuse or overreliance. The authors argue that trust assessment is not straightforward and needs reflection within the specific interaction context. Their approach provides a practical starting point for evaluating propensity to trust.

Core claim

The paper's central claim is that the HCTS is a promising tool for an initial evaluation of propensity to trust, but such an assessment requires reflection and interpretation that should be considered within the context of the interaction. The authors present the process used to develop a guideline for interpreting the instrument's results and explain the rationale for their decisions, advocating for calibrating trust in technology within HCI.

What carries the argument

The Human-Computer Trust Scale (HCTS) as a tool for initial trust propensity evaluation, supported by newly developed interpretation guidelines derived from empirical data.

If this is right

Designers can use HCTS scores early to adjust system transparency and reduce mismatched expectations.
Users can apply the guidelines to reflect on their trust tendencies before committing to new technologies.
HCI researchers obtain a structured method for turning abstract trust concepts into actionable assessments.
Trust calibration shifts from an ideal to a practical process integrated into system development.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These guidelines could be adapted for high-stakes domains such as medical or autonomous systems to mitigate risks from misplaced trust.
The focus on context implies that static trust metrics alone may fall short for rapidly evolving technologies.
Training users to apply the interpretation process might increase long-term adoption rates of complex systems.

Load-bearing premise

That the empirical study provides sufficient evidence to create reliable, generalizable interpretation guidelines for the HCTS without further validation or details on study methods, sample, or limitations.

What would settle it

A replication study that applies the developed HCTS interpretation guidelines in a different interaction context and finds that the resulting trust level predictions do not align with observed user behaviors or system outcomes.

Figures

Figures reproduced from arXiv: 2604.05658 by David Lamas, Debora F. de Souza, Gabriela Beltr\~ao, Sonia Sousa.

**Figure 3.** Figure 3: Visual representation of proposed interpretation range [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the workflow across studies, from ad [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

The role of trust within Human-Computer Interaction is being redefined. With the increasing omnipresence, autonomy, and opacity of technology, users often struggle to understand the capabilities and limitations of systems. In this article, we present the results of an empirical study designed to provide a practical, evidence-based interpretation of trust propensity assessment using the Human-Computer Trust Scale (HCTS). We outline the process used to develop a guideline for interpreting the instrument's results and explain the rationale for our decisions, advocating for calibrating trust in technology within HCI. Our findings demonstrate that the HCTS is a promising tool for conducting an initial evaluation of propensity to trust, but that such an assessment requires reflection and interpretation that should be considered within the context of the interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper creates interpretation guidelines for the existing HCTS trust scale from an empirical study, but the abstract gives almost no information on the study methods or data, so the guidelines are hard to evaluate.

read the letter

The main takeaway is that the authors ran an empirical study to turn the Human-Computer Trust Scale into something more usable by adding practical guidelines for interpreting its scores. They argue the scale can serve as a starting point for assessing how much someone tends to trust technology, but any real use requires thinking about the specific interaction context rather than treating the numbers as standalone facts.

Referee Report

2 major / 2 minor

Summary. The manuscript presents results from an empirical study intended to yield a practical, evidence-based guideline for interpreting scores on the Human-Computer Trust Scale (HCTS) as a measure of users' propensity to trust technology. The authors describe the guideline-development process, rationalize their decisions, and conclude that the HCTS is a promising instrument for initial propensity assessment provided that interpretation remains contextual and reflective.

Significance. If the underlying empirical evidence is shown to be robust, the work would offer a timely contribution to HCI by supplying a concrete tool for calibrating trust in increasingly autonomous and opaque systems. The explicit caveat that assessment requires contextual reflection is a constructive strength that guards against over-interpretation. Transparent reporting of the decision rationale during guideline construction is also a positive feature.

major comments (2)

[§3] §3 (Empirical Study / Methods): The manuscript provides no information on participant count, demographics, recruitment, task design, or statistical procedures used to derive the interpretation guidelines. These omissions make it impossible to evaluate whether the guidelines are supported by adequate evidence or influenced by post-hoc choices.
[§4] §4 (Results / Guideline Presentation): No quantitative findings, tables, or validation metrics (e.g., reliability coefficients, inter-rater agreement, or cross-validation results) are reported to justify the specific score thresholds or interpretive categories in the proposed guideline.

minor comments (2)

[Abstract] The abstract could include a brief statement of sample size and primary quantitative outcomes to help readers gauge the strength of the claims at first reading.
[Discussion] The limitations paragraph should explicitly discuss the scope of generalizability given the (currently unreported) participant pool and study context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments identify key omissions in the description of our empirical study that must be addressed to allow proper evaluation of the work. We respond to each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [§3] §3 (Empirical Study / Methods): The manuscript provides no information on participant count, demographics, recruitment, task design, or statistical procedures used to derive the interpretation guidelines. These omissions make it impossible to evaluate whether the guidelines are supported by adequate evidence or influenced by post-hoc choices.

Authors: We acknowledge that the methods section omits these critical details. This was an oversight during manuscript preparation. In the revised version we will expand §3 to report the exact participant count, full demographics, recruitment procedures, task design, and the statistical analyses used to derive the interpretation guidelines. These additions will make the empirical foundation transparent and allow readers to assess whether the guidelines rest on adequate evidence. revision: yes
Referee: [§4] §4 (Results / Guideline Presentation): No quantitative findings, tables, or validation metrics (e.g., reliability coefficients, inter-rater agreement, or cross-validation results) are reported to justify the specific score thresholds or interpretive categories in the proposed guideline.

Authors: We agree that the results section currently lacks quantitative findings, tables, and validation metrics. The submitted manuscript focused on the guideline and its rationale but did not present the supporting data. We will revise §4 to include the relevant quantitative results, tables summarizing participant responses, reliability coefficients, and any validation metrics (such as inter-rater agreement or cross-validation) that justify the chosen thresholds and categories. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical guideline development does not reduce to self-referential inputs

full rationale

The paper presents results from an empirical study to develop an interpretation guideline for the Human-Computer Trust Scale (HCTS). No mathematical derivations, equations, parameter fitting, or predictive models are described that could reduce by construction to the study's own inputs. The central claim rests on the outcomes of the empirical process itself, which is presented as external evidence rather than a self-defining loop. Self-citations are not invoked as load-bearing uniqueness theorems or ansatzes. The work is self-contained as a descriptive empirical contribution without the circular patterns enumerated in the analysis criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a non-mathematical empirical study focused on scale interpretation guidelines; no free parameters, axioms, or invented entities are identifiable from the abstract.

pith-pipeline@v0.9.0 · 5430 in / 1071 out tokens · 58377 ms · 2026-05-10T18:48:48.341654+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

[1]

1989.Questionnaire construction manual

Bettina A Babbitt and Charles O Nystrom. 1989.Questionnaire construction manual. Technical Report

work page 1989
[2]

Tita Alissa Bach, Amna Khan, Harry Hallock, Gabriela Beltrão, and Sonia Sousa

work page
[3]

A Systematic Literature Review of User Trust in AI-Enabled Systems: An HCI Perspective.International Journal of Human–Computer Interaction(2022), 1–16

work page 2022
[4]

Aaron Bangor, Philip Kortum, and James Miller. 2009. Determining what indi- vidual SUS scores mean: Adding an adjective rating scale.Journal of usability studies4, 3 (2009), 114–123

work page 2009
[5]

Aaron Bangor, Philip T Kortum, and James T Miller. 2008. An empirical evaluation of the system usability scale.Intl. Journal of Human–Computer Interaction24, 6 (2008), 574–594

work page 2008
[6]

Gabriela Beltrão and Sonia Sousa. 2021. Factors Influencing Trust in WhatsApp: A Cross-Cultural Study. InInternational Conference on Human-Computer Interaction. Springer, 495–508

work page 2021
[7]

Gabriela Beltrão, Sonia Sousa, and David Lamas. 2023. Trust in Facial Recognition Systems: A Perspective from the Users. InIFIP Conference on Human-Computer Interaction. Springer, 379–388

work page 2023
[8]

Gabriela Beltrão, Sonia Sousa, and David Lamas. 2025. Assessing the Measure- ment Invariance of the Human–Computer Trust Scale.Electronics14, 9 (2025), 1806

work page 2025
[9]

Anol Bhattacherjee. 2002. Individual trust in online firms: Scale development and initial test.Journal of management information systems19, 1 (2002), 211–241

work page 2002
[10]

Susanne Bødker. 2006. When second wave HCI meets third wave challenges. In Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles. 1–8

work page 2006
[11]

Susanne Bødker. 2015. Third-wave HCI, 10 years later—participation and sharing. interactions22, 5 (2015), 24–31. CHI ’26, April 13–17, 2026, Barcelona, Spain Beltrão et al

work page 2015
[12]

John Brooke et al. 1996. SUS-A quick and dirty usability scale.Usability evaluation in industry189, 194 (1996), 4–7

work page 1996
[13]

Debora Firmino de Souza, Sonia Sousa, Kadri Kristjuhan-Ling, Olga Dunajeva, Mare Roosileht, Avar Pentel, Mati Mõttus, Mustafa Can Özdemir, and Žanna Gratšjova. 2025. Trust and Trustworthiness from Human-Centered Perspective in HRI – A Systematic Literature Review. arXiv:2501.19323 [cs.HC] https://arxiv. org/abs/2501.19323

work page arXiv 2025
[14]

Ewart J De Visser, Marieke MM Peeters, Malte F Jung, Spencer Kohn, Tyler H Shaw, Richard Pak, and Mark A Neerincx. 2020. Towards a theory of longitudinal trust calibration in human–robot teams.International journal of social robotics 12, 2 (2020), 459–478

work page 2020
[15]

Stuart Carter Dodd and Thomas R Gerbrick. 1960. Word scales for degrees of opinion.Language and Speech3, 1 (1960), 18–31

work page 1960
[16]

Fred E Emery and Eric L Trist. 1960. Socio-technical systems.Management science, models and techniques2 (1960), 83–97

work page 1960
[17]

Kairi Fimberg and Sonia Sousa. 2020. The Impact of Website Design on Users’ Trust Perceptions. InInternational Conference on Applied Human Factors and Ergonomics. Springer, 267–274

work page 2020
[18]

Kerstin Fischer, Hanna Mareike Weigelin, and Leon Bodenhagen. 2018. In- creasing trust in human–robot medical interactions: effects of transparency and adaptability.Paladyn, Journal of Behavioral Robotics9, 1 (2018), 95–109. doi:10.1515/pjbr-2018-0007

work page doi:10.1515/pjbr-2018-0007 2018
[19]

Ilaria Gaudiello, Elisabetta Zibetti, Sébastien Lefort, Mohamed Chetouani, and Serena Ivaldi. 2016. Trust as indicator of robot functional and social acceptance. An experimental study on user conformation to iCub answers.Computers in Human Behavior61 (2016), 633–655

work page 2016
[20]

David Gefen, Elena Karahanna, and Detmar W Straub. 2003. Trust and TAM in online shopping: An integrated model.MIS quarterly(2003), 51–90

work page 2003
[21]

Harjinder Gill, Kathleen Boies, Joan E Finegan, and Jeffrey McNally. 2005. An- tecedents of trust: Establishing a boundary condition for the relation between propensity to trust and intention to trust.Journal of business and psychology19 (2005), 287–302

work page 2005
[22]

Siddharth Gulati, Sonia Sousa, and David Lamas. 2017. Modelling trust: An em- pirical assessment. InIFIP Conference on Human-Computer Interaction. Springer, 40–61

work page 2017
[23]

Siddharth Gulati, Sonia Sousa, and David Lamas. 2019. Design, development and evaluation of a human-computer trust scale.Behaviour & Information Technology 38, 10 (2019), 1004–1015

work page 2019
[24]

Hancock, Deborah R

Peter A. Hancock, Deborah R. Billings, Kristin E. Schaefer, Jessie Y. C. Chen, Ewart J. de Visser, and Raja Parasuraman. 2011. A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction.Human Factors: The Journal of Human Factors and Ergonomics Society53, 5 (2011), 517–527. doi:10.1177/0018720811417254

work page doi:10.1177/0018720811417254 2011
[25]

Kevin Anthony Hoff and Masooda Bashir. 2014. Trust in Automation.Human Factors: The Journal of Human Factors and Ergonomics Society57, 3 (2014), 407–434. doi:10.1177/0018720814547570

work page doi:10.1177/0018720814547570 2014
[26]

Alexandra D Kaplan, Theresa T Kessler, J Christopher Brill, and Peter A Hancock

work page
[27]

Trust in artificial intelligence: Meta-analytic findings.Human factors65, 2 (2023), 337–359

work page 2023
[28]

Spencer C Kohn, Ewart J De Visser, Eva Wiese, Yi-Ching Lee, and Tyler H Shaw

work page
[29]

Measurement of trust in automation: A narrative review and reference guide.Frontiers in psychology12 (2021), 604977

work page 2021
[30]

Bing Cai Kok and Harold Soh. 2020. Trust in robots: Challenges and opportunities. Current Robotics Reports1, 4 (2020), 297–309

work page 2020
[31]

2015.How many subjects?: Statis- tical power analysis in research

Helena Chmura Kraemer and Christine Blasey. 2015.How many subjects?: Statis- tical power analysis in research. Sage publications

work page 2015
[32]

John Lee and Neville Moray. 1992. Trust, control strategies and allocation of function in human-machine systems.Ergonomics35, 10 (1992), 1243–1270

work page 1992
[33]

John D Lee and Katrina A See. 2004. Trust in automation: Designing for appro- priate reliance.Human factors46, 1 (2004), 50–80

work page 2004
[34]

David Lewis and Andrew Weigert

J. David Lewis and Andrew Weigert. 1985. Trust as a Social Reality.Social Forces 63, 4 (1985), 967–985. doi:10.1093/sf/63.4.967

work page doi:10.1093/sf/63.4.967 1985
[35]

Robb, Bruce W

Mei Yii Lim, David A. Robb, Bruce W. Wilson, and Helen Hastie. 2023. Feeding the Coffee Habit: A Longitudinal Study of a Robo-Barista.2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)00 (2023), 1983–1990. doi:10.1109/ro-man57019.2023.10309621

work page doi:10.1109/ro-man57019.2023.10309621 2023
[36]

1978.The perceived favorableness of selected scale anchors and response alternatives

Josephine L Matthews, Calvin E Wright, Kenneth L Yudowitch, James Geddie, and RL Palmer. 1978.The perceived favorableness of selected scale anchors and response alternatives. Technical Report

work page 1978
[37]

Roger C Mayer, James H Davis, and F David Schoorman. 1995. An integrative model of organizational trust.Academy of management review20, 3 (1995), 709–734

work page 1995
[38]

D Harrison McKnight, Vivek Choudhury, and Charles Kacmar. 2002. Devel- oping and validating trust measures for e-commerce: An integrative typology. Information systems research13, 3 (2002), 334–359

work page 2002
[39]

Linda Miller, Johannes Kraus, Franziska Babel, and Martin Baumann. 2021. More Than a Feeling—Interrelation of Trust Layers in Human-Robot Interaction and the Role of User Dispositions and State Anxiety.Frontiers in Psychology12 (2021). doi:10.3389/fpsyg.2021.592711

work page doi:10.3389/fpsyg.2021.592711 2021
[40]

Bonnie M Muir. 1987. Trust between humans and machines, and the design of decision aids.International journal of man-machine studies27, 5-6 (1987), 527–539

work page 1987
[41]

Bonnie M Muir and Neville Moray. 1996. Trust in automation. Part II. Experi- mental studies of trust and human intervention in a process control simulation. Ergonomics39, 3 (1996), 429–460

work page 1996
[42]

Triin Oper and Sonia Sousa. 2020. User attitudes towards Facebook: perception and reassurance of trust (Estonian Case Study). InInternational Conference on Human-Computer Interaction. Springer, 224–230

work page 2020
[43]

Tim O’reilly. 2005. What is web 2.0. InOnline Communication and Collaboration: A Reader, Helen Donelan, Karen Kear, and Magnus Ramage (Eds.). Routledge

work page 2005
[44]

Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse.Human factors39, 2 (1997), 230–253

work page 1997
[45]

Paul A Pavlou and David Gefen. 2004. Building effective online marketplaces with institution-based trust.Information systems research15, 1 (2004), 37–59

work page 2004
[46]

Ana Pinto, Sonia Sousa, Cristóvão Silva, and Pedro Coelho. 2020. Adaptation and validation of the HCTM scale into human-robot interaction Portuguese context: a study of measuring trust in human-robot interactions. InProceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society. 1–4

work page 2020
[47]

Ana Pinto, Sónia Sousa, Ana Simões, Joana Santos, et al. [n. d.]. A Trust Scale for Human-Robot Interaction: Translation, Adaptation, and Validation of a Human Computer Trust Scale.Human Behavior and Emerging Technologies2022 ([n. d.])

work page
[48]

Felix Schoeller, Mark Miller, Roy Salomon, and Karl J Friston. 2021. Trust as extended control: Human-machine interactions as active inference.Frontiers in Systems Neuroscience15 (2021), 669810

work page 2021
[49]

Sonia Sousa, David Lamas, and Paulo Dias. 2014. A model for Human-computer Trust. InInternational Conference on Learning and Collaboration Technologies. Springer, 128–137

work page 2014
[50]

Real World

Nathan Tenhundfeld, Mustafa Demir, and Ewart de Visser. 2022. Assessment of Trust in Automation in the “Real World”: Requirements for New Trust in Automation Measurement Techniques for Use by Practitioners.Journal of Cognitive Engineering and Decision Making16, 2 (2022), 101–118. doi:10.1177/ 15553434221096261

work page 2022
[51]

Who Should I Believe?

Hong Wang, Natalia Calvo-Barajas, Katie Winkle, and Ginevra Castellano. 2025. “Who Should I Believe?”: User Interpretation and Decision-Making When a Family Healthcare Robot Contradicts Human Memory.arXiv(2025). doi:10. 48550/arxiv.2506.21322

work page arXiv 2025
[52]

1-Strongly disagree

Larry Wasserman. 2013.All of statistics: a concise course in statistical inference. Springer Science & Business Media. A How to use the HCTS and interpret the results The Human Computer Trust scale (HCTS) is a quick-and-dirty psy- chometric scale that assesses individuals’ predisposition to trust a technological artifact. This instrument can be used for e...

work page 2013