pith. sign in

arxiv: 2606.30980 · v1 · pith:BK7LQDVRnew · submitted 2026-06-29 · 💻 cs.HC

Ethics and Social Responsibility in AI-Assisted Interviewing: An LLM-in-the-Loop Study of AI-Generated Follow-Up Questions

Pith reviewed 2026-07-01 00:53 UTC · model grok-4.3

classification 💻 cs.HC
keywords AI ethicsLLM-assisted interviewingWizard-of-Oz studyfollow-up questionssocial responsibilityprivacy risksqualitative methodsdiscriminatory language
0
0 comments X

The pith

A Wizard-of-Oz study finds five interlocking ethical concerns when LLMs generate follow-up questions during live interviews.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a simulated AI follow-up assistant, powered by GPT-4o and mediated by a human co-interviewer, surfaces five specific worries among 17 interviewers of varying expertise. These worries center on harmful language, diminished respect for interviewees, unequal participation, unclear accountability, and privacy exposures. A sympathetic reader would care because semi-structured interviews are widely used in research, hiring, and social services, where AI tools are increasingly proposed to reduce cognitive load. The study translates the concerns directly into recommendations for design choices and governance structures that could make such systems safer.

Core claim

In an LLM-in-the-loop Wizard-of-Oz study, a human co-interviewer selectively relayed and could edit real-time AI-generated follow-up questions produced by GPT-4o. Across 17 participants, five interlocking concerns emerged: (1) harmful or discriminatory language and unpredictable interaction harms, (2) undermining interviewees' sense of respect through divided attention and missing nonverbal cues, (3) technology-based participation inequality, (4) unclear responsibility when harms occur, and (5) privacy, disclosure, and compliance risks when AI listens, records, or transcribes sensitive content. The authors translate these concerns into design and governance implications for safer, more respe

What carries the argument

The selective-relay Wizard-of-Oz setup in which a human co-interviewer edits and relays GPT-4o-generated questions during live interviews while preserving human oversight.

If this is right

  • Systems must incorporate safeguards against discriminatory or harmful language in generated questions.
  • Designs should minimize interviewer distraction to preserve respect and attention toward interviewees.
  • Clear accountability protocols are required to assign responsibility when AI outputs cause harm.
  • Privacy and compliance features must handle sensitive content when AI records or transcribes interviews.
  • Solutions for technology access are needed to avoid creating participation inequality among interviewees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same five concerns could appear in other AI-mediated qualitative data collection settings such as focus groups or oral histories.
  • A side-by-side comparison of fully automated versus human-mediated versions would test whether the observed concerns are artifacts of the Wizard-of-Oz mediation.
  • Policy makers could use these concerns as a starting list when drafting domain-specific guidelines for AI tools in human-subject research.
  • Training programs for interviewers might reduce some risks by teaching effective collaboration with AI assistants.

Load-bearing premise

The selective-relay Wizard-of-Oz setup with a human co-interviewer editing AI-generated questions produces concerns that accurately reflect those that would arise in actual deployed AI-assisted interviewing systems without full automation.

What would settle it

Deploy a fully automated version of the same AI follow-up system with no human editing or relay step, then measure whether participants still report the same five concerns in the same proportions.

Figures

Figures reproduced from arXiv: 2606.30980 by He Zhang, Jie Cai, John M. Carroll, Xin Guan, Yueyan Liu.

Figure 1
Figure 1. Figure 1: LLM-in-the-loop Wizard-of-Oz study design for AI-assisted qualitative interviewing (study schematic). The lead [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
read the original abstract

Semi-structured interviews rely on timely, context-sensitive follow-up questions, yet interviewers' cognitive load and limited domain familiarity can constrain probing depth. We report findings from an LLM-in-the-loop Wizard-of-Oz (WoZ) study that simulates an AI follow-up assistant in live interviewing while preserving human oversight. In our setup, a co-interviewer selectively relayed and could edit AI-generated follow-up questions (AGQs) produced in real time by GPT-4o, enabling a realistic approximation of deployment without fully automating the interaction. Across 17 interviewers with varied qualitative-method expertise, participants raised five interlocking concerns: (1) harmful or discriminatory language and unpredictable interaction harms, (2) undermining interviewees' sense of respect through divided attention and missing nonverbal cues, (3) technology-based participation inequality, (4) unclear responsibility when harms occur, and (5) privacy, disclosure, and compliance risks when AI listens, records, or transcribes sensitive content. We translate these concerns into design and governance implications for safer, more respectful, and more accountable AI-assisted interviewing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript reports findings from an LLM-in-the-loop Wizard-of-Oz study simulating AI-assisted semi-structured interviewing. Using GPT-4o to generate follow-up questions in real time, with a human co-interviewer selectively relaying and editing them, the study with 17 interviewers of varying expertise identifies five interlocking participant concerns—harmful language, undermined respect, participation inequality, unclear responsibility, and privacy risks—and translates these into design and governance implications.

Significance. If the concerns are robustly supported and representative of deployed systems, the work offers timely empirical grounding for ethical considerations in AI tools for qualitative interviewing and human-computer interaction, highlighting risks that could inform safer system design and policy.

major comments (2)
  1. [Study Setup (Abstract)] Study Setup (Abstract): The selective-relay WoZ design permits the co-interviewer to edit or reject AI-generated questions before relay. This filtering mechanism may reduce the observed incidence of harmful or discriminatory language (concern 1) compared to fully automated deployment. The manuscript does not report the fraction of AGQs edited or rejected, leaving unquantified how well the observed concerns map to real AI-assisted systems without human mediation.
  2. [Methods] Methods: No details are provided on participant recruitment, interview protocol, data analysis method, or the process for coding and validating the five concerns. This absence limits evaluation of whether the reported concerns are robustly derived from the data.
minor comments (1)
  1. [Abstract] Abstract: Consider adding the total number of interviews conducted or AGQs generated to contextualize the scale of the qualitative findings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. We address each of the major comments below.

read point-by-point responses
  1. Referee: [Study Setup (Abstract)] Study Setup (Abstract): The selective-relay WoZ design permits the co-interviewer to edit or reject AI-generated questions before relay. This filtering mechanism may reduce the observed incidence of harmful or discriminatory language (concern 1) compared to fully automated deployment. The manuscript does not report the fraction of AGQs edited or rejected, leaving unquantified how well the observed concerns map to real AI-assisted systems without human mediation.

    Authors: We acknowledge the referee's point that the selective-relay WoZ design introduces human mediation, which could potentially mitigate some of the observed concerns, particularly harmful language, relative to a fully automated system. This design was chosen to ethically simulate AI assistance while maintaining human oversight, consistent with the 'LLM-in-the-Loop' framing of the study. Although we did not log the precise fraction of AI-generated questions that were edited or rejected (our data collection prioritized interviewer reflections over quantitative system metrics), we agree that this limits direct mapping to unmediated deployments. In the revised manuscript, we will add a dedicated discussion of this limitation, including how the five concerns might manifest differently without human filtering, and suggest avenues for future fully automated studies under appropriate safeguards. revision: yes

  2. Referee: [Methods] Methods: No details are provided on participant recruitment, interview protocol, data analysis method, or the process for coding and validating the five concerns. This absence limits evaluation of whether the reported concerns are robustly derived from the data.

    Authors: We appreciate this feedback and agree that the Methods section requires greater detail for transparency and replicability. The original manuscript includes descriptions of these elements, but they may not have been sufficiently prominent or detailed. We will revise the Methods section to explicitly cover: (1) participant recruitment via university networks and professional lists, yielding 17 interviewers with diverse expertise levels; (2) the semi-structured interview protocol, including topics discussed; (3) the data analysis approach using reflexive thematic analysis; and (4) the coding and validation process for the five concerns, involving multiple coders and consensus-building. These revisions will ensure the robustness of our findings is clear. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical qualitative study with no derivations or self-referential predictions

full rationale

The paper reports participant concerns from a Wizard-of-Oz study with 17 interviewers; its claims rest on transcribed feedback rather than any equations, fitted parameters, or predictions that reduce to inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support a central result. The setup assumption (human editing approximates deployment) is an acknowledged methodological limit but does not create definitional or statistical circularity in the reported findings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical qualitative study and introduces no free parameters, mathematical axioms, or invented entities. It rests on standard domain assumptions about the validity of self-reported concerns in simulated technology-use scenarios.

axioms (1)
  • domain assumption Participant self-reports in a simulated AI-assisted interview setting can surface meaningful and generalizable ethical concerns about real deployment.
    The study treats the five concerns as actionable for design and governance without additional validation data.

pith-pipeline@v0.9.1-grok · 5733 in / 1268 out tokens · 43685 ms · 2026-07-01T00:53:56.067805+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 14 canonical work pages

  1. [1]

    W. C. Adams. 2015. Conducting semi-structured interviews. InHandbook of Practical Program Evaluation, K. E. Newcomer, H. P. Hatry, and J. S. Wholey (Eds.). Jossey-Bass, 492–505. https://doi.org/10.1002/9781119171386.ch19

  2. [2]

    Uwe. Flick. 2006.An Introduction to Qualitative Research. SAGE Publications. https://books.google.com/books?id=t45GmKMZp0MC

  3. [3]

    National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research

    United States. National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. 1978.The Belmont Report: Ethical Princi- ples and Guidelines for the Protection of Human Subjects of Research : Appendix. Number v. 2 in DHEW publication ; no. (OS) 78-0013. Department of Health, Edu- cation, and Welfare, National Commission for ...

  4. [4]

    Koji Inoue, Kohei Hara, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, Katsuya Takanashi, and Tatsuya Kawahara. 2020. Job Interviewer Android with Elaborate Follow-up Question Generation. InProceedings of the 2020 In- ternational Conference on Multimodal Interaction(Virtual Event, Netherlands) (ICMI ’20). Association for Computing Machinery, New York, NY,...

  5. [5]

    Shazia Jamshed. 2014. Qualitative research method-interviewing and observation. Journal of basic and clinical pharmacy5, 4 (2014), 87. https://doi.org/10.4103/0976- 0105.141942

  6. [6]

    Orit Karnieli-Miller, Roni Strier, and Liat Pessach. 2009. Power relations in qualitative research.Qualitative health research19, 2 (2009), 279–289. https: //doi.org/10.1177/1049732308329306

  7. [7]

    Nigel King, Joanna Brooks, and Christine Horrocks. 2019. Interviews in qualitative research. (2019). https://doi.org/10.4135/9781036234881

  8. [8]

    Steinar Kvale. 2006. Dominance through interviews and dialogues.Qualitative inquiry12, 3 (2006), 480–500. https://doi.org/10.1177/1077800406286235

  9. [9]

    Chee Wee Leong, Navaneeth Jawahar, Vinay Basheerabad, Torsten Wörtwein, Andrew Emerson, and Guy Sivan. 2024. Combining Generative and Discrimi- native AI for High-Stakes Interview Practice. InCompanion Proceedings of the 26th International Conference on Multimodal Interaction(San Jose, Costa Rica) (ICMI Companion ’24). Association for Computing Machinery,...

  10. [10]

    Bingjie Liu, Lewen Wei, Mu Wu, and Tianyi Luo. 2023. Speech production under uncertainty: how do job applicants experience and communicate with an AI interviewer?Journal of Computer-Mediated Communication28, 4 (2023), zmad028. https://doi.org/10.1093/jcmc/zmad028

  11. [11]

    Zhe Liu, Jiamin Dai, Cristina Conati, and Joanna McGrenere. 2025. Envisioning AI Support during Semi-Structured Interviews Across the Expertise Spectrum. Proceedings of the ACM on Human-Computer Interaction9, 2, Article CSCW011 (May 2025), 29 pages. https://doi.org/10.1145/3710909

  12. [12]

    Marshall and G.B

    C. Marshall and G.B. Rossman. 2014.Designing Qualitative Research. SAGE Publications. https://books.google.com/books?id=-zncBQAAQBAJ

  13. [13]

    Kevin R. McKee. 2024. Human Participants in AI Research: Ethics and Trans- parency in Practice.IEEE Transactions on Technology and Society5, 3 (2024), 279–288. https://doi.org/10.1109/TTS.2024.3446183

  14. [14]

    Roulston

    K. Roulston. 2010.Reflective Interviewing: A Guide to Theory and Practice. SAGE. https://doi.org/10.4135/9781446288009

  15. [15]

    Alexander Spangher, Michael Lu, Sriya Kalyan, Hyundong Justin Cho, Tenghao Huang, Weiyan Shi, and Jonathan May. 2025. NewsInterview: a Dataset and a Playground to Evaluate LLMs’ Grounding Gap via Informational Interviews. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce ...

  16. [16]

    Kuldeep Yadav, Animesh Seemendra, Abhishek Singhania, Sagar Bora, Pratyaksh Dubey, and Varun Aggarwal. 2023. Interviewing the Interviewer: AI-generated Insights to Help Conduct Candidate-centric Interviews. InProceedings of the 28th International Conference on Intelligent User Interfaces(Sydney, NSW, Australia) (IUI ’23). Association for Computing Machine...

  17. [17]

    He Zhang, Yueyan Liu, Xin Guan, Jie Cai, and John M. Carroll. 2025. Harnessing the Power of AI in Qualitative Research: Role Assignment, Engagement, and User Perceptions of AI-Generated Follow-Up Questions in Semi-Structured Interviews. arXiv:2509.12709 [cs.HC] https://arxiv.org/abs/2509.12709