Adversarial Co-Thinking: Calibration and Triangulation Across Multiple GenAI Tools in HCI Writing

Pia Tukkinen

arxiv: 2606.06702 · v1 · pith:5G7XLWLCnew · submitted 2026-06-04 · 💻 cs.HC

Adversarial Co-Thinking: Calibration and Triangulation Across Multiple GenAI Tools in HCI Writing

Pia Tukkinen This is my paper

Pith reviewed 2026-06-27 23:17 UTC · model grok-4.3

classification 💻 cs.HC

keywords adversarial co-thinkingGenAI toolsacademic writingtriangulationdisclosure frameworksepistemic practiceHCI writingcalibration

0 comments

The pith

Adversarial co-thinking with multiple GenAI tools can amplify expertise but also mask its absence in academic writing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes what occurs when GenAI tools are embedded from the first sentence of an academic draft rather than used only for polishing. The author drafted in parallel with Claude, ChatGPT, and Gemini, then compared outputs to her own intended contribution while calibrating against prior peer reviews. A recurring pattern emerged called adversarial co-thinking, in which tool outputs are set against one another to be tested instead of accepted. Sympathetic readers would care because tools default to praise, so the central skill becomes evaluative triangulation rather than generation, and this practice can strengthen real expertise or hide its lack. The work questions whether existing disclosure rules can track this change and offers four propositions on autonomy, supervision, equity, and disclosure for further discussion.

Core claim

The author argues that adversarial co-thinking is a high-skill epistemic practice that can amplify expertise where it exists but can also mask its absence, and that current disclosure frameworks are poorly equipped to capture this shift when GenAI tools are fully integrated into the drafting of HCI writing.

What carries the argument

Adversarial co-thinking: calibrating GenAI tool outputs against past peer reviews and triangulating them against each other to generate tested critique rather than deferring to praise.

If this is right

The benefit of GenAI in academic writing depends primarily on the writer's evaluative skill at triangulation rather than on the generative capacity of the tools.
Disclosure frameworks need to account for the writer's active role in producing critique, not merely record tool use.
Equity of access to effective GenAI-assisted writing may hinge on prior expertise in critical evaluation.
Academic supervision should address the risk that adversarial co-thinking can conceal insufficient expertise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Workshops on responsible AI use in research could shift emphasis from prompt design to training in multi-tool critique.
The same triangulation method might be tested in non-academic domains such as technical reporting or policy writing.
Single-tool workflows could systematically yield shallower critique than multi-tool adversarial setups if the observed default-to-praise pattern holds.

Load-bearing premise

The pattern that GenAI tools default to praise and require active adversarial triangulation to produce useful critique is a general feature of the tools rather than specific to the author's prompts, tool versions, or the topic of this single paper.

What would settle it

A controlled test in which the same three GenAI tools, given only neutral prompts without calibration or cross-tool comparison, consistently deliver balanced substantive critique on academic drafts instead of praise.

read the original abstract

This paper examines what happens when GenAI tools are fully embedded in the drafting of an academic paper rather than confined to late-stage polishing. To investigate how an intensive multi-tool GenAI workflow differs from conventional academic writing, I drafted this paper from the first sentence in parallel with three GenAI tools - Claude, ChatGPT, and Gemini - comparing their outputs against my own intended contribution. Across this process, a recurring pattern took shape that I call adversarial co-thinking: using past peer reviews to calibrate the tools, then setting their outputs against one another to be tested rather than deferred to. I argue that surfacing genuine critique from tools that default to praise is a central practical challenge of working with these tools, and that the skill at stake is evaluative rather than generative. Adversarial co-thinking is a high-skill epistemic practice: it can amplify expertise where it exists, but it can also mask its absence. I further argue that current disclosure frameworks are poorly equipped to capture this shift. The paper offers four propositions for workshop discussion concerning autonomy, supervision, equity of access, and disclosure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A single-author reflection on drafting with three GenAI tools that introduces adversarial co-thinking but offers no systematic evidence beyond personal workflow.

read the letter

This paper is a reflective piece where the author used three GenAI tools to help draft it and reflects on the process. The key takeaway is that getting useful critique from these tools requires deliberate adversarial co-thinking by triangulating outputs and calibrating with past reviews, rather than accepting their default positive responses.

The new element is the specific framing around multi-tool use and the term adversarial co-thinking. It does a decent job pointing out that current disclosure rules don't capture the shift in how writing is done when AI is involved from the start. The propositions for workshop discussion on autonomy, supervision, equity, and disclosure are reasonable starting points for conversation.

The main limitation is that all observations come from one person's experience with one paper. There's no data from other authors, different topics, or controlled comparisons. This makes it hard to know if the need for adversarial triangulation is a general feature of the tools or something specific to this setup. The self-referential nature, where the paper is both the method and the example, adds to the circular feel.

The argument that this practice can amplify expertise or mask its absence is plausible but not demonstrated beyond anecdote. The claim about tools defaulting to praise isn't backed by systematic testing.

This kind of paper is for people in HCI or similar fields who are thinking about integrating GenAI into their writing workflows. It could be useful for sparking discussion but isn't positioned as a definitive study.

I think it deserves peer review because the topic is timely and the issues raised are practical. A referee could help strengthen the framing or suggest ways to make the claims more robust.

Referee Report

2 major / 2 minor

Summary. This paper presents a reflective account of drafting an academic manuscript in HCI using three GenAI tools (Claude, ChatGPT, and Gemini) in parallel from the initial sentence. The author describes calibrating the tools with past peer reviews and then triangulating their outputs against each other and the author's intent, leading to the proposed practice of 'adversarial co-thinking'. The central claims are that GenAI tools tend to default to praise, requiring active adversarial engagement to elicit useful critique; that this is a high-skill epistemic practice that can amplify or mask expertise; and that current disclosure frameworks are ill-suited to this workflow. The paper concludes with four propositions for workshop discussion on autonomy, supervision, equity of access, and disclosure.

Significance. If the described pattern holds beyond this single case, the work contributes to HCI discussions on GenAI integration in scholarly writing by emphasizing evaluative skills over generative ones and highlighting limitations in existing disclosure practices. The self-referential nature of the study, where the paper is the product of the method, provides a concrete example but also limits broader claims. It could serve as a starting point for empirical studies on multi-tool workflows.

major comments (2)

[Abstract and drafting process section] The claim that GenAI tools 'default to praise' (abstract) and that 'adversarial co-thinking' is required to surface genuine critique rests entirely on the single-author reflective workflow described in the drafting process section, with no systematic comparison to alternative prompting regimes, other authors, topics, or tool versions; this directly undermines the generality of the central claim about tool-inherent behavior.
[Abstract and propositions section] The assertion that 'adversarial co-thinking is a high-skill epistemic practice' that 'can amplify expertise where it exists, but it can also mask its absence' (abstract) is presented without any external validation, error analysis, or cross-case data, making the load-bearing distinction between amplification and masking unsupported by evidence beyond this self-referential case.

minor comments (2)

[Final section] The four propositions for workshop discussion are introduced at the end but lack explicit mapping back to specific observations from the three-tool triangulation process, reducing their traceability.
[Introduction] Notation for the new term 'adversarial co-thinking' would benefit from an explicit definition box or comparison table to related concepts such as adversarial prompting or multi-agent critique.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these precise comments on the scope and evidentiary basis of our reflective account. The manuscript is explicitly a single-case, self-referential exploration of a multi-tool workflow rather than a comparative study; we will revise to make this framing and its limitations clearer while preserving the paper's intent as a prompt for workshop discussion.

read point-by-point responses

Referee: [Abstract and drafting process section] The claim that GenAI tools 'default to praise' (abstract) and that 'adversarial co-thinking' is required to surface genuine critique rests entirely on the single-author reflective workflow described in the drafting process section, with no systematic comparison to alternative prompting regimes, other authors, topics, or tool versions; this directly undermines the generality of the central claim about tool-inherent behavior.

Authors: We accept that the observation of a default-to-praise tendency is drawn solely from the described single-author workflow and does not rest on systematic comparisons. The paper presents this as a recurring pattern encountered during the drafting of this specific manuscript. We will revise the abstract and drafting-process section to state explicitly that the claim is case-specific and offered as a hypothesis for further empirical investigation rather than a general assertion about tool behavior. revision: partial
Referee: [Abstract and propositions section] The assertion that 'adversarial co-thinking is a high-skill epistemic practice' that 'can amplify expertise where it exists, but it can also mask its absence' (abstract) is presented without any external validation, error analysis, or cross-case data, making the load-bearing distinction between amplification and masking unsupported by evidence beyond this self-referential case.

Authors: We agree that the amplification-versus-masking distinction is advanced on the basis of this single self-referential case without external validation or cross-case data. The manuscript positions the distinction as an observation emerging from applying the method to its own production. We will revise the abstract and the propositions section to label the distinction as a hypothesis derived from the case and to note the absence of broader validation, thereby aligning the language with the paper's reflective scope. revision: partial

Circularity Check

0 steps flagged

No circularity: reflective observation from transparent single-author workflow

full rationale

The paper is a qualitative reflection on an intensive multi-tool GenAI drafting process used to produce the manuscript itself. It reports observed patterns (tools defaulting to praise, value of adversarial triangulation) as direct outcomes of that workflow rather than as predictions, fitted parameters, or theorems derived from prior results. No equations, models, or uniqueness claims appear; no self-citations to the authors' earlier work are invoked as load-bearing premises; and the central propositions are framed as discussion points emerging from the described experience, not as reductions to inputs by construction. The self-referential aspect (paper written with the method it describes) is explicit and does not create definitional circularity under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces one new conceptual entity and relies on an untested domain assumption about GenAI behavior. No free parameters or formal axioms are present.

axioms (1)

domain assumption GenAI tools default to praise rather than substantive critique when assisting academic writing
Stated as the reason adversarial co-thinking is needed; appears in the description of the recurring pattern.

invented entities (1)

adversarial co-thinking no independent evidence
purpose: A proposed epistemic practice for triangulating multiple GenAI tools to surface critique during drafting
New term and method introduced as the central contribution based on the author's workflow.

pith-pipeline@v0.9.1-grok · 5716 in / 1291 out tokens · 28075 ms · 2026-06-27T23:17:14.470273+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 9 canonical work pages · 1 internal anchor

[1]

ACM. 2023. Policy on Authorship. https://www.acm.org/publications/policies/ new-acm-policy-on-authorship. Accessed: 2026-04-20

2023
[2]

ACM. 2025. Peer Review Policy: Frequently Asked Questions. https://www.acm. org/publications/policies/peer-review-faq Accessed: 2026-04-22

2025
[3]

Zana Buçinça, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making.Proceedings of the ACM on Human-Computer Interaction5, CSCW1, Article 188, 21 pages. doi:10.1145/3449287

work page internal anchor Pith review doi:10.1145/3449287 2021
[4]

Shiping Chen, Duncan Brumby, and Anna Cox. 2025. Envisioning the Future of Peer Review: Investigating LLM-Assisted Reviewing Using ChatGPT as a Case Study. InProceedings of the 4th Annual Symposium on Human-Computer Interaction for Work (CHIWORK ’25). ACM, Amsterdam, Netherlands. doi:10. 1145/3729176.3729196

arXiv 2025
[5]

Cecchinato, and Carman Neustaedter

Audrey Desjardins, Oscar Tomico, Andrés Lucero, Marta E. Cecchinato, and Carman Neustaedter. 2021. Introduction to the Special Issue on First-Person Methods in HCI.ACM Transactions on Computer-Human Interaction28, 6, Article 37 (2021), 12 pages. doi:10.1145/3492342

work page doi:10.1145/3492342 2021
[6]

Gaole He, Nilay Aishwarya, and Ujwal Gadiraju. 2025. Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant. In Proceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). ACM, Cagliari, Italy. doi:10.1145/3708359.3712133

work page doi:10.1145/3708359.3712133 2025
[7]

Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman
[8]

Co-Writing with Opinionated Language Models Affects Users’ Views

Co-Writing with Opinionated Language Models Affects Users’ Views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 111, 15 pages. doi:10.1145/3544548.3581196

work page doi:10.1145/3544548.3581196 2023
[9]

Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. AC...

work page doi:10.1145/3706598.3713778 2025
[10]

Ramesh Manuvinakurike, Emanuel Moss, Elizabeth Anne Watkins, Saurav Sa- hay, Giuseppe Raffa, and Lama Nachman. 2025. Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines. InProceedings of the 5th ACM CHI Workshop on Human-Centered Explainable AI (HCXAI ’25). doi:10.5281/zenodo.15170393

work page doi:10.5281/zenodo.15170393 2025
[11]

Hauke Sandhaus, Kashif Imteyaz, Mohammed Almutairi, Pooja Prajod, Divya Ramesh, Saiph Savage, Qian Yang, and Michael Muller. 2026. Interrogating GenAI Augmentation for CHIworkers: Strategies for Professional Autonomy and Accountability. InAdjunct Proceedings of the 5th Annual Symposium on Human- Computer Interaction for Work(Linz, Austria)(CHIWORK ’26). A...

work page doi:10.1145/3805029.3818271 2026
[12]

Advait Sarkar. 2024. Large Language Models Cannot Explain Themselves. In Proceedings of the 4th ACM CHI Workshop on Human-Centered Explainable AI (HCXAI ’24). arXiv:2405.04382

arXiv 2024
[13]

Ashley Suh, Kenneth Alperin, Harry Li, and Steven R Gomez. 2025. Don’t Just Translate, Agitate: Using Large Language Models as Devil’s Advocates for AI Explanations. InProceedings of the 5th ACM CHI Workshop on Human-Centered Explainable AI (HCXAI ’25). doi:10.5281/zenodo.15170455

work page doi:10.5281/zenodo.15170455 2025
[14]

Yuan Sun et al. 2026. Be Friendly, Not Friends: How LLM Sycophancy Shapes User Trust. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, Barcelona, Spain. doi:10.1145/3772318.3791079

work page doi:10.1145/3772318.3791079 2026

[1] [1]

ACM. 2023. Policy on Authorship. https://www.acm.org/publications/policies/ new-acm-policy-on-authorship. Accessed: 2026-04-20

2023

[2] [2]

ACM. 2025. Peer Review Policy: Frequently Asked Questions. https://www.acm. org/publications/policies/peer-review-faq Accessed: 2026-04-22

2025

[3] [3]

Zana Buçinça, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making.Proceedings of the ACM on Human-Computer Interaction5, CSCW1, Article 188, 21 pages. doi:10.1145/3449287

work page internal anchor Pith review doi:10.1145/3449287 2021

[4] [4]

Shiping Chen, Duncan Brumby, and Anna Cox. 2025. Envisioning the Future of Peer Review: Investigating LLM-Assisted Reviewing Using ChatGPT as a Case Study. InProceedings of the 4th Annual Symposium on Human-Computer Interaction for Work (CHIWORK ’25). ACM, Amsterdam, Netherlands. doi:10. 1145/3729176.3729196

arXiv 2025

[5] [5]

Cecchinato, and Carman Neustaedter

Audrey Desjardins, Oscar Tomico, Andrés Lucero, Marta E. Cecchinato, and Carman Neustaedter. 2021. Introduction to the Special Issue on First-Person Methods in HCI.ACM Transactions on Computer-Human Interaction28, 6, Article 37 (2021), 12 pages. doi:10.1145/3492342

work page doi:10.1145/3492342 2021

[6] [6]

Gaole He, Nilay Aishwarya, and Ujwal Gadiraju. 2025. Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant. In Proceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). ACM, Cagliari, Italy. doi:10.1145/3708359.3712133

work page doi:10.1145/3708359.3712133 2025

[7] [7]

Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman

[8] [8]

Co-Writing with Opinionated Language Models Affects Users’ Views

Co-Writing with Opinionated Language Models Affects Users’ Views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 111, 15 pages. doi:10.1145/3544548.3581196

work page doi:10.1145/3544548.3581196 2023

[9] [9]

Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. AC...

work page doi:10.1145/3706598.3713778 2025

[10] [10]

Ramesh Manuvinakurike, Emanuel Moss, Elizabeth Anne Watkins, Saurav Sa- hay, Giuseppe Raffa, and Lama Nachman. 2025. Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines. InProceedings of the 5th ACM CHI Workshop on Human-Centered Explainable AI (HCXAI ’25). doi:10.5281/zenodo.15170393

work page doi:10.5281/zenodo.15170393 2025

[11] [11]

Hauke Sandhaus, Kashif Imteyaz, Mohammed Almutairi, Pooja Prajod, Divya Ramesh, Saiph Savage, Qian Yang, and Michael Muller. 2026. Interrogating GenAI Augmentation for CHIworkers: Strategies for Professional Autonomy and Accountability. InAdjunct Proceedings of the 5th Annual Symposium on Human- Computer Interaction for Work(Linz, Austria)(CHIWORK ’26). A...

work page doi:10.1145/3805029.3818271 2026

[12] [12]

Advait Sarkar. 2024. Large Language Models Cannot Explain Themselves. In Proceedings of the 4th ACM CHI Workshop on Human-Centered Explainable AI (HCXAI ’24). arXiv:2405.04382

arXiv 2024

[13] [13]

Ashley Suh, Kenneth Alperin, Harry Li, and Steven R Gomez. 2025. Don’t Just Translate, Agitate: Using Large Language Models as Devil’s Advocates for AI Explanations. InProceedings of the 5th ACM CHI Workshop on Human-Centered Explainable AI (HCXAI ’25). doi:10.5281/zenodo.15170455

work page doi:10.5281/zenodo.15170455 2025

[14] [14]

Yuan Sun et al. 2026. Be Friendly, Not Friends: How LLM Sycophancy Shapes User Trust. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, Barcelona, Spain. doi:10.1145/3772318.3791079

work page doi:10.1145/3772318.3791079 2026