A Context-Aware Dataset for Stance Detection in Bioethical Controversies on Reddit
Pith reviewed 2026-06-27 06:46 UTC · model grok-4.3
The pith
BioStance supplies 39,600 context-preserved Reddit pairs labeled for stance in bioethical controversies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present BioStance, a context-aware dataset of 39,600 annotated Post-Comment pairs from Reddit bioethical discussions. BioStance covers six controversial targets across three dimensions of bioethical controversy: fundamental value conflicts, individual liberty versus collective responsibility, and technological uncertainty. Each instance preserves hierarchical conversational context and is labeled by three independent annotators using a three-class stance scheme: Favor, Against, and None. The annotations achieve a mean Krippendorff's α of 0.82, indicating substantial reliability.
What carries the argument
BioStance dataset, which supplies hierarchical conversational context for each post-comment pair to enable context-aware stance detection.
If this is right
- The dataset enables training and evaluation of context-aware stance detection models on bioethical topics.
- It supports argument mining research by providing structured conversational threads.
- It facilitates computational analysis of how bioethical discourse unfolds on social media platforms.
- The three dimensions and six targets allow systematic comparison across different types of controversy.
Where Pith is reading between the lines
- Similar context-preserving collection methods could be applied to stance datasets in other polarized domains such as climate policy or economic regulation.
- The annotation reliability suggests the three-class scheme may transfer to related tasks like detecting neutrality in ethical debates.
- Downstream models trained on BioStance might reveal patterns in how context shifts stance that single-post datasets miss.
Load-bearing premise
The hierarchical conversational context preserved in each instance meaningfully supports context-aware stance detection modeling, and the chosen targets and dimensions adequately represent bioethical controversies on Reddit.
What would settle it
Re-annotating a random sample of the pairs with new annotators and obtaining agreement below 0.67 on Krippendorff's alpha would undermine the reliability of the resource.
Figures
read the original abstract
Bioethical debates increasingly unfold on social media, yet stance detection research lacks large-scale, domain-specific resources for modeling such context-dependent discourse. We present BioStance, a context-aware dataset of 39,600 annotated Post-Comment pairs from Reddit bioethical discussions. BioStance covers six controversial targets across three dimensions of bioethical controversy: fundamental value conflicts, individual liberty versus collective responsibility, and technological uncertainty. Each instance preserves hierarchical conversational context and is labeled by three independent annotators using a three-class stance scheme: Favor, Against, and None. The annotations achieve a mean Krippendorff's $\alpha$ of 0.82, indicating substantial reliability. By combining thematic diversity, conversational structure, and high-quality human annotation, BioStance supports research on context-aware stance detection, argument mining, and computational analysis of bioethical discourse.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents BioStance, a dataset of 39,600 annotated Post-Comment pairs drawn from Reddit discussions on bioethical topics. It covers six targets across three dimensions (fundamental value conflicts, individual liberty vs. collective responsibility, technological uncertainty), preserves hierarchical conversational context in each instance, and reports annotations by three independent annotators using a three-class scheme (Favor, Against, None) with mean Krippendorff's α = 0.82.
Significance. If the construction details hold, the dataset would fill a documented gap in domain-specific, context-rich resources for stance detection and argument mining in bioethics on social media. The reported inter-annotator agreement and conversational structure are strengths that could support modeling of context-dependent discourse.
major comments (2)
- [Dataset Construction] Dataset Construction section: The manuscript provides the final size (39,600 pairs) and thematic coverage but does not detail the subreddit selection criteria, search terms, or sampling procedure used to identify bioethical discussions. This information is required to evaluate selection bias and the claim that the targets adequately represent bioethical controversies on Reddit.
- [Annotation] Annotation section: While the mean Krippendorff's α of 0.82 is reported, the paper does not describe the annotation guidelines given to annotators, the label distribution across the three dimensions, or any adjudication process. These details are load-bearing for the claim of 'high-quality human annotation' and the dataset's utility for downstream modeling.
minor comments (2)
- [Abstract] Abstract: The three dimensions are named but not illustrated with example targets; adding one concrete example per dimension would improve immediate readability.
- [Related Work] Related Work: The positioning against existing stance datasets would be strengthened by a brief quantitative comparison (e.g., size, domain, context preservation) rather than only qualitative statements.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and positive assessment of BioStance's potential contribution. We address each major comment below and will revise the manuscript to improve transparency and reproducibility.
read point-by-point responses
-
Referee: [Dataset Construction] Dataset Construction section: The manuscript provides the final size (39,600 pairs) and thematic coverage but does not detail the subreddit selection criteria, search terms, or sampling procedure used to identify bioethical discussions. This information is required to evaluate selection bias and the claim that the targets adequately represent bioethical controversies on Reddit.
Authors: We agree that these procedural details are necessary for assessing selection bias and representativeness. In the revised manuscript we will expand the Dataset Construction section with the specific subreddits chosen, the exact search terms and filters applied per target, and the sampling procedure (including how threads were filtered for sufficient context and how the final 39,600 pairs were obtained). revision: yes
-
Referee: [Annotation] Annotation section: While the mean Krippendorff's α of 0.82 is reported, the paper does not describe the annotation guidelines given to annotators, the label distribution across the three dimensions, or any adjudication process. These details are load-bearing for the claim of 'high-quality human annotation' and the dataset's utility for downstream modeling.
Authors: We concur that these elements are essential. The revised version will include the full annotation guidelines (as an appendix), a breakdown of label distributions (Favor/Against/None) by dimension and target, and a description of the adjudication procedure used when the three annotators disagreed. revision: yes
Circularity Check
No significant circularity
full rationale
This is a dataset release paper whose central contribution is the construction and annotation of BioStance (39,600 post-comment pairs, six targets, three dimensions, three-class labels, reported Krippendorff's α = 0.82). No mathematical derivations, fitted parameters, predictions, or uniqueness theorems appear in the abstract or described construction. The reported statistics are direct measurements of the annotation process itself rather than outputs derived from prior fitted values or self-citations. The paper therefore contains no load-bearing steps that reduce to their own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
and Collins, Robert N
Schott, Daniel H. and Collins, Robert N. and Bretscher, Anthony , title =. The Journal of Cell Biology , volume =
-
[2]
and Krause, Nicole M
Scheufele, Dietram A. and Krause, Nicole M. , title =. Proceedings of the National Academy of Sciences of the United States of America , volume =
-
[3]
Zylinska, Joanna , title =
-
[4]
Social Media + Society , volume =
Proferes, Nicholas and Jones, Naiyan and Gilbert, Sarah and Fiesler, Casey and Zimmer, Michael , title =. Social Media + Society , volume =
-
[5]
Weidinger, Laura and Uesato, Jonathan and Rauh, Maribeth and Griffin, Conor and Huang, Po-Sen and Mellor, John and Glaese, Amelia and Cheng, Myra and Balle, Borja and Kasirzadeh, Atoosa and others , title =. Proceedings of the 2022. 2022 , pages =. doi:10.1145/3531146.3533088 , url =
-
[6]
Aquino, Estela M. L. and Silveira, Ismael Henrique and Pescarini, Julia Moreira and Aquino, Rosana and Souza-Filho, Jaime Almeida de and Rocha, Aline dos Santos and Ferreira, Andrea and Victor, Aud. Social Distancing Measures to Control the. Ci
-
[7]
Hastings center report , volume=
The principles of the Belmont report revisited: How have respect for persons, beneficence, and justice been applied to clinical medicine? , author=. Hastings center report , volume=. 2000 , publisher=
2000
-
[8]
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence,
Five Years of Argument Mining: a Data-driven Analysis , author =. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence,. 2018 , month = jul, doi =
2018
-
[9]
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) , pages =
Mohammad, Saif and Kiritchenko, Svetlana and Sobhani, Parinaz and Zhu, Xiaodan and Cherry, Colin , title =. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) , pages =
2016
-
[10]
Center for Politics and Communication , volume=
Calculating inter-coder reliability in media content analysis using Krippendorff’s Alpha , author=. Center for Politics and Communication , volume=
-
[11]
American Journal of Public Health , year =
An ethics framework for public health , author =. American Journal of Public Health , year =
-
[12]
Journal of Medical Ethics , year =
Ethics needs principles---four can encompass the rest---and respect for autonomy should be ``first among equals'' , author =. Journal of Medical Ethics , year =
-
[13]
Communication methods and measures , volume=
Answering the call for a standard reliability measure for coding data , author=. Communication methods and measures , volume=. 2007 , publisher=
2007
-
[14]
and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =
Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =. 2021 , doi =
2021
-
[15]
Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text , pages =
Somasundaran, Swapna and Wiebe, Janyce , title =. Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text , pages =
2010
-
[16]
Stance Detection in COVID -19 Tweets
Glandt, Kyle and Khanal, Sarthak and Li, Yingjie and Caragea, Doina and Caragea, Cornelia. Stance Detection in COVID -19 Tweets. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021
2021
-
[17]
and Mortlock, Douglas P
Ormond, Kelly E. and Mortlock, Douglas P. and Scholes, Derek T. and Bombard, Yvonne and Brody, Lawrence C. and Faucett, W. Andrew and Garrison, Nanibaa’A and Hercher, Laura and Isasi, Rosario and Middleton, Anna and others , title =. The American Journal of Human Genetics , volume =
-
[18]
Science , volume =
London, Alex John and Kimmelman, Jonathan , title =. Science , volume =
-
[19]
Computational Linguistics , volume =
Habernal, Ivan and Gurevych, Iryna , title =. Computational Linguistics , volume =
-
[20]
Applied Ethics , pages =
Rawls, John , title =. Applied Ethics , pages =
-
[21]
The Lancet Digital Health , volume =
Burki, Talha , title =. The Lancet Digital Health , volume =
-
[22]
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , title =. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers) , pages =
2019
-
[23]
and Dickert, Stephan and Eom, Kimin and Jiga-Boy, Gabriela M
Flores, Alexandra and Cole, Jennifer C. and Dickert, Stephan and Eom, Kimin and Jiga-Boy, Gabriela M. and Kogut, Tehila and Loria, Riley and Mayorga, Marcus and Pedersen, Eric J. and Pereira, Beatriz and others , title =. Proceedings of the National Academy of Sciences of the United States of America , volume =
-
[24]
and Chu, James and Druckman, James N
Pink, Sophia L. and Chu, James and Druckman, James N. and Rand, David G. and Willer, Robb , title =. Proceedings of the National Academy of Sciences of the United States of America , volume =
-
[25]
Duch, Raymond and Roope, Laurence S. J. and Violato, Mara and Fuentes Becerra, Matias and Robinson, Thomas S. and Bonnefon, Jean-Francois and Friedman, Jorge and Loewen, Peter John and Mamidi, Pavan and Melegaro, Alessia and others , title =. Proceedings of the National Academy of Sciences of the United States of America , volume =
-
[26]
Unintended Pregnancy and Abortion by Income, Region, and the Legal Status of Abortion: Estimates from a Comprehensive Model for 1990--2019 , journal =
Bearak, Jonathan and Popinchalk, Anna and Ganatra, Bela and Moller, Ann-Beth and Tun. Unintended Pregnancy and Abortion by Income, Region, and the Legal Status of Abortion: Estimates from a Comprehensive Model for 1990--2019 , journal =
1990
-
[27]
and Betsch, Cornelia and Leask, Julie , title =
Omer, Saad B. and Betsch, Cornelia and Leask, Julie , title =. Nature , volume =
-
[28]
and Onwuteaka-Philipsen, Bregje D
Emanuel, Ezekiel J. and Onwuteaka-Philipsen, Bregje D. and Urwin, John W. and Cohen, Joachim , title =. Journal of the American Medical Association , volume =. 2016 , doi =
2016
-
[29]
Stance Detection: A Survey , journal =
K\". Stance Detection: A Survey , journal =. 2021 , month = jan, publisher =. doi:10.1145/3369026 , url =
-
[30]
Alta and Church, George and Corn, Jacob E
Baltimore, David and Berg, Paul and Botchan, Michael and Carroll, Dana and Charo, R. Alta and Church, George and Corn, Jacob E. and Daley, George Q. and Doudna, Jennifer A. and Fenner, Marsha and others , title =. Science , volume =
-
[31]
, title =
Persad, Govind and Wertheimer, Alan and Emanuel, Ezekiel J. , title =. The Lancet , volume =
-
[32]
Nature Medicine , volume =
Ienca, Marcello and Vayena, Effy , title =. Nature Medicine , volume =
-
[33]
The American Journal of Bioethics , volume =
Daniels, Norman , title =. The American Journal of Bioethics , volume =
-
[34]
and Wendler, David and Grady, Christine , title =
Emanuel, Ezekiel J. and Wendler, David and Grady, Christine , title =. Journal of the American Medical Association , volume =. 2000 , doi =
2000
-
[35]
AMIA Annual Symposium Proceedings , volume =
Park, Albert and Conway, Mike , title =. AMIA Annual Symposium Proceedings , volume =
-
[36]
Page, Matthew J. and McKenzie, Joanne E. and Bossuyt, Patrick M. and Boutron, Isabelle and Hoffmann, Tammy C. and Mulrow, Cynthia D. and Shamseer, Larissa and Tetzlaff, Jennifer M. and Akl, Elie A. and Brennan, Sue E. and others , title =. BMJ , year =. doi:10.1136/bmj.n71 , url =
-
[37]
Proceedings of the International AAAI Conference on Web and Social Media , volume =
Baumgartner, Jason and Zannettou, Savvas and Keegan, Brian and Squire, Megan and Blackburn, Jeremy , title =. Proceedings of the International AAAI Conference on Web and Social Media , volume =
-
[38]
IEEE Transactions on Computational Social Systems , year=
Knowledge-augmented interpretable network for zero-shot stance detection on social media , author=. IEEE Transactions on Computational Social Systems , year=
-
[39]
Information Fusion , volume=
Logic Augmented Multi-Decision Fusion framework for stance detection on social media , author=. Information Fusion , volume=. 2025 , publisher=
2025
-
[40]
Proceedings of the 32nd ACM international conference on multimedia , pages=
Multimodal multi-turn conversation stance detection: A challenge dataset and effective model , author=. Proceedings of the 32nd ACM international conference on multimedia , pages=
-
[41]
Proceedings of the 2024 Joint International Conference on Computational Linguistics and Language Resources and Evaluation , pages =
Niu, Fuqiang and Yang, Min and Li, Ang and Zhang, Baoquan and Peng, Xiaojiang and Zhang, Bowen , title =. Proceedings of the 2024 Joint International Conference on Computational Linguistics and Language Resources and Evaluation , pages =
2024
-
[42]
Proceedings of the National Academy of Sciences of the United States of America , volume =
Cinelli, Matteo and De Francisci Morales, Gianmarco and Galeazzi, Alessandro and Quattrociocchi, Walter and Starnini, Michele , title =. Proceedings of the National Academy of Sciences of the United States of America , volume =
-
[43]
Available at SSRN 2445102 , year=
The menlo report: Ethical principles guiding information and communication technology research , author=. Available at SSRN 2445102 , year=
-
[44]
, title =
Dawid, Alexander Philip and Skene, Allan M. , title =. Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume =
-
[45]
and Surdeanu, Mihai and Bauer, John and Finkel, Jenny Rose and Bethard, Steven and McClosky, David , title =
Manning, Christopher D. and Surdeanu, Mihai and Bauer, John and Finkel, Jenny Rose and Bethard, Steven and McClosky, David , title =. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations , pages =
-
[46]
Social Media + Society , volume =
Fiesler, Casey and Proferes, Nicholas , title =. Social Media + Society , volume =
-
[47]
Proceedings of the
Narayanan, Arvind and Shmatikov, Vitaly , title =. Proceedings of the. 2008 , organization =
2008
-
[48]
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers) , pages =
Sobhani, Parinaz and Inkpen, Diana and Zhu, Xiaodan , title =. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers) , pages =
-
[49]
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries , journal =
Olteanu, Alexandra and Castillo, Carlos and Diaz, Fernando and K. Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries , journal =
-
[50]
Proceedings of the National Academy of Sciences of the United States of America , volume =
Gilardi, Fabrizio and Alizadeh, Meysam and Kubli, Ma. Proceedings of the National Academy of Sciences of the United States of America , volume =
-
[51]
and Jamison, Amelia M
Broniatowski, David A. and Jamison, Amelia M. and Qi, SiHua and AlKulaib, Lulwah and Chen, Tao and Benton, Adrian and Quinn, Sandra C. and Dredze, Mark , title =. American Journal of Public Health , volume =
-
[52]
Behavior Research Methods , volume =
Peer, Eyal and Rothschild, David and Gordon, Andrew and Evernden, Zak and Damer, Ekaterina , title =. Behavior Research Methods , volume =
-
[53]
Computational Linguistics , volume =
Artstein, Ron and Poesio, Massimo , title =. Computational Linguistics , volume =
-
[54]
Scientific Data , volume=
The FAIR Guiding Principles for scientific data management and stewardship , author=. Scientific Data , volume=. 2016 , publisher=
2016
-
[55]
Proceedings of the 27th International Conference on Computational Linguistics , pages =
Sun, Qingying and Wang, Zhongqing and Zhu, Qiaoming and Zhou, Guodong , title =. Proceedings of the 27th International Conference on Computational Linguistics , pages =
-
[56]
Datasheets for Datasets , journal =
Gebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and Daum. Datasheets for Datasets , journal =
-
[57]
Stance Detection with Bidirectional Conditional Encoding , booktitle =
Augenstein, Isabelle and Rockt. Stance Detection with Bidirectional Conditional Encoding , booktitle =. 2016 , url =
2016
-
[58]
Proceedings of the 47th Hawaii International Conference on System Sciences , pages =
Heimerl, Florian and Lohmann, Steffen and Lange, Simon and Ertl, Thomas , title =. Proceedings of the 47th Hawaii International Conference on System Sciences , pages =. 2014 , organization =
2014
-
[59]
Findings of the Association for Computational Linguistics: NAACL 2022 , pages =
Hardalov, Momchil and Arora, Arnav and Nakov, Preslav and Augenstein, Isabelle , title =. Findings of the Association for Computational Linguistics: NAACL 2022 , pages =
2022
-
[60]
and Burnap, Pete and Sloan, Luke , title =
Williams, Matthew L. and Burnap, Pete and Sloan, Luke , title =. Sociology , volume =
-
[61]
and Argyle, Lisa P
Bail, Christopher A. and Argyle, Lisa P. and Brown, Taylor W. and Bumpus, John P. and Chen, Haohan and Hunzaker, M. B. Fallin and Lee, Jaemin and Mann, Marcus and Merhout, Friedolin and Volfovsky, Alexander , title =. Proceedings of the National Academy of Sciences of the United States of America , volume =
-
[62]
and Tree, Jean E
Walker, Marilyn A. and Tree, Jean E. Fox and Anand, Pranav and Abbott, Rob and King, Joseph , title =. Proceedings of the International Conference on Language Resources and Evaluation , pages =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.