pith. sign in

arxiv: 1907.03975 · v1 · pith:VK7F2RIWnew · submitted 2019-07-09 · 💻 cs.CL

Implicit Discourse Relation Identification for Open-domain Dialogues

Pith reviewed 2026-05-25 00:52 UTC · model grok-4.3

classification 💻 cs.CL
keywords implicit discourse relationsopen-domain dialoguesdiscourse parsingdialogue systemsautomatic extractionfeature ablationcorpus creation
0
0 comments X

The pith

A pipeline automatically extracts implicit discourse relation pairs from open-domain dialogues and uses dialogue features to improve identification models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a new approach to identifying implicit discourse relations specifically for open-domain dialogues rather than formal written text. It does this by designing a pipeline that automatically extracts relation argument pairs and labels from sequences of dialogic turns to build a dedicated corpus. The work then incorporates unique dialogue features into existing models and uses ablation to demonstrate gains. A sympathetic reader would care because current systems struggle with the informal and topic-shifting nature of real conversations, and this could make dialogue agents more coherent.

Core claim

We designed a novel discourse relation identification pipeline specifically tuned for open-domain dialogue systems. We firstly propose a method to automatically extract the implicit discourse relation argument pairs and labels from a dataset of dialogic turns, resulting in a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse. Moreover, we have taken the first steps to leverage the dialogue features unique to our task to further improve the identification of such relations by performing feature ablation and incorporating dialogue features to enhance the state-of-the-art model.

What carries the argument

The novel discourse relation identification pipeline tuned for open-domain dialogue systems, which includes automatic extraction of argument pairs from dialogic turns.

If this is right

  • The resulting corpus enables training models on dialogic rather than formal text data.
  • Incorporating dialogue features improves performance on identifying relations in conversations.
  • Feature ablation reveals which dialogue aspects contribute most to the task.
  • This addresses the unsuitability of news and journal corpora for dialogue nuances and topics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future dialogue systems could use this method to maintain better discourse coherence across turns.
  • The extraction technique might apply to other conversational datasets to create more resources.
  • Improved relation identification could help in tasks like response generation that require understanding connections between utterances.

Load-bearing premise

The automatic extraction method from dialogic turns produces sufficiently accurate implicit discourse relation labels and pairs without substantial noise or mislabeling that would invalidate downstream model improvements.

What would settle it

A manual review of a sample of extracted pairs showing low agreement with the assigned labels would indicate the method does not produce reliable training data.

read the original abstract

Discourse relation identification has been an active area of research for many years, and the challenge of identifying implicit relations remains largely an unsolved task, especially in the context of an open-domain dialogue system. Previous work primarily relies on a corpora of formal text which is inherently non-dialogic, i.e., news and journals. This data however is not suitable to handle the nuances of informal dialogue nor is it capable of navigating the plethora of valid topics present in open-domain dialogue. In this paper, we designed a novel discourse relation identification pipeline specifically tuned for open-domain dialogue systems. We firstly propose a method to automatically extract the implicit discourse relation argument pairs and labels from a dataset of dialogic turns, resulting in a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse. Moreover, we have taken the first steps to leverage the dialogue features unique to our task to further improve the identification of such relations by performing feature ablation and incorporating dialogue features to enhance the state-of-the-art model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims to introduce a novel pipeline for implicit discourse relation identification tailored to open-domain dialogues. It describes an automatic method to extract implicit relation argument pairs and labels from dialogic turns, yielding the first such corpus, followed by feature ablation and incorporation of dialogue-specific features to improve a state-of-the-art model.

Significance. If the extraction method yields a sufficiently clean corpus and the reported improvements hold under validation, the work would address a clear gap by moving beyond formal-text corpora (e.g., PDTB-style news) to informal dialogue, potentially benefiting dialogue systems. The emphasis on dialogue-unique features is a constructive direction. At present, however, the absence of any quantitative results, label validation, or error analysis prevents assessment of whether the claimed gains are substantive.

major comments (2)
  1. [Abstract] Abstract: the description of the pipeline and feature addition supplies no quantitative results, validation of extracted labels, or error analysis; the central claim of improvement therefore cannot be assessed from available text.
  2. [Extraction method (as described in Abstract)] Extraction pipeline: the automatic extraction method from dialogic turns produces the corpus on which all subsequent ablation and SOTA-enhancement claims rest, yet no precision/recall figures, inter-annotator agreement, or sample analysis against PDTB-style gold labels are reported, leaving open the possibility that observed gains are artifacts of label noise rather than genuine discourse signal.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify that quantitative results and validation details are needed to support the claims. We address each point below and will make the corresponding revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the description of the pipeline and feature addition supplies no quantitative results, validation of extracted labels, or error analysis; the central claim of improvement therefore cannot be assessed from available text.

    Authors: We agree that the abstract provides no quantitative results. The full manuscript reports experimental results on the extracted corpus, feature ablations, and improvements to the state-of-the-art model. We will revise the abstract to summarize key metrics including corpus size, baseline performance, and the gains from dialogue features. revision: yes

  2. Referee: [Extraction method (as described in Abstract)] Extraction pipeline: the automatic extraction method from dialogic turns produces the corpus on which all subsequent ablation and SOTA-enhancement claims rest, yet no precision/recall figures, inter-annotator agreement, or sample analysis against PDTB-style gold labels are reported, leaving open the possibility that observed gains are artifacts of label noise rather than genuine discourse signal.

    Authors: We acknowledge that the current manuscript does not report precision/recall, IAA, or error analysis for the extraction pipeline. Because no PDTB-style gold standard exists for open-domain dialogues, direct comparison is not possible. We will add a validation subsection with manual sampling, inter-annotator agreement on extracted pairs, and qualitative error analysis to demonstrate corpus quality and rule out label noise as the source of gains. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external data and ablation testing

full rationale

The paper constructs a new corpus by automatic extraction from an external dialogue dataset and evaluates improvements via feature ablation on an enhanced SOTA model. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the provided text. The central claims rest on empirical ablation results against external benchmarks rather than reducing to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities; extraction method and model details are not described.

pith-pipeline@v0.9.0 · 5719 in / 1018 out tokens · 18429 ms · 2026-05-25T00:52:26.937841+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 4 internal anchors

  1. [1]

    arXiv preprint arXiv:1709.05411

    Combining search with structured data to create a more engag- ing user experience in open domain dialogue. arXiv preprint arXiv:1709.05411. Kevin K Bowden, Jiaqi Wu, Wen Cui, Juraj Juraska, Vrindavan Harrison, Brian Schwarzmann, Nick San- ter, and Marilyn Walker. Slugbot: Developing a computational model and framework of a novel dia- logue genre. Kevin K ...

  2. [2]

    In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: T echnical Papers , pages 1694–1705

    Combining natu- ral and artificial examples to improve implicit dis- course relation identification. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: T echnical Papers , pages 1694–1705. Zeyu Dai and Ruihong Huang

  3. [3]

    Improving im- plicit discourse relation classification by modeling inter-dependencies of discourse units in a paragraph. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language T echnologies, V olume 1 (Long Papers), volume 1, pages 141–151. Joachim Fainberg, Ben Krause, Mihai D...

  4. [4]

    Talking to myself: self-dialogues as data for conversational agents

    Talk- ing to myself: self-dialogues as data for conversa- tional agents. arXiv preprint arXiv:1809.06641 . Eric N Forsyth and Craig H Martell

  5. [5]

    In Inter- national Conference on Semantic Computing (ICSC 2007), pages 19–26

    Lexical and discourse analysis of online chat dialog. In Inter- national Conference on Semantic Computing (ICSC 2007), pages 19–26. IEEE. Fengyu Guo, Ruifang He, Di Jin, Jianwu Dang, Long- biao Wang, and Xiangang Li

  6. [6]

    Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize

    Advancing the state of the art in open domain dia- log systems through the alexa prize. arXiv preprint arXiv:1812.10757. Ben Krause, Marco Damonte, Mihai Dobre, Daniel Duma, Joachim Fainberg, Federico Fancellu, Em- manuel Kahembwe, Jianpeng Cheng, and Bon- nie Webber

  7. [7]

    Edina: Building an Open Domain Socialbot with Self-dialogues

    Edina: Building an open do- main socialbot with self-dialogues. arXiv preprint arXiv:1709.09816. Junyi Jessy Li and Ani Nenkova

  8. [8]

    In Proceedings of the 2009 Con- ference on Empirical Methods in Natural Language Processing: V olume 1-V olume 1 , pages 343–351

    Recognizing implicit discourse relations in the penn discourse treebank. In Proceedings of the 2009 Con- ference on Empirical Methods in Natural Language Processing: V olume 1-V olume 1 , pages 343–351. Association for Computational Linguistics. Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky

  9. [9]

    In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2263–2270

    A stacking gated neural architecture for implicit dis- course relation classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2263–2270. Ashwin Ram, Rohit Prasad, Chandra Khatri, Anu V enkatesh, Raefer Gabriel, Qing Liu, Jeff Nunn, Behnam Hedayatnia, Ming Cheng, Ashish Nagar, et al

  10. [10]

    Conversational AI: The Science Behind the Alexa Prize

    Conversational ai: The science behind the alexa prize. arXiv preprint arXiv:1801.03604 . Attapol Rutherford and Nianwen Xue

  11. [11]

    In Pro- ceedings of the 2015 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Human Language T echnologies , pages 799–808

    Improv- ing the inference of implicit discourse relations via classifying explicit discourse connectives. In Pro- ceedings of the 2015 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Human Language T echnologies , pages 799–808. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andre...

  12. [12]

    In Proceedings of the 2013 conference on empirical methods in natural language processing , pages 1631–1642

    Recursive deep models for semantic compositionality over a sentiment tree- bank. In Proceedings of the 2013 conference on empirical methods in natural language processing , pages 1631–1642. Caroline Sporleder and Alex Lascarides

  13. [13]

    In INLG’2000 Proceedings of the First International Conference on Natural Language Generation

    Rhetorical structure in dialog. In INLG’2000 Proceedings of the First International Conference on Natural Language Generation . Sara Tonelli, Giuseppe Riccardi, Rashmi Prasad, and Aravind K Joshi

  14. [14]

    Proceedings of COLING 2012, pages 2757–2772

    Implicit discourse relation recognition by selecting typical training examples. Proceedings of COLING 2012, pages 2757–2772. Ben Wellner, James Pustejovsky, Catherine Havasi, Anna Rumshisky, and Roser Sauri

  15. [15]

    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 725–731

    Using active learning to expand training data for implicit dis- course relation recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 725–731