pith. sign in

arxiv: 2605.15304 · v1 · pith:NFXWO2F6new · submitted 2026-05-14 · 💻 cs.CL

DiscoExplorer: An Open Interface for the Study of Multilingual Discourse Relations

Pith reviewed 2026-05-19 15:57 UTC · model grok-4.3

classification 💻 cs.CL
keywords discourse relationsmultilingualweb interfaceDISRPTconnectivesvisualizationcross-lingual
0
0 comments X

The pith

DiscoExplorer is an open web interface that makes discourse relation datasets from 16 languages available for local exploration and comparison.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DiscoExplorer to overcome the complexity of discourse data and the absence of accessible tools that have hindered comparisons of relations such as cause or concession across languages. It releases datasets from the DISRPT Shared Task covering 16 languages through a locally runnable open-source web interface. The tool supplies a query language, search functions, and visualizations for relations and signaling devices like connectives, along with example studies. If successful, this removes the main practical barriers to cross-lingual discourse research in computational linguistics.

Core claim

The authors introduce DiscoExplorer, an open source web interface that runs on local computers and provides access to DISRPT Shared Task datasets on discourse relation classification across 16 languages, together with facilities for querying relations, signaling devices such as connectives, and visualizing the results.

What carries the argument

The DiscoExplorer web interface, which includes a dedicated query language plus search and visualization tools for discourse relations and connectives.

If this is right

  • Cross-language comparisons of discourse relations become feasible without specialized data handling skills.
  • Studies of how connectives signal specific relations can be run directly across the 16 languages in one interface.
  • Local execution removes dependency on remote servers for exploring the full DISRPT collections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar interfaces could be built for other standardized linguistic resources to lower barriers in multilingual NLP.
  • The query language might be adapted to support automatic pattern discovery or integration with annotation tools.
  • Educational settings could use the visualizations to teach discourse analysis across different languages.

Load-bearing premise

Standardizing discourse relation inventories across datasets is enough to support meaningful cross-language comparisons once an easy-to-use interface exists.

What would settle it

If researchers using the tool produce no new cross-lingual findings or report that the standardized inventories miss important language-specific differences, the claim that the interface enables such studies would not hold.

Figures

Figures reproduced from arXiv: 2605.15304 by Amir Zeldes.

Figure 1
Figure 1. Figure 1: DiscoExplorer search interface: Users can input a query and select filters. Underlines show query matches [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Frequency breakdown of DISRPT labels. well as displaying significance codes. For exam￾ple, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Relation labels in GUM vs. GENTLE. The figure shows that CONJUNCTION is more common in GENTLE (in orange), which is primar￾ily due to genres containing many lists, such as medical notes and syllabuses. The ELABORATION label, but contrast, is very similar in prevalence. As with frequencies, numerical variables receive side-by-side boxplots [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Association of explicitness vs. label in PDTB. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Signals per MODE relation compared. documents (see [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Original eRST graph fragment for a CONCESSION relation, visualized using rstWeb (Gessler et al., 2019) and the corresponding output in DiscoExplorer [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Frequencies of TEMPORAL ‘when’ clauses in left-to-right vs. right-to-left directions only items appearing 20 or more times. Although the DISRPT relation labels do not include subtypes, we can get breakdowns for relation subtypes if the original labels of the underlying dataset include them, which is the case here. The biggest disparity is the preference of EXPLANATION-EVIDENCE re￾lations to be marked by ‘f… view at source ↗
Figure 8
Figure 8. Figure 8: Contingency tables for signal types and subtypes in [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
read the original abstract

The relations connecting propositions in discourse such as cause (A because B) or concession (A although B) are a subject of intense interest in Computational Linguistics and Pragmatics, but challenging to study and compare across languages. Recent progress in standardizing discourse relation inventories across datasets offers the potential to facilitate such studies, but is hindered by the complexity of relevant data and the lack of easily accessible interfaces to analyze it. In this paper we present DiscoExplorer, a new open source web interface, capable of running on local computers, which we use to make datasets from the DISRPT Shared Task on discourse relation classification publicly available, covering 16 different languages. We present the query language, search and visualization facilities for relations and signaling devices such as connectives, as well as some example studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript presents DiscoExplorer, a new open-source web interface that runs locally, exposing datasets from the DISRPT Shared Task on discourse relation classification for 16 languages. It includes a query language along with search and visualization facilities for discourse relations and signaling devices such as connectives, plus example studies demonstrating its use for multilingual analysis.

Significance. If implemented as described, the contribution has moderate significance for computational linguistics by lowering barriers to accessing and querying standardized multilingual discourse datasets. The local execution model and open-source release are clear strengths that support accessibility and reproducibility without requiring external servers.

major comments (2)
  1. [Abstract and §4 (example studies)] The abstract and introduction identify complexity of data and lack of interfaces as main hindrances, yet the manuscript provides no quantitative assessment of query performance, error rates, or coverage across the 16 languages (e.g., no table reporting successful query counts or dataset sizes per language). This weakens the central claim that the tool facilitates studies.
  2. [Description of the query language] The standardization of inventories is presented as enabling cross-language studies, but no concrete mapping or alignment details are given for how relations are unified across datasets; without this, cross-lingual queries risk inconsistent results.
minor comments (3)
  1. Add at least one figure or screenshot of the web interface to illustrate the search and visualization components.
  2. Ensure the query language syntax is documented with a complete grammar or BNF in an appendix or dedicated subsection.
  3. Clarify the exact mechanism for local installation and any dependencies required to run the tool.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract and §4 (example studies)] The abstract and introduction identify complexity of data and lack of interfaces as main hindrances, yet the manuscript provides no quantitative assessment of query performance, error rates, or coverage across the 16 languages (e.g., no table reporting successful query counts or dataset sizes per language). This weakens the central claim that the tool facilitates studies.

    Authors: We agree that a quantitative overview of dataset coverage would strengthen support for the claim that the tool facilitates studies. The manuscript prioritizes interface description and example analyses, but we will add a table in the revised version reporting per-language statistics including number of documents, relations, and connectives. Query performance and error rates are not reported because DiscoExplorer is an interactive exploration interface rather than an automated classifier; queries rely on standard database operations that scale efficiently to the dataset sizes involved. We will add a clarifying sentence on this point. revision: yes

  2. Referee: [Description of the query language] The standardization of inventories is presented as enabling cross-language studies, but no concrete mapping or alignment details are given for how relations are unified across datasets; without this, cross-lingual queries risk inconsistent results.

    Authors: The relation inventories are unified according to the DISRPT Shared Task annotation guidelines, which we cite. To address the concern directly, we will expand the query language section in the revision with a short description of the alignment approach and explicit reference to the mapping tables provided in the DISRPT overview papers, ensuring readers can verify consistency of cross-lingual queries. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a software tool paper whose central contribution is the description and release of DiscoExplorer, an open-source local web interface exposing DISRPT discourse-relation datasets for 16 languages together with a query language and visualizations. No equations, fitted parameters, predictions, or derivations appear anywhere in the text. The standardization claim is presented only as background motivation for providing the interface, not as a result derived from the tool or from self-citation. The work is therefore self-contained against external benchmarks with no load-bearing step that reduces to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a tool-presentation paper with no mathematical model, derivations, or empirical fitting; therefore no free parameters, axioms, or invented entities are involved in the central claim.

pith-pipeline@v0.9.0 · 5651 in / 1039 out tokens · 38644 ms · 2026-05-19T15:57:00.022156+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    Mann and Sandra A

    William C. Mann and Sandra A. Thompson , journal =. Rhetorical. 1988 , number =

  2. [2]

    Logics of Conversation , year =

    Nicholas Asher and Alex Lascarides , publisher =. Logics of Conversation , year =

  3. [3]

    Reflections on the P enn D iscourse T ree B ank, Comparable Corpora, and Complementary Annotation

    Prasad, Rashmi and Webber, Bonnie and Joshi, Aravind. Reflections on the P enn D iscourse T ree B ank, Comparable Corpora, and Complementary Annotation. Computational Linguistics. 2014. doi:10.1162/COLI_a_00204

  4. [4]

    A Dependency Perspective on RST Discourse Parsing and Evaluation

    Morey, Mathieu and Muller, Philippe and Asher, Nicholas. A Dependency Perspective on RST Discourse Parsing and Evaluation. Computational Linguistics. 2018. doi:10.1162/COLI_a_00314

  5. [5]

    When Collaborative Treebank Curation Meets Graph Grammars

    Guibon, Ga. When Collaborative Treebank Curation Meets Graph Grammars. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

  6. [6]

    Graph Querying for Semantic Annotations

    Amblard, Maxime and Guillaume, Bruno and Pavlova, Siyana and Perrier, Guy. Graph Querying for Semantic Annotations. Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022. 2022

  7. [7]

    DISRPT : A Multilingual, Multi-domain, Cross-framework Benchmark for Discourse Processing

    Braud, Chlo \'e and Zeldes, Amir and Rivi \`e re, Laura and Liu, Yang Janet and Muller, Philippe and Sileo, Damien and Aoyama, Tatsuya. DISRPT : A Multilingual, Multi-domain, Cross-framework Benchmark for Discourse Processing. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLIN...

  8. [8]

    The DISRPT 2025 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification

    Braud, Chlo \'e and Zeldes, Amir and Li, Chuyuan and Liu, Yang Janet and Muller, Philippe. The DISRPT 2025 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification. Proceedings of the 4th Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2025). 2025. doi:10.18653/v1/2025.disrpt-1.1

  9. [9]

    Manning, Joakim Nivre, and Daniel Zeman

    de Marneffe, Marie-Catherine and Manning, Christopher D. and Nivre, Joakim and Zeman, Daniel. U niversal D ependencies. Computational Linguistics. 2021. doi:10.1162/coli_a_00402

  10. [10]

    Iria da Cunha and Juan. The. Recent Advances in Natural Language Processing,. 2011 , url =

  11. [11]

    Iruskieta, Mikel and Aranzabe, Mar. The. Anais do IV Workshop ``A RST e os Estudos do Texto'' , pages =. 2013 , note =

  12. [12]

    Digital Scholarship in the Humanities , year =

    Thomas Krause and Amir Zeldes , title =. Digital Scholarship in the Humanities , year =

  13. [13]

    Zeldes, Amir , journal=. The. 2017 , publisher=

  14. [14]

    e RST : A Signaled Graph Theory of Discourse Relations and Organization

    Zeldes, Amir and Aoyama, Tatsuya and Liu, Yang Janet and Peng, Siyao and Das, Debopam and Gessler, Luke. e RST : A Signaled Graph Theory of Discourse Relations and Organization. Computational Linguistics. 2025. doi:10.1162/coli_a_00538

  15. [15]

    A Discourse Signal Annotation System for RST Trees

    Gessler, Luke and Liu, Yang and Zeldes, Amir. A Discourse Signal Annotation System for RST Trees. Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019. 2019. doi:10.18653/v1/W19-2708

  16. [16]

    D e D is C o at the DISRPT 2025 Shared Task: A System for Discourse Relation Classification

    Ju, Zhuoxuan and Wu, Jingni and Purushothama, Abhishek and Zeldes, Amir. D e D is C o at the DISRPT 2025 Shared Task: A System for Discourse Relation Classification. Proceedings of the 4th Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2025). 2025. doi:10.18653/v1/2025.disrpt-1.4

  17. [17]

    GCDT : A C hinese RST Treebank for Multigenre and Multilingual Discourse Parsing

    Peng, Siyao and Liu, Yang Janet and Zeldes, Amir. GCDT : A C hinese RST Treebank for Multigenre and Multilingual Discourse Parsing. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2022. doi:10....

  18. [18]

    Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21–23, 2022, Proceedings , pages =

    Mendes, Am\'. Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21–23, 2022, Proceedings , pages =. 2022 , isbn =. doi:10.1007/978-3-030-98305-5_8 , abstract =

  19. [19]

    A Cross-Genre Ensemble Approach to Robust R eddit Part of Speech Tagging

    Behzad, Shabnam and Zeldes, Amir. A Cross-Genre Ensemble Approach to Robust R eddit Part of Speech Tagging. Proceedings of the 12th Web as Corpus Workshop. 2020

  20. [20]

    GENTLE : A Genre-Diverse Multilayer Challenge Set for E nglish NLP and Linguistic Evaluation

    Aoyama, Tatsuya and Behzad, Shabnam and Gessler, Luke and Levine, Lauren and Lin, Jessica and Liu, Yang Janet and Peng, Siyao and Zhu, Yilun and Zeldes, Amir. GENTLE : A Genre-Diverse Multilayer Challenge Set for E nglish NLP and Linguistic Evaluation. Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII). 2023. doi:10.18653/v1/2023.law-1.17

  21. [21]

    GDTB : Genre Diverse Data for E nglish Shallow Discourse Parsing across Modalities, Text Types, and Domains

    Liu, Yang Janet and Aoyama, Tatsuya and Scivetti, Wesley and Zhu, Yilun and Behzad, Shabnam and Levine, Lauren Elizabeth and Lin, Jessica and Tiwari, Devika and Zeldes, Amir. GDTB : Genre Diverse Data for E nglish Shallow Discourse Parsing across Modalities, Text Types, and Domains. Proceedings of the 2024 Conference on Empirical Methods in Natural Langua...

  22. [22]

    Manning, Christopher

    Bauer, John and Kiddon, Chlo \'e and Yeh, Eric and Shan, Alex and D. Manning, Christopher. Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs. Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023). 2023

  23. [23]

    Semgrex-Plus: a Tool for Automatic Dependency-Graph Rewriting

    Tamburini, Fabio. Semgrex-Plus: a Tool for Automatic Dependency-Graph Rewriting. Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017). 2017

  24. [24]

    P otsdam Commentary Corpus 2.0: Annotation for Discourse Research

    Stede, Manfred and Neumann, Arne. P otsdam Commentary Corpus 2.0: Annotation for Discourse Research. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14). 2014

  25. [25]

    Out-of-Domain Discourse Dependency Parsing via Bootstrapping: An Empirical Analysis on Its Effectiveness and Limitation

    Nishida, Noriki and Matsumoto, Yuji. Out-of-Domain Discourse Dependency Parsing via Bootstrapping: An Empirical Analysis on Its Effectiveness and Limitation. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00451

  26. [26]

    S ci DTB : Discourse Dependency T ree B ank for Scientific Abstracts

    Yang, An and Li, Sujian. S ci DTB : Discourse Dependency T ree B ank for Scientific Abstracts. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. doi:10.18653/v1/P18-2071

  27. [27]

    Current and New Directions in Discourse and Dialogue , publisher =

    Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , author =. Current and New Directions in Discourse and Dialogue , publisher =. 2003 , address =

  28. [28]

    Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus

    Asher, Nicholas and Hunter, Julie and Morey, Mathieu and Farah, Benamara and Afantenos, Stergos. Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus. Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16). 2016

  29. [29]

    The RST B asque TreeBank

    Mikel Iruskieta and Mar\' i a Jes\' u s Aranzabe and Arantza Diaz de Ilarraza and Itziar Gonzalez-Dios and Mikel Lersundi and Oier Lopez de Lacalle. The RST B asque TreeBank

  30. [30]

    2021 , month = jun, doi =

    Sara Shahmohammadi and Hadi Veisi and Ali Darzi , title =. 2021 , month = jun, doi =. 2106.13833 , archivePrefix =

  31. [31]

    An empirical resource for discovering cognitive principles of discourse organisation: T he ANNODIS corpus

    Afantenos, Stergos and Asher, Nicholas and Benamara, Farah and Bras, Myriam and Fabre, C \'e cile and Ho-dac, Mai and Draoulec, Anne Le and Muller, Philippe and P \'e ry-Woodley, Marie-Paule and Pr \'e vot, Laurent and Rebeyrolles, Josette and Tanguy, Ludovic and Vergez-Couret, Marianne and Vieu, Laure. An empirical resource for discovering cognitive prin...

  32. [32]

    Multi-Layer Discourse Annotation of a D utch Text Corpus

    Redeker, Gisela and Berzl \'a novich, Ildik \'o and van der Vliet, Nynke and Bouma, Gosse and Egg, Markus. Multi-Layer Discourse Annotation of a D utch Text Corpus. Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC '12). 2012

  33. [33]

    Paula C. F. Cardoso and Erick G. Maziero and Maria Luc. Anais do III Workshop ``A RST e os Estudos do Texto'' , pages =. 2011 , publisher =

  34. [34]

    Proceedings of the 23rd International Conference on Computational Linguistics and Intellectual Technologies ``Dialogue-2017'' , year =

    Dina Pisarevskaya and Margarita Ananyeva and Maria Kobozeva and Alexander Nasedkin and Sofia Nikiforova and Irina Pavlova and Alexey Shelepov , title =. Proceedings of the 23rd International Conference on Computational Linguistics and Intellectual Technologies ``Dialogue-2017'' , year =

  35. [35]

    The RST S panish- C hinese Treebank

    Cao, Shuyuan and da Cunha, Iria and Iruskieta, Mikel. The RST S panish- C hinese Treebank. Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions ( LAW - MWE - C x G -2018). 2018

  36. [36]

    Unifying Discourse Resources with Dependency Framework

    Yi, Cheng and Sujian, Li and Yueyuan, Li. Unifying Discourse Resources with Dependency Framework. Proceedings of the 20th Chinese National Conference on Computational Linguistics. 2021

  37. [37]

    TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style , journal =

    Deniz Zeyrek and Am. TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style , journal =. 2020 , volume =. doi:10.1007/s10579-019-09445-9 , url =

  38. [38]

    and Chowdhury, Shammur Absar

    Tonelli, Sara and Riccardi, Giuseppe and Prasad, Rashmi and Joshi, Aravind and Stepanov, Evgeny A. and Chowdhury, Shammur Absar. LUNA Corpus Discourse Data Set

  39. [39]

    TDB 1.1: Extensions on T urkish Discourse Bank

    Zeyrek, Deniz and Kurfal , Murathan. TDB 1.1: Extensions on T urkish Discourse Bank. Proceedings of the 11th Linguistic Annotation Workshop. 2017. doi:10.18653/v1/W17-0809

  40. [40]

    Chinese D iscourse T reebank 0.5

    Yuping Zhou and Jill Lu and Jennifer Zhang and Nianwen Xue. Chinese D iscourse T reebank 0.5

  41. [41]

    Developing a R hetorical S tructure T heory Treebank for C zech

    Pol \'a kov \'a , Lucie and M \'i rovsk \'y , Ji r \'i and Zik \'a nov \'a , S \'a rka and Haji c ov \'a , Eva. Developing a R hetorical S tructure T heory Treebank for C zech. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

  42. [42]

    The Internet and Higher Education , volume =

    Andrew Potter , title =. The Internet and Higher Education , volume =

  43. [43]

    Rhetorical Strategies in the UN Security Council: R hetorical S tructure T heory and Conflicts

    Zaczynska, Karolina and Stede, Manfred. Rhetorical Strategies in the UN Security Council: R hetorical S tructure T heory and Conflicts. Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2024. doi:10.18653/v1/2024.sigdial-1.2

  44. [44]

    Discourse Structure for the M inecraft Corpus

    Thompson, Kate and Hunter, Julie and Asher, Nicholas. Discourse Structure for the M inecraft Corpus. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

  45. [45]

    2025 , publisher=

    Scholman, Merel CJ and Marchal, Marian and Brown, AriaRay and Demberg, Vera , journal=. 2025 , publisher=

  46. [46]

    P olish Discourse Corpus ( PDC ): Corpus Design, ISO -Compliant Annotation, Data Highlights, and Parser Development

    Ogrodniczuk, Maciej and Tomaszewska, Aleksandra and Ziembicki, Daniel and \.Z urowski, Sebastian and Tuora, Ryszard and Zwierzchowska, Aleksandra. P olish Discourse Corpus ( PDC ): Corpus Design, ISO -Compliant Annotation, Data Highlights, and Parser Development. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language...

  47. [47]

    , title =

    Prasertsom, Ponrawee and Jaroonpol, Apiwat and Rutherford, Attapol T. , title =. Transactions of the Association for Computational Linguistics , volume =. 2024 , month =. doi:10.1162/tacl_a_00650 , url =