DiscoExplorer: An Open Interface for the Study of Multilingual Discourse Relations
Pith reviewed 2026-05-19 15:57 UTC · model grok-4.3
The pith
DiscoExplorer is an open web interface that makes discourse relation datasets from 16 languages available for local exploration and comparison.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce DiscoExplorer, an open source web interface that runs on local computers and provides access to DISRPT Shared Task datasets on discourse relation classification across 16 languages, together with facilities for querying relations, signaling devices such as connectives, and visualizing the results.
What carries the argument
The DiscoExplorer web interface, which includes a dedicated query language plus search and visualization tools for discourse relations and connectives.
If this is right
- Cross-language comparisons of discourse relations become feasible without specialized data handling skills.
- Studies of how connectives signal specific relations can be run directly across the 16 languages in one interface.
- Local execution removes dependency on remote servers for exploring the full DISRPT collections.
Where Pith is reading between the lines
- Similar interfaces could be built for other standardized linguistic resources to lower barriers in multilingual NLP.
- The query language might be adapted to support automatic pattern discovery or integration with annotation tools.
- Educational settings could use the visualizations to teach discourse analysis across different languages.
Load-bearing premise
Standardizing discourse relation inventories across datasets is enough to support meaningful cross-language comparisons once an easy-to-use interface exists.
What would settle it
If researchers using the tool produce no new cross-lingual findings or report that the standardized inventories miss important language-specific differences, the claim that the interface enables such studies would not hold.
Figures
read the original abstract
The relations connecting propositions in discourse such as cause (A because B) or concession (A although B) are a subject of intense interest in Computational Linguistics and Pragmatics, but challenging to study and compare across languages. Recent progress in standardizing discourse relation inventories across datasets offers the potential to facilitate such studies, but is hindered by the complexity of relevant data and the lack of easily accessible interfaces to analyze it. In this paper we present DiscoExplorer, a new open source web interface, capable of running on local computers, which we use to make datasets from the DISRPT Shared Task on discourse relation classification publicly available, covering 16 different languages. We present the query language, search and visualization facilities for relations and signaling devices such as connectives, as well as some example studies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents DiscoExplorer, a new open-source web interface that runs locally, exposing datasets from the DISRPT Shared Task on discourse relation classification for 16 languages. It includes a query language along with search and visualization facilities for discourse relations and signaling devices such as connectives, plus example studies demonstrating its use for multilingual analysis.
Significance. If implemented as described, the contribution has moderate significance for computational linguistics by lowering barriers to accessing and querying standardized multilingual discourse datasets. The local execution model and open-source release are clear strengths that support accessibility and reproducibility without requiring external servers.
major comments (2)
- [Abstract and §4 (example studies)] The abstract and introduction identify complexity of data and lack of interfaces as main hindrances, yet the manuscript provides no quantitative assessment of query performance, error rates, or coverage across the 16 languages (e.g., no table reporting successful query counts or dataset sizes per language). This weakens the central claim that the tool facilitates studies.
- [Description of the query language] The standardization of inventories is presented as enabling cross-language studies, but no concrete mapping or alignment details are given for how relations are unified across datasets; without this, cross-lingual queries risk inconsistent results.
minor comments (3)
- Add at least one figure or screenshot of the web interface to illustrate the search and visualization components.
- Ensure the query language syntax is documented with a complete grammar or BNF in an appendix or dedicated subsection.
- Clarify the exact mechanism for local installation and any dependencies required to run the tool.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation of minor revision. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract and §4 (example studies)] The abstract and introduction identify complexity of data and lack of interfaces as main hindrances, yet the manuscript provides no quantitative assessment of query performance, error rates, or coverage across the 16 languages (e.g., no table reporting successful query counts or dataset sizes per language). This weakens the central claim that the tool facilitates studies.
Authors: We agree that a quantitative overview of dataset coverage would strengthen support for the claim that the tool facilitates studies. The manuscript prioritizes interface description and example analyses, but we will add a table in the revised version reporting per-language statistics including number of documents, relations, and connectives. Query performance and error rates are not reported because DiscoExplorer is an interactive exploration interface rather than an automated classifier; queries rely on standard database operations that scale efficiently to the dataset sizes involved. We will add a clarifying sentence on this point. revision: yes
-
Referee: [Description of the query language] The standardization of inventories is presented as enabling cross-language studies, but no concrete mapping or alignment details are given for how relations are unified across datasets; without this, cross-lingual queries risk inconsistent results.
Authors: The relation inventories are unified according to the DISRPT Shared Task annotation guidelines, which we cite. To address the concern directly, we will expand the query language section in the revision with a short description of the alignment approach and explicit reference to the mapping tables provided in the DISRPT overview papers, ensuring readers can verify consistency of cross-lingual queries. revision: yes
Circularity Check
No significant circularity
full rationale
The manuscript is a software tool paper whose central contribution is the description and release of DiscoExplorer, an open-source local web interface exposing DISRPT discourse-relation datasets for 16 languages together with a query language and visualizations. No equations, fitted parameters, predictions, or derivations appear anywhere in the text. The standardization claim is presented only as background motivation for providing the interface, not as a result derived from the tool or from self-citation. The work is therefore self-contained against external benchmarks with no load-bearing step that reduces to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
William C. Mann and Sandra A. Thompson , journal =. Rhetorical. 1988 , number =
work page 1988
-
[2]
Logics of Conversation , year =
Nicholas Asher and Alex Lascarides , publisher =. Logics of Conversation , year =
-
[3]
Reflections on the P enn D iscourse T ree B ank, Comparable Corpora, and Complementary Annotation
Prasad, Rashmi and Webber, Bonnie and Joshi, Aravind. Reflections on the P enn D iscourse T ree B ank, Comparable Corpora, and Complementary Annotation. Computational Linguistics. 2014. doi:10.1162/COLI_a_00204
-
[4]
A Dependency Perspective on RST Discourse Parsing and Evaluation
Morey, Mathieu and Muller, Philippe and Asher, Nicholas. A Dependency Perspective on RST Discourse Parsing and Evaluation. Computational Linguistics. 2018. doi:10.1162/COLI_a_00314
-
[5]
When Collaborative Treebank Curation Meets Graph Grammars
Guibon, Ga. When Collaborative Treebank Curation Meets Graph Grammars. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020
work page 2020
-
[6]
Graph Querying for Semantic Annotations
Amblard, Maxime and Guillaume, Bruno and Pavlova, Siyana and Perrier, Guy. Graph Querying for Semantic Annotations. Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022. 2022
work page 2022
-
[7]
DISRPT : A Multilingual, Multi-domain, Cross-framework Benchmark for Discourse Processing
Braud, Chlo \'e and Zeldes, Amir and Rivi \`e re, Laura and Liu, Yang Janet and Muller, Philippe and Sileo, Damien and Aoyama, Tatsuya. DISRPT : A Multilingual, Multi-domain, Cross-framework Benchmark for Discourse Processing. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLIN...
work page 2024
-
[8]
Braud, Chlo \'e and Zeldes, Amir and Li, Chuyuan and Liu, Yang Janet and Muller, Philippe. The DISRPT 2025 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification. Proceedings of the 4th Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2025). 2025. doi:10.18653/v1/2025.disrpt-1.1
-
[9]
Manning, Joakim Nivre, and Daniel Zeman
de Marneffe, Marie-Catherine and Manning, Christopher D. and Nivre, Joakim and Zeman, Daniel. U niversal D ependencies. Computational Linguistics. 2021. doi:10.1162/coli_a_00402
-
[10]
Iria da Cunha and Juan. The. Recent Advances in Natural Language Processing,. 2011 , url =
work page 2011
-
[11]
Iruskieta, Mikel and Aranzabe, Mar. The. Anais do IV Workshop ``A RST e os Estudos do Texto'' , pages =. 2013 , note =
work page 2013
-
[12]
Digital Scholarship in the Humanities , year =
Thomas Krause and Amir Zeldes , title =. Digital Scholarship in the Humanities , year =
-
[13]
Zeldes, Amir , journal=. The. 2017 , publisher=
work page 2017
-
[14]
e RST : A Signaled Graph Theory of Discourse Relations and Organization
Zeldes, Amir and Aoyama, Tatsuya and Liu, Yang Janet and Peng, Siyao and Das, Debopam and Gessler, Luke. e RST : A Signaled Graph Theory of Discourse Relations and Organization. Computational Linguistics. 2025. doi:10.1162/coli_a_00538
-
[15]
A Discourse Signal Annotation System for RST Trees
Gessler, Luke and Liu, Yang and Zeldes, Amir. A Discourse Signal Annotation System for RST Trees. Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019. 2019. doi:10.18653/v1/W19-2708
-
[16]
D e D is C o at the DISRPT 2025 Shared Task: A System for Discourse Relation Classification
Ju, Zhuoxuan and Wu, Jingni and Purushothama, Abhishek and Zeldes, Amir. D e D is C o at the DISRPT 2025 Shared Task: A System for Discourse Relation Classification. Proceedings of the 4th Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2025). 2025. doi:10.18653/v1/2025.disrpt-1.4
-
[17]
GCDT : A C hinese RST Treebank for Multigenre and Multilingual Discourse Parsing
Peng, Siyao and Liu, Yang Janet and Zeldes, Amir. GCDT : A C hinese RST Treebank for Multigenre and Multilingual Discourse Parsing. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2022. doi:10....
-
[18]
Mendes, Am\'. Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21–23, 2022, Proceedings , pages =. 2022 , isbn =. doi:10.1007/978-3-030-98305-5_8 , abstract =
-
[19]
A Cross-Genre Ensemble Approach to Robust R eddit Part of Speech Tagging
Behzad, Shabnam and Zeldes, Amir. A Cross-Genre Ensemble Approach to Robust R eddit Part of Speech Tagging. Proceedings of the 12th Web as Corpus Workshop. 2020
work page 2020
-
[20]
GENTLE : A Genre-Diverse Multilayer Challenge Set for E nglish NLP and Linguistic Evaluation
Aoyama, Tatsuya and Behzad, Shabnam and Gessler, Luke and Levine, Lauren and Lin, Jessica and Liu, Yang Janet and Peng, Siyao and Zhu, Yilun and Zeldes, Amir. GENTLE : A Genre-Diverse Multilayer Challenge Set for E nglish NLP and Linguistic Evaluation. Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII). 2023. doi:10.18653/v1/2023.law-1.17
-
[21]
Liu, Yang Janet and Aoyama, Tatsuya and Scivetti, Wesley and Zhu, Yilun and Behzad, Shabnam and Levine, Lauren Elizabeth and Lin, Jessica and Tiwari, Devika and Zeldes, Amir. GDTB : Genre Diverse Data for E nglish Shallow Discourse Parsing across Modalities, Text Types, and Domains. Proceedings of the 2024 Conference on Empirical Methods in Natural Langua...
-
[22]
Bauer, John and Kiddon, Chlo \'e and Yeh, Eric and Shan, Alex and D. Manning, Christopher. Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs. Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023). 2023
work page 2023
-
[23]
Semgrex-Plus: a Tool for Automatic Dependency-Graph Rewriting
Tamburini, Fabio. Semgrex-Plus: a Tool for Automatic Dependency-Graph Rewriting. Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017). 2017
work page 2017
-
[24]
P otsdam Commentary Corpus 2.0: Annotation for Discourse Research
Stede, Manfred and Neumann, Arne. P otsdam Commentary Corpus 2.0: Annotation for Discourse Research. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14). 2014
work page 2014
-
[25]
Nishida, Noriki and Matsumoto, Yuji. Out-of-Domain Discourse Dependency Parsing via Bootstrapping: An Empirical Analysis on Its Effectiveness and Limitation. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00451
-
[26]
S ci DTB : Discourse Dependency T ree B ank for Scientific Abstracts
Yang, An and Li, Sujian. S ci DTB : Discourse Dependency T ree B ank for Scientific Abstracts. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. doi:10.18653/v1/P18-2071
-
[27]
Current and New Directions in Discourse and Dialogue , publisher =
Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , author =. Current and New Directions in Discourse and Dialogue , publisher =. 2003 , address =
work page 2003
-
[28]
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Asher, Nicholas and Hunter, Julie and Morey, Mathieu and Farah, Benamara and Afantenos, Stergos. Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus. Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16). 2016
work page 2016
-
[29]
Mikel Iruskieta and Mar\' i a Jes\' u s Aranzabe and Arantza Diaz de Ilarraza and Itziar Gonzalez-Dios and Mikel Lersundi and Oier Lopez de Lacalle. The RST B asque TreeBank
-
[30]
Sara Shahmohammadi and Hadi Veisi and Ali Darzi , title =. 2021 , month = jun, doi =. 2106.13833 , archivePrefix =
-
[31]
Afantenos, Stergos and Asher, Nicholas and Benamara, Farah and Bras, Myriam and Fabre, C \'e cile and Ho-dac, Mai and Draoulec, Anne Le and Muller, Philippe and P \'e ry-Woodley, Marie-Paule and Pr \'e vot, Laurent and Rebeyrolles, Josette and Tanguy, Ludovic and Vergez-Couret, Marianne and Vieu, Laure. An empirical resource for discovering cognitive prin...
work page 2012
-
[32]
Multi-Layer Discourse Annotation of a D utch Text Corpus
Redeker, Gisela and Berzl \'a novich, Ildik \'o and van der Vliet, Nynke and Bouma, Gosse and Egg, Markus. Multi-Layer Discourse Annotation of a D utch Text Corpus. Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC '12). 2012
work page 2012
-
[33]
Paula C. F. Cardoso and Erick G. Maziero and Maria Luc. Anais do III Workshop ``A RST e os Estudos do Texto'' , pages =. 2011 , publisher =
work page 2011
-
[34]
Dina Pisarevskaya and Margarita Ananyeva and Maria Kobozeva and Alexander Nasedkin and Sofia Nikiforova and Irina Pavlova and Alexey Shelepov , title =. Proceedings of the 23rd International Conference on Computational Linguistics and Intellectual Technologies ``Dialogue-2017'' , year =
work page 2017
-
[35]
The RST S panish- C hinese Treebank
Cao, Shuyuan and da Cunha, Iria and Iruskieta, Mikel. The RST S panish- C hinese Treebank. Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions ( LAW - MWE - C x G -2018). 2018
work page 2018
-
[36]
Unifying Discourse Resources with Dependency Framework
Yi, Cheng and Sujian, Li and Yueyuan, Li. Unifying Discourse Resources with Dependency Framework. Proceedings of the 20th Chinese National Conference on Computational Linguistics. 2021
work page 2021
-
[37]
TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style , journal =
Deniz Zeyrek and Am. TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style , journal =. 2020 , volume =. doi:10.1007/s10579-019-09445-9 , url =
-
[38]
Tonelli, Sara and Riccardi, Giuseppe and Prasad, Rashmi and Joshi, Aravind and Stepanov, Evgeny A. and Chowdhury, Shammur Absar. LUNA Corpus Discourse Data Set
-
[39]
TDB 1.1: Extensions on T urkish Discourse Bank
Zeyrek, Deniz and Kurfal , Murathan. TDB 1.1: Extensions on T urkish Discourse Bank. Proceedings of the 11th Linguistic Annotation Workshop. 2017. doi:10.18653/v1/W17-0809
-
[40]
Chinese D iscourse T reebank 0.5
Yuping Zhou and Jill Lu and Jennifer Zhang and Nianwen Xue. Chinese D iscourse T reebank 0.5
-
[41]
Developing a R hetorical S tructure T heory Treebank for C zech
Pol \'a kov \'a , Lucie and M \'i rovsk \'y , Ji r \'i and Zik \'a nov \'a , S \'a rka and Haji c ov \'a , Eva. Developing a R hetorical S tructure T heory Treebank for C zech. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024
work page 2024
-
[42]
The Internet and Higher Education , volume =
Andrew Potter , title =. The Internet and Higher Education , volume =
-
[43]
Rhetorical Strategies in the UN Security Council: R hetorical S tructure T heory and Conflicts
Zaczynska, Karolina and Stede, Manfred. Rhetorical Strategies in the UN Security Council: R hetorical S tructure T heory and Conflicts. Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2024. doi:10.18653/v1/2024.sigdial-1.2
-
[44]
Discourse Structure for the M inecraft Corpus
Thompson, Kate and Hunter, Julie and Asher, Nicholas. Discourse Structure for the M inecraft Corpus. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024
work page 2024
-
[45]
Scholman, Merel CJ and Marchal, Marian and Brown, AriaRay and Demberg, Vera , journal=. 2025 , publisher=
work page 2025
-
[46]
Ogrodniczuk, Maciej and Tomaszewska, Aleksandra and Ziembicki, Daniel and \.Z urowski, Sebastian and Tuora, Ryszard and Zwierzchowska, Aleksandra. P olish Discourse Corpus ( PDC ): Corpus Design, ISO -Compliant Annotation, Data Highlights, and Parser Development. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language...
work page 2024
-
[47]
Prasertsom, Ponrawee and Jaroonpol, Apiwat and Rutherford, Attapol T. , title =. Transactions of the Association for Computational Linguistics , volume =. 2024 , month =. doi:10.1162/tacl_a_00650 , url =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.