Overview and Results: CL-SciSumm Shared Task 2019
Pith reviewed 2026-05-24 17:28 UTC · model grok-4.3
The pith
The CL-SciSumm Shared Task establishes the first medium-scale benchmark for scientific document summarization in computational linguistics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The CL-SciSumm Shared Task is the first medium-scale shared task on scientific document summarization in the computational linguistics~(CL) domain. In 2019, it comprised three tasks: (1A) identifying relationships between citing documents and the referred document, (1B) classifying the discourse facets, and (2) generating the abstractive summary. The dataset comprised 40 annotated sets of citing and reference papers of the CL-SciSumm 2018 corpus and 1000 more from the SciSummNet dataset. All papers are from the open access research papers in the CL domain.
What carries the argument
The three-task shared task structure for citation context analysis and summary generation, run on the CL-SciSumm 2018 corpus augmented with SciSummNet papers.
If this is right
- Systems from multiple teams can now be directly compared on the same citation summarization tasks.
- The dataset and evaluation scripts are released publicly for further research.
- ROUGE is applied as an evaluation metric alongside another metric for the summaries.
- The task results provide a baseline for performance on discourse facet classification and abstractive summary generation in the CL domain.
Where Pith is reading between the lines
- Larger or multi-domain datasets could test whether the current benchmark generalizes beyond computational linguistics.
- Better facet classification might directly improve the quality of generated summaries in follow-on work.
- The public scripts enable reproducible comparisons that could accelerate progress on citation-based summarization.
Load-bearing premise
The 40 annotated sets from the 2018 corpus plus 1000 papers from SciSummNet form a representative and sufficient testbed for evaluating systems on citation-based scientific summarization.
What would settle it
A new collection of CL papers annotated in the same way, on which top systems from the shared task perform much worse than on the provided dataset, would show the testbed is not representative.
Figures
read the original abstract
The CL-SciSumm Shared Task is the first medium-scale shared task on scientific document summarization in the computational linguistics~(CL) domain. In 2019, it comprised three tasks: (1A) identifying relationships between citing documents and the referred document, (1B) classifying the discourse facets, and (2) generating the abstractive summary. The dataset comprised 40 annotated sets of citing and reference papers of the CL-SciSumm 2018 corpus and 1000 more from the SciSummNet dataset. All papers are from the open access research papers in the CL domain. This overview describes the participation and the official results of the CL-SciSumm 2019 Shared Task, organized as a part of the 42nd Annual Conference of the Special Interest Group in Information Retrieval (SIGIR), held in Paris, France in July 2019. We compare the participating systems in terms of two evaluation metrics and discuss the use of ROUGE as an evaluation metric. The annotated dataset used for this shared task and the scripts used for evaluation can be accessed and used by the community at: https://github.com/WING-NUS/scisumm-corpus.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript provides an overview of the CL-SciSumm 2019 Shared Task on scientific document summarization in the computational linguistics domain. It defines three tasks—(1A) identifying relationships between citing and reference documents, (1B) classifying discourse facets, and (2) generating abstractive summaries—using a dataset of 40 annotated sets from the 2018 corpus plus 1000 papers from SciSummNet. The paper reports participation details, official results, system comparisons via two evaluation metrics, a discussion of ROUGE, and links to the publicly available annotated dataset and evaluation scripts on GitHub.
Significance. If the reported results and dataset composition hold, the overview is significant as a community resource that documents a shared task on citation-based scientific summarization and supplies reusable data and scripts. The GitHub release of the annotated corpus and evaluation code is a concrete strength that supports reproducibility and follow-on work in the CL domain.
minor comments (2)
- [Abstract] Abstract: the notation 'computational linguistics~(CL)' appears to be a LaTeX artifact; replace the tilde with standard parentheses or rephrase for plain-text readability.
- The discussion of ROUGE would be clearer if it included a brief quantitative comparison (e.g., how ROUGE scores aligned or diverged from the two primary metrics) rather than a general statement.
Simulated Author's Rebuttal
We thank the referee for the positive review and recommendation to accept the manuscript. The provided summary accurately reflects the paper's description of the CL-SciSumm 2019 shared task, its tasks, dataset, results, and public resources.
Circularity Check
No significant circularity
full rationale
The manuscript is a purely descriptive overview of a shared-task organization, participation statistics, and evaluation results. It contains no equations, derivations, fitted parameters, predictions, or technical claims that could reduce to self-definition or self-citation. The single positioning statement (that the task is the 'first medium-scale' one) is a historical/factual assertion whose support lies outside the paper; the dataset description is presented as the resource actually used for the 2019 event, without any claim that it is exhaustive or that the results generalize beyond the reported runs. No load-bearing steps exist that match any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aburaed, A., Bravo, A., Chiruzzo, L., Saggion, H.: Lastus/taln+ inco@ cl-scisumm 2018-using regression and convolutions for cross-document semantic linking and summarization of scholarly literature. In: Proceedings of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2018). A...
work page 2018
-
[2]
AbuRaed, A., Chiruzzo, L., Bravo, A., Saggion, H.: LaSTUS-TALN+INCO @ CL- SciSumm 2019. In: BIRNDL2019 (2019)
work page 2019
-
[3]
De Moraes, L.F., Das, A., Karimi, S., Verma, R.M.: University of houston@ cl- scisumm 2018. In: BIRNDL@ SIGIR. pp. 142–149 (2018)
work page 2018
-
[4]
Fergadis, A., Pappas, D., Papageorgiou, H.: Siamese recurrent bi-directional neu- ral network for scientific summarization @ CL-SciSumm 2019 . In: BIRNDL2019 (2019)
work page 2019
-
[5]
Jaidka, K., Chandrasekaran, M.K., Jain, D., Kan, M.Y.: The cl-scisumm shared task 2017: Results and key insights. In: BIRNDL@ SIGIR (2). vol. 2002, pp. 1–15. CEUR (2017)
work page 2017
-
[6]
Inter- national Journal on Digital Libraries pp
Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.Y.: Insights from cl- scisumm 2016: the faceted scientific document summarization shared task. Inter- national Journal on Digital Libraries pp. 1–9 (2017)
work page 2016
-
[7]
Jaidka, K., Yasunaga, M., Chandrasekaran, M.K., Radev, D., Kan, M.Y.: The cl- scisumm shared task 2018: Results and key insights. In: BIRNDL@ SIGIR (2). vol. 2132, pp. 74–83. CEUR (2018)
work page 2018
-
[8]
Information Processing and Management 43(6), 1449–1481 (2007)
Jones, K.S.: Automatic summarising: The state of the art. Information Processing and Management 43(6), 1449–1481 (2007)
work page 2007
-
[9]
Kim, H., Ou, S.: Ranking-based Identification of Cited Text with Deep Learning . In: BIRNDL2019 (2019)
work page 2019
-
[10]
Li, L., Zhu, Y., Xie, Y., Huang, Z., Liu, W., Li, X., Liu, Y.: CIST@CLSciSumm- 19: Automatic Scientific Paper Summarization with Citances and Facets. In: BIRNDL2019 (2019)
work page 2019
-
[11]
Text summa- rization branches out: Proceedings of the ACL-04 workshop 8 (2004)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. Text summa- rization branches out: Proceedings of the ACL-04 workshop 8 (2004)
work page 2004
-
[12]
Liu, F., Liu, Y.: Correlation between rouge and human evaluation of extractive meeting summaries. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. pp. 201–204. Association for Computational Linguistics (2008)
work page 2008
-
[13]
Ma, S., Zhang, H., Xu, T., Xu, J., Hu, S., Zhang, C.: IR&TM-NJUST @ CLSciSumm-19. In: BIRNDL2019 (2019)
work page 2019
-
[14]
Mayr, P., Chandrasekaran, M.K., Jaidka, K.: Editorial for the 2nd joint work- shop on bibliometric-enhanced information retrieval and natural language pro- cessing for digital libraries (BIRNDL) at SIGIR 2017. In: Proceedings of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Lan- guage Processing for Digital Libraries (B...
work page 2017
-
[15]
Mayr, P., Frommholz, I., Cabanac, G., Wolfram, D.: Editorial for the Joint Work- shop on Bibliometric-enhanced Information Retrieval and Natural Language Pro- cessing for Digital Libraries (BIRNDL) at JCDL 2016. In: Proc. of the Joint Work- shop on Bibliometric-enhanced Information Retrieval and Natural Language Pro- cessing for Digital Libraries (BIRNDL2...
work page 2016
-
[16]
In: Proceedings of the SIGIR’04 workshop on Search and Discovery in Bioinformatics
Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: Citation sentences for semantic analysis of bioscience text. In: Proceedings of the SIGIR’04 workshop on Search and Discovery in Bioinformatics. pp. 81–88 (2004)
work page 2004
-
[17]
Frontiers in Research Metrics and Analytics 3, 31 (2018)
Nomoto, T.: Resolving citation links with neural networks. Frontiers in Research Metrics and Analytics 3, 31 (2018)
work page 2018
-
[18]
Pitarch, Y., Pinel-Sauvagnat, K., Hubert, G., Cabanac, G., elie Fraisier-Vannier, O.: IRIT-IRIS at CL-SciSumm 2019: Matching Citances with their Intended Ref- erence Text Spans from the Scientific Literature. In: BIRNDL2019 (2019)
work page 2019
-
[19]
In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1
Qazvinian, V., Radev, D.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. pp. 689–696. ACL (2008)
work page 2008
-
[20]
Quatra, M.L., Cagliero, L., Baralis, E.: Poli2Sum@CL-SciSumm 2019: identify, classify, and summarize cited text spans by means of ensembles of supervised mod- els . In: BIRNDL2019 (2019)
work page 2019
-
[21]
Syed, B., Indurthi, V., Srinivasan, B.V., Varma, V.: Transfer learning for effective scientific research comprehension. In: BIRNDL2019 (2019)
work page 2019
-
[22]
Wang, P., Li, S., Wang, T., Zhou, H., Tang, J.: Nudt@ clscisumm-18. In: BIRNDL@ SIGIR. pp. 102–113 (2018)
work page 2018
-
[23]
In: Proceedings of AAAI 2019 (2019)
Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A., Li, I., Friedman, D., Radev, D.: ScisummNet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In: Proceedings of AAAI 2019 (2019)
work page 2019
-
[24]
Zerva, C., Nghiem, M.Q., Nguyen, N.T., Ananiadou, S.: UoM@CL-SciSumm 2019. In: BIRNDL2019 (2019)
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.