pith. sign in

arxiv: 1907.09854 · v1 · pith:ZWH5WCWPnew · submitted 2019-07-23 · 💻 cs.CL · cs.DL· cs.IR

Overview and Results: CL-SciSumm Shared Task 2019

Pith reviewed 2026-05-24 17:28 UTC · model grok-4.3

classification 💻 cs.CL cs.DLcs.IR
keywords scientific summarizationshared taskcomputational linguisticscitation contextabstractive summarydiscourse facetsROUGE
0
0 comments X

The pith

The CL-SciSumm Shared Task establishes the first medium-scale benchmark for scientific document summarization in computational linguistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports on the CL-SciSumm 2019 Shared Task, which ran three subtasks on citation-based summarization of scientific papers in computational linguistics. The tasks cover identifying relationships between citing and cited papers, classifying discourse facets in citations, and generating abstractive summaries. It draws on a dataset of 40 annotated sets plus 1000 papers from SciSummNet, all open-access CL papers. The overview details participation, results using two metrics, and the public release of the dataset and evaluation scripts for community use.

Core claim

The CL-SciSumm Shared Task is the first medium-scale shared task on scientific document summarization in the computational linguistics~(CL) domain. In 2019, it comprised three tasks: (1A) identifying relationships between citing documents and the referred document, (1B) classifying the discourse facets, and (2) generating the abstractive summary. The dataset comprised 40 annotated sets of citing and reference papers of the CL-SciSumm 2018 corpus and 1000 more from the SciSummNet dataset. All papers are from the open access research papers in the CL domain.

What carries the argument

The three-task shared task structure for citation context analysis and summary generation, run on the CL-SciSumm 2018 corpus augmented with SciSummNet papers.

If this is right

  • Systems from multiple teams can now be directly compared on the same citation summarization tasks.
  • The dataset and evaluation scripts are released publicly for further research.
  • ROUGE is applied as an evaluation metric alongside another metric for the summaries.
  • The task results provide a baseline for performance on discourse facet classification and abstractive summary generation in the CL domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Larger or multi-domain datasets could test whether the current benchmark generalizes beyond computational linguistics.
  • Better facet classification might directly improve the quality of generated summaries in follow-on work.
  • The public scripts enable reproducible comparisons that could accelerate progress on citation-based summarization.

Load-bearing premise

The 40 annotated sets from the 2018 corpus plus 1000 papers from SciSummNet form a representative and sufficient testbed for evaluating systems on citation-based scientific summarization.

What would settle it

A new collection of CL papers annotated in the same way, on which top systems from the shared task perform much worse than on the provided dataset, would show the testbed is not representative.

Figures

Figures reproduced from arXiv: 1907.09854 by Dayne Freitag, Dragomir Radev, Michihiro Yasunaga, Min-Yen Kan, Muthu Kumar Chandrasekaran.

Figure 1
Figure 1. Figure 1: Performances on (a) Task 1A in terms of sentence overlap and ROUGE [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
read the original abstract

The CL-SciSumm Shared Task is the first medium-scale shared task on scientific document summarization in the computational linguistics~(CL) domain. In 2019, it comprised three tasks: (1A) identifying relationships between citing documents and the referred document, (1B) classifying the discourse facets, and (2) generating the abstractive summary. The dataset comprised 40 annotated sets of citing and reference papers of the CL-SciSumm 2018 corpus and 1000 more from the SciSummNet dataset. All papers are from the open access research papers in the CL domain. This overview describes the participation and the official results of the CL-SciSumm 2019 Shared Task, organized as a part of the 42nd Annual Conference of the Special Interest Group in Information Retrieval (SIGIR), held in Paris, France in July 2019. We compare the participating systems in terms of two evaluation metrics and discuss the use of ROUGE as an evaluation metric. The annotated dataset used for this shared task and the scripts used for evaluation can be accessed and used by the community at: https://github.com/WING-NUS/scisumm-corpus.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript provides an overview of the CL-SciSumm 2019 Shared Task on scientific document summarization in the computational linguistics domain. It defines three tasks—(1A) identifying relationships between citing and reference documents, (1B) classifying discourse facets, and (2) generating abstractive summaries—using a dataset of 40 annotated sets from the 2018 corpus plus 1000 papers from SciSummNet. The paper reports participation details, official results, system comparisons via two evaluation metrics, a discussion of ROUGE, and links to the publicly available annotated dataset and evaluation scripts on GitHub.

Significance. If the reported results and dataset composition hold, the overview is significant as a community resource that documents a shared task on citation-based scientific summarization and supplies reusable data and scripts. The GitHub release of the annotated corpus and evaluation code is a concrete strength that supports reproducibility and follow-on work in the CL domain.

minor comments (2)
  1. [Abstract] Abstract: the notation 'computational linguistics~(CL)' appears to be a LaTeX artifact; replace the tilde with standard parentheses or rephrase for plain-text readability.
  2. The discussion of ROUGE would be clearer if it included a brief quantitative comparison (e.g., how ROUGE scores aligned or diverged from the two primary metrics) rather than a general statement.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation to accept the manuscript. The provided summary accurately reflects the paper's description of the CL-SciSumm 2019 shared task, its tasks, dataset, results, and public resources.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a purely descriptive overview of a shared-task organization, participation statistics, and evaluation results. It contains no equations, derivations, fitted parameters, predictions, or technical claims that could reduce to self-definition or self-citation. The single positioning statement (that the task is the 'first medium-scale' one) is a historical/factual assertion whose support lies outside the paper; the dataset description is presented as the resource actually used for the 2019 event, without any claim that it is exhaustive or that the results generalize beyond the reported runs. No load-bearing steps exist that match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities because the document is a descriptive overview of a shared task with no theoretical claims or derivations.

pith-pipeline@v0.9.0 · 5759 in / 1034 out tokens · 31670 ms · 2026-05-24T17:28:21.825224+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    In: Proceedings of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2018)

    Aburaed, A., Bravo, A., Chiruzzo, L., Saggion, H.: Lastus/taln+ inco@ cl-scisumm 2018-using regression and convolutions for cross-document semantic linking and summarization of scholarly literature. In: Proceedings of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2018). A...

  2. [2]

    In: BIRNDL2019 (2019)

    AbuRaed, A., Chiruzzo, L., Bravo, A., Saggion, H.: LaSTUS-TALN+INCO @ CL- SciSumm 2019. In: BIRNDL2019 (2019)

  3. [3]

    In: BIRNDL@ SIGIR

    De Moraes, L.F., Das, A., Karimi, S., Verma, R.M.: University of houston@ cl- scisumm 2018. In: BIRNDL@ SIGIR. pp. 142–149 (2018)

  4. [4]

    In: BIRNDL2019 (2019)

    Fergadis, A., Pappas, D., Papageorgiou, H.: Siamese recurrent bi-directional neu- ral network for scientific summarization @ CL-SciSumm 2019 . In: BIRNDL2019 (2019)

  5. [5]

    In: BIRNDL@ SIGIR (2)

    Jaidka, K., Chandrasekaran, M.K., Jain, D., Kan, M.Y.: The cl-scisumm shared task 2017: Results and key insights. In: BIRNDL@ SIGIR (2). vol. 2002, pp. 1–15. CEUR (2017)

  6. [6]

    Inter- national Journal on Digital Libraries pp

    Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.Y.: Insights from cl- scisumm 2016: the faceted scientific document summarization shared task. Inter- national Journal on Digital Libraries pp. 1–9 (2017)

  7. [7]

    In: BIRNDL@ SIGIR (2)

    Jaidka, K., Yasunaga, M., Chandrasekaran, M.K., Radev, D., Kan, M.Y.: The cl- scisumm shared task 2018: Results and key insights. In: BIRNDL@ SIGIR (2). vol. 2132, pp. 74–83. CEUR (2018)

  8. [8]

    Information Processing and Management 43(6), 1449–1481 (2007)

    Jones, K.S.: Automatic summarising: The state of the art. Information Processing and Management 43(6), 1449–1481 (2007)

  9. [9]

    In: BIRNDL2019 (2019)

    Kim, H., Ou, S.: Ranking-based Identification of Cited Text with Deep Learning . In: BIRNDL2019 (2019)

  10. [10]

    In: BIRNDL2019 (2019)

    Li, L., Zhu, Y., Xie, Y., Huang, Z., Liu, W., Li, X., Liu, Y.: CIST@CLSciSumm- 19: Automatic Scientific Paper Summarization with Citances and Facets. In: BIRNDL2019 (2019)

  11. [11]

    Text summa- rization branches out: Proceedings of the ACL-04 workshop 8 (2004)

    Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. Text summa- rization branches out: Proceedings of the ACL-04 workshop 8 (2004)

  12. [12]

    In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers

    Liu, F., Liu, Y.: Correlation between rouge and human evaluation of extractive meeting summaries. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. pp. 201–204. Association for Computational Linguistics (2008)

  13. [13]

    In: BIRNDL2019 (2019)

    Ma, S., Zhang, H., Xu, T., Xu, J., Hu, S., Zhang, C.: IR&TM-NJUST @ CLSciSumm-19. In: BIRNDL2019 (2019)

  14. [14]

    Mayr, P., Chandrasekaran, M.K., Jaidka, K.: Editorial for the 2nd joint work- shop on bibliometric-enhanced information retrieval and natural language pro- cessing for digital libraries (BIRNDL) at SIGIR 2017. In: Proceedings of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Lan- guage Processing for Digital Libraries (B...

  15. [15]

    In: Proc

    Mayr, P., Frommholz, I., Cabanac, G., Wolfram, D.: Editorial for the Joint Work- shop on Bibliometric-enhanced Information Retrieval and Natural Language Pro- cessing for Digital Libraries (BIRNDL) at JCDL 2016. In: Proc. of the Joint Work- shop on Bibliometric-enhanced Information Retrieval and Natural Language Pro- cessing for Digital Libraries (BIRNDL2...

  16. [16]

    In: Proceedings of the SIGIR’04 workshop on Search and Discovery in Bioinformatics

    Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: Citation sentences for semantic analysis of bioscience text. In: Proceedings of the SIGIR’04 workshop on Search and Discovery in Bioinformatics. pp. 81–88 (2004)

  17. [17]

    Frontiers in Research Metrics and Analytics 3, 31 (2018)

    Nomoto, T.: Resolving citation links with neural networks. Frontiers in Research Metrics and Analytics 3, 31 (2018)

  18. [18]

    In: BIRNDL2019 (2019)

    Pitarch, Y., Pinel-Sauvagnat, K., Hubert, G., Cabanac, G., elie Fraisier-Vannier, O.: IRIT-IRIS at CL-SciSumm 2019: Matching Citances with their Intended Ref- erence Text Spans from the Scientific Literature. In: BIRNDL2019 (2019)

  19. [19]

    In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1

    Qazvinian, V., Radev, D.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. pp. 689–696. ACL (2008)

  20. [20]

    In: BIRNDL2019 (2019)

    Quatra, M.L., Cagliero, L., Baralis, E.: Poli2Sum@CL-SciSumm 2019: identify, classify, and summarize cited text spans by means of ensembles of supervised mod- els . In: BIRNDL2019 (2019)

  21. [21]

    In: BIRNDL2019 (2019)

    Syed, B., Indurthi, V., Srinivasan, B.V., Varma, V.: Transfer learning for effective scientific research comprehension. In: BIRNDL2019 (2019)

  22. [22]

    In: BIRNDL@ SIGIR

    Wang, P., Li, S., Wang, T., Zhou, H., Tang, J.: Nudt@ clscisumm-18. In: BIRNDL@ SIGIR. pp. 102–113 (2018)

  23. [23]

    In: Proceedings of AAAI 2019 (2019)

    Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A., Li, I., Friedman, D., Radev, D.: ScisummNet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In: Proceedings of AAAI 2019 (2019)

  24. [24]

    In: BIRNDL2019 (2019)

    Zerva, C., Nghiem, M.Q., Nguyen, N.T., Ananiadou, S.: UoM@CL-SciSumm 2019. In: BIRNDL2019 (2019)