Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

Bela Gipp; Michael Karmer; Moritz Schubotz; Norman Meuschke; Vincent Stange

arxiv: 1906.11761 · v1 · pith:4KJNFWO4new · submitted 2019-06-27 · 💻 cs.DL · cs.IR

Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

Norman Meuschke , Vincent Stange , Moritz Schubotz , Michael Karmer , Bela Gipp This is my paper

Pith reviewed 2026-05-25 13:39 UTC · model grok-4.3

classification 💻 cs.DL cs.IR

keywords academic plagiarism detectionmathematical content analysiscitation analysisSTEM documentssimilarity measuresconcealed plagiarismtwo-stage detection

0 comments

The pith

Combining math content and citation similarity with text analysis improves detection of concealed plagiarism in STEM documents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper advances a two-stage detection process that first measures similarity in mathematical expressions and citation patterns, then integrates those signals with conventional text comparison to surface concealed cases such as strong paraphrases, translations, and idea reuse. It introduces new similarity measures for mathematical features that take feature order into account and evaluates the full approach against confirmed plagiarism instances before running it over 102,000 STEM documents. A sympathetic reader would care because current text-only tools reliably miss many forms of misconduct that dominate in fields where equations and references carry substantial intellectual content. The work shows that these non-text features supply usable additional signals for identifying suspicious documents.

Core claim

The authors establish that a two-stage process combining assessments of mathematical content similarity, academic citation similarity, and text similarity, using newly developed order-sensitive measures for mathematical features, outperforms text-only approaches in identifying confirmed cases of academic plagiarism and can flag suspicious documents within a collection of 102,000 STEM publications.

What carries the argument

The two-stage detection process integrating math-based, citation-based, and text-based similarity measures, with new measures that incorporate the order of mathematical features.

If this is right

The new order-aware similarity measures for mathematical features outperform the measures from prior work.
Combined math and citation analysis identifies potentially suspicious cases inside a large collection of 102K STEM documents.
Math-based and citation-based features serve as a supplement to text-based detection for concealed plagiarism.
Direct comparison on confirmed cases shows measurable gains from the multi-feature approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Detection systems could incorporate domain-specific non-text features like equations as a standard layer for technical literature.
Similar ordered-feature analysis might be applied to diagrams, tables, or data sets to address additional reuse patterns.
Large-scale screening of submissions could become feasible if the method proves efficient on production collections.

Load-bearing premise

The confirmed cases of academic plagiarism used for evaluation are representative of concealed forms such as strong paraphrases, translations, and idea reuse.

What would settle it

A new test set of confirmed plagiarism cases in which the combined math-plus-citation approach flags no additional instances beyond those already caught by text analysis alone would falsify the improvement claim.

Figures

Figures reproduced from arXiv: 1906.11761 by Bela Gipp, Michael Karmer, Moritz Schubotz, Norman Meuschke, Vincent Stange.

**Figure 2.** Figure 2: Similarity scores in 1M random document pairs. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In this paper, we extend our prior research on analyzing mathematical content and academic citations. Both are promising approaches for improving the detection of concealed academic plagiarism primarily in Science, Technology, Engineering and Mathematics (STEM). We make the following contributions: i) We present a two-stage detection process that combines similarity assessments of mathematical content, academic citations, and text. ii) We introduce new similarity measures that consider the order of mathematical features and outperform the measures in our prior research. iii) We compare the effectiveness of the math-based, citation-based, and text-based detection approaches using confirmed cases of academic plagiarism. iv) We demonstrate that the combined analysis of math-based and citation-based content features allows identifying potentially suspicious cases in a collection of 102K STEM documents. Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The order-aware math measures and two-stage pipeline add a workable signal for STEM plagiarism checks on the tested cases, but the confirmed instances may not show it handles the concealed forms the abstract targets.

read the letter

This paper extends the authors' prior work with new math similarity measures that account for feature order and a two-stage process that layers math, citation, and text checks. They evaluate the approaches on confirmed plagiarism cases and run the combined system over 102k STEM documents to identify suspicious ones. The order-aware measures reportedly outperform their earlier versions, and the combined run surfaces cases text alone would miss. The scale of the corpus and use of external confirmed cases are concrete strengths; the work stays grounded in real data rather than synthetic tests. The main limitation is the evaluation target. The abstract stresses value for concealed plagiarism such as strong paraphrases, translations, and idea reuse, yet the confirmed cases used for comparison could be dominated by verbatim or lightly altered copies. If so, the reported gains do not directly demonstrate effectiveness against the harder concealed regime the paper claims to address. The large-corpus scan is useful for spotting candidates but lacks ground truth for those hard forms. Readers building or tuning detection tools for STEM literature would get the most from the specific measures and pipeline design. The paper shows clear engineering effort and honest use of external data, so it deserves peer review even if the claims on concealed cases need tighter support.

Referee Report

2 major / 2 minor

Summary. The manuscript extends prior work on plagiarism detection in STEM documents by proposing a two-stage process that integrates similarity measures for mathematical content (including new order-aware features), academic citations, and text. It claims these new math measures outperform prior versions, that the combined math/citation/text approaches are effective when evaluated on confirmed plagiarism cases, and that applying the math+citation combination to a 102K-document STEM collection identifies suspicious cases. The central claim is that math and citation analysis provides a striking supplement to conventional text-based methods specifically for detecting concealed forms of plagiarism such as strong paraphrases, translations, and idea reuse.

Significance. If the evaluation holds, the work would meaningfully advance detection of non-textual and concealed plagiarism in STEM by exploiting domain-specific signals (ordered math expressions and citation patterns) that are harder to disguise than text. The scale of the 102K-document demonstration and the focus on order-aware math measures are positive elements that could inform practical systems if the representativeness of the ground-truth cases is established.

major comments (2)

[Contribution (iii) and evaluation section] Contribution (iii) and the associated evaluation section: the claim that the math-based and citation-based approaches supplement text-based detection for concealed plagiarism rests on performance differences observed on 'confirmed cases of academic plagiarism.' The manuscript does not report the breakdown of these cases by concealment type (verbatim/light rewording vs. strong paraphrases, translations, or idea reuse), which is load-bearing for the central claim; if the confirmed set is dominated by easily detectable verbatim copies, the comparative results do not establish added value in the concealed-plagiarism regime highlighted in the abstract and skeptic note.
[Two-stage process and math similarity measures section] Section describing the two-stage detection process and new order-aware math measures: the outperformance of the new measures over prior work is asserted, but without explicit reporting of statistical significance tests, effect sizes, or controls for post-hoc threshold selection on the confirmed cases, it is unclear whether the gains are robust or depend on dataset-specific tuning.

minor comments (2)

[Abstract] The abstract and introduction use 'striking supplement' without quantifying the improvement (e.g., precision/recall deltas); a concrete metric comparison would strengthen the presentation.
[Mathematical content similarity section] Notation for the order-aware math features should be defined more explicitly when first introduced to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of our evaluation that we will address through revisions to strengthen the presentation of our results.

read point-by-point responses

Referee: [Contribution (iii) and evaluation section] Contribution (iii) and the associated evaluation section: the claim that the math-based and citation-based approaches supplement text-based detection for concealed plagiarism rests on performance differences observed on 'confirmed cases of academic plagiarism.' The manuscript does not report the breakdown of these cases by concealment type (verbatim/light rewording vs. strong paraphrases, translations, or idea reuse), which is load-bearing for the central claim; if the confirmed set is dominated by easily detectable verbatim copies, the comparative results do not establish added value in the concealed-plagiarism regime highlighted in the abstract and skeptic note.

Authors: We agree that explicitly reporting the breakdown of confirmed cases by concealment type would strengthen support for the central claim regarding concealed plagiarism. We will revise the evaluation section to include this breakdown based on the available case metadata. revision: yes
Referee: [Two-stage process and math similarity measures section] Section describing the two-stage detection process and new order-aware math measures: the outperformance of the new measures over prior work is asserted, but without explicit reporting of statistical significance tests, effect sizes, or controls for post-hoc threshold selection on the confirmed cases, it is unclear whether the gains are robust or depend on dataset-specific tuning.

Authors: We acknowledge the need for statistical rigor. We will add significance tests and effect sizes to the revised manuscript. We will also clarify the threshold selection procedure and add any necessary controls to demonstrate it was not performed post-hoc on the evaluation cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation relies on external ground truth

full rationale

The paper's contributions consist of a two-stage detection process, new order-aware similarity measures for math features, and empirical comparisons on confirmed external plagiarism cases plus an independent 102K-document collection. No equations or derivations reduce by construction to fitted parameters or self-referential definitions. Self-citation to prior work on math/citation analysis is present but not load-bearing, as the effectiveness claims are validated against independent confirmed cases rather than derived from the cited prior results. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied empirical paper on detection methods with no mathematical derivations, free parameters, axioms, or invented entities described in the abstract.

pith-pipeline@v0.9.0 · 5764 in / 1147 out tokens · 58182 ms · 2026-05-25T13:39:05.454133+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

Akiko Aizawa, Michael Kohlhase, Iadh Ounis, and Moritz Schubotz. 2014. NTCIR- 11 Math-2 Task Overview. In Proc. NTCIR

work page 2014
[2]

Alzahrani, Naomie Salim, and Ajith Abraham

Salha M. Alzahrani, Naomie Salim, and Ajith Abraham. 2012. Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods. In IEEE Trans. Syst., Man, Cybern. C, Appl. Rev. , Vol. 42. 133–149

work page 2012
[3]

Alberto Barrón-Cedeño, Parth Gupta, and Paolo Rosso. 2013. Methods for Cross- language Plagiarism Detection. Know.-Based Syst. 50 (2013), 211–217

work page 2013
[4]

Hannah Bast and Claudius Korzen. 2017. A Benchmark and Evaluation for Text Extraction from PDF. In Proc. JCDL

work page 2017
[5]

Zdenek Ceska. 2008. Plagiarism Detection Based on Singular Value Decomposi- tion. In Advances in Natural Language Processing . LNCS, Vol. 5221. Springer

work page 2008
[6]

Nava Ehsan and Azadeh Shakery. 2016. Candidate Document Retrieval for Cross- lingual Plagiarism Detection Using Two-level Proximity Information. Inf. Process. Manage. 52, 6 (2016), 1004–1017

work page 2016
[7]

Tompa, and Azadeh Shakery

Nava Ehsan, Frank Wm. Tompa, and Azadeh Shakery. 2016. Using a Dictionary and N-gram Alignment to Improve Fine-grained Cross-Language Plagiarism Detection. In Proc. DocEng

work page 2016
[8]

We know it when we see it

Teddy Fishman. 2009. "We know it when we see it"? is not good enough: toward a standard definition of plagiarism that transcends theft, fraud, and copyright. In Proc. Asia Pacific Conf. on Educational Integrity

work page 2009
[9]

Bela Gipp. 2014. Citation-based Plagiarism Detection - Detecting Disguised and Cross-language Plagiarism using Citation Pattern Analysis . Springer

work page 2014
[10]

Bela Gipp and Norman Meuschke. 2011. Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. In Proc. DocEng

work page 2011
[11]

Bela Gipp, Norman Meuschke, and Joeran Beel. 2011. Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag. In Proc. JCDL

work page 2011
[12]

Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, and Andreas Nuernberger. 2014. Web-based Demonstration of Semantic Similarity Detection using Citation Pattern Visualization for a Cross Language Plagiarism Case. In Proc. Int. Conf. on Enterprise Inform. Sys

work page 2014
[13]

Bela Gipp, Norman Meuschke, and Mario Lipinski. 2015. CITREC: An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central. In Proc. iConference

work page 2015
[14]

Christian Grozea, Christian Gehl, and Marius Popescu. 2009. ENCOPLOT: Pair- wise Sequence Matching in Linear Time Applied to Plagiarism Detection. In Proc. PAN WS

work page 2009
[15]

Ferruccio Guidi and Claudio Sacerdoti Coen. 2016. A Survey on Retrieval of Mathematical Knowledge. Mathem. in Computer Science 10, 4 (2016), 409–427

work page 2016
[16]

Gupta, Vani K, and C

D. Gupta, Vani K, and C. K. Singh. 2014. Using Natural Language Processing tech- niques and fuzzy-semantic similarity for automatic external plagiarism detection. In Proc. Int. Conf. on Advances in Computing, Communications and Informatics

work page 2014
[17]

Matthias Hagen, Martin Potthast, and Benno Stein. 2015. Source Retrieval for Plagiarism Detection from Large Web Corpora. In Proc. PAN WS

work page 2015
[18]

Kenichi Iwatsuki, Takeshi Sagara, Tadayoshi Hara, and Akiko Aizawa. 2017. Detecting In-line Mathematical Expressions in Scientific Documents. In Proc. DocEng

work page 2017
[19]

Vani K and Deepa Gupta. 2015. Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In Proc. Int. Conf. on Advances in Computing, Communications and Informatics

work page 2015
[20]

Leilei Kong, Haoliang Qi, Cuixia Du, Mingxing Wang, and Zhongyuan Han. 2013. Approaches for Source Retrieval and Text Alignment of Plagiarism Detection. In Proc. PAN WS

work page 2013
[21]

Arun kumar Jayapal. 2012. Similarity Overlap Metric and Greedy String Tiling at PAN 2012. In Proc. PAN WS

work page 2012
[22]

Donald L. McCabe. 2005. Cheating among College and University Students: A North American Perspective. Int.J. for Academic Integrity 1, 1 (2005), 1–11

work page 2005
[23]

Norman Meuschke and Bela Gipp. 2013. State-of-the-art in detecting academic plagiarism. Int. J. for Educational Integrity (2013)

work page 2013
[24]

Norman Meuschke and Bela Gipp. 2014. Reducing Computational Effort for Plagiarism Detection by using Citation Characteristics to Limit Retrieval Space. In Proc. JCDL

work page 2014
[25]

Keim, and Bela Gipp

Norman Meuschke, Christopher Gondek, Daniel Seebacher, Corinna Breitinger, Daniel A. Keim, and Bela Gipp. 2018. An Adaptive Image-based Plagiarism Detection Approach. In Proc. JCDL

work page 2018
[26]

Norman Meuschke, Moritz Schubotz, Felix Hamborg, Tomas Skopal, and Bela Gipp. 2017. Analyzing Mathematical Content to Detect Academic Plagiarism. In Proc. CIKM

work page 2017
[27]

Norman Meuschke, Nicolas Siebeck, Moritz Schubotz, and Bela Gipp. 2017. Ana- lyzing Semantic Concept Patterns to Detect Academic Plagiarism. In Proc. Int. WS on Mining Scientific Publ. (WOSP) at JCDL

work page 2017
[28]

Norman Meuschke, Vincent Stange, Moritz Schubotz, and Bela Gipp. 2018. Hy- Plag: A Hybrid Approach to Academic Plagiarism Detection. In Proc. SIGIR

work page 2018
[29]

Moed, W.J.M

H.F. Moed, W.J.M. Burger, J.G. Frankfort, and A.F.J. Van Raan. 1985. The applica- tion of bibliometric indicators: Important field- and time-dependent factors to be considered. 8, 3-4 (1985), 177–203

work page 1985
[30]

Velásquez

Gabriel Oberreuter, Gaston L’Huillier, Sebastián Ríos, and Juan. Velásquez. 2011. Approaches for Intrinsic and External Plagiarism Detection. In Proc. PAN WS

work page 2011
[31]

Merin Paul and Sangeetha Jamal. 2015. An improved SRL based plagiarism detection technique using sentence ranking. Proc. CS 46 (2015), 223–230

work page 2015
[32]

Pertile, Viviane P

Solange de L. Pertile, Viviane P. Moreira, and Paolo Rosso. 2016. Comparing and combining Content- and Citation-based approaches for plagiarism detection. JASIST 67, 10 (2016), 2511–2526

work page 2016
[33]

Martin Potthast, Tim Gollub, Matthias Hagen, Jan Graßegger, Johannes Kiesel, Maximilian Michel, Arnd Oberländer, Martin Tippmann, Alberto Barrón-Cedeño, Parth Gupta, Paolo Rosso, and Benno Stein. 2012. Overview of the 4th Interna- tional Competition on Plagiarism Detection. In Proc. PAN WS

work page 2012
[34]

Martin Potthast, Benno Stein, Alberto Barrón Cedeño, and Paolo Rosso. 2010. An Evaluation Framework for Plagiarism Detection. In Proc. ACL

work page 2010
[35]

Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding plagiarisms among a set of programs with JPlag. J. of Univ. CS 8, 11 (2002), 1016

work page 2002
[36]

Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov

Miguel A. Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov. 2015. Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition. In Proc. CLEF (LNCS) , Vol. 9283

work page 2015
[37]

Cohl, Norman Meuschke, Bela Gipp, Abdou S

Moritz Schubotz, Alexey Grigorev, Marcus Leich, Howard S. Cohl, Norman Meuschke, Bela Gipp, Abdou S. Youssef, and Volker Markl. 2016. Semantification of Identifiers in Mathematics for Better Math Information Retrieval. In Proc. SIGIR

work page 2016
[38]

Moritz Schubotz, Olaf Teschke, Vincent Stange, Norman Meuschke, and Bela Gipp. 2019. Forms of Plagiarism in Digital Mathematical Libraries. In Proc. Int. Conf. on Intelligent Computer Mathematics

work page 2019
[39]

Petr Sojka and Martin Líška. 2011. Indexing and Searching Mathematics in Digital Libraries – Architecture, Design and Scalability Issues. In Proc. Int. Conf. on Intelligent Computer Mathematics (LNCS) , Vol. 6824

work page 2011
[40]

Soleman and A

S. Soleman and A. Purwarianti. 2014. Experiments on the Indonesian plagiarism detection using latent semantic analysis. In Int. Conf. on ICT

work page 2014
[41]

Benno Stein, Sven Meyer zu Eissen, and Martin Potthast. 2007. Strategies for Retrieving Plagiarized Documents. In Proc. SIGIR

work page 2007
[42]

Dominika Tkaczyk, PawełSzostek, Mateusz Fedoryszak, Piotr Jan Dendek, and Lukasz Bolikowski. 2015. CERMINE: Automatic Extraction of Structured Meta- data from Scientific Literature. Int. J. Doc. Anal. Recognit. 18, 4 (2015), 317–335

work page 2015
[43]

Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez

Juan D. Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez. 2016. DOCODE 3.0 (DOcument COpy DEtector). Information Fusion 27 (2016)

work page 2016
[44]

Debora Weber-Wulff. 2014. False Feathers: A Perspective on Academic Plagiarism

work page 2014
[45]

Michael J. Wise. 1993. String Similarity via Greedy String Tiling and Running Karp-Rabin Matching. TR (Univ. of Sydney. Basser Dept. of CS) 463. Improving PD for STEM Documents by Analyzing Mathematics and Citations JCDL’19, Jun. 2019, Urbana-Champaign, IL, USA Listing 1: Use the following BibTeX code to cite this article @inproceedings { Meuschke2019 , a...

work page arXiv 1993

[1] [1]

Akiko Aizawa, Michael Kohlhase, Iadh Ounis, and Moritz Schubotz. 2014. NTCIR- 11 Math-2 Task Overview. In Proc. NTCIR

work page 2014

[2] [2]

Alzahrani, Naomie Salim, and Ajith Abraham

Salha M. Alzahrani, Naomie Salim, and Ajith Abraham. 2012. Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods. In IEEE Trans. Syst., Man, Cybern. C, Appl. Rev. , Vol. 42. 133–149

work page 2012

[3] [3]

Alberto Barrón-Cedeño, Parth Gupta, and Paolo Rosso. 2013. Methods for Cross- language Plagiarism Detection. Know.-Based Syst. 50 (2013), 211–217

work page 2013

[4] [4]

Hannah Bast and Claudius Korzen. 2017. A Benchmark and Evaluation for Text Extraction from PDF. In Proc. JCDL

work page 2017

[5] [5]

Zdenek Ceska. 2008. Plagiarism Detection Based on Singular Value Decomposi- tion. In Advances in Natural Language Processing . LNCS, Vol. 5221. Springer

work page 2008

[6] [6]

Nava Ehsan and Azadeh Shakery. 2016. Candidate Document Retrieval for Cross- lingual Plagiarism Detection Using Two-level Proximity Information. Inf. Process. Manage. 52, 6 (2016), 1004–1017

work page 2016

[7] [7]

Tompa, and Azadeh Shakery

Nava Ehsan, Frank Wm. Tompa, and Azadeh Shakery. 2016. Using a Dictionary and N-gram Alignment to Improve Fine-grained Cross-Language Plagiarism Detection. In Proc. DocEng

work page 2016

[8] [8]

We know it when we see it

Teddy Fishman. 2009. "We know it when we see it"? is not good enough: toward a standard definition of plagiarism that transcends theft, fraud, and copyright. In Proc. Asia Pacific Conf. on Educational Integrity

work page 2009

[9] [9]

Bela Gipp. 2014. Citation-based Plagiarism Detection - Detecting Disguised and Cross-language Plagiarism using Citation Pattern Analysis . Springer

work page 2014

[10] [10]

Bela Gipp and Norman Meuschke. 2011. Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. In Proc. DocEng

work page 2011

[11] [11]

Bela Gipp, Norman Meuschke, and Joeran Beel. 2011. Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag. In Proc. JCDL

work page 2011

[12] [12]

Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, and Andreas Nuernberger. 2014. Web-based Demonstration of Semantic Similarity Detection using Citation Pattern Visualization for a Cross Language Plagiarism Case. In Proc. Int. Conf. on Enterprise Inform. Sys

work page 2014

[13] [13]

Bela Gipp, Norman Meuschke, and Mario Lipinski. 2015. CITREC: An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central. In Proc. iConference

work page 2015

[14] [14]

Christian Grozea, Christian Gehl, and Marius Popescu. 2009. ENCOPLOT: Pair- wise Sequence Matching in Linear Time Applied to Plagiarism Detection. In Proc. PAN WS

work page 2009

[15] [15]

Ferruccio Guidi and Claudio Sacerdoti Coen. 2016. A Survey on Retrieval of Mathematical Knowledge. Mathem. in Computer Science 10, 4 (2016), 409–427

work page 2016

[16] [16]

Gupta, Vani K, and C

D. Gupta, Vani K, and C. K. Singh. 2014. Using Natural Language Processing tech- niques and fuzzy-semantic similarity for automatic external plagiarism detection. In Proc. Int. Conf. on Advances in Computing, Communications and Informatics

work page 2014

[17] [17]

Matthias Hagen, Martin Potthast, and Benno Stein. 2015. Source Retrieval for Plagiarism Detection from Large Web Corpora. In Proc. PAN WS

work page 2015

[18] [18]

Kenichi Iwatsuki, Takeshi Sagara, Tadayoshi Hara, and Akiko Aizawa. 2017. Detecting In-line Mathematical Expressions in Scientific Documents. In Proc. DocEng

work page 2017

[19] [19]

Vani K and Deepa Gupta. 2015. Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In Proc. Int. Conf. on Advances in Computing, Communications and Informatics

work page 2015

[20] [20]

Leilei Kong, Haoliang Qi, Cuixia Du, Mingxing Wang, and Zhongyuan Han. 2013. Approaches for Source Retrieval and Text Alignment of Plagiarism Detection. In Proc. PAN WS

work page 2013

[21] [21]

Arun kumar Jayapal. 2012. Similarity Overlap Metric and Greedy String Tiling at PAN 2012. In Proc. PAN WS

work page 2012

[22] [22]

Donald L. McCabe. 2005. Cheating among College and University Students: A North American Perspective. Int.J. for Academic Integrity 1, 1 (2005), 1–11

work page 2005

[23] [23]

Norman Meuschke and Bela Gipp. 2013. State-of-the-art in detecting academic plagiarism. Int. J. for Educational Integrity (2013)

work page 2013

[24] [24]

Norman Meuschke and Bela Gipp. 2014. Reducing Computational Effort for Plagiarism Detection by using Citation Characteristics to Limit Retrieval Space. In Proc. JCDL

work page 2014

[25] [25]

Keim, and Bela Gipp

Norman Meuschke, Christopher Gondek, Daniel Seebacher, Corinna Breitinger, Daniel A. Keim, and Bela Gipp. 2018. An Adaptive Image-based Plagiarism Detection Approach. In Proc. JCDL

work page 2018

[26] [26]

Norman Meuschke, Moritz Schubotz, Felix Hamborg, Tomas Skopal, and Bela Gipp. 2017. Analyzing Mathematical Content to Detect Academic Plagiarism. In Proc. CIKM

work page 2017

[27] [27]

Norman Meuschke, Nicolas Siebeck, Moritz Schubotz, and Bela Gipp. 2017. Ana- lyzing Semantic Concept Patterns to Detect Academic Plagiarism. In Proc. Int. WS on Mining Scientific Publ. (WOSP) at JCDL

work page 2017

[28] [28]

Norman Meuschke, Vincent Stange, Moritz Schubotz, and Bela Gipp. 2018. Hy- Plag: A Hybrid Approach to Academic Plagiarism Detection. In Proc. SIGIR

work page 2018

[29] [29]

Moed, W.J.M

H.F. Moed, W.J.M. Burger, J.G. Frankfort, and A.F.J. Van Raan. 1985. The applica- tion of bibliometric indicators: Important field- and time-dependent factors to be considered. 8, 3-4 (1985), 177–203

work page 1985

[30] [30]

Velásquez

Gabriel Oberreuter, Gaston L’Huillier, Sebastián Ríos, and Juan. Velásquez. 2011. Approaches for Intrinsic and External Plagiarism Detection. In Proc. PAN WS

work page 2011

[31] [31]

Merin Paul and Sangeetha Jamal. 2015. An improved SRL based plagiarism detection technique using sentence ranking. Proc. CS 46 (2015), 223–230

work page 2015

[32] [32]

Pertile, Viviane P

Solange de L. Pertile, Viviane P. Moreira, and Paolo Rosso. 2016. Comparing and combining Content- and Citation-based approaches for plagiarism detection. JASIST 67, 10 (2016), 2511–2526

work page 2016

[33] [33]

Martin Potthast, Tim Gollub, Matthias Hagen, Jan Graßegger, Johannes Kiesel, Maximilian Michel, Arnd Oberländer, Martin Tippmann, Alberto Barrón-Cedeño, Parth Gupta, Paolo Rosso, and Benno Stein. 2012. Overview of the 4th Interna- tional Competition on Plagiarism Detection. In Proc. PAN WS

work page 2012

[34] [34]

Martin Potthast, Benno Stein, Alberto Barrón Cedeño, and Paolo Rosso. 2010. An Evaluation Framework for Plagiarism Detection. In Proc. ACL

work page 2010

[35] [35]

Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding plagiarisms among a set of programs with JPlag. J. of Univ. CS 8, 11 (2002), 1016

work page 2002

[36] [36]

Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov

Miguel A. Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov. 2015. Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition. In Proc. CLEF (LNCS) , Vol. 9283

work page 2015

[37] [37]

Cohl, Norman Meuschke, Bela Gipp, Abdou S

Moritz Schubotz, Alexey Grigorev, Marcus Leich, Howard S. Cohl, Norman Meuschke, Bela Gipp, Abdou S. Youssef, and Volker Markl. 2016. Semantification of Identifiers in Mathematics for Better Math Information Retrieval. In Proc. SIGIR

work page 2016

[38] [38]

Moritz Schubotz, Olaf Teschke, Vincent Stange, Norman Meuschke, and Bela Gipp. 2019. Forms of Plagiarism in Digital Mathematical Libraries. In Proc. Int. Conf. on Intelligent Computer Mathematics

work page 2019

[39] [39]

Petr Sojka and Martin Líška. 2011. Indexing and Searching Mathematics in Digital Libraries – Architecture, Design and Scalability Issues. In Proc. Int. Conf. on Intelligent Computer Mathematics (LNCS) , Vol. 6824

work page 2011

[40] [40]

Soleman and A

S. Soleman and A. Purwarianti. 2014. Experiments on the Indonesian plagiarism detection using latent semantic analysis. In Int. Conf. on ICT

work page 2014

[41] [41]

Benno Stein, Sven Meyer zu Eissen, and Martin Potthast. 2007. Strategies for Retrieving Plagiarized Documents. In Proc. SIGIR

work page 2007

[42] [42]

Dominika Tkaczyk, PawełSzostek, Mateusz Fedoryszak, Piotr Jan Dendek, and Lukasz Bolikowski. 2015. CERMINE: Automatic Extraction of Structured Meta- data from Scientific Literature. Int. J. Doc. Anal. Recognit. 18, 4 (2015), 317–335

work page 2015

[43] [43]

Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez

Juan D. Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez. 2016. DOCODE 3.0 (DOcument COpy DEtector). Information Fusion 27 (2016)

work page 2016

[44] [44]

Debora Weber-Wulff. 2014. False Feathers: A Perspective on Academic Plagiarism

work page 2014

[45] [45]

Michael J. Wise. 1993. String Similarity via Greedy String Tiling and Running Karp-Rabin Matching. TR (Univ. of Sydney. Basser Dept. of CS) 463. Improving PD for STEM Documents by Analyzing Mathematics and Citations JCDL’19, Jun. 2019, Urbana-Champaign, IL, USA Listing 1: Use the following BibTeX code to cite this article @inproceedings { Meuschke2019 , a...

work page arXiv 1993