The pipeline identifies all template nodes and sorts them by length to ensure that larger, more complex templates are handled first

T emplate processing: Templates are a central feature of Wikimedia markup, used for formatting, metadata, or inserting standardised content

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Wiki Dumps to Training Corpora: South Slavic Case

cs.CL · 2026-04-28 · unverdicted · novelty 4.0 · 2 refs

A two-phase pipeline extracts clean text from Wikimedia dumps and applies n-gram filtering to remove repetitive low-quality articles for South Slavic language corpora.

citing papers explorer

Showing 1 of 1 citing paper.

Wiki Dumps to Training Corpora: South Slavic Case cs.CL · 2026-04-28 · unverdicted · none · ref 22 · 2 links
A two-phase pipeline extracts clean text from Wikimedia dumps and applies n-gram filtering to remove repetitive low-quality articles for South Slavic language corpora.

The pipeline identifies all template nodes and sorts them by length to ensure that larger, more complex templates are handled first

fields

years

verdicts

representative citing papers

citing papers explorer