A focused crawling system uses fine-tuned Transformer models on URLs to predict language and parallelism, enabling faster discovery of parallel documents compared to brute-force methods.
InProceedings of the 58th annual meeting of the association for computational linguistics,4555–4567
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Smart Bilingual Focused Crawling of Parallel Documents
A focused crawling system uses fine-tuned Transformer models on URLs to predict language and parallelism, enabling faster discovery of parallel documents compared to brute-force methods.