pith. sign in

arxiv: 1710.00803 · v1 · pith:JIFMNQJPnew · submitted 2017-10-02 · 💻 cs.CL

Compiling and Processing Historical and Contemporary Portuguese Corpora

classification 💻 cs.CL
keywords corporaportuguesepublishedhistoricalmethodspresentsprocessingreport
0
0 comments X
read the original abstract

This technical report describes the framework used for processing three large Portuguese corpora. Two corpora contain texts from newspapers, one published in Brazil and the other published in Portugal. The third corpus is Colonia, a historical Portuguese collection containing texts written between the 16th and the early 20th century. The report presents pre-processing methods, segmentation, and annotation of the corpora as well as indexing and querying methods. Finally, it presents published research papers using the corpora.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.