Nonsymbolic Text Representation
classification
💻 cs.CL
keywords
textmodelrepresentationnonsymbolictrainingappliesapplyingattempts
read the original abstract
We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.