pith. sign in

arxiv: 1410.7182 · v1 · pith:3VU7Q7GNnew · submitted 2014-10-27 · 💻 cs.CL

Analysis of Named Entity Recognition and Linking for Tweets

classification 💻 cs.CL
keywords entitytweetsdisambiguationnamedrecognitionanalysisinformationlanguage
0
0 comments X p. Extension
pith:3VU7Q7GN Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{3VU7Q7GN}

Prints a linked pith:3VU7Q7GN badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.