pith. sign in

arxiv: cs/0011001 · v1 · submitted 2000-11-02 · 💻 cs.CL

Utilizing the World Wide Web as an Encyclopedia: Extracting Term Descriptions from Semi-Structured Texts

classification 💻 cs.CL
keywords descriptionsmethodencyclopediaextracttermtextwideworld
0
0 comments X
read the original abstract

In this paper, we propose a method to extract descriptions of technical terms from Web pages in order to utilize the World Wide Web as an encyclopedia. We use linguistic patterns and HTML text structures to extract text fragments containing term descriptions. We also use a language model to discard extraneous descriptions, and a clustering method to summarize resultant descriptions. We show the effectiveness of our method by way of experiments.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.