pith. machine review for the scientific record. sign in

arxiv: 1808.07228 · v1 · submitted 2018-08-22 · 💻 cs.CL · cs.AI· cs.IR

Recognition: unknown

Hierarchical Neural Network for Extracting Knowledgeable Snippets and Documents

Authors on Pith no claims yet
classification 💻 cs.CL cs.AIcs.IR
keywords knowledgeabledocumentssnippetsmethodentitiesextractingmodelmultiple
0
0 comments X
read the original abstract

In this study, we focus on extracting knowledgeable snippets and annotating knowledgeable documents from Web corpus, consisting of the documents from social media and We-media. Informally, knowledgeable snippets refer to the text describing concepts, properties of entities, or relations among entities, while knowledgeable documents are the ones with enough knowledgeable snippets. These knowledgeable snippets and documents could be helpful in multiple applications, such as knowledge base construction and knowledge-oriented service. Previous studies extracted the knowledgeable snippets using the pattern-based method. Here, we propose the semantic-based method for this task. Specifically, a CNN based model is developed to extract knowledgeable snippets and annotate knowledgeable documents simultaneously. Additionally, a "low-level sharing, high-level splitting" structure of CNN is designed to handle the documents from different content domains. Compared with building multiple domain-specific CNNs, this joint model not only critically saves the training time, but also improves the prediction accuracy visibly. The superiority of the proposed method is demonstrated in a real dataset from Wechat public platform.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.