Unsupervised Extraction of Representative Concepts from Scientific Literature

Adit Krishnan; Aravind Sankar; Jiawei Han; Shi Zhi

arxiv: 1710.02271 · v3 · pith:OAVQB6G7new · submitted 2017-10-06 · 💻 cs.IR

Unsupervised Extraction of Representative Concepts from Scientific Literature

Adit Krishnan , Aravind Sankar , Shi Zhi , Jiawei Han This is my paper

classification 💻 cs.IR

keywords scientificconceptextractionproposeacademicalgorithmaspect-typedconcepts

0 comments

read the original abstract

This paper studies the automated categorization and extraction of scientific concepts from titles of scientific articles, in order to gain a deeper understanding of their key contributions and facilitate the construction of a generic academic knowledgebase. Towards this goal, we propose an unsupervised, domain-independent, and scalable two-phase algorithm to type and extract key concept mentions into aspects of interest (e.g., Techniques, Applications, etc.). In the first phase of our algorithm we propose PhraseType, a probabilistic generative model which exploits textual features and limited POS tags to broadly segment text snippets into aspect-typed phrases. We extend this model to simultaneously learn aspect-specific features and identify academic domains in multi-domain corpora, since the two tasks mutually enhance each other. In the second phase, we propose an approach based on adaptor grammars to extract fine grained concept mentions from the aspect-typed phrases without the need for any external resources or human effort, in a purely data-driven manner. We apply our technique to study literature from diverse scientific domains and show significant gains over state-of-the-art concept extraction techniques. We also present a qualitative analysis of the results obtained.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Alternatives to the Laplacian for Scalable Spectral Clustering with Group Fairness Constraints
cs.LG 2025-10 unverdicted novelty 4.0

Fair-SMW uses SMW identity and alternative Laplacians to produce group-fair spectral clustering that is twice as fast and twice as balanced as prior methods on SBM and real network data.