LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
3 David Arthur and Sergei Vassilvitskii
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
PRISM distills sparse LLM labels into a fine-tuned embedding model for thresholded clustering that separates fine-grained topics better than prior local models or raw frontier embeddings.
BERTopic with contextual augmentation outperforms STM on topic coherence and interpretability for short survey responses, but STM better supports inferential covariate analysis.
TF-IDF identifies labeled experts in the top 25 recommendations 79.5% of the time versus 51.5% for GPT-4o mini on an astronomy observatory dataset.
Bibliometric methods rise from 19.61% to 31.81% usage as LIS scholars age, method diversity increases then declines, and scholars increasingly combine conventional and unconventional methods.
Granite Embedding Multilingual R2 releases 311M and 97M parameter bi-encoder models that achieve state-of-the-art retrieval performance on multilingual text, code, long-document, and reasoning datasets.
IKMF introduces a dual-stream architecture that converts raw data into semantically rich knowledge via AI mining while maintaining integrity, provenance, and reproducibility through parallel archiving.
citing papers explorer
-
Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
-
PRISM: LLM-Guided Semantic Clustering for High-Precision Topics
PRISM distills sparse LLM labels into a fine-tuned embedding model for thresholded clustering that separates fine-grained topics better than prior local models or raw frontier embeddings.
-
A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses
BERTopic with contextual augmentation outperforms STM on topic coherence and interpretability for short survey responses, but STM better supports inferential covariate analysis.
-
Traditional statistical representations outperform generative AI in identifying expert peer reviewers
TF-IDF identifies labeled experts in the top 25 recommendations 79.5% of the time versus 51.5% for GPT-4o mini on an astronomy observatory dataset.
-
Evolution of Research Method Usage Across the Academic Careers of Library and Information Science Scholars
Bibliometric methods rise from 19.61% to 31.81% usage as LIS scholars age, method diversity increases then declines, and scholars increasingly combine conventional and unconventional methods.
-
Granite Embedding Multilingual R2 Models
Granite Embedding Multilingual R2 releases 311M and 97M parameter bi-encoder models that achieve state-of-the-art retrieval performance on multilingual text, code, long-document, and reasoning datasets.
-
Intelligent Knowledge Mining Framework: Bridging AI Analysis and Trustworthy Preservation
IKMF introduces a dual-stream architecture that converts raw data into semantically rich knowledge via AI mining while maintaining integrity, provenance, and reproducibility through parallel archiving.
- Much of Geospatial Web Search Is Beyond Traditional GIS