GCTM-OT extracts goal candidates with an LLM, then uses goal-prompted contrastive learning and optimal transport to discover topics that are more coherent, diverse, and aligned with human intent than prior methods on subreddit data.
Blei, Andrew Y
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
DOF ranks document categories by distinctiveness instead of size to promote blind-spot discovery, surfacing different content than coverage-based methods across four domains.
Three data-mining methods applied to Stack Overflow co-usage data identify tight language clusters, 25 developer profiles, and three macro-communities with Java as the central connector, with all methods converging on the same structure.
Paper Espresso deploys LLMs to summarize and analyze trends across 13,300+ arXiv papers over 35 months, releasing metadata that shows non-saturating topic growth and higher engagement for novel topics.
citing papers explorer
-
Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport
GCTM-OT extracts goal candidates with an LLM, then uses goal-prompted contrastive learning and optimal transport to discover topics that are more coherent, diverse, and aligned with human intent than prior methods on subreddit data.
-
Discovery-Oriented Faceting: From Coverage to Blind-Spot Discovery
DOF ranks document categories by distinctiveness instead of size to promote blind-spot discovery, surfacing different content than coverage-based methods across four domains.
-
Programming Language Co-Usage Patterns on Stack Overflow: Analysis of the Developer Ecosystem
Three data-mining methods applied to Stack Overflow co-usage data identify tight language clusters, 25 developer profiles, and three macro-communities with Java as the central connector, with all methods converging on the same structure.
-
Paper Espresso: From Paper Overload to Research Insight
Paper Espresso deploys LLMs to summarize and analyze trends across 13,300+ arXiv papers over 35 months, releasing metadata that shows non-saturating topic growth and higher engagement for novel topics.