A corpus-centric framework diagnoses scale, structure, overlap, metadata, and terminology properties across nine biomedical NER/EL corpora, showing substantial differences that common statistics fail to capture.
Computational Linguistics 34(4), 555–596 (2008)
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
LLMs show consistent structural enumeration of data comparisons and ranges across prompts, unlike humans who synthesize visualizations into trend-centric narratives.
Introduces a 25k-narrative Flemish corpus and finds that BERTopic yields more coherent and culturally relevant topics than LDA or K-Means according to human raters, despite LDA scoring higher on automated coherence metrics.
MODEE is a multimodal system that integrates graphs with LLM embeddings to outperform prior open-domain event extraction methods on large datasets.
Augmenting commonsense knowledge corpora with negation produces over 2M new triples that benefit LLM negation understanding when used for pre-training.
This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.
citing papers explorer
-
What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework
A corpus-centric framework diagnoses scale, structure, overlap, metadata, and terminology properties across nine biomedical NER/EL corpora, showing substantial differences that common statistics fail to capture.
-
How Do LLMs See Charts? A Comparative Study on High-Level Visualization Comprehension in Humans and LLMs
LLMs show consistent structural enumeration of data comparisons and ranges across prompts, unlike humans who synthesize visualizations into trend-centric narratives.
-
FLAME: A New Dataset on FLemish Accounts of Momentary Experiences
Introduces a 25k-narrative Flemish corpus and finds that BERTopic yields more coherent and culturally relevant topics than LDA or K-Means according to human raters, despite LDA scoring higher on automated coherence metrics.
-
A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents
MODEE is a multimodal system that integrates graphs with LLM embeddings to outperform prior open-domain event extraction methods on large datasets.
-
Commonsense Knowledge with Negation: A Resource to Enhance Negation Understanding
Augmenting commonsense knowledge corpora with negation produces over 2M new triples that benefit LLM negation understanding when used for pre-training.
-
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.