Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Tianyang Zhong , Zhenyuan Yang , Zhengliang Liu , Ruidong Zhang , Weihang You , Yiheng Liu , Haiyang Sun , Yi Pan

show 6 more authors

Yiwei Li Yifan Zhou Hanqi Jiang Junhao Chen Xiang Li Tianming Liu

Authors on Pith no claims yet

classification 💻 cs.CL cs.AI

keywords culturalchallengeslanguageslinguisticlow-resourceresearchhistoricallanguage

0 comments

read the original abstract

Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical challenges, including data scarcity and technological limitations, which hinder their comprehensive study and preservation. Recent advancements in large language models (LLMs) offer transformative opportunities for addressing these challenges, enabling innovative methodologies in linguistic, historical, and cultural research. This study systematically evaluates the applications of LLMs in low-resource language research, encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis. By analyzing technical frameworks, current methodologies, and ethical considerations, this paper identifies key challenges such as data accessibility, model adaptability, and cultural sensitivity. Given the cultural, historical, and linguistic richness inherent in low-resource languages, this work emphasizes interdisciplinary collaboration and the development of customized models as promising avenues for advancing research in this domain. By underscoring the potential of integrating artificial intelligence with the humanities to preserve and study humanity's linguistic and cultural heritage, this study fosters global efforts towards safeguarding intellectual diversity.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cross-Lingual Transfer and Parameter-Efficient Adaptation in the Turkic Language Family: A Theoretical Framework for Low-Resource Language Models
cs.CL 2026-03 unverdicted novelty 7.0

The paper introduces the Turkic Transfer Coefficient (TTC) as a theoretical measure of transfer potential and a scaling model linking adaptation performance to model capacity, data size, and adaptation module expressi...
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
cs.LG 2026-04 unverdicted novelty 6.0

COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs
cs.AI 2026-05 unverdicted novelty 4.0

A research plan to analyze language distribution in LOD knowledge graphs and explore cross-lingual transfer plus analogical reasoning to improve coverage for low-resource languages.