A pipeline extracts 1,416 human-annotated GPX tracks from Common Crawl to produce a multimodal geospatial dataset.
Multilingual machine translation with large language models: Empirical results and analysis
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
LLM translation quality reaches acceptable human scores for Hausa but remains poor for Fongbe, automatic metrics show weak human correlation especially for Hausa due to neural embedding collapse, and at least 2,500 sentences are required for stable system rankings.
MOSAIC uses an Integer Linear Program scheduler for expert placement and prompt assignment plus adaptive aggregation to achieve 1.7-2.3x end-to-end speedup on 4-GPU MoA workloads while keeping accuracy within 0.1pp.
The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
citing papers explorer
-
CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl
A pipeline extracts 1,416 human-annotated GPX tracks from Common Crawl to produce a multimodal geospatial dataset.
-
Video models are zero-shot learners and reasoners
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
-
Evaluating Large Language Models for Hausa and Fongbe Machine Translation: Benchmarks, Failures, and Metric Reliability
LLM translation quality reaches acceptable human scores for Hausa but remains poor for Fongbe, automatic metrics show weak human correlation especially for Hausa due to neural embedding collapse, and at least 2,500 sentences are required for stable system rankings.
-
MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency
MOSAIC uses an Integer Linear Program scheduler for expert placement and prompt assignment plus adaptive aggregation to achieve 1.7-2.3x end-to-end speedup on 4-GPU MoA workloads while keeping accuracy within 0.1pp.
-
Benchmark Data Contamination of Large Language Models: A Survey
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.