hub Mixed citations

Datacomp: In search of the next generation of multimodal datasets

Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan B · 2023 · arXiv 2304.14108

Mixed citation behavior. Most common role is background (40%).

15 Pith papers citing it

Background 40% of classified citations

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 1 other 1

citation-polarity summary

background 2 unclear 2 baseline 1

representative citing papers

Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs

cs.CV · 2026-05-10 · unverdicted · novelty 7.0

PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

MixAtlas uses CLIP-based decomposition and Gaussian process optimization on small proxies to discover data mixtures that improve multimodal benchmark performance by up to 17.6% and transfer to larger models with faster convergence.

FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

cs.CV · 2025-04-14 · unverdicted · novelty 7.0

FLARE is a vision-language model family using text-guided vision encoding, context-aware alignment decoding, dual-semantic mapping loss, and text-driven VQA synthesis to achieve deep cross-modal integration, outperforming larger models with only 630 vision tokens at 3B scale.

Objaverse-XL: A Universe of 10M+ 3D Objects

cs.CV · 2023-07-11 · accept · novelty 7.0

Objaverse-XL supplies over 10 million diverse 3D objects that, when used to render 100 million views, improve zero-shot novel-view synthesis in models such as Zero123.

Prior-Aligned Data Cleaning for Tabular Foundation Models

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.

QuiLL: An LLM-Based Vulnerability Assessment Framework for the Wild

cs.CR · 2025-10-05 · unverdicted · novelty 6.0

QuiLL is a new evaluation pipeline that uses optimized LLM prompts, dynamic in-context learning from an NVD vector store, and a novel accuracy-plus-reasoning metric to benchmark vulnerability detection in real code.

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

cs.CV · 2025-05-29 · unverdicted · novelty 6.0 · 2 refs

Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.

DataComp-LM: In search of the next generation of training sets for language models

cs.LG · 2024-06-17 · unverdicted · novelty 6.0

DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

cs.CV · 2023-11-21 · conditional · novelty 6.0

A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

cs.CV · 2023-07-13 · unverdicted · novelty 6.0

InternVid supplies 7M videos and LLM captions to train ViCLIP, which reaches leading zero-shot action recognition and competitive retrieval performance.

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

cs.CV · 2024-08-09 · unverdicted · novelty 5.0

mPLUG-Owl3 introduces hyper attention blocks to integrate vision and language for long image-sequence understanding and reports SOTA results on single-image, multi-image, and video benchmarks.

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

cs.CL · 2023-11-07 · unverdicted · novelty 5.0

mPLUG-Owl2 presents a modular MLLM architecture that enables modality collaboration via shared functional modules and modality-adaptive components, achieving SOTA on both text and multi-modal tasks with one generic model.

From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint

cs.CY · 2026-05-06 · unverdicted · novelty 4.0

A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

cs.CV · 2023-08-02 · unverdicted · novelty 4.0

OpenFlamingo provides open-source autoregressive vision-language models that achieve 80-89% of Flamingo performance on seven vision-language datasets.

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

cs.LG · 2026-05-19

citing papers explorer

Showing 15 of 15 citing papers.

Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs cs.CV · 2026-05-10 · unverdicted · none · ref 12
PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining cs.LG · 2026-04-03 · unverdicted · none · ref 8
MixAtlas uses CLIP-based decomposition and Gaussian process optimization on small proxies to discover data mixtures that improve multimodal benchmark performance by up to 17.6% and transfer to larger models with faster convergence.
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding cs.CV · 2025-04-14 · unverdicted · none · ref 18
FLARE is a vision-language model family using text-guided vision encoding, context-aware alignment decoding, dual-semantic mapping loss, and text-driven VQA synthesis to achieve deep cross-modal integration, outperforming larger models with only 630 vision tokens at 3B scale.
Objaverse-XL: A Universe of 10M+ 3D Objects cs.CV · 2023-07-11 · accept · none · ref 20
Objaverse-XL supplies over 10 million diverse 3D objects that, when used to render 100 million views, improve zero-shot novel-view synthesis in models such as Zero123.
Prior-Aligned Data Cleaning for Tabular Foundation Models cs.LG · 2026-04-28 · unverdicted · none · ref 11
L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.
QuiLL: An LLM-Based Vulnerability Assessment Framework for the Wild cs.CR · 2025-10-05 · unverdicted · none · ref 50
QuiLL is a new evaluation pipeline that uses optimized LLM prompts, dynamic in-context learning from an NVD vector store, and a novel accuracy-plus-reasoning metric to benchmark vulnerability detection in real code.
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence cs.CV · 2025-05-29 · unverdicted · none · ref 26 · 2 links
Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.
DataComp-LM: In search of the next generation of training sets for language models cs.LG · 2024-06-17 · unverdicted · none · ref 66
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions cs.CV · 2023-11-21 · conditional · none · ref 16
A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation cs.CV · 2023-07-13 · unverdicted · none · ref 71
InternVid supplies 7M videos and LLM captions to train ViCLIP, which reaches leading zero-shot action recognition and competitive retrieval performance.
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models cs.CV · 2024-08-09 · unverdicted · none · ref 211
mPLUG-Owl3 introduces hyper attention blocks to integrate vision and language for long image-sequence understanding and reports SOTA results on single-image, multi-image, and video benchmarks.
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration cs.CL · 2023-11-07 · unverdicted · none · ref 18
mPLUG-Owl2 presents a modular MLLM architecture that enables modality collaboration via shared functional modules and modality-adaptive components, achieving SOTA on both text and multi-modal tasks with one generic model.
From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint cs.CY · 2026-05-06 · unverdicted · none · ref 38
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models cs.CV · 2023-08-02 · unverdicted · none · ref 9
OpenFlamingo provides open-source autoregressive vision-language models that achieve 80-89% of Flamingo performance on seven vision-language datasets.
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison cs.LG · 2026-05-19 · unreviewed · ref 9

Datacomp: In search of the next generation of multimodal datasets

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer