The llama 3 herd of models.arXiv e-prints, pages arXiv–2407

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

representative citing papers

VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing

cs.CV · 2026-02-04 · unverdicted · novelty 8.0

VLRS-Bench is the first benchmark dedicated to complex vision-language reasoning in remote sensing, with 2000 QA pairs across 14 tasks in cognition, decision, and prediction dimensions.

M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation

cs.CL · 2025-12-23 · unverdicted · novelty 7.0

M³KG-RAG improves multimodal reasoning in large language models by constructing multi-hop knowledge graphs and selectively pruning retrieved context with GRASP.

ProcObject-10K: Benchmarking Object-Centric Procedural Understanding in Instructional Videos

cs.CV · 2025-12-03 · conditional · novelty 7.0

ProcObject-10K is the first benchmark for object-centric procedural reasoning in videos that exposes a large gap where models answer questions plausibly but fail to ground their answers in the correct video segments.

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

SWIM aligns cross-attention maps from object nouns to ground-truth masks during training on the new NL-Refer dataset to enable text-only fine-grained video object understanding in MLLMs.

Repurposing 3D Generative Model for Autoregressive Layout Generation

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.

Specificity-aware reinforcement learning for fine-grained open-world classification

cs.CV · 2026-03-03 · unverdicted · novelty 6.0

SpeciaRL applies a dynamic verifier-based reward in reinforcement learning to steer reasoning LMMs toward correct and specific predictions on fine-grained open-world image classification tasks.

Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification

cs.CV · 2026-01-10 · unverdicted · novelty 6.0

A three-stage pipeline uses few-shot VLM action parsing, sliding-window segmentation, and LLM sequence classification with peer context to measure student engagement from classroom videos.

Restore-R1: Efficient Image Restoration Agents via Reinforcement Learning with Multimodal LLM Perceptual Feedback

cs.CV · 2025-12-21 · unverdicted · novelty 6.0

An RL-trained lightweight agent uses MLLM perceptual rewards to perform efficient label-free image restoration, matching SOTA on full-reference metrics and surpassing prior work on no-reference metrics.

MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

cs.CV · 2026-05-21 · conditional · novelty 5.0

MuKV adds multi-grained KV cache compression at patch-frame-segment levels plus semi-hierarchical retrieval to raise accuracy and cut memory in long video question-answering.

citing papers explorer

Showing 9 of 9 citing papers.

VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing cs.CV · 2026-02-04 · unverdicted · none · ref 9
VLRS-Bench is the first benchmark dedicated to complex vision-language reasoning in remote sensing, with 2000 QA pairs across 14 tasks in cognition, decision, and prediction dimensions.
M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation cs.CL · 2025-12-23 · unverdicted · none · ref 10
M³KG-RAG improves multimodal reasoning in large language models by constructing multi-hop knowledge graphs and selectively pruning retrieved context with GRASP.
ProcObject-10K: Benchmarking Object-Centric Procedural Understanding in Instructional Videos cs.CV · 2025-12-03 · conditional · none · ref 9
ProcObject-10K is the first benchmark for object-centric procedural reasoning in videos that exposes a large gap where models answer questions plausibly but fail to ground their answers in the correct video segments.
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding cs.CV · 2026-05-18 · unverdicted · none · ref 18
SWIM aligns cross-attention maps from object nouns to ground-truth masks during training on the new NL-Refer dataset to enable text-only fine-grained video object understanding in MLLMs.
Repurposing 3D Generative Model for Autoregressive Layout Generation cs.CV · 2026-04-17 · unverdicted · none · ref 16
LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.
Specificity-aware reinforcement learning for fine-grained open-world classification cs.CV · 2026-03-03 · unverdicted · none · ref 12
SpeciaRL applies a dynamic verifier-based reward in reinforcement learning to steer reasoning LMMs toward correct and specific predictions on fine-grained open-world image classification tasks.
Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification cs.CV · 2026-01-10 · unverdicted · none · ref 13
A three-stage pipeline uses few-shot VLM action parsing, sliding-window segmentation, and LLM sequence classification with peer context to measure student engagement from classroom videos.
Restore-R1: Efficient Image Restoration Agents via Reinforcement Learning with Multimodal LLM Perceptual Feedback cs.CV · 2025-12-21 · unverdicted · none · ref 9
An RL-trained lightweight agent uses MLLM perceptual rewards to perform efficient label-free image restoration, matching SOTA on full-reference metrics and surpassing prior work on no-reference metrics.
MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering cs.CV · 2026-05-21 · conditional · none · ref 10
MuKV adds multi-grained KV cache compression at patch-frame-segment levels plus semi-hierarchical retrieval to raise accuracy and cut memory in long video question-answering.

The llama 3 herd of models.arXiv e-prints, pages arXiv–2407

fields

years

verdicts

representative citing papers

citing papers explorer