A survey on multimodal large language models

· 2024

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

cs.SE · 2026-04-30 · unverdicted · novelty 8.0

MLLMs exhibit a Mirage effect by bypassing circuit diagrams in favor of header semantics for Verilog generation; VeriGround with identifier anonymization and D-ORPO training reaches 46% Functional Pass@1 while refusing blank images at >92%.

UniPPTBench: A Unified Benchmark for Presentation Generation Across Diverse Input Settings

cs.CV · 2026-05-17 · conditional · novelty 7.0

The paper presents UniPPTBench and UniPPTEval, a unified benchmark and scenario-aware evaluation framework for presentation generation from vague prompts, long documents, multimodal documents, and multi-source inputs.

InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs

cs.LG · 2026-02-02 · unverdicted · novelty 6.0

InfoTok uses mutual information constraints to regularize shared visual tokenization in unified MLLMs, improving both understanding and generation performance without extra training data.

Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms

cs.RO · 2026-05-17 · unverdicted · novelty 4.0

A survey proposing a hierarchical taxonomy for multimodal tactile fusion datasets and methods across perception, generation, and interaction in embodied intelligence.

Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

cs.CV · 2026-03-05 · unverdicted · novelty 4.0

Zero-shot MLLMs on ShanghaiTech and CHAD exhibit strong conservative bias with high precision but collapsed recall; class-specific prompts raise peak F1 from 0.09 to 0.64 yet recall remains the bottleneck.

citing papers explorer

Showing 5 of 5 citing papers.

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation cs.SE · 2026-04-30 · unverdicted · none · ref 17
MLLMs exhibit a Mirage effect by bypassing circuit diagrams in favor of header semantics for Verilog generation; VeriGround with identifier anonymization and D-ORPO training reaches 46% Functional Pass@1 while refusing blank images at >92%.
UniPPTBench: A Unified Benchmark for Presentation Generation Across Diverse Input Settings cs.CV · 2026-05-17 · conditional · none · ref 2
The paper presents UniPPTBench and UniPPTEval, a unified benchmark and scenario-aware evaluation framework for presentation generation from vague prompts, long documents, multimodal documents, and multi-source inputs.
InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs cs.LG · 2026-02-02 · unverdicted · none · ref 25
InfoTok uses mutual information constraints to regularize shared visual tokenization in unified MLLMs, improving both understanding and generation performance without extra training data.
Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms cs.RO · 2026-05-17 · unverdicted · none · ref 38
A survey proposing a hierarchical taxonomy for multimodal tactile fusion datasets and methods across perception, generation, and interaction in embodied intelligence.
Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild cs.CV · 2026-03-05 · unverdicted · none · ref 38
Zero-shot MLLMs on ShanghaiTech and CHAD exhibit strong conservative bias with high precision but collapsed recall; class-specific prompts raise peak F1 from 0.09 to 0.64 yet recall remains the bottleneck.

A survey on multimodal large language models

fields

years

verdicts

representative citing papers

citing papers explorer