A survey on multimodal large language models , volume=

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu + 1 more · 2024 · National Science Review · DOI 10.1093/nsr/nwae403

19 Pith papers cite this work, alongside 504 external citations. Polarity classification is still indexing.

19 Pith papers citing it

504 external citations · Crossref

open at publisher browse 19 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Towards Automated Air Traffic Safety Assessment Around Non-Towered Airports Using Large Language Models

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

Large language models achieve macro F1 scores above 0.85 on binary nominal-versus-danger classification from CTAF radio transcripts and METAR weather data using a new synthetic dataset with a 12-category hazard taxonomy.

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.

Scouting By Reward: VLM-TO-IRL-Driven Player Selection For Esports

cs.LG · 2026-04-15 · unverdicted · novelty 7.0

A multimodal VLM-TO-IRL framework with GAIL learns professional-specific reward functions from telemetry and tactical commentary to rank esports players by stylistic alignment.

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

cs.CV · 2025-09-26 · unverdicted · novelty 7.0

MultiMat shows multimodal large models plus constrained search produce higher-quality procedural material graphs than text-only baselines on a new production dataset.

Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework

cs.HC · 2026-06-29 · unverdicted · novelty 6.0

Introduces a benchmark for MLLM-based chart data extraction from unlabeled images and a human-centered training framework that reaches SOTA numerical accuracy with a 7B model.

One Generator, Any Process: LLM-Conditioning for the LHC

hep-ph · 2026-06-22 · unverdicted · novelty 6.0 · 2 refs

LLM embeddings condition a generative transformer to enable faster convergence, better performance, and generalization to unseen LHC processes using a single model.

Polaris: Scaling Up Instruction-Guided Image Generation Towards Millions of Personalized Style Needs

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

Polaris retrieves and integrates relevant models from a large library of checkpoints and adapters to enable scalable instruction-guided image generation and editing without additional training.

Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis

eess.SP · 2026-05-16 · unverdicted · novelty 6.0

Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.

Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

Separate modality-specific reasoning before fusion reduces hallucinations and improves accuracy in audio-visual LLMs by enforcing isolated traces then integrating evidence.

EmoMM: Benchmarking and Steering MLLM for Multimodal Emotion Recognition under Conflict and Missingness

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

EmoMM benchmark reveals Video Contribution Collapse in MLLMs for emotion recognition under modality conflict and missingness, mitigated by CHASE head-level attention steering.

Recommending Usability Improvements with Multimodal Large Language Models

cs.SE · 2026-04-28 · unverdicted · novelty 6.0

Multimodal LLMs can detect usability issues from screen recordings, explain them via Nielsen's heuristics, and rank improvement recommendations, with engineer feedback indicating practical usefulness for teams lacking experts.

AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

AICA-Bench evaluates 23 VLMs on affective image analysis, identifies weak intensity calibration and shallow descriptions as limitations, and proposes training-free Grounded Affective Tree Prompting to improve performance.

Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding

cs.AI · 2026-03-19 · unverdicted · novelty 6.0

MLLMs exhibit a consistent recognition-reasoning inversion on discrete visual symbols across domains, underperforming on elementary perception while appearing competent on higher-level reasoning via linguistic compensation.

Usability Analysis of Configurator User Interfaces with Multimodal Large Language Models

cs.SE · 2026-05-28 · unverdicted · novelty 5.0

Multimodal LLMs applied to 16 real-world configurators using 18 synthesized criteria can identify usability issues and generate actionable suggestions, with human review confirming reliability.

Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

cs.AI · 2026-03-12 · unverdicted · novelty 5.0

Introduces Explicit Logic Channel (ELC) with LLM, VFM and probabilistic inference for validating, selecting and enhancing MLLMs on zero-shot tasks using Consistency Rate and cross-channel integration.

LLM-based Multimodal Personality Recognition via Facial Action Unit-Text Semantic Fusion

cs.CV · 2026-06-29 · unverdicted · novelty 4.0

LLM framework converts facial action unit sequences to text, fuses with responses, and regresses to personality scores, reporting lower errors and higher correlations than baselines on AVI-6.

Automated Detection of Mutual Gaze and Joint Attention in Dual-Camera Settings via Dual-Stream Transformers

cs.CV · 2026-04-29 · unverdicted · novelty 4.0

A dual-stream Transformer using frozen GazeLLE backbones and custom token fusion detects mutual gaze and joint attention from dual-camera recordings, outperforming CNN baselines and a multimodal LLM on caregiver-infant data.

Investigating Multimodal Large Language Models to Support Usability Evaluation

cs.SE · 2025-08-22 · unverdicted · novelty 4.0

The study compares MLLM-generated usability evaluations against expert assessments on prioritization of issues and introduces an interactive visualization tool for reviewing model outputs.

Generative AI Technologies, Techniques & Tensions: A Primer

cs.CY · 2026-04-19 · unverdicted · novelty 2.0

Generative AI systems arise from statistical data processing that produces human-like outputs, creating a mismatch with traditional computer expectations and positioning educational researchers to lead in studying and applying them.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis eess.SP · 2026-05-16 · unverdicted · none · ref 87
Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.

A survey on multimodal large language models , volume=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer