Large language models achieve macro F1 scores above 0.85 on binary nominal-versus-danger classification from CTAF radio transcripts and METAR weather data using a new synthetic dataset with a 12-category hazard taxonomy.
A survey on multimodal large language models , volume=
19 Pith papers cite this work, alongside 504 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 19roles
background 2polarities
background 2representative citing papers
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
A multimodal VLM-TO-IRL framework with GAIL learns professional-specific reward functions from telemetry and tactical commentary to rank esports players by stylistic alignment.
MultiMat shows multimodal large models plus constrained search produce higher-quality procedural material graphs than text-only baselines on a new production dataset.
Introduces a benchmark for MLLM-based chart data extraction from unlabeled images and a human-centered training framework that reaches SOTA numerical accuracy with a 7B model.
LLM embeddings condition a generative transformer to enable faster convergence, better performance, and generalization to unseen LHC processes using a single model.
Polaris retrieves and integrates relevant models from a large library of checkpoints and adapters to enable scalable instruction-guided image generation and editing without additional training.
Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.
Separate modality-specific reasoning before fusion reduces hallucinations and improves accuracy in audio-visual LLMs by enforcing isolated traces then integrating evidence.
EmoMM benchmark reveals Video Contribution Collapse in MLLMs for emotion recognition under modality conflict and missingness, mitigated by CHASE head-level attention steering.
Multimodal LLMs can detect usability issues from screen recordings, explain them via Nielsen's heuristics, and rank improvement recommendations, with engineer feedback indicating practical usefulness for teams lacking experts.
AICA-Bench evaluates 23 VLMs on affective image analysis, identifies weak intensity calibration and shallow descriptions as limitations, and proposes training-free Grounded Affective Tree Prompting to improve performance.
MLLMs exhibit a consistent recognition-reasoning inversion on discrete visual symbols across domains, underperforming on elementary perception while appearing competent on higher-level reasoning via linguistic compensation.
Multimodal LLMs applied to 16 real-world configurators using 18 synthesized criteria can identify usability issues and generate actionable suggestions, with human review confirming reliability.
Introduces Explicit Logic Channel (ELC) with LLM, VFM and probabilistic inference for validating, selecting and enhancing MLLMs on zero-shot tasks using Consistency Rate and cross-channel integration.
LLM framework converts facial action unit sequences to text, fuses with responses, and regresses to personality scores, reporting lower errors and higher correlations than baselines on AVI-6.
A dual-stream Transformer using frozen GazeLLE backbones and custom token fusion detects mutual gaze and joint attention from dual-camera recordings, outperforming CNN baselines and a multimodal LLM on caregiver-infant data.
The study compares MLLM-generated usability evaluations against expert assessments on prioritization of issues and introduces an interactive visualization tool for reviewing model outputs.
Generative AI systems arise from statistical data processing that produces human-like outputs, creating a mismatch with traditional computer expectations and positioning educational researchers to lead in studying and applying them.
citing papers explorer
-
Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis
Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.