In: International conference on machine learning

Li, J · 2023

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Leveraging Multimodal Large Language Models for All-in-One Image Restoration via a Mixture of Frequency Experts

cs.CV · 2026-05-12 · unverdicted · novelty 8.0 · 2 refs

An MLLM-guided architecture with a mixture of frequency experts and relational alignment loss achieves state-of-the-art all-in-one image restoration, outperforming prior methods by up to 1.35 dB on the CDD11 dataset.

Membership Inference Attacks Against Video Large Language Models

cs.CR · 2026-04-29 · unverdicted · novelty 7.0

A temperature-perturbed black-box attack infers video training membership in VideoLLMs with 0.68 AUC by exploiting sharper generation behavior on member samples.

DualTrack: Sensorless 3D Ultrasound needs Local and Global Context

cs.CV · 2025-09-11 · unverdicted · novelty 7.0

DualTrack uses decoupled local spatiotemporal and global anatomical encoders with a fusion module to estimate probe trajectories from 2D ultrasound sequences, achieving sub-5 mm average reconstruction error on public benchmarks.

Hi-GaTA: Hierarchical Gated Temporal Aggregation Adapter for Surgical Video Report Generation

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Hi-GaTA is a hierarchical gated temporal aggregation adapter that uses short-to-long temporal pyramids and gated fusion to enable surgical video report generation, backed by a new 214-video benchmark and a surgical ViViT pretrained on 40,000 minutes of video.

SemEnrich: Self-Supervised Semantic Enrichment of Radiology Reports for Vision-Language Learning

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

SemEnrich enriches radiology reports with positive/neutral findings via self-supervised semantic clustering, yielding average gains of 5-7% on COMET, BERT score, Sentence BLEU, CheXbert-F1 and RadGraph-F1 after fine-tuning, plus further gains when cluster info is added to GRPO rewards.

Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding

cs.CV · 2026-03-18 · unverdicted · novelty 6.0

Motion-MLLM integrates IMU egomotion data into MLLMs using cascaded filtering and asymmetric fusion to ground visual content in physical trajectories for scale-aware 3D understanding, achieving competitive accuracy at higher speed.

Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs

cs.CL · 2026-04-28 · unverdicted · novelty 5.0

A Medical Entity Tree organizes medical knowledge to engineer higher-quality training data that boosts general MLLMs on medical benchmarks.

Beyond Standard Benchmarks: A Systematic Audit of Vision-Language Model's Robustness to Natural Semantic Variation Across Diverse Tasks

cs.CV · 2026-04-06 · unverdicted · novelty 4.0

Robust CLIP models amplify vulnerabilities to natural adversarial scenarios while standard CLIP shows large performance drops on natural language-induced adversarial examples in zero-shot classification, segmentation, and VQA.

U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025

cs.CV · 2026-05-22 · unverdicted · novelty 3.0

U-CESE integrates three CESE modules into a unified clip-based pipeline with DAKE keyframe extraction and ReCap captioning to support consistent multimodal event retrieval across video sources.

citing papers explorer

Showing 9 of 9 citing papers.

Leveraging Multimodal Large Language Models for All-in-One Image Restoration via a Mixture of Frequency Experts cs.CV · 2026-05-12 · unverdicted · none · ref 24 · 2 links
An MLLM-guided architecture with a mixture of frequency experts and relational alignment loss achieves state-of-the-art all-in-one image restoration, outperforming prior methods by up to 1.35 dB on the CDD11 dataset.
Membership Inference Attacks Against Video Large Language Models cs.CR · 2026-04-29 · unverdicted · none · ref 21
A temperature-perturbed black-box attack infers video training membership in VideoLLMs with 0.68 AUC by exploiting sharper generation behavior on member samples.
DualTrack: Sensorless 3D Ultrasound needs Local and Global Context cs.CV · 2025-09-11 · unverdicted · none · ref 7
DualTrack uses decoupled local spatiotemporal and global anatomical encoders with a fusion module to estimate probe trajectories from 2D ultrasound sequences, achieving sub-5 mm average reconstruction error on public benchmarks.
Hi-GaTA: Hierarchical Gated Temporal Aggregation Adapter for Surgical Video Report Generation cs.CV · 2026-05-11 · unverdicted · none · ref 9 · 2 links
Hi-GaTA is a hierarchical gated temporal aggregation adapter that uses short-to-long temporal pyramids and gated fusion to enable surgical video report generation, backed by a new 214-video benchmark and a surgical ViViT pretrained on 40,000 minutes of video.
SemEnrich: Self-Supervised Semantic Enrichment of Radiology Reports for Vision-Language Learning cs.LG · 2026-04-10 · unverdicted · none · ref 14
SemEnrich enriches radiology reports with positive/neutral findings via self-supervised semantic clustering, yielding average gains of 5-7% on COMET, BERT score, Sentence BLEU, CheXbert-F1 and RadGraph-F1 after fine-tuning, plus further gains when cluster info is added to GRPO rewards.
Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding cs.CV · 2026-03-18 · unverdicted · none · ref 34
Motion-MLLM integrates IMU egomotion data into MLLMs using cascaded filtering and asymmetric fusion to ground visual content in physical trajectories for scale-aware 3D understanding, achieving competitive accuracy at higher speed.
Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs cs.CL · 2026-04-28 · unverdicted · none · ref 15
A Medical Entity Tree organizes medical knowledge to engineer higher-quality training data that boosts general MLLMs on medical benchmarks.
Beyond Standard Benchmarks: A Systematic Audit of Vision-Language Model's Robustness to Natural Semantic Variation Across Diverse Tasks cs.CV · 2026-04-06 · unverdicted · none · ref 16
Robust CLIP models amplify vulnerabilities to natural adversarial scenarios while standard CLIP shows large performance drops on natural language-induced adversarial examples in zero-shot classification, segmentation, and VQA.
U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025 cs.CV · 2026-05-22 · unverdicted · none · ref 16
U-CESE integrates three CESE modules into a unified clip-based pipeline with DAKE keyframe extraction and ReCap captioning to support consistent multimodal event retrieval across video sources.

In: International conference on machine learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer