Title resolution pending

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Challenging Vision-Language Models with Physically Deployable Multimodal Semantic Lighting Attacks

cs.CV · 2026-04-14 · unverdicted · novelty 8.0

MSLA is the first physically deployable attack that uses adversarial lighting to break semantic alignment in VLMs such as CLIP, LLaVA, and BLIP, causing classification failures and hallucinations in real scenes.

Geo2Sound: A Scalable Geo-Aligned Framework for Soundscape Generation from Satellite Imagery

cs.MM · 2026-04-16 · unverdicted · novelty 7.0

Geo2Sound generates geographically realistic soundscapes from satellite imagery via geospatial attribute modeling, semantic hypothesis expansion, and geo-acoustic alignment, achieving SOTA FAD of 1.765 on a new 20k-pair benchmark.

AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

AIM applies modality-specific masks to balance stability and plasticity in asymmetric VLMs, achieving SOTA average performance and reduced forgetting on continual VQA v2 and GQA while preserving generalization to novel compositions.

MAP4TS: A Multi-Aspect Prompting Framework for Time-Series Forecasting with Large Language Models

cs.CL · 2025-10-27 · unverdicted · novelty 6.0

MAP4TS combines global, local, statistical, and temporal prompts derived from classical time-series analysis with raw embeddings via cross-modality alignment to improve LLM forecasting performance across eight datasets.

Fall into a Pit, Gain in a Wit: Cognitive-Guided Harmful Meme Detection via Misjudgment Risk Pattern Retrieval

cs.LG · 2025-10-10 · unverdicted · novelty 5.0

PatMD improves harmful meme detection by retrieving misjudgment risk patterns to guide MLLMs, reporting 8.30% average F1 and 7.71% accuracy gains on 6,626 memes across 5 tasks.

Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection

cs.CV · 2026-04-18 · unverdicted · novelty 4.0

Vision-language grounding shows high prompt sensitivity, with different wordings for the same object leading to distinct instance selections and text embeddings explaining only 34% of the disagreement.

citing papers explorer

Showing 6 of 6 citing papers.

Challenging Vision-Language Models with Physically Deployable Multimodal Semantic Lighting Attacks cs.CV · 2026-04-14 · unverdicted · none · ref 20
MSLA is the first physically deployable attack that uses adversarial lighting to break semantic alignment in VLMs such as CLIP, LLaVA, and BLIP, causing classification failures and hallucinations in real scenes.
Geo2Sound: A Scalable Geo-Aligned Framework for Soundscape Generation from Satellite Imagery cs.MM · 2026-04-16 · unverdicted · none · ref 38
Geo2Sound generates geographically realistic soundscapes from satellite imagery via geospatial attribute modeling, semantic hypothesis expansion, and geo-acoustic alignment, achieving SOTA FAD of 1.765 on a new 20k-pair benchmark.
AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning cs.CV · 2026-04-16 · unverdicted · none · ref 28
AIM applies modality-specific masks to balance stability and plasticity in asymmetric VLMs, achieving SOTA average performance and reduced forgetting on continual VQA v2 and GQA while preserving generalization to novel compositions.
MAP4TS: A Multi-Aspect Prompting Framework for Time-Series Forecasting with Large Language Models cs.CL · 2025-10-27 · unverdicted · none · ref 16
MAP4TS combines global, local, statistical, and temporal prompts derived from classical time-series analysis with raw embeddings via cross-modality alignment to improve LLM forecasting performance across eight datasets.
Fall into a Pit, Gain in a Wit: Cognitive-Guided Harmful Meme Detection via Misjudgment Risk Pattern Retrieval cs.LG · 2025-10-10 · unverdicted · none · ref 29
PatMD improves harmful meme detection by retrieving misjudgment risk patterns to guide MLLMs, reporting 8.30% average F1 and 7.71% accuracy gains on 6,626 memes across 5 tasks.
Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection cs.CV · 2026-04-18 · unverdicted · none · ref 18
Vision-language grounding shows high prompt sensitivity, with different wordings for the same object leading to distinct instance selections and text embeddings explaining only 34% of the disagreement.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer