In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

· 2025 · arXiv 1701.2025

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

LMM-Track4D formulates a trajectory-grounded dialogue task, releases Track4D-Bench with 526 samples, and proposes RTGE encoding, TRK state token, and OSK-RA decoder to elicit better 4D spatiotemporal reasoning in LMMs.

SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining

cs.CV · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

SpectralEarth-FM is a multisensor hierarchical transformer pretrained on a 40TB co-located HSI-MSI-SAR dataset using a JEPA-style objective and reports state-of-the-art results on hyperspectral and standard EO benchmarks.

MooD: Perception-Enhanced Efficient Affective Image Editing via Continuous Valence-Arousal Modeling

cs.CV · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

MooD introduces continuous valence-arousal modeling with VA-aware retrieval and perception-enhanced guidance for efficient, controllable affective image editing, plus a new AffectSet dataset.

citing papers explorer

Showing 3 of 3 citing papers.

LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue cs.CV · 2026-05-19 · unverdicted · none · ref 55
LMM-Track4D formulates a trajectory-grounded dialogue task, releases Track4D-Bench with 526 samples, and proposes RTGE encoding, TRK state token, and OSK-RA decoder to elicit better 4D spatiotemporal reasoning in LMMs.
SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining cs.CV · 2026-05-20 · unverdicted · none · ref 38 · 2 links
SpectralEarth-FM is a multisensor hierarchical transformer pretrained on a 40TB co-located HSI-MSI-SAR dataset using a JEPA-style objective and reports state-of-the-art results on hyperspectral and standard EO benchmarks.
MooD: Perception-Enhanced Efficient Affective Image Editing via Continuous Valence-Arousal Modeling cs.CV · 2026-05-04 · unverdicted · none · ref 5 · 2 links
MooD introduces continuous valence-arousal modeling with VA-aware retrieval and perception-enhanced guidance for efficient, controllable affective image editing, plus a new AffectSet dataset.

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

fields

years

verdicts

representative citing papers

citing papers explorer