A multimodal pipeline decodes EEG into 3D meshes via EEG-to-image, MLLM reasoning, diffusion, and single-image-to-3D conversion, reporting 85.4% 10-way accuracy and 0.648 CLIPScore.
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4years
2026 4roles
background 1polarities
background 1representative citing papers
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
A two-stage framework enables multimodal LLMs to learn shared latent representations from pairwise modality data and achieve cross-modal generation when incorporating new modalities.
LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.
citing papers explorer
-
Brain3D: EEG-to-3D Decoding of Visual Representations via Multimodal Reasoning
A multimodal pipeline decodes EEG into 3D meshes via EEG-to-image, MLLM reasoning, diffusion, and single-image-to-3D conversion, reporting 85.4% 10-way accuracy and 0.648 CLIPScore.
-
StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
-
Multimodal LLMs under Pairwise Modalities
A two-stage framework enables multimodal LLMs to learn shared latent representations from pairwise modality data and achieve cross-modal generation when incorporating new modalities.
-
LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows
LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.