MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
arXiv preprint arXiv:2305.11147 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
SpatialFusion internalizes 3D geometric awareness into unified image generation models by pairing an MLLM with a spatial transformer that produces depth maps to constrain diffusion generation.
PiERN proposes token-level routing of physically-isolated experts to embed high-precision computation directly into LLMs, reporting higher accuracy and lower latency, token count, and energy use than fine-tuning or multi-agent baselines.
Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.
IdentiFace is a multi-modal iterative diffusion framework that generates identifiable suspect faces with improved identity retrieval for law enforcement applications.
citing papers explorer
-
MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling
MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
-
SpatialFusion: Endowing Unified Image Generation with Intrinsic 3D Geometric Awareness
SpatialFusion internalizes 3D geometric awareness into unified image generation models by pairing an MLLM with a spatial transformer that produces depth maps to constrain diffusion generation.
-
PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
PiERN proposes token-level routing of physically-isolated experts to embed high-precision computation directly into LLMs, reporting higher accuracy and lower latency, token count, and energy use than fine-tuning or multi-agent baselines.
-
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.
-
IdentiFace: Multi-Modal Iterative Diffusion Framework for Identifiable Suspect Face Generation in Crime Investigations
IdentiFace is a multi-modal iterative diffusion framework that generates identifiable suspect faces with improved identity retrieval for law enforcement applications.