hub

Lotus: Diffusion-based visual foundation model for high-quality dense prediction

Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Liu, Bingbing Liu, Ying- Cong Chen · 2024 · arXiv 2409.18124

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

cs.CV · 2026-05-12 · unverdicted · novelty 8.0

TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.

CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

CDPR integrates polarization priors into a diffusion-based monocular depth estimator via shared latent space and adaptive gating, outperforming RGB-only methods in challenging scenes.

How to Spin an Object: First, Get the Shape Right

cs.CV · 2024-12-13 · unverdicted · novelty 7.0

Camera-Relative Object Coordinates (CROCS) as an intermediate geometry representation in two-stage image-to-3D models yields superior novel-view quality, geometric accuracy, and multiview consistency over depth maps, visual features, and other pointmap alternatives.

Open-Source Image Editing Models Are Zero-Shot Vision Learners

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Open-source image-editing models show competitive zero-shot performance on monocular depth, surface normals, and semantic segmentation, sometimes matching tuned models.

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

Diffusion Model as a Generalist Segmentation Learner

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

DiGSeg repurposes diffusion U-Nets as generalist segmentation learners by conditioning on image-mask latents and multi-scale CLIP text features, achieving strong cross-domain performance.

Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion

cs.CV · 2026-03-11 · unverdicted · novelty 6.0

Marigold-SSD delivers zero-shot depth completion via single-step diffusion with late fusion, achieving fast inference after only 4.5 GPU days of training while showing strong cross-domain results on indoor and outdoor benchmarks.

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion

cs.CV · 2026-02-08 · unverdicted · novelty 6.0

Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.

Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

cs.CV · 2025-11-30 · unverdicted · novelty 6.0

Lotus-2 is a two-stage deterministic adaptation of diffusion priors that achieves state-of-the-art monocular depth estimation with only 59K training samples.

Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering

cs.CV · 2025-08-20 · unverdicted · novelty 6.0

Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.

The Midas Touch for Metric Depth

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

MTD turns relative depth into metric depth via segment-wise sparse graph optimization and discontinuity-aware geodesic pixel refinement, claiming better accuracy and generalization than prior depth methods.

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

cs.CV · 2025-01-05 · unverdicted · novelty 5.0

DepthMaster proposes a single-step diffusion model with Feature Alignment and Fourier Enhancement modules in a two-stage training process to improve generalization and detail preservation in monocular depth estimation over prior diffusion methods.

Image Generators are Generalist Vision Learners

cs.CV · 2026-04-22 · 2 refs

citing papers explorer

Showing 13 of 13 citing papers.

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking cs.CV · 2026-05-12 · unverdicted · none · ref 21
TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.
CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation cs.CV · 2026-04-13 · unverdicted · none · ref 23
CDPR integrates polarization priors into a diffusion-based monocular depth estimator via shared latent space and adaptive gating, outperforming RGB-only methods in challenging scenes.
How to Spin an Object: First, Get the Shape Right cs.CV · 2024-12-13 · unverdicted · none · ref 14
Camera-Relative Object Coordinates (CROCS) as an intermediate geometry representation in two-stage image-to-3D models yields superior novel-view quality, geometric accuracy, and multiview consistency over depth maps, visual features, and other pointmap alternatives.
Open-Source Image Editing Models Are Zero-Shot Vision Learners cs.CV · 2026-05-06 · unverdicted · none · ref 19
Open-source image-editing models show competitive zero-shot performance on monocular depth, surface normals, and semantic segmentation, sometimes matching tuned models.
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors cs.CV · 2026-05-01 · unverdicted · none · ref 42
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
Diffusion Model as a Generalist Segmentation Learner cs.CV · 2026-04-27 · unverdicted · none · ref 34
DiGSeg repurposes diffusion U-Nets as generalist segmentation learners by conditioning on image-mask latents and multi-scale CLIP text features, achieving strong cross-domain performance.
Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion cs.CV · 2026-03-11 · unverdicted · none · ref 22
Marigold-SSD delivers zero-shot depth completion via single-step diffusion with late fusion, achieving fast inference after only 4.5 GPU days of training while showing strong cross-domain results on indoor and outdoor benchmarks.
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion cs.CV · 2026-02-08 · unverdicted · none · ref 32
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model cs.CV · 2025-11-30 · unverdicted · none · ref 22
Lotus-2 is a two-stage deterministic adaptation of diffusion priors that achieves state-of-the-art monocular depth estimation with only 59K training samples.
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering cs.CV · 2025-08-20 · unverdicted · none · ref 21
Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.
The Midas Touch for Metric Depth cs.CV · 2026-05-12 · unverdicted · none · ref 20
MTD turns relative depth into metric depth via segment-wise sparse graph optimization and discontinuity-aware geodesic pixel refinement, claiming better accuracy and generalization than prior depth methods.
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation cs.CV · 2025-01-05 · unverdicted · none · ref 20
DepthMaster proposes a single-step diffusion model with Feature Alignment and Fourier Enhancement modules in a two-stage training process to improve generalization and detail preservation in monocular depth estimation over prior diffusion methods.
Image Generators are Generalist Vision Learners cs.CV · 2026-04-22 · unreviewed · ref 12 · 2 links

Lotus: Diffusion-based visual foundation model for high-quality dense prediction

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer