Single-shot HDR is achieved by conditioning a video diffusion model on an LDR input to generate an exposure bracket and fusing the bracket with per-pixel weights from a lightweight UNet.
Title resolution pending
13 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
HairGPT reframes 3D hairstyle synthesis as dual-decoupled autoregressive strand sequence modeling with geometric tokenization for semantic control and rare style generation.
A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.
ProCompNav improves success rate and shortens user responses in ambiguous instance navigation by using comparative binary questions that prune a candidate pool rather than requesting detailed descriptions.
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
A new RL method called MoCA with Perception Verification rewards perceptual fidelity independently to improve both seeing and thinking in VLMs.
DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.
ECTraj enhances consistency models for multi-agent trajectory prediction via improved student-teacher supervision and conditional top-K generation, yielding faster inference and competitive accuracy on Argoverse 2.
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.
A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classification tasks.
A UAV-to-3DGS-to-MPM pipeline reconstructs real landslide sites with photorealistic visuals and runs physics-based simulations, validated on a Hong Kong event.
Comparative study of DS-NeRF, TensoRF, and HashNeRF with depth-supervision and architectural variants finds no conclusive outperformance under equal training time but identifies which design choices transfer to low-data, low-compute regimes.
citing papers explorer
-
Single-Shot HDR Recovery via a Video Diffusion Prior
Single-shot HDR is achieved by conditioning a video diffusion model on an LDR input to generate an exposure bracket and fusing the bracket with per-pixel weights from a lightweight UNet.
-
HairGPT: Strand-as-Language Autoregressive Modeling for Realistic 3D Hairstyle Synthesis
HairGPT reframes 3D hairstyle synthesis as dual-decoupled autoregressive strand sequence modeling with geometric tokenization for semantic control and rare style generation.
-
Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval
A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.
-
Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries
ProCompNav improves success rate and shortens user responses in ambiguous instance navigation by using comparative binary questions that prune a candidate pool rather than requesting detailed descriptions.
-
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
-
Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning
A new RL method called MoCA with Perception Verification rewards perceptual fidelity independently to improve both seeing and thinking in VLMs.
-
DocAtlas: Multilingual Document Understanding Across 80+ Languages
DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.
-
Enhancing Consistency Models for Multi-Agent Trajectory Prediction
ECTraj enhances consistency models for multi-agent trajectory prediction via improved student-teacher supervision and conditional top-K generation, yielding faster inference and competitive accuracy on Argoverse 2.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
-
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.
-
Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging
A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classification tasks.
-
UAV-Assisted Scan-to-Simulation for Landslides Using Physics-Informed Gaussian Splatting
A UAV-to-3DGS-to-MPM pipeline reconstructs real landslide sites with photorealistic visuals and runs physics-based simulations, validated on a Hong Kong event.
-
Low-Cost Neural Radiance Fields
Comparative study of DS-NeRF, TensoRF, and HashNeRF with depth-supervision and architectural variants finds no conclusive outperformance under equal training time but identifies which design choices transfer to low-data, low-compute regimes.