Canonical reference

Title resolution pending

Jonathan Ho, Ajay Jain, Pieter Abbeel · 2020

Canonical reference. 100% of citing Pith papers cite this work as background.

30 Pith papers citing it

Background 100% of classified citations

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

Towards Generalized Image Manipulation Localization via Score-based Model

cs.CV · 2026-05-16 · conditional · novelty 7.0

DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.

ZODIAC: Zero-shot Offline Diffusion for Inferring Multi-xApps Conflicts in Open Radio Access Networks

cs.NI · 2026-04-21 · unverdicted · novelty 7.0

ZODIAC enables zero-shot inference of conflict-inducing conditions in O-RAN xApps from marginal offline data alone via uncertainty-penalized compositional diffusion reasoning.

Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.

Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

DFAlign uses diffusion-based denoising to generate foreground knowledge prompts that improve cross-modal alignment for detecting unseen actions in untrimmed videos, reporting state-of-the-art results on OV-TAD benchmarks.

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.

GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

GeoLink uses offline 3D scene reconstruction to guide 2D feature refinement and relation distillation for improved generalization in cross-view geo-localization.

Direct Discrepancy Replay: Distribution-Discrepancy Condensation and Manifold-Consistent Replay for Continual Face Forgery Detection

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

A replay method for continual face forgery detection condenses real-fake distribution discrepancies into compact maps and synthesizes compatible samples from current real faces to reduce forgetting under tight memory budgets without storing historical images.

IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

IAD-Unify unifies industrial anomaly segmentation, region-grounded language understanding, and mask-guided generation in one framework using DINOv2 token injection into Qwen3.5, supported by the new Anomaly-56K dataset of 59,916 images.

SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis

cs.CV · 2026-03-23 · conditional · novelty 7.0

SHARP applies a spectrum-aware dynamic RoPE scaling schedule that promotes resolution more strongly in early denoising stages and relaxes it later, outperforming static baselines on quality metrics for remote sensing images.

Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation

cs.RO · 2026-02-18 · unverdicted · novelty 7.0

PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.

Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations

cs.CV · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

Memorization in diffusion models is detected via latent update norm instability and mitigated on-the-fly, yielding AUC over 0.999 and zero memorization rate on Stable Diffusion 1.4.

SENSE: Satellite-based ENergy Synthesis for Sustainable Environment

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

SENSE is a controllable diffusion model that jointly generates realistic urban satellite imagery and aligned building energy consumption and height maps from road networks and density inputs, improving downstream tasks with under 20% labeled data.

MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

MAEPose is a masked autoencoder that learns spatiotemporal representations from unlabeled mmWave radar videos to estimate human poses, outperforming baselines by up to 22.1% in MPJPE.

Latent Denoising Improves Visual Alignment in Large Multimodal Models

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

A latent denoising objective with saliency-aware corruption and contrastive distillation improves visual alignment and corruption robustness in large multimodal models.

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.

Semi-Supervised Flow Matching for Mosaiced and Panchromatic Fusion Imaging

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

A two-stage semi-supervised flow matching framework with random voting and conflict-free guidance fuses mosaiced hyperspectral and panchromatic images to generate superior high-resolution hyperspectral results on benchmarks.

Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

Long-CODE isolates long-context video evaluation with a new benchmark dataset and shot-dynamics metric that correlates better with human judgments on narrative richness and global consistency than short-video metrics.

Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.

ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

ARGen generates high-fidelity dynamic facial expression videos using affective semantic injection and adaptive reinforcement diffusion to improve emotion recognition models facing data scarcity and long-tail distributions.

VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.

InsTraj: Instructing Diffusion Models with Travel Intentions to Generate Real-world Trajectories

cs.AI · 2026-04-05 · unverdicted · novelty 6.0

InsTraj generates realistic, instruction-faithful GPS trajectories by using an LLM to parse natural-language travel intent and a multimodal diffusion transformer to produce the paths.

User-Aware Conditional Generative Total Correlation Learning for Multi-Modal Recommendation

cs.IR · 2026-04-03 · unverdicted · novelty 6.0

GTC improves multi-modal recommendation by using user-conditional diffusion-based feature filtering and total correlation optimization, achieving up to 28.3% gains in NDCG@5 on benchmarks.

Enhancing Foundation VLM Robustness to Missing Modality: Scalable Diffusion for Bi-directional Feature Restoration

cs.AI · 2026-02-03 · unverdicted · novelty 6.0

A diffusion model with dynamic modality gating and cross-modal mutual learning restores missing features in VLMs bi-directionally while preserving the original model's generalization.

citing papers explorer

Showing 30 of 30 citing papers.

Towards Generalized Image Manipulation Localization via Score-based Model cs.CV · 2026-05-16 · conditional · none · ref 11
DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.
ZODIAC: Zero-shot Offline Diffusion for Inferring Multi-xApps Conflicts in Open Radio Access Networks cs.NI · 2026-04-21 · unverdicted · none · ref 16
ZODIAC enables zero-shot inference of conflict-inducing conditions in O-RAN xApps from marginal offline data alone via uncertainty-penalized compositional diffusion reasoning.
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation cs.CV · 2026-04-21 · unverdicted · none · ref 16
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection cs.CV · 2026-04-20 · unverdicted · none · ref 13
DFAlign uses diffusion-based denoising to generate foreground knowledge prompts that improve cross-modal alignment for detecting unseen actions in untrimmed videos, reporting state-of-the-art results on OV-TAD benchmarks.
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs cs.CV · 2026-04-17 · unverdicted · none · ref 18
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces cs.AI · 2026-04-16 · unverdicted · none · ref 20
SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.
GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization cs.CV · 2026-04-14 · unverdicted · none · ref 11
GeoLink uses offline 3D scene reconstruction to guide 2D feature refinement and relation distillation for improved generalization in cross-view geo-localization.
Direct Discrepancy Replay: Distribution-Discrepancy Condensation and Manifold-Consistent Replay for Continual Face Forgery Detection cs.CV · 2026-04-14 · unverdicted · none · ref 10
A replay method for continual face forgery detection condenses real-fake distribution discrepancies into compact maps and synthesizes compatible samples from current real faces to reduce forgetting under tight memory budgets without storing historical images.
IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation cs.CV · 2026-04-14 · unverdicted · none · ref 13
IAD-Unify unifies industrial anomaly segmentation, region-grounded language understanding, and mask-guided generation in one framework using DINOv2 token injection into Qwen3.5, supported by the new Anomaly-56K dataset of 59,916 images.
SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis cs.CV · 2026-03-23 · conditional · none · ref 12
SHARP applies a spectrum-aware dynamic RoPE scaling schedule that promotes resolution more strongly in early denoising stages and relaxes it later, outperforming static baselines on quality metrics for remote sensing images.
Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation cs.RO · 2026-02-18 · unverdicted · none · ref 19
PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations cs.CV · 2026-05-21 · unverdicted · none · ref 17 · 2 links
Memorization in diffusion models is detected via latent update norm instability and mitigated on-the-fly, yielding AUC over 0.999 and zero memorization rate on Stable Diffusion 1.4.
SENSE: Satellite-based ENergy Synthesis for Sustainable Environment cs.CV · 2026-05-18 · unverdicted · none · ref 18
SENSE is a controllable diffusion model that jointly generates realistic urban satellite imagery and aligned building energy consumption and height maps from road networks and density inputs, improving downstream tasks with under 20% labeled data.
MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video cs.CV · 2026-04-30 · unverdicted · none · ref 13
MAEPose is a masked autoencoder that learns spatiotemporal representations from unlabeled mmWave radar videos to estimate human poses, outperforming baselines by up to 22.1% in MPJPE.
Latent Denoising Improves Visual Alignment in Large Multimodal Models cs.CV · 2026-04-23 · unverdicted · none · ref 34
A latent denoising objective with saliency-aware corruption and contrastive distillation improves visual alignment and corruption robustness in large multimodal models.
Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing cs.CV · 2026-04-22 · unverdicted · none · ref 15
Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.
Semi-Supervised Flow Matching for Mosaiced and Panchromatic Fusion Imaging cs.CV · 2026-04-22 · unverdicted · none · ref 18
A two-stage semi-supervised flow matching framework with random voting and conflict-free guidance fuses mosaiced hyperspectral and panchromatic images to generate superior high-resolution hyperspectral results on benchmarks.
Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation cs.CV · 2026-04-19 · unverdicted · none · ref 9
Long-CODE isolates long-context video evaluation with a new benchmark dataset and shot-dynamics metric that correlates better with human judgments on narrative richness and global consistency than short-video metrics.
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing cs.LG · 2026-04-17 · unverdicted · none · ref 16
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception cs.CV · 2026-04-14 · unverdicted · none · ref 15
ARGen generates high-fidelity dynamic facial expression videos using affective semantic injection and adaptive reinforcement diffusion to improve emotion recognition models facing data scarcity and long-tail distributions.
VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis cs.CV · 2026-04-08 · unverdicted · none · ref 14
VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.
InsTraj: Instructing Diffusion Models with Travel Intentions to Generate Real-world Trajectories cs.AI · 2026-04-05 · unverdicted · none · ref 14
InsTraj generates realistic, instruction-faithful GPS trajectories by using an LLM to parse natural-language travel intent and a multimodal diffusion transformer to produce the paths.
User-Aware Conditional Generative Total Correlation Learning for Multi-Modal Recommendation cs.IR · 2026-04-03 · unverdicted · none · ref 10
GTC improves multi-modal recommendation by using user-conditional diffusion-based feature filtering and total correlation optimization, achieving up to 28.3% gains in NDCG@5 on benchmarks.
Enhancing Foundation VLM Robustness to Missing Modality: Scalable Diffusion for Bi-directional Feature Restoration cs.AI · 2026-02-03 · unverdicted · none · ref 17
A diffusion model with dynamic modality gating and cross-modal mutual learning restores missing features in VLMs bi-directionally while preserving the original model's generalization.
Generative Bid Shading in Real-Time Bidding Advertising cs.GT · 2025-08-06 · unverdicted · none · ref 14
GBS replaces two-stage bid landscape modeling with an autoregressive generative model plus reward-aligned policy optimization to improve short- and long-term advertiser surplus in real-time bidding.
From Snapshots to Trajectories: Learning Single-Cell Gene Expression Dynamics via Conditional Flow Matching cs.LG · 2026-05-21 · unverdicted · none · ref 21
scFM learns bidirectional velocity fields from entropically regularized OT couplings between snapshots, with added alignment and regularization to reduce drift in long-horizon predictions of single-cell trajectories.
SubFlow: Sub-mode Conditioned Flow Matching for Diverse One-Step Generation cs.LG · 2026-04-14 · unverdicted · none · ref 18
SubFlow restores full mode coverage in one-step flow matching by conditioning on sub-modes from semantic clustering, yielding higher diversity on ImageNet-256 while preserving FID.
Sampling Parallelism for Fast and Efficient Bayesian Learning cs.LG · 2026-04-06 · unverdicted · none · ref 17
Sampling parallelism distributes Bayesian sample evaluations across GPUs for near-perfect scaling, lower memory use, and faster convergence via per-GPU data augmentations, outperforming pure data parallelism in diversity.
TIGFlow-GRPO: Trajectory Forecasting via Interaction-Aware Flow Matching and Reward-Guided Optimization cs.CV · 2026-03-26 · unverdicted · none · ref 21
TIGFlow-GRPO uses a Trajectory-Interaction-Graph in conditional flow matching plus Flow-GRPO optimization to produce more accurate, socially compliant, and physically feasible trajectory forecasts on ETH/UCY and SDD datasets.
Harnessing AI for Inverse Partial Differential Equation Problems: Past, Present, and Prospects cs.AI · 2026-05-16 · unverdicted · none · ref 101
A survey organizing AI methods for inverse PDE problems into inverse problems, inverse design, and control categories, covering applications and future challenges like physics-informed models and uncertainty quantification.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer