SCOPE accelerates autoregressive video diffusion up to 4.73x by using a tri-modal cache-predict-recompute scheduler with Taylor extrapolation and selective active-frame computation while preserving output quality.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 3polarities
background 3representative citing papers
FASA bridges low-level forensic frequency signals and high-level semantic consistency to achieve state-of-the-art localization of both conventional and diffusion-generated image manipulations.
CAGE uses LLM-generated code for label-correct diagrams followed by ControlNet-conditioned diffusion refinement to produce both accurate and visually engaging educational graphics, backed by the new EduDiagram-2K dataset.
InsTraj generates realistic, instruction-faithful GPS trajectories by using an LLM to parse natural-language travel intent and a multimodal diffusion transformer to produce the paths.
SubFlow restores full mode coverage in one-step flow matching by conditioning on sub-modes from semantic clustering, yielding higher diversity on ImageNet-256 while preserving FID.
TIGFlow-GRPO uses a Trajectory-Interaction-Graph in conditional flow matching plus Flow-GRPO optimization to produce more accurate, socially compliant, and physically feasible trajectory forecasts on ETH/UCY and SDD datasets.
Face-D²CL fuses spatial and frequency features and uses dual continual learning to reduce forgetting while adapting to new DeepFakes, cutting average error rates by 60.7% and raising unseen-domain AUC by 7.9% over prior SOTA.
citing papers explorer
-
Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation
SCOPE accelerates autoregressive video diffusion up to 4.73x by using a tri-modal cache-predict-recompute scheduler with Taylor extrapolation and selective active-frame computation while preserving output quality.
-
Bridging the Micro--Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization
FASA bridges low-level forensic frequency signals and high-level semantic consistency to achieve state-of-the-art localization of both conventional and diffusion-generated image manipulations.
-
CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement
CAGE uses LLM-generated code for label-correct diagrams followed by ControlNet-conditioned diffusion refinement to produce both accurate and visually engaging educational graphics, backed by the new EduDiagram-2K dataset.
-
InsTraj: Instructing Diffusion Models with Travel Intentions to Generate Real-world Trajectories
InsTraj generates realistic, instruction-faithful GPS trajectories by using an LLM to parse natural-language travel intent and a multimodal diffusion transformer to produce the paths.
-
SubFlow: Sub-mode Conditioned Flow Matching for Diverse One-Step Generation
SubFlow restores full mode coverage in one-step flow matching by conditioning on sub-modes from semantic clustering, yielding higher diversity on ImageNet-256 while preserving FID.
-
TIGFlow-GRPO: Trajectory Forecasting via Interaction-Aware Flow Matching and Reward-Guided Optimization
TIGFlow-GRPO uses a Trajectory-Interaction-Graph in conditional flow matching plus Flow-GRPO optimization to produce more accurate, socially compliant, and physically feasible trajectory forecasts on ETH/UCY and SDD datasets.
-
Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection
Face-D²CL fuses spatial and frequency features and uses dual continual learning to reduce forgetting while adapting to new DeepFakes, cutting average error rates by 60.7% and raising unseen-domain AUC by 7.9% over prior SOTA.