R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
hub
Proceedings of the AAAI conference on artificial intelligence , volume=
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
A graph-based neural operator trained on expert-validated race-car CFD data reaches accuracy levels usable for early-stage interactive aerodynamic design exploration.
Observational and counterfactual distributions are linked by identical support and invariant features, enabling a flow-matching estimator with semiparametric efficiency correction to generate debiased counterfactuals from observations.
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
In a two-agent Almgren-Chriss liquidation game, deep RL agents given intra-episode history of prices and own actions achieve supra-competitive outcomes more frequently and persistently than agents without such memory.
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
SuperIgor uses iterative co-training of a language model planner and a goal-conditional RL agent to self-generate and refine plans, resulting in stricter instruction adherence and better generalization to unseen instructions.
citing papers explorer
-
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.