SetFlow is a flow-matching generative model for permutation-invariant MIL bags in representation space that produces synthetic data improving classification performance and enabling training on synthetic data alone.
Film: Visual reasoning with a general conditioning layer
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
method 3polarities
use method 3representative citing papers
PSG-UIENet fuses Retinex physics with CLIP-derived text semantics and a new multimodal dataset to enhance underwater images, claiming better results than fifteen prior methods.
GRAIL trains graph predictors via imitation learning by modeling generation as sequential decisions on partial graph embeddings, matching or exceeding prior methods on 18 benchmarks.
Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.
Diffusion models for in-context meta-learning of robot dynamics outperform deterministic Transformers in robustness to distribution shifts while enabling real-time operation via warm-started sampling.
TGPNet unifies denoising, cloud removal, shadow removal, deblurring, and SAR despeckling into one model via task-guided prompting and reports state-of-the-art results on a new multi-modal benchmark.
A hierarchical tactile-aware policy combines human-demonstration training for contact cue prediction with sim-to-real reinforcement learning to improve quadrupedal loco-manipulation performance by 28.54% over vision baselines on contact-rich tasks.
SACF discretizes target direction and distance from audio-visual cues then applies conditioned fusion to improve navigation efficiency and generalization to unheard sounds.
An instance-centric representation with local frames, relative positional encodings, and adaptive reward transformation in adversarial IRL yields scalable, accurate, and robust behavior models for multi-agent driving simulation.
citing papers explorer
-
Task-Guided Prompting for Unified Remote Sensing Image Restoration
TGPNet unifies denoising, cloud removal, shadow removal, deblurring, and SAR despeckling into one model via task-guided prompting and reports state-of-the-art results on a new multi-modal benchmark.
-
Learning Tactile-Aware Quadrupedal Loco-Manipulation Policies
A hierarchical tactile-aware policy combines human-demonstration training for contact cue prediction with sim-to-real reinforcement learning to improve quadrupedal loco-manipulation performance by 28.54% over vision baselines on contact-rich tasks.
-
Spatial-Aware Conditioned Fusion for Audio-Visual Navigation
SACF discretizes target direction and distance from audio-visual cues then applies conditioned fusion to improve navigation efficiency and generalization to unheard sounds.