SetFlow is a flow-matching generative model for permutation-invariant MIL bags in representation space that produces synthetic data improving classification performance and enabling training on synthetic data alone.
Film: Visual reasoning with a general conditioning layer
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
method 3polarities
use method 3representative citing papers
PSG-UIENet fuses Retinex physics with CLIP-derived text semantics and a new multimodal dataset to enhance underwater images, claiming better results than fifteen prior methods.
Diffusion models for in-context meta-learning of robot dynamics outperform deterministic Transformers in robustness to distribution shifts while enabling real-time operation via warm-started sampling.
TGPNet unifies denoising, cloud removal, shadow removal, deblurring, and SAR despeckling into one model via task-guided prompting and reports state-of-the-art results on a new multi-modal benchmark.
A hierarchical tactile-aware policy combines human-demonstration training for contact cue prediction with sim-to-real reinforcement learning to improve quadrupedal loco-manipulation performance by 28.54% over vision baselines on contact-rich tasks.
SACF discretizes target direction and distance from audio-visual cues then applies conditioned fusion to improve navigation efficiency and generalization to unheard sounds.
An instance-centric representation with local frames, relative positional encodings, and adaptive reward transformation in adversarial IRL yields scalable, accurate, and robust behavior models for multi-agent driving simulation.
citing papers explorer
-
SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning
SetFlow is a flow-matching generative model for permutation-invariant MIL bags in representation space that produces synthetic data improving classification performance and enabling training on synthetic data alone.
-
Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network
PSG-UIENet fuses Retinex physics with CLIP-derived text semantics and a new multimodal dataset to enhance underwater images, claiming better results than fifteen prior methods.
-
Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics
Diffusion models for in-context meta-learning of robot dynamics outperform deterministic Transformers in robustness to distribution shifts while enabling real-time operation via warm-started sampling.
-
Task-Guided Prompting for Unified Remote Sensing Image Restoration
TGPNet unifies denoising, cloud removal, shadow removal, deblurring, and SAR despeckling into one model via task-guided prompting and reports state-of-the-art results on a new multi-modal benchmark.
-
Learning Tactile-Aware Quadrupedal Loco-Manipulation Policies
A hierarchical tactile-aware policy combines human-demonstration training for contact cue prediction with sim-to-real reinforcement learning to improve quadrupedal loco-manipulation performance by 28.54% over vision baselines on contact-rich tasks.
-
Spatial-Aware Conditioned Fusion for Audio-Visual Navigation
SACF discretizes target direction and distance from audio-visual cues then applies conditioned fusion to improve navigation efficiency and generalization to unheard sounds.
-
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
An instance-centric representation with local frames, relative positional encodings, and adaptive reward transformation in adversarial IRL yields scalable, accurate, and robust behavior models for multi-agent driving simulation.