SEST is the first deep learning model for event-based saliency prediction, using a pretrained Swin Transformer backbone and synthetic benchmarks to outperform prior event methods while transferring to real event streams.
In: ICLR (2019)
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9verdicts
UNVERDICTED 9representative citing papers
HAC provides a parameter-efficient way to move CLIP into hyperbolic geometry, yielding consistent gains on zero-shot VQA benchmarks without any VQA training data overlap.
Pretrained vision transformers exhibit strong intra-object leakage where each part representation encodes information from the entire object, undermining the faithfulness of attention-based part-centric interpretability methods.
Refinement via Regeneration (RvR) reformulates image refinement in unified multimodal models as conditional regeneration using prompt and semantic tokens from the initial image, yielding higher alignment scores than editing-based methods.
Free Geometry enables test-time self-improvement of 3D reconstruction models via cross-view consistency between full and masked observations, yielding average gains of 3.73% in pose accuracy and 2.88% in point maps.
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.
SFG-SwinSR improves PSNR to 45.19 dB and SSIM to 0.9852 on SpaceNet by adding a depthwise-blur plus gated spatial branch inside each Swin2SR feed-forward network.
StratFormer uses a two-phase curriculum with dual-turn tokens and bucket-rate features to model and exploit opponents in Leduc Hold'em, gaining +0.106 BB/hand on average over GTO while keeping near-equilibrium safety.
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.
citing papers explorer
-
Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model
SEST is the first deep learning model for event-based saliency prediction, using a pretrained Swin Transformer backbone and synthetic benchmarks to outperform prior event methods while transferring to real event streams.
-
HAC: Parameter-Efficient Hyperbolic Adaptation of CLIP for Zero-Shot VQA
HAC provides a parameter-efficient way to move CLIP into hyperbolic geometry, yielding consistent gains on zero-shot VQA benchmarks without any VQA training data overlap.
-
Metonymy in vision models undermines attention-based interpretability
Pretrained vision transformers exhibit strong intra-object leakage where each part representation encodes information from the entire object, undermining the faithfulness of attention-based part-centric interpretability methods.
-
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
Refinement via Regeneration (RvR) reformulates image refinement in unified multimodal models as conditional regeneration using prompt and semantic tokens from the initial image, yielding higher alignment scores than editing-based methods.
-
Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself
Free Geometry enables test-time self-improvement of 3D reconstruction models via cross-view consistency between full and masked observations, yielding average gains of 3.73% in pose accuracy and 2.88% in point maps.
-
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.
-
Spatial-Frequency Gated Swin Transformer for Remote Sensing Single-Image Super-Resolution
SFG-SwinSR improves PSNR to 45.19 dB and SSIM to 0.9852 on SpaceNet by adding a depthwise-blur plus gated spatial branch inside each Swin2SR feed-forward network.
-
StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games
StratFormer uses a two-phase curriculum with dual-turn tokens and bucket-rate features to model and exploit opponents in Leduc Hold'em, gaining +0.106 BB/hand on average over GTO while keeping near-equilibrium safety.
-
Weak-to-Strong Knowledge Distillation Accelerates Visual Learning
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.