SHED improves domain generalization in CLIP by aligning style-homogenized embeddings instead of raw ones, achieving state-of-the-art results on five benchmarks including a 4% gain on DomainNet.
Mixed citations
Title resolution pending
Mixed citation behavior. Most common role is background (60%).
citation-role summary
citation-polarity summary
years
2026 15representative citing papers
A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.
TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
RevealLayer decomposes natural images into multiple RGBA layers using diffusion models with region-aware attention, occlusion-guided adaptation, and a composite loss, outperforming prior methods on a new benchmark dataset.
FO2 groundings can require 2^Ω(n) DNNF size, but a type-based compiler with residual caching often yields smaller circuits and faster runtimes than naive grounding.
EyeCue detects driver cognitive distraction by modeling gaze-visual context interactions in egocentric videos and achieves 74.38% accuracy on the new CogDrive dataset, outperforming 11 baselines.
Tenability defines when an argument can be maintained in debate against any conflict-free opponent attack using monotone commitment games, with three variants that differ from prior weak semantics on benchmarks like self-defeating attacks and floating assignments.
HDFM adds a continuous heat-dissipation (blur) process to flow matching, aligns an interpolated path to fix ill-posed inverse heat dissipation, and uses x-prediction to ease high-dimensional regression, yielding better performance than most baselines on image datasets.
DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.
An automated sub-exponential algorithm computes winning strategies and ranking certificate witnesses for polynomial reachability games on infinite-state real-variable graphs.
DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.
Neurosymbolic framework grounds skeleton motion in learnable pose and dynamics concepts then reasons over them with differentiable logic to recognize actions interpretably on NTU and NW-UCLA benchmarks.
ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods on three benchmarks.
A deep learning model generates image-aware poster layouts that satisfy user-specified attribute constraints via Gaussian noise sampling and partial layout constraints via a dedicated loss and random mask, reaching state-of-the-art performance.
Derives novel generalization error bounds for multimodal pairwise metric learning showing that fine-grained modality features reduce hypothesis space complexity via enhanced complementarity.
citing papers explorer
-
SHED: Style-Homogenized Embedding Alignment for Domain Generalization
SHED improves domain generalization in CLIP by aligning style-homogenized embeddings instead of raw ones, achieving state-of-the-art results on five benchmarks including a 4% gain on DomainNet.
-
Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning
A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.
-
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
-
RevealLayer: Disentangling Hidden and Visible Layers via Occlusion-Aware Image Decomposition
RevealLayer decomposes natural images into multiple RGBA layers using diffusion models with region-aware attention, occlusion-guided adaptation, and a composite loss, outperforming prior methods on a new benchmark dataset.
-
On Knowledge Compilation For Two-Variable First-Order Logic
FO2 groundings can require 2^Ω(n) DNNF size, but a type-based compiler with residual caching often yields smaller circuits and faster runtimes than naive grounding.
-
EyeCue: Driver Cognitive Distraction Detection via Gaze-Empowered Egocentric Video Understanding
EyeCue detects driver cognitive distraction by modeling gaze-visual context interactions in egocentric videos and achieves 74.38% accuracy on the new CogDrive dataset, outperforming 11 baselines.
-
Tenability and Weak Semantics: Modeling Non-uniform Defense -- Extended Version
Tenability defines when an argument can be maintained in debate against any conflict-free opponent attack using monotone commitment games, with three variants that differ from prior weak semantics on benchmarks like self-defeating attacks and floating assignments.
-
Multi-Scale Generative Modeling with Heat Dissipation Flow Matching
HDFM adds a continuous heat-dissipation (blur) process to flow matching, aligns an interpolated path to fix ill-posed inverse heat dissipation, and uses x-prediction to ease high-dimensional regression, yielding better performance than most baselines on image datasets.
-
Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.
-
Automated Approach for Solving Infinite-state Polynomial Reachability Games
An automated sub-exponential algorithm computes winning strategies and ranking certificate witnesses for polynomial reachability games on infinite-state real-variable graphs.
-
DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents
DR-MMSearchAgent derives batch-wide trajectory advantages and uses differentiated Gaussian rewards to prevent premature collapse in multimodal agents, outperforming MMSearch-R1 by 8.4% on FVQA-test.
-
Neurosymbolic Framework for Concept-Driven Logical Reasoning in Skeleton-Based Human Action Recognition
Neurosymbolic framework grounds skeleton motion in learnable pose and dynamics concepts then reasons over them with differentiable logic to recognize actions interpretably on NTU and NW-UCLA benchmarks.
-
Lightning Unified Video Editing via In-Context Sparse Attention
ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods on three benchmarks.
-
Image-aware Layout Generation with User Constraints for Poster Design
A deep learning model generates image-aware poster layouts that satisfy user-specified attribute constraints via Gaussian noise sampling and partial layout constraints via a dedicated loss and random mask, reaching state-of-the-art performance.
-
Quantifying Multimodal Capabilities: Formal Generalization Guarantees in Pairwise Metric Learning
Derives novel generalization error bounds for multimodal pairwise metric learning showing that fine-grained modality features reduce hypothesis space complexity via enhanced complementarity.