hub

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Swin transformer: Hierarchical vision transformer using shifted windows , author=

27 Pith papers cite this work. Polarity classification is still indexing.

27 Pith papers citing it

browse 27 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Scaling Limits of Long-Context Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.

A neurosymbolic Approach with Epistemic Deep Learning for Hierarchical Image Classification

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

A neurosymbolic model augments Swin Transformers with focal sets and fuzzy logic to produce calibrated hierarchical image classifications that respect logical constraints.

Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration

cs.SD · 2026-05-11 · unverdicted · novelty 7.0

Polyphonia improves zero-shot stem-specific timbre transfer in polyphonic music by 15.5% target alignment via acoustic-informed attention calibration that uses probabilistic priors to set coarse boundaries.

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.

Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

ALiBi bias is the expectation of positional LSH-induced block masks, yielding spectral and max-norm approximation bounds that reduce long-context biased attention to randomized short-context unbiased attention.

Empirical Evidence for Simply Connected Decision Regions in Image Classifiers

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

Empirical tests with quad-mesh filling indicate that decision regions in modern image classifiers are simply connected.

Benign Overfitting in Adversarial Training for Vision Transformers

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.

Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

NodePFN pre-trains on synthetic graphs with controllable homophily and causal feature-label models to achieve 71.27 average accuracy on 23 node classification benchmarks without graph-specific training.

Discontinuous Galerkin Neural Operator for Pathology Defocus Deblurring

eess.IV · 2026-05-22 · unverdicted · novelty 6.0

DGNO parameterizes integral kernels with discontinuous Galerkin elements for heterogeneous defocus deblurring in pathology images and reports superior performance over prior methods.

ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

ST-TGExplainer disentangles stability and transition patterns in temporal graphs via a self-explainable TGNN guided by a disentangled information bottleneck objective to produce more faithful explanations.

ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.

ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems

cs.LG · 2026-05-12 · conditional · novelty 6.0

ROMER cuts perplexity by up to 59% in noisy analog CIM environments for MoE LLMs via expert replacement and router recalibration calibrated on real-chip measurements.

FEFormer: Frequency-enhanced Vision Transformer for Generic Knowledge Extraction and Adaptive Feature Fusion in Volumetric Medical Image Segmentation

eess.IV · 2026-05-12 · unverdicted · novelty 6.0

A frequency-enhanced Vision Transformer with FDSA, FGMLP, WAFF, and FCSB modules delivers superior volumetric medical image segmentation performance and efficiency over prior state-of-the-art methods.

Post-hoc Selective Classification for Reliable Synthetic Image Detection

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

ReSIDe generalizes logit-based confidence scores to intermediate layers of synthetic image detectors and uses preference optimization to aggregate them, cutting area under the risk-coverage curve by up to 69.55% under covariate shifts.

Large Vision-Language Models Get Lost in Attention

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.

Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Hallucinations in diffusion models are driven by local intrinsic dimension instabilities on the manifold, which Intrinsic Quenching corrects by deflating it.

HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

HEXST applies a hexagonal shifted-window Transformer with rotary positional encodings, contrast-sensitive training objectives, and single-cell priors to predict gene expression from histology slides, outperforming prior models on seven datasets while preserving spatial heterogeneity.

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

cs.LG · 2023-10-19 · conditional · novelty 6.0

SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.

PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

PACD-Net uses pseudo-augmented contrastive distillation with a hybrid Swin Transformer-CNN backbone to estimate TAR, TIR, and TBR from sparse SMBG data and outperforms prior methods in accuracy and stability under sparse conditions.

STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

STAR-IOD applies scale-decoupled topology alignment and K-Means-based pseudo-label refinement to reduce catastrophic forgetting in remote sensing incremental object detection, reporting 1.7% and 2.1% mAP gains on new DIOR-IOD and DOTA-IOD datasets.

Semi-LAR: Semi-supervised Contrastive Learning with Linear Attention for Removal of Nighttime Flares

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

Semi-LAR is a semi-supervised contrastive learning framework with linear attention for nighttime flare removal that refines pseudo-labels via quality assessment and uses flare-aware patch-level contrastive losses.

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 5.0 · 2 refs

TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.

Are Candidate Models Really Needed for Active Learning?

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

Active learning with randomly initialized models achieves comparable results to traditional candidate-model methods, with low-confidence sampling proving most effective.

Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.

citing papers explorer

Showing 27 of 27 citing papers.

Scaling Limits of Long-Context Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 29
For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
A neurosymbolic Approach with Epistemic Deep Learning for Hierarchical Image Classification cs.CV · 2026-05-11 · unverdicted · none · ref 3
A neurosymbolic model augments Swin Transformers with focal sets and fuzzy logic to produce calibrated hierarchical image classifications that respect logical constraints.
Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration cs.SD · 2026-05-11 · unverdicted · none · ref 55
Polyphonia improves zero-shot stem-specific timbre transfer in polyphonic music by 15.5% target alignment via acoustic-informed attention calibration that uses probabilistic priors to set coarse boundaries.
Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization cs.LG · 2026-05-11 · unverdicted · none · ref 53
LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.
Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases cs.LG · 2026-05-10 · unverdicted · none · ref 33
ALiBi bias is the expectation of positional LSH-induced block masks, yielding spectral and max-norm approximation bounds that reduce long-context biased attention to randomized short-context unbiased attention.
Empirical Evidence for Simply Connected Decision Regions in Image Classifiers cs.CV · 2026-05-07 · unverdicted · none · ref 16
Empirical tests with quad-mesh filling indicate that decision regions in modern image classifiers are simply connected.
Benign Overfitting in Adversarial Training for Vision Transformers cs.LG · 2026-04-21 · unverdicted · none · ref 32
Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors cs.LG · 2026-04-21 · unverdicted · none · ref 200
NodePFN pre-trains on synthetic graphs with controllable homophily and causal feature-label models to achieve 71.27 average accuracy on 23 node classification benchmarks without graph-specific training.
Discontinuous Galerkin Neural Operator for Pathology Defocus Deblurring eess.IV · 2026-05-22 · unverdicted · none · ref 1
DGNO parameterizes integral kernels with discontinuous Galerkin elements for heterogeneous defocus deblurring in pathology images and reports superior performance over prior methods.
ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability cs.LG · 2026-05-19 · unverdicted · none · ref 113
ST-TGExplainer disentangles stability and transition patterns in temporal graphs via a self-explainable TGNN guided by a disentangled information bottleneck objective to produce more faithful explanations.
ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices cs.CV · 2026-05-15 · unverdicted · none · ref 72
ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.
ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems cs.LG · 2026-05-12 · conditional · none · ref 4
ROMER cuts perplexity by up to 59% in noisy analog CIM environments for MoE LLMs via expert replacement and router recalibration calibrated on real-chip measurements.
FEFormer: Frequency-enhanced Vision Transformer for Generic Knowledge Extraction and Adaptive Feature Fusion in Volumetric Medical Image Segmentation eess.IV · 2026-05-12 · unverdicted · none · ref 25
A frequency-enhanced Vision Transformer with FDSA, FGMLP, WAFF, and FCSB modules delivers superior volumetric medical image segmentation performance and efficiency over prior state-of-the-art methods.
Post-hoc Selective Classification for Reliable Synthetic Image Detection cs.CV · 2026-05-09 · unverdicted · none · ref 58
ReSIDe generalizes logit-based confidence scores to intermediate layers of synthetic image detectors and uses preference optimization to aggregate them, cutting area under the risk-coverage curve by up to 69.55% under covariate shifts.
Large Vision-Language Models Get Lost in Attention cs.AI · 2026-05-07 · unverdicted · none · ref 71
In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.
Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models cs.CV · 2026-05-06 · unverdicted · none · ref 74
Hallucinations in diffusion models are driven by local intrinsic dimension instabilities on the manifold, which Intrinsic Quenching corrects by deflating it.
HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction cs.LG · 2026-05-06 · unverdicted · none · ref 17
HEXST applies a hexagonal shifted-window Transformer with rotary positional encodings, contrast-sensitive training objectives, and single-cell priors to predict gene expression from histology slides, outperforming prior models on seven datasets while preserving spatial heterogeneity.
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation cs.LG · 2023-10-19 · conditional · none · ref 128
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG cs.LG · 2026-05-20 · unverdicted · none · ref 62
PACD-Net uses pseudo-augmented contrastive distillation with a hybrid Swin Transformer-CNN backbone to estimate TAR, TIR, and TBR from sparse SMBG data and outperforms prior methods in accuracy and stability under sparse conditions.
STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection cs.CV · 2026-05-20 · unverdicted · none · ref 204
STAR-IOD applies scale-decoupled topology alignment and K-Means-based pseudo-label refinement to reduce catastrophic forgetting in remote sensing incremental object detection, reporting 1.7% and 2.1% mAP gains on new DIOR-IOD and DOTA-IOD datasets.
Semi-LAR: Semi-supervised Contrastive Learning with Linear Attention for Removal of Nighttime Flares cs.CV · 2026-05-18 · unverdicted · none · ref 82
Semi-LAR is a semi-supervised contrastive learning framework with linear attention for nighttime flare removal that refines pseudo-labels via quality assessment and uses flare-aware patch-level contrastive losses.
Temporal Aware Pruning for Efficient Diffusion-based Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 110 · 2 links
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
Are Candidate Models Really Needed for Active Learning? cs.CV · 2026-05-14 · unverdicted · none · ref 121
Active learning with randomly initialized models achieves comparable results to traditional candidate-model methods, with low-confidence sampling proving most effective.
Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice cs.CV · 2026-05-11 · unverdicted · none · ref 96
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.
SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion cs.CV · 2026-05-02 · unverdicted · none · ref 24 · 2 links
SplAttN uses Gaussian soft splatting and attention to avoid sparse projection collapse in point cloud completion, achieving SOTA results and demonstrating genuine visual cue reliance on KITTI.
ESsEN: Training Compact Discriminative Vision-Language Transformers in a Low-Resource Setting cs.CV · 2026-04-20 · unverdicted · none · ref 37
ESsEN is a parameter-efficient two-tower vision-language transformer that matches larger models on discriminative tasks after training end-to-end with limited data and resources.
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting cs.AI · 2026-05-09 · unreviewed · ref 183 · 2 links

Proceedings of the IEEE/CVF international conference on computer vision , pages=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer