Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.
hub Mixed citations
Score-cam: Score-weighted visual explanations for convolutional neural net- works
Mixed citation behavior. Most common role is background (60%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion network for LULC mapping under cloud contamination that achieves 86.60% OA, 83.29% F1, and 73.51% mIoU on a new global benchmark of 40,223 samples.
PGU-Net is a deep unfolding network for blind cross-sensor spectral super-resolution that jointly reconstructs the HSI and learns the spectral transformation function via alternating optimization stages.
AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
Non-Euclidean distance variants of harmonic loss improve accuracy, gradient stability, and energy efficiency over cross-entropy and Euclidean harmonic loss in vision backbones and large language models.
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
A GPU-optimized tensor method computes WECT and ECF for arbitrary-dimensional simplicial and cubical complexes with reported speedups over prior approaches and ships as the pyECT Python package.
ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.
RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.
Optimized 3x3 adversarial image filters based on edge detection generate transferable untargeted attacks on neural networks with 30-80% success using only one pass and far fewer parameters than prior methods.
Uncertainty estimation and regularization on weak positive pairs improves mAP by 3.06%, 3.55%, and 6.94% on CUHK-PEDES, RSTPReid, and ICFG-PEDES respectively.
UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.
Make-A-Video achieves state-of-the-art text-to-video generation by decomposing temporal U-Net and attention structures to add space-time modeling to text-to-image models, trained without any paired text-video data.
The paper releases SignNet-1M, a 1M-scale augmented dataset for ASL, CSL and DGS with 3DGS and diffusion-based variations, plus benchmarks showing improved cross-shift generalization.
Prithvi-EO-2.0 shows environment-dependent flood detection limits, with highest accuracy in cropland (IoU 52%) and riverine events (F1 0.69) and near-zero performance in tree cover and built-up areas across 19 global events.
ReforMe is an interactive document digitization system using layout-aware propagation to generalize user corrections from natural language or direct edits, shown to improve efficiency in a 12-user study on real documents.
DisjunctiveNet represents input-dependent mixed-integer rules as disjunctive constraints and applies hierarchical convex relaxations to create tractable linear layers that enforce exact rule satisfaction inside end-to-end neural networks.
Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
StomaD2 integrates diffusion-based image restoration with a specialized rotated detection network to achieve high-accuracy stomatal phenotyping across more than 130 plant species.
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.
Synthetic historical maps are generated from modern vector data via style transfer and uncertainty emulation to train segmentation models for historical map corpora.
A cross-verification strategy using three YOLO models trained on distinct views of a 2134-sample 3D GPR dataset detects road subsurface distress with over 98.6 percent recall on field data.
citing papers explorer
-
Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset
CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion network for LULC mapping under cloud contamination that achieves 86.60% OA, 83.29% F1, and 73.51% mIoU on a new global benchmark of 40,223 samples.
-
Almost for Free: Crafting Adversarial Examples with Convolutional Image Filters
Optimized 3x3 adversarial image filters based on edge detection generate transferable untargeted attacks on neural networks with 30-80% success using only one pass and far fewer parameters than prior methods.
-
Harnessing Weak Pair Uncertainty for Text-based Person Search
Uncertainty estimation and regularization on weak positive pairs improves mAP by 3.06%, 3.55%, and 6.94% on CUHK-PEDES, RSTPReid, and ICFG-PEDES respectively.