Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.
hub Mixed citations
Trevor Hastie, Andrea Montanari, Saharon Rosset, and Ryan J
Mixed citation behavior. Most common role is background (60%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
Non-Euclidean distance variants of harmonic loss improve accuracy, gradient stability, and energy efficiency over cross-entropy and Euclidean harmonic loss in vision backbones and large language models.
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
A GPU-optimized tensor method computes WECT and ECF for arbitrary-dimensional simplicial and cubical complexes with reported speedups over prior approaches and ships as the pyECT Python package.
Optimized 3x3 adversarial image filters based on edge detection generate transferable untargeted attacks on neural networks with 30-80% success using only one pass and far fewer parameters than prior methods.
Uncertainty estimation and regularization on weak positive pairs improves mAP by 3.06%, 3.55%, and 6.94% on CUHK-PEDES, RSTPReid, and ICFG-PEDES respectively.
UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.
Make-A-Video achieves state-of-the-art text-to-video generation by decomposing temporal U-Net and attention structures to add space-time modeling to text-to-image models, trained without any paired text-video data.
Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
StomaD2 integrates diffusion-based image restoration with a specialized rotated detection network to achieve high-accuracy stomatal phenotyping across more than 130 plant species.
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.
Synthetic historical maps are generated from modern vector data via style transfer and uncertainty emulation to train segmentation models for historical map corpora.
A cross-verification strategy using three YOLO models trained on distinct views of a 2134-sample 3D GPR dataset detects road subsurface distress with over 98.6 percent recall on field data.
Secondary bounded rationality describes how AI recruitment algorithms reproduce structural inequality by optimizing for biased proxies of competence drawn from cultural and social capital disparities.
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
Bayesian neural networks match or exceed frequentist performance on SHD classification from the EchoNext dataset while providing more robust uncertainty estimates for clinical triage.
citing papers explorer
-
A document is worth a structured record: Principled inductive bias design for document recognition
Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.
-
AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps
AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.
-
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
-
Rethinking the Harmonic Loss via Non-Euclidean Distance Layers
Non-Euclidean distance variants of harmonic loss improve accuracy, gradient stability, and energy efficiency over cross-entropy and Euclidean harmonic loss in vision backbones and large language models.
-
MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
-
Tensor Computation of Euler Characteristic Functions and Transforms
A GPU-optimized tensor method computes WECT and ECF for arbitrary-dimensional simplicial and cubical complexes with reported speedups over prior approaches and ships as the pyECT Python package.
-
Almost for Free: Crafting Adversarial Examples with Convolutional Image Filters
Optimized 3x3 adversarial image filters based on edge detection generate transferable untargeted attacks on neural networks with 30-80% success using only one pass and far fewer parameters than prior methods.
-
Harnessing Weak Pair Uncertainty for Text-based Person Search
Uncertainty estimation and regularization on weak positive pairs improves mAP by 3.06%, 3.55%, and 6.94% on CUHK-PEDES, RSTPReid, and ICFG-PEDES respectively.
-
Toward Unified Fine-Grained Vehicle Classification and Automatic License Plate Recognition
UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.
-
Make-A-Video: Text-to-Video Generation without Text-Video Data
Make-A-Video achieves state-of-the-art text-to-video generation by decomposing temporal U-Net and attention structures to add space-time modeling to text-to-image models, trained without any paired text-video data.
-
From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments
Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.
-
A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
-
StomaD2: An All-in-One System for Intelligent Stomatal Phenotype Analysis via Diffusion-Based Restoration Detection Network
StomaD2 integrates diffusion-based image restoration with a specialized rotated detection network to achieve high-accuracy stomatal phenotyping across more than 130 plant species.
-
Weak-to-Strong Knowledge Distillation Accelerates Visual Learning
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.
-
Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation
Synthetic historical maps are generated from modern vector data via style transfer and uncertainty emulation to train segmentation models for historical map corpora.
-
Automatic Road Subsurface Distress Recognition from Ground Penetrating Radar Images using Deep Learning-based Cross-verification
A cross-verification strategy using three YOLO models trained on distinct views of a 2134-sample 3D GPR dataset detects road subsurface distress with over 98.6 percent recall on field data.
-
Secondary Bounded Rationality: A Theory of How Algorithms Reproduce Structural Inequality in AI Hiring
Secondary bounded rationality describes how AI recruitment algorithms reproduce structural inequality by optimizing for biased proxies of competence drawn from cultural and social capital disparities.
-
Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
-
Uncertainty-aware classification and triage of structural heart disease using electrocardiography and echocardiography metrics
Bayesian neural networks match or exceed frequentist performance on SHD classification from the EchoNext dataset while providing more robust uncertainty estimates for clinical triage.