Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.
hub Canonical reference
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
Canonical reference. 85% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A taxonomy of SNN training algorithms is presented with the release of NeuroTrain, an open benchmarking framework for reproducible comparisons across datasets and architectures.
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
The paper introduces the VODA setting for domain adaptation from scratch using vision-language models and presents TS-DRD, which achieves competitive performance on standard benchmarks without source models.
Introduces ViTextCaps dataset and PhonoSTFG phonological graph fusion framework for Vietnamese scene-text image captioning, showing cross-modal graph edges harm performance.
DSR is a neuro-symbolic autoformalization framework using operator trees that achieves new state-of-the-art results on the PRIME benchmark of 156 Lean 4 theorems.
Presents the ev-CIVIL dataset and benchmark showing that event-based cameras can support real-time detection of cracks and spalling in civil infrastructure under challenging lighting.
Decomposed Vision-Language Alignment framework factorizes prompts into concept and attribute tokens with Feature-Gated Cross-Attention for better compositional generalization in fine-grained open-vocabulary segmentation.
Clear2Fog generates realistic synthetic fog from clear scenes, enabling mixed-density training that outperforms full fixed-density data and improves real-world performance by 1.67 mAP after learning-rate adjustment.
MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.
FMG-Pan is a model-guided instance-wise adaptation framework for real-world pansharpening that adds physical fidelity constraints to deliver state-of-the-art fusion quality with training and inference completed in seconds on single image pairs.
Binno is a proximal-gradient first-order algorithm for nonconvex nonsmooth bi-level optimization, shown on sparse low-rank matrix factorization and regularized market-clearing problems with reported gains over baselines.
A new offline protocol to profile recommender algorithms by stability in retaining past patterns and plasticity in adapting to changes upon retraining, with preliminary results on the GoodReads dataset.
Synthetic experiments reveal that class-dependent effects appear in both perturbation-based and ground-truth evaluations of time series feature attributions, often producing contradictory rankings of attribution quality due to differences in feature amplitude or temporal extent between classes.
NFR combines neural features with dynamic consistency-filtered geometric registration to achieve robust non-rigid 3D shape matching without annotated correspondences.
ProtoPathway fuses prototype-based histopathology encoding with pathway-aware graph neural networks for multimodal cancer survival prediction and native biological attribution.
A method uses spurious-positive samples to identify and regularize neurons that rely on spurious features, improving model robustness without extra annotations or balanced data.
Embodied LLM agents exhibit emergent collaborative behaviors indicating mental models of partners in a color-matching game, detected via LLM judges and supported by positive user feedback.
Standard unlearning metrics disagree in multimodal settings, but a correlation-weighted Unified Quality Score delivers consistent method rankings across benchmarks.
A visual analytics workbench enables scientists to explore, query, and verify embedding-based similarity searches on weather and climate data by tracing results back to physical evidence.
CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.
SNNs deployed on Loihi 2 achieve real-time object detection with the lowest dynamic energy per inference and recover 87-100% of ANN accuracy via distillation-aware training.
CoRE aligns image tokens to a hierarchical concept library to simulate clinical reasoning for expert routing and demand-based growth in continual brain lesion segmentation, achieving SOTA on 12 tasks.
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
citing papers explorer
-
A document is worth a structured record: Principled inductive bias design for document recognition
Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.
-
NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking Framework
A taxonomy of SNN training algorithms is presented with the release of NeuroTrain, an open benchmarking framework for reproducible comparisons across datasets and architectures.
-
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
-
Rethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Model
The paper introduces the VODA setting for domain adaptation from scratch using vision-language models and presents TS-DRD, which achieves competitive performance on standard benchmarks without source models.
-
Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention
Introduces ViTextCaps dataset and PhonoSTFG phonological graph fusion framework for Vietnamese scene-text image captioning, showing cross-modal graph edges harm performance.
-
Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees
DSR is a neuro-symbolic autoformalization framework using operator trees that achieves new state-of-the-art results on the PRIME benchmark of 156 Lean 4 theorems.
-
Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark
Presents the ev-CIVIL dataset and benchmark showing that event-based cameras can support real-time detection of cracks and spalling in civil infrastructure under challenging lighting.
-
Decomposed Vision-Language Alignment for Fine-Grained Open-Vocabulary Segmentation
Decomposed Vision-Language Alignment framework factorizes prompts into concept and attribute tokens with Feature-Gated Cross-Attention for better compositional generalization in fine-grained open-vocabulary segmentation.
-
A Data Efficiency Study of Synthetic Fog for Object Detection Using the Clear2Fog Pipeline
Clear2Fog generates realistic synthetic fog from clear scenes, enabling mixed-density training that outperforms full fixed-density data and improves real-world performance by 1.67 mAP after learning-rate adjustment.
-
MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution
MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.
-
Fast Model-guided Instance-wise Adaptation Framework for Real-world Pansharpening with Fidelity Constraints
FMG-Pan is a model-guided instance-wise adaptation framework for real-world pansharpening that adds physical fidelity constraints to deliver state-of-the-art fusion quality with training and inference completed in seconds on single image pairs.
-
Binno: A 1st-order method for Bi-level Nonconvex Nonsmooth Optimization for Matrix Factorizations
Binno is a proximal-gradient first-order algorithm for nonconvex nonsmooth bi-level optimization, shown on sparse low-rank matrix factorization and regularized market-clearing problems with reported gains over baselines.
-
Measuring the stability and plasticity of recommender systems
A new offline protocol to profile recommender algorithms by stability in retaining past patterns and plasticity in adapting to changes upon retraining, with preliminary results on the GoodReads dataset.
-
Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation
Synthetic experiments reveal that class-dependent effects appear in both perturbation-based and ground-truth evaluations of time series feature attributions, often producing contradictory rankings of attribution quality due to differences in feature amplitude or temporal extent between classes.
-
NFR: Neural Feature-Guided Non-Rigid Shape Registration
NFR combines neural features with dynamic consistency-filtered geometric registration to achieve robust non-rigid 3D shape matching without annotated correspondences.
-
ProtoPathway: Biologically Structured Prototype-Pathway Fusion for Multimodal Cancer Survival Prediction
ProtoPathway fuses prototype-based histopathology encoding with pathway-aware graph neural networks for multimodal cancer survival prediction and native biological attribution.
-
Shortcut Mitigation via Spurious-Positive Samples
A method uses spurious-positive samples to identify and regularize neurons that rely on spurious features, improving model robustness without extra annotations or balanced data.
-
Evaluating Generative Models as Interactive Emergent Representations of Human-Like Collaborative Behavior
Embodied LLM agents exhibit emergent collaborative behaviors indicating mental models of partners in a color-matching game, detected via LLM judges and supported by positive user feedback.
-
Metric Unreliability in Multimodal Machine Unlearning: A Systematic Analysis and Principled Unified Score
Standard unlearning metrics disagree in multimodal settings, but a correlation-weighted Unified Quality Score delivers consistent method rankings across benchmarks.
-
Toward a Scientific Discovery Engine for Weather and Climate Data: A Visual Analytics Workbench for Embedding-Based Exploration
A visual analytics workbench enables scientists to explore, query, and verify embedding-based similarity searches on weather and climate data by tracing results back to physical evidence.
-
Cross Paradigm Representation and Alignment Transformer for Image Deraining
CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.
-
Real-Time Frame- and Event-based Object Detection with Spiking Neural Networks on Edge Neuromorphic Hardware: Design, Deployment and Benchmark
SNNs deployed on Loihi 2 achieve real-time object detection with the lowest dynamic energy per inference and recover 87-100% of ANN accuracy via distillation-aware training.
-
CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation
CoRE aligns image tokens to a hierarchical concept library to simulate clinical reasoning for expert routing and demand-based growth in continual brain lesion segmentation, achieving SOTA on 12 tasks.
-
Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
-
SCULPT: An Interactive Machine Learning Platform for Analyzing Multi-Particle Coincidence Data from Cold Target Recoil Ion Momentum Spectroscopy
SCULPT is an interactive machine learning platform combining UMAP, clustering, and adaptive confidence scoring for analyzing COLTRIMS multi-particle coincidence data.
-
Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions
A survey of trajectory prediction techniques for autonomous vehicles that proposes a taxonomy, overviews the prediction pipeline, and highlights remaining research gaps.
-
TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving
TwinLiteNet+ is a hybrid-encoder multi-task segmentation model with new UCB, USB, and PCAA modules that reports 92.9% mIoU on drivable area and 34.2% IoU on lane segmentation on BDD100K while using 11x fewer FLOPs than prior models.
-
Aesthetic Attributes Assessment of Images
The paper proposes the Aesthetic Multi-Attribute Network (AMAN) that jointly predicts captions and scores for five aesthetic attributes using a new weakly-labeled dataset created via knowledge transfer.
-
Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges
A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.
-
RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques
A survey of RGB-D object detection from traditional hand-crafted features with machine learning to deep learning techniques.
- Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models