DualGaze-VLM uses text guidance and a new object-level dataset G-W3DA to predict driver attention, beating prior models by up to 17.8% in similarity metrics and passing human visual Turing tests at 88%.
A convnet for the 2020s
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
baseline 2polarities
baseline 2representative citing papers
Open-source neural network iris matchers (TripletIris using batch-hard triplet loss and ArcIris using ArcFace loss) plus compliant C++ implementations of HDBIF and CRYPTS are released, evaluated on IREX X and eight academic datasets, and accompanied by segmentation tools to lower entry barriers for
Light-ResKAN reaches 99.09% accuracy on MSTAR SAR images with 82.9 times fewer FLOPs and 163.78 times fewer parameters than VGG16 by combining KAN convolutions, Gram polynomials, and channel-wise parameter sharing.
The CDMA speech depression model generalizes across languages, favors emotional speech, and aligns with EEG markers of emotional dysregulation.
BerLU constructs a C1-differentiable activation with Lipschitz constant 1 via Bernstein polynomial approximation, showing better performance and efficiency than baselines on image classification with ViTs and CNNs.
UniPASE extends the PASE framework with DeWavLM-Omni to convert degraded speech into high-fidelity, low-hallucination audio across sampling rates via phonetic enhancement, acoustic adaptation, and multi-rate vocoding.
ConvNeXt-Tiny outperforms ViT-Base with higher F1-score and better efficiency for image-based phishing detection from webpage screenshots when decision thresholds are optimized.
citing papers explorer
-
From Scene to Object: Text-Guided Dual-Gaze Prediction
DualGaze-VLM uses text guidance and a new object-level dataset G-W3DA to predict driver attention, beating prior models by up to 17.8% in similarity metrics and passing human visual Turing tests at 88%.
-
Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition
Open-source neural network iris matchers (TripletIris using batch-hard triplet loss and ArcIris using ArcFace loss) plus compliant C++ implementations of HDBIF and CRYPTS are released, evaluated on IREX X and eight academic datasets, and accompanied by segmentation tools to lower entry barriers for
-
Light-ResKAN: A Parameter-Sharing Lightweight KAN with Gram Polynomials for Efficient SAR Image Recognition
Light-ResKAN reaches 99.09% accuracy on MSTAR SAR images with 82.9 times fewer FLOPs and 163.78 times fewer parameters than VGG16 by combining KAN convolutions, Gram polynomials, and channel-wise parameter sharing.
-
Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation
The CDMA speech depression model generalizes across languages, favors emotional speech, and aligns with EEG markers of emotional dysregulation.
-
Universal Smoothness via Bernstein Polynomials: A Constructive Approximation Approach for Activation Functions
BerLU constructs a C1-differentiable activation with Lipschitz constant 1 via Bernstein polynomial approximation, showing better performance and efficiency than baselines on image classification with ViTs and CNNs.
-
UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations
UniPASE extends the PASE framework with DeWavLM-Omni to convert degraded speech into high-fidelity, low-hallucination audio across sampling rates via phonetic enhancement, acoustic adaptation, and multi-rate vocoding.
-
AI Powered Image Analysis for Phishing Detection
ConvNeXt-Tiny outperforms ViT-Base with higher F1-score and better efficiency for image-based phishing detection from webpage screenshots when decision thresholds are optimized.