hub Canonical reference

An Introduction to Convolutional Neural Networks

O’shea, K · 2015 · cs.NE · arXiv 1511.08458

Canonical reference. 83% of citing Pith papers cite this work as background.

28 Pith papers citing it

Background 83% of classified citations

open full Pith review browse 28 citing papers arXiv PDF

abstract

The field of machine learning has taken a dramatic twist in recent times, with the rise of the Artificial Neural Network (ANN). These biologically inspired computational models are able to far exceed the performance of previous forms of artificial intelligence in common machine learning tasks. One of the most impressive forms of ANN architecture is that of the Convolutional Neural Network (CNN). CNNs are primarily used to solve difficult image-driven pattern recognition tasks and with their precise yet simple architecture, offers a simplified method of getting started with ANNs. This document provides a brief introduction to CNNs, discussing recently published papers and newly formed techniques in developing these brilliantly fantastic image recognition models. This introduction assumes you are familiar with the fundamentals of ANNs and machine learning.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 5 unclear 1

representative citing papers

Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

Thermo-VL augments a frozen Molmo-7B VLM with a trainable thermal encoder and prompt-conditioned dual-attention fusion to improve cross-spectrum visual reasoning.

OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.

Bridging Phase-Field Model and Deep Learning for Predicting 2D and 3D Microstructure Evolution in Ternary Alloys

cond-mat.mtrl-sci · 2026-06-20 · unverdicted · novelty 6.0

Hybrid phase-field and attention-based deep learning model predicts microstructure evolution in ternary alloys up to 400 timesteps with generalization to new compositions.

DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

DECKER is a domain-invariant four-stage framework (keyboard normalization, adversarial disentanglement, cross-keyboard contrastive alignment, acoustic style randomization) plus LLM post-processing that improves keystroke inference over baselines on the new HEAR dataset, especially in cross-keyboard

When Do Diffusion Models learn to Generate Multiple Objects?

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

Using the mosaic controlled dataset framework, experiments show scene complexity dominates over concept imbalance in diffusion model failures for multi-object generation, with counting especially hard in low-data regimes and compositional generalization collapsing under held-out combinations.

SpectraLLM: Uncovering the Ability of LLMs for Molecular Structure Elucidation from Multi-Spectral Data

q-bio.QM · 2025-08-04 · unverdicted · novelty 6.0

SpectraLLM is an LLM fine-tuned to predict small-molecule structures from single or multiple spectra, reporting state-of-the-art results on four public benchmarks with gains from multi-modal input.

QMC-Net: Data-Aware Quantum Representations for Remote Sensing Image Classification

quant-ph · 2026-04-10 · unverdicted · novelty 6.0

QMC-Net maps per-band statistics to customized quantum circuit hyperparameters and achieves 93.80% and 99.34% accuracy on EuroSAT and SAT-6, outperforming classical and monolithic quantum baselines.

Automated Attention Pattern Discovery at Scale in Large Language Models

cs.LG · 2026-04-04 · unverdicted · novelty 6.0

AP-MAE reconstructs masked attention patterns in LLMs with high accuracy, generalizes across models, predicts generation correctness at 55-70%, and enables 13.6% accuracy gains via targeted interventions.

DiffUNet^2: Bidirectional Prediction, Probabilistic Generation and Collaborative Visual Discovery for Scientific Data

cs.HC · 2026-06-02 · unverdicted · novelty 5.0

DiffUNet^2 is a bidirectional conditional diffusion model integrated with visual tools for probabilistic exploration of scientific time series across five evaluated datasets.

STARFISH: faST Accuracy Recovery in pruned networks From Internal State Healing

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

STARFISH recovers accuracy in pruned neural networks by optimizing internal state alignment to the original model with a minimal unlabeled calibration set, outperforming prior recovery methods especially at high pruning ratios.

A Cubing Strategy for Identifying Stable Hyperparameter Regions for Uncertainty Quantification in Spatial Deep Learning

stat.CO · 2026-05-15 · unverdicted · novelty 5.0

A recursive cubing framework identifies stable hyperparameter regions for MC dropout uncertainty quantification in spatial deep learning and produces competitive or superior predictive intervals versus a statistical baseline on simulations and land-surface temperature data.

AlayaLaser: Efficient Index Layout and Search Strategy for Large-scale High-dimensional Vector Similarity Search

cs.DB · 2026-02-26 · unverdicted · novelty 5.0

AlayaLaser uses a SIMD-optimized on-disk graph layout plus caching and search strategies to outperform prior on-disk ANNS systems and match or exceed in-memory performance on large high-dimensional datasets.

Temporal Data and Short-Time Averages Improve Multiphase Mass Flow Metering

eess.SP · 2026-01-18 · unverdicted · novelty 5.0

Short-time averages within experiments plus temporal-preserving models like CNNs cut multiphase mass flow metering errors to 4.3% MAPE on air-water-oil data, outperforming single-averaged baselines.

Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models

q-bio.GN · 2025-09-13 · conditional · novelty 5.0

Genome-Factory is an open-source Python library that integrates data pipelines, model tuning, inference, benchmarks, and biological interpretation for genomic foundation models.

AdaProb: Efficient Machine Unlearning via Adaptive Probability

cs.LG · 2024-11-04 · unverdicted · novelty 5.0

AdaProb performs machine unlearning by substituting final-layer output probabilities with optimized uniform pseudo-probabilities and updating model weights.

Suppression of Neutron Background using Deep Neural Network and Fourier Frequency Analysis at the KOTO Experiment

hep-ex · 2023-09-21 · unverdicted · novelty 5.0

Using a deep CNN and Fourier frequency analysis on calorimeter data, the KOTO experiment suppressed neutron background by a factor of 5.6×10^5 while maintaining 70% efficiency for the signal decay.

Accurate, Efficient, and Explainable Deep Learning Approaches for Environmental Science Problems

cs.LG · 2026-05-19 · unverdicted · novelty 4.0

The work introduces WaLeF/FIDLAr for flood forecasting, CoDiCast for probabilistic weather, and Hypercube-RAG for explainable environmental QA, claiming superior accuracy, efficiency, and interpretability over baselines.

TemPose-TF-ASF: Two-Stage Bidirectional Stroke Context Fusion for Badminton Stroke Classification

cs.CV · 2026-05-04 · unverdicted · novelty 4.0 · 2 refs

TemPose-TF-ASF adds adjacent-stroke fusion with two-stage bidirectional context to boost Accuracy and Macro-F1 in badminton stroke classification over baselines.

Advances in Compliance Detection: Novel Models Using Vision-Based Tactile Sensors

cs.CV · 2025-06-17 · unverdicted · novelty 4.0

LRCN and Transformer models using GelSight tactile images improve compliance prediction accuracy over baselines and show that objects harder than the sensor are harder to estimate.

HeartBERT: A Self-Supervised ECG Embedding Model for Efficient and Effective Medical Signal Analysis

eess.SP · 2024-11-08 · unverdicted · novelty 4.0

HeartBERT applies self-supervised pretraining on a RoBERTa architecture to ECG signals, producing embeddings that enable strong performance on sleep staging and heartbeat classification with smaller labeled datasets and fewer parameters than baselines.

SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs

cs.CV · 2026-04-04 · unverdicted · novelty 4.0

SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.

Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers

cs.CV · 2026-04-01 · unverdicted · novelty 4.0

Pre-trained ViT representations combined with active learning and targeted design choices for annotations and selection improve object class retrieval in multi-object scenes.

EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

cs.AI · 2026-05-17 · unverdicted · novelty 3.0

EGI integrates four existing AI components for real-time multimodal emotion monitoring and feedback in simulated agile meetings, reporting 10% WER and improved self-awareness for Scrum Masters.

Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data

cs.SD · 2024-02-12 · unverdicted · novelty 3.0

HuBERT reaches 86% accuracy and 0.93 AUC detecting COVID-19 from 893 voice samples in the Cambridge COVID-19 Sound database.

citing papers explorer

Showing 28 of 28 citing papers.

Thermo-VL: Extending Vision-Language Models to Thermal Infrared Perception cs.CV · 2026-05-21 · unverdicted · none · ref 50 · internal anchor
Thermo-VL augments a frozen Molmo-7B VLM with a trainable thermal encoder and prompt-conditioned dual-attention fusion to improve cross-spectrum visual reasoning.
OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance cs.CV · 2026-04-09 · unverdicted · none · ref 36
OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.
Bridging Phase-Field Model and Deep Learning for Predicting 2D and 3D Microstructure Evolution in Ternary Alloys cond-mat.mtrl-sci · 2026-06-20 · unverdicted · none · ref 112 · internal anchor
Hybrid phase-field and attention-based deep learning model predicts microstructure evolution in ternary alloys up to 400 timesteps with generalization to new compositions.
DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition cs.CR · 2026-05-05 · unverdicted · none · ref 22 · internal anchor
DECKER is a domain-invariant four-stage framework (keyboard normalization, adversarial disentanglement, cross-keyboard contrastive alignment, acoustic style randomization) plus LLM post-processing that improves keystroke inference over baselines on the new HEAR dataset, especially in cross-keyboard
When Do Diffusion Models learn to Generate Multiple Objects? cs.CV · 2026-04-30 · unverdicted · none · ref 17 · internal anchor
Using the mosaic controlled dataset framework, experiments show scene complexity dominates over concept imbalance in diffusion model failures for multi-object generation, with counting especially hard in low-data regimes and compositional generalization collapsing under held-out combinations.
SpectraLLM: Uncovering the Ability of LLMs for Molecular Structure Elucidation from Multi-Spectral Data q-bio.QM · 2025-08-04 · unverdicted · none · ref 64 · internal anchor
SpectraLLM is an LLM fine-tuned to predict small-molecule structures from single or multiple spectra, reporting state-of-the-art results on four public benchmarks with gains from multi-modal input.
QMC-Net: Data-Aware Quantum Representations for Remote Sensing Image Classification quant-ph · 2026-04-10 · unverdicted · none · ref 16
QMC-Net maps per-band statistics to customized quantum circuit hyperparameters and achieves 93.80% and 99.34% accuracy on EuroSAT and SAT-6, outperforming classical and monolithic quantum baselines.
Automated Attention Pattern Discovery at Scale in Large Language Models cs.LG · 2026-04-04 · unverdicted · none · ref 22
AP-MAE reconstructs masked attention patterns in LLMs with high accuracy, generalizes across models, predicts generation correctness at 55-70%, and enables 13.6% accuracy gains via targeted interventions.
DiffUNet^2: Bidirectional Prediction, Probabilistic Generation and Collaborative Visual Discovery for Scientific Data cs.HC · 2026-06-02 · unverdicted · none · ref 25 · internal anchor
DiffUNet^2 is a bidirectional conditional diffusion model integrated with visual tools for probabilistic exploration of scientific time series across five evaluated datasets.
STARFISH: faST Accuracy Recovery in pruned networks From Internal State Healing cs.LG · 2026-05-31 · unverdicted · none · ref 24 · internal anchor
STARFISH recovers accuracy in pruned neural networks by optimizing internal state alignment to the original model with a minimal unlabeled calibration set, outperforming prior recovery methods especially at high pruning ratios.
A Cubing Strategy for Identifying Stable Hyperparameter Regions for Uncertainty Quantification in Spatial Deep Learning stat.CO · 2026-05-15 · unverdicted · none · ref 242 · internal anchor
A recursive cubing framework identifies stable hyperparameter regions for MC dropout uncertainty quantification in spatial deep learning and produces competitive or superior predictive intervals versus a statistical baseline on simulations and land-surface temperature data.
AlayaLaser: Efficient Index Layout and Search Strategy for Large-scale High-dimensional Vector Similarity Search cs.DB · 2026-02-26 · unverdicted · none · ref 43 · internal anchor
AlayaLaser uses a SIMD-optimized on-disk graph layout plus caching and search strategies to outperform prior on-disk ANNS systems and match or exceed in-memory performance on large high-dimensional datasets.
Temporal Data and Short-Time Averages Improve Multiphase Mass Flow Metering eess.SP · 2026-01-18 · unverdicted · none · ref 34 · internal anchor
Short-time averages within experiments plus temporal-preserving models like CNNs cut multiphase mass flow metering errors to 4.3% MAPE on air-water-oil data, outperforming single-averaged baselines.
Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models q-bio.GN · 2025-09-13 · conditional · none · ref 22 · internal anchor
Genome-Factory is an open-source Python library that integrates data pipelines, model tuning, inference, benchmarks, and biological interpretation for genomic foundation models.
AdaProb: Efficient Machine Unlearning via Adaptive Probability cs.LG · 2024-11-04 · unverdicted · none · ref 11 · internal anchor
AdaProb performs machine unlearning by substituting final-layer output probabilities with optimized uniform pseudo-probabilities and updating model weights.
Suppression of Neutron Background using Deep Neural Network and Fourier Frequency Analysis at the KOTO Experiment hep-ex · 2023-09-21 · unverdicted · none · ref 10 · internal anchor
Using a deep CNN and Fourier frequency analysis on calorimeter data, the KOTO experiment suppressed neutron background by a factor of 5.6×10^5 while maintaining 70% efficiency for the signal decay.
Accurate, Efficient, and Explainable Deep Learning Approaches for Environmental Science Problems cs.LG · 2026-05-19 · unverdicted · none · ref 85 · internal anchor
The work introduces WaLeF/FIDLAr for flood forecasting, CoDiCast for probabilistic weather, and Hypercube-RAG for explainable environmental QA, claiming superior accuracy, efficiency, and interpretability over baselines.
TemPose-TF-ASF: Two-Stage Bidirectional Stroke Context Fusion for Badminton Stroke Classification cs.CV · 2026-05-04 · unverdicted · none · ref 20 · 2 links · internal anchor
TemPose-TF-ASF adds adjacent-stroke fusion with two-stage bidirectional context to boost Accuracy and Macro-F1 in badminton stroke classification over baselines.
Advances in Compliance Detection: Novel Models Using Vision-Based Tactile Sensors cs.CV · 2025-06-17 · unverdicted · none · ref 34 · internal anchor
LRCN and Transformer models using GelSight tactile images improve compliance prediction accuracy over baselines and show that objects harder than the sensor are harder to estimate.
HeartBERT: A Self-Supervised ECG Embedding Model for Efficient and Effective Medical Signal Analysis eess.SP · 2024-11-08 · unverdicted · none · ref 28 · internal anchor
HeartBERT applies self-supervised pretraining on a RoBERTa architecture to ECG signals, producing embeddings that enable strong performance on sleep staging and heartbeat classification with smaller labeled datasets and fewer parameters than baselines.
SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs cs.CV · 2026-04-04 · unverdicted · none · ref 17
SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.
Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers cs.CV · 2026-04-01 · unverdicted · none · ref 25
Pre-trained ViT representations combined with active learning and targeted design choices for annotations and selection improve object class retrieval in multi-object scenes.
EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness cs.AI · 2026-05-17 · unverdicted · none · ref 25 · internal anchor
EGI integrates four existing AI components for real-time multimodal emotion monitoring and feedback in simulated agile meetings, reporting 10% WER and improved self-awareness for Scrum Masters.
Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data cs.SD · 2024-02-12 · unverdicted · none · ref 26 · internal anchor
HuBERT reaches 86% accuracy and 0.93 AUC detecting COVID-19 from 893 voice samples in the Cambridge COVID-19 Sound database.
Learning-Based Spectrum Cartography in Low Earth Orbit Satellite Networks: An Overview cs.NI · 2026-05-11 · unverdicted · none · ref 27
The paper overviews attention-based learning methods for spectrum cartography in LEO satellite networks to enable adaptive fusion of heterogeneous measurements for inference and resource allocation.
Survey on Disaster Management Datasets for Remote Sensing Based Emergency Applications cs.CV · 2026-05-05 · unverdicted · none · ref 134
A survey providing an overview of publicly available image-based datasets for ML/DL-based disaster management pipelines covering pre-disaster, during, and post-disaster phases.
Intelligent Agents with Emotional Intelligence: Current Trends, Challenges, and Future Prospects cs.HC · 2025-10-11 · unverdicted · none · ref 277 · internal anchor
A holistic survey of affective computing for intelligent agents covering emotion understanding via multimodal data, affective cognition, emotional expression synthesis, key challenges, and future directions emphasizing generative technologies.
Comparison of window shapes and lengths in short-time feature extraction for classification of heart sound signals cs.SD · 2026-04-15 · unverdicted · none · ref 27
A 75 ms Gaussian window for segmenting phonocardiography signals yields the highest biLSTM classification accuracy among tested shapes and lengths, outperforming rectangular windows and a baseline method.

An Introduction to Convolutional Neural Networks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer