Thermo-VL augments a frozen Molmo-7B VLM with a trainable thermal encoder and prompt-conditioned dual-attention fusion to improve cross-spectrum visual reasoning.
hub Canonical reference
An Introduction to Convolutional Neural Networks
Canonical reference. 83% of citing Pith papers cite this work as background.
abstract
The field of machine learning has taken a dramatic twist in recent times, with the rise of the Artificial Neural Network (ANN). These biologically inspired computational models are able to far exceed the performance of previous forms of artificial intelligence in common machine learning tasks. One of the most impressive forms of ANN architecture is that of the Convolutional Neural Network (CNN). CNNs are primarily used to solve difficult image-driven pattern recognition tasks and with their precise yet simple architecture, offers a simplified method of getting started with ANNs. This document provides a brief introduction to CNNs, discussing recently published papers and newly formed techniques in developing these brilliantly fantastic image recognition models. This introduction assumes you are familiar with the fundamentals of ANNs and machine learning.
hub tools
citation-role summary
citation-polarity summary
fields
cs.CV 8 cs.LG 4 cs.HC 2 cs.SD 2 eess.SP 2 cond-mat.mtrl-sci 1 cs.AI 1 cs.CR 1 cs.DB 1 cs.NI 1roles
background 6representative citing papers
OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.
Hybrid phase-field and attention-based deep learning model predicts microstructure evolution in ternary alloys up to 400 timesteps with generalization to new compositions.
DECKER is a domain-invariant four-stage framework (keyboard normalization, adversarial disentanglement, cross-keyboard contrastive alignment, acoustic style randomization) plus LLM post-processing that improves keystroke inference over baselines on the new HEAR dataset, especially in cross-keyboard
Using the mosaic controlled dataset framework, experiments show scene complexity dominates over concept imbalance in diffusion model failures for multi-object generation, with counting especially hard in low-data regimes and compositional generalization collapsing under held-out combinations.
SpectraLLM is an LLM fine-tuned to predict small-molecule structures from single or multiple spectra, reporting state-of-the-art results on four public benchmarks with gains from multi-modal input.
QMC-Net maps per-band statistics to customized quantum circuit hyperparameters and achieves 93.80% and 99.34% accuracy on EuroSAT and SAT-6, outperforming classical and monolithic quantum baselines.
AP-MAE reconstructs masked attention patterns in LLMs with high accuracy, generalizes across models, predicts generation correctness at 55-70%, and enables 13.6% accuracy gains via targeted interventions.
DiffUNet^2 is a bidirectional conditional diffusion model integrated with visual tools for probabilistic exploration of scientific time series across five evaluated datasets.
STARFISH recovers accuracy in pruned neural networks by optimizing internal state alignment to the original model with a minimal unlabeled calibration set, outperforming prior recovery methods especially at high pruning ratios.
A recursive cubing framework identifies stable hyperparameter regions for MC dropout uncertainty quantification in spatial deep learning and produces competitive or superior predictive intervals versus a statistical baseline on simulations and land-surface temperature data.
AlayaLaser uses a SIMD-optimized on-disk graph layout plus caching and search strategies to outperform prior on-disk ANNS systems and match or exceed in-memory performance on large high-dimensional datasets.
Short-time averages within experiments plus temporal-preserving models like CNNs cut multiphase mass flow metering errors to 4.3% MAPE on air-water-oil data, outperforming single-averaged baselines.
Genome-Factory is an open-source Python library that integrates data pipelines, model tuning, inference, benchmarks, and biological interpretation for genomic foundation models.
AdaProb performs machine unlearning by substituting final-layer output probabilities with optimized uniform pseudo-probabilities and updating model weights.
Using a deep CNN and Fourier frequency analysis on calorimeter data, the KOTO experiment suppressed neutron background by a factor of 5.6×10^5 while maintaining 70% efficiency for the signal decay.
The work introduces WaLeF/FIDLAr for flood forecasting, CoDiCast for probabilistic weather, and Hypercube-RAG for explainable environmental QA, claiming superior accuracy, efficiency, and interpretability over baselines.
TemPose-TF-ASF adds adjacent-stroke fusion with two-stage bidirectional context to boost Accuracy and Macro-F1 in badminton stroke classification over baselines.
LRCN and Transformer models using GelSight tactile images improve compliance prediction accuracy over baselines and show that objects harder than the sensor are harder to estimate.
HeartBERT applies self-supervised pretraining on a RoBERTa architecture to ECG signals, producing embeddings that enable strong performance on sleep staging and heartbeat classification with smaller labeled datasets and fewer parameters than baselines.
SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.
Pre-trained ViT representations combined with active learning and targeted design choices for annotations and selection improve object class retrieval in multi-object scenes.
EGI integrates four existing AI components for real-time multimodal emotion monitoring and feedback in simulated agile meetings, reporting 10% WER and improved self-awareness for Scrum Masters.
HuBERT reaches 86% accuracy and 0.93 AUC detecting COVID-19 from 893 voice samples in the Cambridge COVID-19 Sound database.
citing papers explorer
-
Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice Data
HuBERT reaches 86% accuracy and 0.93 AUC detecting COVID-19 from 893 voice samples in the Cambridge COVID-19 Sound database.