Numerical Coordinate Regression with Convolutional Neural Networks

Aiden Nibali , Zhen He , Stuart Morgan , Luke Prendergast

Authors on Pith no claims yet

classification 💻 cs.CV

keywords dsntheatmapconvolutionaldifferentiableexistingfullylayermatching

read the original abstract

We study deep learning approaches to inferring numerical coordinates for points of interest in an input image. Existing convolutional neural network-based solutions to this problem either take a heatmap matching approach or regress to coordinates with a fully connected output layer. Neither of these approaches is ideal, since the former is not entirely differentiable, and the latter lacks inherent spatial generalization. We propose our differentiable spatial to numerical transform (DSNT) to fill this gap. The DSNT layer adds no trainable parameters, is fully differentiable, and exhibits good spatial generalization. Unlike heatmap matching, DSNT works well with low heatmap resolutions, so it can be dropped in as an output layer for a wide range of existing fully convolutional architectures. Consequently, DSNT offers a better trade-off between inference speed and prediction accuracy compared to existing techniques. When used to replace the popular heatmap matching approach used in almost all state-of-the-art methods for pose estimation, DSNT gives better prediction accuracy for all model architectures tested.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Domain Generalization through Spatial Relation Induction over Visual Primitives
cs.CV 2026-05 unverdicted novelty 7.0

PARSE improves domain generalization accuracy by factoring recognition into visual primitives and their spatial relational compositions learned end-to-end with differentiable predicates.
MARCO: Navigating the Unseen Space of Semantic Correspondence
cs.CV 2026-04 unverdicted novelty 6.0

MARCO achieves new state-of-the-art semantic correspondence on SPair-71k, AP-10K and PF-PASCAL by combining coarse-to-fine refinement with self-distillation on DINOv2, delivering larger gains at fine thresholds and on...
Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models
cs.CV 2026-04 unverdicted novelty 5.0

VANGUARD is a staged-training VLM framework that reports 94% ROC-AUC and 84% F1 on UCF-Crime while adding chain-of-thought reasoning and spatial grounding to video anomaly detection.