Universal Pooling -- A New Pooling Method for Convolutional Neural Networks
Pith reviewed 2026-05-24 15:59 UTC · model grok-4.3
The pith
Universal pooling learns to produce any pooling function for a given CNN problem and dataset, including existing methods as special cases while outperforming them on benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Universal pooling generates any pooling function depending on a given problem and dataset. It is inspired by attention methods and can be considered as a channel-wise form of local spatial attention. Universal pooling is trained jointly with the main network and it is shown that it includes the existing pooling methods. When applied to two benchmark problems, the proposed method outperformed the existing pooling methods and performed with the expected diversity, adapting to the given problem.
What carries the argument
Universal pooling, implemented as a channel-wise local spatial attention mechanism trained jointly with the network to produce task-specific pooling functions.
If this is right
- Pooling no longer needs to be chosen in advance and can instead be generated to match the problem and dataset.
- Standard methods such as average pooling, max pooling, and stride pooling arise as particular cases within the universal approach.
- CNNs achieve higher accuracy on the tested benchmarks by allowing the pooling step to adapt during training.
- The learned pooling exhibits variety that aligns with the different characteristics of each problem.
Where Pith is reading between the lines
- The same attention-based idea could be tested on other fixed operations inside networks, such as normalization layers.
- Learned pooling might reduce manual hyperparameter tuning when designing new CNN architectures for varied tasks.
- If the generated functions prove interpretable, they could reveal dataset-specific patterns in how features should be aggregated.
- Stability of the joint training might be checked by varying random seeds or dataset sizes to see whether gains persist.
Load-bearing premise
A jointly trained attention-style mechanism can reliably produce useful and stable pooling functions without introducing training instability or overfitting that would negate the reported gains.
What would settle it
Reproducing the benchmark experiments and finding either no performance gain over fixed pooling or no observable diversity in the learned pooling functions across the two problems.
Figures
read the original abstract
Pooling is one of the main elements in convolutional neural networks. The pooling reduces the size of the feature map, enabling training and testing with a limited amount of computation. This paper proposes a new pooling method named universal pooling. Unlike the existing pooling methods such as average pooling, max pooling, and stride pooling with fixed pooling function, universal pooling generates any pooling function, depending on a given problem and dataset. Universal pooling was inspired by attention methods and can be considered as a channel-wise form of local spatial attention. Universal pooling is trained jointly with the main network and it is shown that it includes the existing pooling methods. Finally, when applied to two benchmark problems, the proposed method outperformed the existing pooling methods and performed with the expected diversity, adapting to the given problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes universal pooling, a trainable pooling operation for CNNs presented as a channel-wise form of local spatial attention. It claims that this method can generate any pooling function depending on the problem and dataset, subsumes existing fixed pooling operations (average, max, stride), is trained jointly with the network, and outperforms standard pooling on two benchmark problems while exhibiting adaptive diversity.
Significance. If the expressivity claim holds and the reported gains are reproducible and not due to added capacity alone, the work would offer a flexible, learnable alternative to fixed pooling layers, potentially improving CNN performance across tasks by allowing data-driven adaptation of spatial aggregation.
major comments (2)
- [Abstract] Abstract: the central claim that universal pooling 'generates any pooling function' and 'includes the existing pooling methods' is unsupported by any equation, parameterization of the attention weights, or proof of expressivity. No construction is given showing how the channel-wise local spatial attention recovers arbitrary pooling operators or the listed special cases, making the universality assertion unverifiable.
- [Abstract] Abstract: the performance claim ('outperformed the existing pooling methods on two benchmark problems') is stated without reference to the benchmarks, network architectures, baselines, metrics, training protocol, or any quantitative results or error analysis. This absence prevents evaluation of whether gains are attributable to the proposed pooling or to increased model capacity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We agree that the abstract would benefit from greater specificity to support its claims and will revise it accordingly. Point-by-point responses to the major comments follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that universal pooling 'generates any pooling function' and 'includes the existing pooling methods' is unsupported by any equation, parameterization of the attention weights, or proof of expressivity. No construction is given showing how the channel-wise local spatial attention recovers arbitrary pooling operators or the listed special cases, making the universality assertion unverifiable.
Authors: The abstract serves as a concise summary; the parameterization of the channel-wise attention weights, the explicit constructions showing recovery of average, max, and stride pooling as special cases, and the demonstration that the operation adapts to generate problem-specific pooling functions are all provided in the main text (method and analysis sections). We will revise the abstract to briefly reference this parameterization and the special-case recoveries, making the claims more directly verifiable from the abstract while preserving its length. revision: yes
-
Referee: [Abstract] Abstract: the performance claim ('outperformed the existing pooling methods on two benchmark problems') is stated without reference to the benchmarks, network architectures, baselines, metrics, training protocol, or any quantitative results or error analysis. This absence prevents evaluation of whether gains are attributable to the proposed pooling or to increased model capacity.
Authors: We acknowledge the abstract lacks these specifics. The experiments section details the two benchmarks, architectures, baselines, metrics, and quantitative results with error analysis. We will revise the abstract to name the benchmarks and report key performance deltas, enabling readers to assess the gains. The added capacity from the attention mechanism is minimal and fixed across experiments; we can add a clarifying clause in the revision if the editor deems it necessary. revision: yes
Circularity Check
No circularity; derivation is self-contained trainable module
full rationale
The paper defines universal pooling as a jointly trained channel-wise local spatial attention module whose parameters are optimized end-to-end with the network. No equation or claim reduces a derived quantity to a fitted input by construction, nor does any load-bearing step rely on a self-citation chain or an ansatz imported from prior author work. The claim that the module 'includes the existing pooling methods' is presented as an empirical observation after joint training rather than a definitional identity. The derivation therefore stands on independent trainable components and reported benchmark results without self-referential reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Sca-cnn: Spatial and channel-wise attention in convolutional networks for im- age captioning
Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for im- age captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5659–5667, 2017
work page 2017
-
[2]
Sig- nal recovery from pooling representations
Joan Bruna Estrach, Arthur Szlam, and Yann LeCun. Sig- nal recovery from pooling representations. In International conference on machine learning, pages 307–315, 2014
work page 2014
-
[3]
R Scott Graham, Brian J Samsell, Allison Proffer, Mark A Moore, Rafael A Vega, Joel M Stary, and Bruce Mathern. Evaluation of glycerol-preserved bone allografts in cervi- cal spine fusion: a prospective, randomized controlled trial. Journal of Neurosurgery: Spine, 22(1):1–10, 2015
work page 2015
-
[4]
Learned-norm pooling for deep feedforward and recurrent neural networks
Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, and Yoshua Bengio. Learned-norm pooling for deep feedforward and recurrent neural networks. In Joint European conference on machine learning and knowledge discovery in databases, pages 530–546. Springer, 2014
work page 2014
-
[5]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016
work page 2016
-
[6]
Densely connected convolutional net- works
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017
work page 2017
-
[7]
Learning multiple layers of features from tiny images
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Cite- seer, 2009
work page 2009
-
[8]
Imagenet classification with deep convolutional neural net- works
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. In Advances in neural information processing sys- tems, pages 1097–1105, 2012
work page 2012
-
[9]
Gen- eralizing pooling functions in convolutional neural networks: Mixed, gated, and tree
Chen-Yu Lee, Patrick W Gallagher, and Zhuowen Tu. Gen- eralizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Artificial intelligence and statis- tics, pages 464–472, 2016
work page 2016
-
[10]
Spectral rep- resentations for convolutional neural networks
Oren Rippel, Jasper Snoek, and Ryan P Adams. Spectral rep- resentations for convolutional neural networks. In Advances in neural information processing systems, pages 2449–2457, 2015
work page 2015
-
[11]
Imagenet large scale visual recognition challenge
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015
work page 2015
-
[12]
Detail-preserving pooling in deep networks
Faraz Saeedan, Nicolas Weber, Michael Goesele, and Ste- fan Roth. Detail-preserving pooling in deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9108–9116, 2018
work page 2018
-
[13]
Very deep convo- lutional networks for large-scale image recognition
Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition. InInter- national conference on learning representations, 2015
work page 2015
-
[14]
Going deeper with convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015
work page 2015
-
[15]
Cbam: Convolutional block attention module
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 3–19, 2018
work page 2018
-
[16]
Stochastic pooling for regularization of deep convolutional neural networks
Matthew D Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In In- ternational conference on learning representations, 2013
work page 2013
-
[17]
S3pool: Pooling with stochastic spatial sampling
Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, and Rogerio Feris. S3pool: Pooling with stochastic spatial sampling. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 4970–4978, 2017
work page 2017
-
[18]
Duo Zhang, Erlend Skullestad Holland, Geir Lindholm, and Harsha Ratnaweera. Enhancing operation of a sewage pump- ing station for inter catchment wastewater transfer by us- ing deep learning and hydraulic model. arXiv preprint arXiv:1811.06367, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
Places: A 10 million image database for scene recognition
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2018
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.