Universal Pooling -- A New Pooling Method for Convolutional Neural Networks

Euntai Kim; Hongje Seong; Junhyuk Hyun

arxiv: 1907.11440 · v1 · pith:76ZGDWSAnew · submitted 2019-07-26 · 💻 cs.CV

Universal Pooling -- A New Pooling Method for Convolutional Neural Networks

Junhyuk Hyun , Hongje Seong , Euntai Kim This is my paper

Pith reviewed 2026-05-24 15:59 UTC · model grok-4.3

classification 💻 cs.CV

keywords universal poolingconvolutional neural networkspooling methodsattention mechanismadaptive poolingfeature map reductionCNN architecturebenchmark evaluation

0 comments

The pith

Universal pooling learns to produce any pooling function for a given CNN problem and dataset, including existing methods as special cases while outperforming them on benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes universal pooling as a replacement for fixed operations like average or max pooling in convolutional neural networks. It uses a jointly trained attention-style mechanism to create a pooling function that fits the specific task and data at hand. The method is presented as a channel-wise local spatial attention process that can recover traditional pooling approaches when appropriate. Experiments on two benchmark problems show it delivers better results and exhibits the expected variety in how it reduces feature maps.

Core claim

Universal pooling generates any pooling function depending on a given problem and dataset. It is inspired by attention methods and can be considered as a channel-wise form of local spatial attention. Universal pooling is trained jointly with the main network and it is shown that it includes the existing pooling methods. When applied to two benchmark problems, the proposed method outperformed the existing pooling methods and performed with the expected diversity, adapting to the given problem.

What carries the argument

Universal pooling, implemented as a channel-wise local spatial attention mechanism trained jointly with the network to produce task-specific pooling functions.

If this is right

Pooling no longer needs to be chosen in advance and can instead be generated to match the problem and dataset.
Standard methods such as average pooling, max pooling, and stride pooling arise as particular cases within the universal approach.
CNNs achieve higher accuracy on the tested benchmarks by allowing the pooling step to adapt during training.
The learned pooling exhibits variety that aligns with the different characteristics of each problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same attention-based idea could be tested on other fixed operations inside networks, such as normalization layers.
Learned pooling might reduce manual hyperparameter tuning when designing new CNN architectures for varied tasks.
If the generated functions prove interpretable, they could reveal dataset-specific patterns in how features should be aggregated.
Stability of the joint training might be checked by varying random seeds or dataset sizes to see whether gains persist.

Load-bearing premise

A jointly trained attention-style mechanism can reliably produce useful and stable pooling functions without introducing training instability or overfitting that would negate the reported gains.

What would settle it

Reproducing the benchmark experiments and finding either no performance gain over fixed pooling or no observable diversity in the learned pooling functions across the two problems.

Figures

Figures reproduced from arXiv: 1907.11440 by Euntai Kim, Hongje Seong, Junhyuk Hyun.

**Figure 3.** Figure 3: Average pooling averages the feature-map entries in each [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 5.** Figure 5: Standard pooling can be considered as a linear combina [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

**Figure 4.** Figure 4: Stride convolution can be considered as stride pooling [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 7.** Figure 7: Application of softmax within the pooling block. Each [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗

**Figure 8.** Figure 8: Local and global pooling implemented by fully connected and convolutional layers. The red squares delineate the pooling blocks. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Examples from the CIFAR10 dataset, which comprises [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: Boxplot of experiments on the CIFAR10 dataset, performed in the (a)VGG architecture, and (b)the ResNet architecture. Red [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 12.** Figure 12: Pooling weights trained by average pooling (top) and [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗

**Figure 11.** Figure 11: Examples from the Places2 dataset, which contains im [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 13.** Figure 13: Pooling weights trained by flexible pooling (top) and [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗

**Figure 15.** Figure 15: As in Figure 14, but with the pooling features taken [PITH_FULL_IMAGE:figures/full_fig_p009_15.png] view at source ↗

read the original abstract

Pooling is one of the main elements in convolutional neural networks. The pooling reduces the size of the feature map, enabling training and testing with a limited amount of computation. This paper proposes a new pooling method named universal pooling. Unlike the existing pooling methods such as average pooling, max pooling, and stride pooling with fixed pooling function, universal pooling generates any pooling function, depending on a given problem and dataset. Universal pooling was inspired by attention methods and can be considered as a channel-wise form of local spatial attention. Universal pooling is trained jointly with the main network and it is shown that it includes the existing pooling methods. Finally, when applied to two benchmark problems, the proposed method outperformed the existing pooling methods and performed with the expected diversity, adapting to the given problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Universal pooling is an attention-style trainable layer that claims to subsume standard pooling and adapt to tasks, but the 'any function' claim has no supporting parameterization or proof.

read the letter

The main things here are that the paper introduces universal pooling as a channel-wise local spatial attention module that is trained jointly with the CNN, and it asserts this module can generate any pooling function while including the usual max, average, and stride versions. It also reports better results than fixed pooling on two benchmarks along with the expected adaptation behavior. That is the core contribution on offer. The framing as attention is a reasonable way to make pooling adaptive rather than fixed, and the joint training setup is straightforward. If the experiments in the full paper show stable gains without extra overfitting, that would be a modest practical win for vision models that need task-specific downsampling. The soft spot is the expressivity claim. The abstract states that the method generates any pooling function depending on the problem and dataset, yet there is no equation, construction, or constraint given that shows how the attention weights recover arbitrary operators beyond the listed special cases. The stress-test note is correct on this: without that detail the reported improvements could simply come from added model capacity rather than genuine universality. The abstract also gives no numbers, baselines, variance, or error analysis, so the outperformance statement cannot be evaluated from what is written. This is the kind of incremental CNN tweak that might interest people who work on architecture details for image classification or detection. A reader already experimenting with attention or learnable pooling could pick up the idea and test it themselves. It is not a foundational shift, but the concept is clear enough on its own terms to deserve a look from referees who can check the actual attention equations, the training stability, and whether the benchmark results hold up under scrutiny. I would send it to peer review rather than desk reject.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes universal pooling, a trainable pooling operation for CNNs presented as a channel-wise form of local spatial attention. It claims that this method can generate any pooling function depending on the problem and dataset, subsumes existing fixed pooling operations (average, max, stride), is trained jointly with the network, and outperforms standard pooling on two benchmark problems while exhibiting adaptive diversity.

Significance. If the expressivity claim holds and the reported gains are reproducible and not due to added capacity alone, the work would offer a flexible, learnable alternative to fixed pooling layers, potentially improving CNN performance across tasks by allowing data-driven adaptation of spatial aggregation.

major comments (2)

[Abstract] Abstract: the central claim that universal pooling 'generates any pooling function' and 'includes the existing pooling methods' is unsupported by any equation, parameterization of the attention weights, or proof of expressivity. No construction is given showing how the channel-wise local spatial attention recovers arbitrary pooling operators or the listed special cases, making the universality assertion unverifiable.
[Abstract] Abstract: the performance claim ('outperformed the existing pooling methods on two benchmark problems') is stated without reference to the benchmarks, network architectures, baselines, metrics, training protocol, or any quantitative results or error analysis. This absence prevents evaluation of whether gains are attributable to the proposed pooling or to increased model capacity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract would benefit from greater specificity to support its claims and will revise it accordingly. Point-by-point responses to the major comments follow.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that universal pooling 'generates any pooling function' and 'includes the existing pooling methods' is unsupported by any equation, parameterization of the attention weights, or proof of expressivity. No construction is given showing how the channel-wise local spatial attention recovers arbitrary pooling operators or the listed special cases, making the universality assertion unverifiable.

Authors: The abstract serves as a concise summary; the parameterization of the channel-wise attention weights, the explicit constructions showing recovery of average, max, and stride pooling as special cases, and the demonstration that the operation adapts to generate problem-specific pooling functions are all provided in the main text (method and analysis sections). We will revise the abstract to briefly reference this parameterization and the special-case recoveries, making the claims more directly verifiable from the abstract while preserving its length. revision: yes
Referee: [Abstract] Abstract: the performance claim ('outperformed the existing pooling methods on two benchmark problems') is stated without reference to the benchmarks, network architectures, baselines, metrics, training protocol, or any quantitative results or error analysis. This absence prevents evaluation of whether gains are attributable to the proposed pooling or to increased model capacity.

Authors: We acknowledge the abstract lacks these specifics. The experiments section details the two benchmarks, architectures, baselines, metrics, and quantitative results with error analysis. We will revise the abstract to name the benchmarks and report key performance deltas, enabling readers to assess the gains. The added capacity from the attention mechanism is minimal and fixed across experiments; we can add a clarifying clause in the revision if the editor deems it necessary. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained trainable module

full rationale

The paper defines universal pooling as a jointly trained channel-wise local spatial attention module whose parameters are optimized end-to-end with the network. No equation or claim reduces a derived quantity to a fitted input by construction, nor does any load-bearing step rely on a self-citation chain or an ansatz imported from prior author work. The claim that the module 'includes the existing pooling methods' is presented as an empirical observation after joint training rather than a definitional identity. The derivation therefore stands on independent trainable components and reported benchmark results without self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5659 in / 1005 out tokens · 21105 ms · 2026-05-24T15:59:30.436524+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Sca-cnn: Spatial and channel-wise attention in convolutional networks for im- age captioning

Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for im- age captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5659–5667, 2017

work page 2017
[2]

Sig- nal recovery from pooling representations

Joan Bruna Estrach, Arthur Szlam, and Yann LeCun. Sig- nal recovery from pooling representations. In International conference on machine learning, pages 307–315, 2014

work page 2014
[3]

Evaluation of glycerol-preserved bone allografts in cervi- cal spine fusion: a prospective, randomized controlled trial

R Scott Graham, Brian J Samsell, Allison Proffer, Mark A Moore, Rafael A Vega, Joel M Stary, and Bruce Mathern. Evaluation of glycerol-preserved bone allografts in cervi- cal spine fusion: a prospective, randomized controlled trial. Journal of Neurosurgery: Spine, 22(1):1–10, 2015

work page 2015
[4]

Learned-norm pooling for deep feedforward and recurrent neural networks

Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, and Yoshua Bengio. Learned-norm pooling for deep feedforward and recurrent neural networks. In Joint European conference on machine learning and knowledge discovery in databases, pages 530–546. Springer, 2014

work page 2014
[5]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[6]

Densely connected convolutional net- works

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

work page 2017
[7]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Cite- seer, 2009

work page 2009
[8]

Imagenet classiﬁcation with deep convolutional neural net- works

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolutional neural net- works. In Advances in neural information processing sys- tems, pages 1097–1105, 2012

work page 2012
[9]

Gen- eralizing pooling functions in convolutional neural networks: Mixed, gated, and tree

Chen-Yu Lee, Patrick W Gallagher, and Zhuowen Tu. Gen- eralizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Artiﬁcial intelligence and statis- tics, pages 464–472, 2016

work page 2016
[10]

Spectral rep- resentations for convolutional neural networks

Oren Rippel, Jasper Snoek, and Ryan P Adams. Spectral rep- resentations for convolutional neural networks. In Advances in neural information processing systems, pages 2449–2457, 2015

work page 2015
[11]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015

work page 2015
[12]

Detail-preserving pooling in deep networks

Faraz Saeedan, Nicolas Weber, Michael Goesele, and Ste- fan Roth. Detail-preserving pooling in deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9108–9116, 2018

work page 2018
[13]

Very deep convo- lutional networks for large-scale image recognition

Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition. InInter- national conference on learning representations, 2015

work page 2015
[14]

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

work page 2015
[15]

Cbam: Convolutional block attention module

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 3–19, 2018

work page 2018
[16]

Stochastic pooling for regularization of deep convolutional neural networks

Matthew D Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In In- ternational conference on learning representations, 2013

work page 2013
[17]

S3pool: Pooling with stochastic spatial sampling

Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, and Rogerio Feris. S3pool: Pooling with stochastic spatial sampling. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 4970–4978, 2017

work page 2017
[18]

Enhancing Operation of a Sewage Pumping Station for Inter Catchment Wastewater Transfer by Using Deep Learning and Hydraulic Model

Duo Zhang, Erlend Skullestad Holland, Geir Lindholm, and Harsha Ratnaweera. Enhancing operation of a sewage pump- ing station for inter catchment wastewater transfer by us- ing deep learning and hydraulic model. arXiv preprint arXiv:1811.06367, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Places: A 10 million image database for scene recognition

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2018

work page 2018

[1] [1]

Sca-cnn: Spatial and channel-wise attention in convolutional networks for im- age captioning

Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for im- age captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5659–5667, 2017

work page 2017

[2] [2]

Sig- nal recovery from pooling representations

Joan Bruna Estrach, Arthur Szlam, and Yann LeCun. Sig- nal recovery from pooling representations. In International conference on machine learning, pages 307–315, 2014

work page 2014

[3] [3]

Evaluation of glycerol-preserved bone allografts in cervi- cal spine fusion: a prospective, randomized controlled trial

R Scott Graham, Brian J Samsell, Allison Proffer, Mark A Moore, Rafael A Vega, Joel M Stary, and Bruce Mathern. Evaluation of glycerol-preserved bone allografts in cervi- cal spine fusion: a prospective, randomized controlled trial. Journal of Neurosurgery: Spine, 22(1):1–10, 2015

work page 2015

[4] [4]

Learned-norm pooling for deep feedforward and recurrent neural networks

Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, and Yoshua Bengio. Learned-norm pooling for deep feedforward and recurrent neural networks. In Joint European conference on machine learning and knowledge discovery in databases, pages 530–546. Springer, 2014

work page 2014

[5] [5]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[6] [6]

Densely connected convolutional net- works

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

work page 2017

[7] [7]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Cite- seer, 2009

work page 2009

[8] [8]

Imagenet classiﬁcation with deep convolutional neural net- works

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolutional neural net- works. In Advances in neural information processing sys- tems, pages 1097–1105, 2012

work page 2012

[9] [9]

Gen- eralizing pooling functions in convolutional neural networks: Mixed, gated, and tree

Chen-Yu Lee, Patrick W Gallagher, and Zhuowen Tu. Gen- eralizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Artiﬁcial intelligence and statis- tics, pages 464–472, 2016

work page 2016

[10] [10]

Spectral rep- resentations for convolutional neural networks

Oren Rippel, Jasper Snoek, and Ryan P Adams. Spectral rep- resentations for convolutional neural networks. In Advances in neural information processing systems, pages 2449–2457, 2015

work page 2015

[11] [11]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015

work page 2015

[12] [12]

Detail-preserving pooling in deep networks

Faraz Saeedan, Nicolas Weber, Michael Goesele, and Ste- fan Roth. Detail-preserving pooling in deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9108–9116, 2018

work page 2018

[13] [13]

Very deep convo- lutional networks for large-scale image recognition

Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition. InInter- national conference on learning representations, 2015

work page 2015

[14] [14]

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

work page 2015

[15] [15]

Cbam: Convolutional block attention module

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 3–19, 2018

work page 2018

[16] [16]

Stochastic pooling for regularization of deep convolutional neural networks

Matthew D Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In In- ternational conference on learning representations, 2013

work page 2013

[17] [17]

S3pool: Pooling with stochastic spatial sampling

Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, and Rogerio Feris. S3pool: Pooling with stochastic spatial sampling. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 4970–4978, 2017

work page 2017

[18] [18]

Enhancing Operation of a Sewage Pumping Station for Inter Catchment Wastewater Transfer by Using Deep Learning and Hydraulic Model

Duo Zhang, Erlend Skullestad Holland, Geir Lindholm, and Harsha Ratnaweera. Enhancing operation of a sewage pump- ing station for inter catchment wastewater transfer by us- ing deep learning and hydraulic model. arXiv preprint arXiv:1811.06367, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Places: A 10 million image database for scene recognition

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2018

work page 2018