pith. sign in

arxiv: 1907.11440 · v1 · pith:76ZGDWSAnew · submitted 2019-07-26 · 💻 cs.CV

Universal Pooling -- A New Pooling Method for Convolutional Neural Networks

Pith reviewed 2026-05-24 15:59 UTC · model grok-4.3

classification 💻 cs.CV
keywords universal poolingconvolutional neural networkspooling methodsattention mechanismadaptive poolingfeature map reductionCNN architecturebenchmark evaluation
0
0 comments X

The pith

Universal pooling learns to produce any pooling function for a given CNN problem and dataset, including existing methods as special cases while outperforming them on benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes universal pooling as a replacement for fixed operations like average or max pooling in convolutional neural networks. It uses a jointly trained attention-style mechanism to create a pooling function that fits the specific task and data at hand. The method is presented as a channel-wise local spatial attention process that can recover traditional pooling approaches when appropriate. Experiments on two benchmark problems show it delivers better results and exhibits the expected variety in how it reduces feature maps.

Core claim

Universal pooling generates any pooling function depending on a given problem and dataset. It is inspired by attention methods and can be considered as a channel-wise form of local spatial attention. Universal pooling is trained jointly with the main network and it is shown that it includes the existing pooling methods. When applied to two benchmark problems, the proposed method outperformed the existing pooling methods and performed with the expected diversity, adapting to the given problem.

What carries the argument

Universal pooling, implemented as a channel-wise local spatial attention mechanism trained jointly with the network to produce task-specific pooling functions.

If this is right

  • Pooling no longer needs to be chosen in advance and can instead be generated to match the problem and dataset.
  • Standard methods such as average pooling, max pooling, and stride pooling arise as particular cases within the universal approach.
  • CNNs achieve higher accuracy on the tested benchmarks by allowing the pooling step to adapt during training.
  • The learned pooling exhibits variety that aligns with the different characteristics of each problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same attention-based idea could be tested on other fixed operations inside networks, such as normalization layers.
  • Learned pooling might reduce manual hyperparameter tuning when designing new CNN architectures for varied tasks.
  • If the generated functions prove interpretable, they could reveal dataset-specific patterns in how features should be aggregated.
  • Stability of the joint training might be checked by varying random seeds or dataset sizes to see whether gains persist.

Load-bearing premise

A jointly trained attention-style mechanism can reliably produce useful and stable pooling functions without introducing training instability or overfitting that would negate the reported gains.

What would settle it

Reproducing the benchmark experiments and finding either no performance gain over fixed pooling or no observable diversity in the learned pooling functions across the two problems.

Figures

Figures reproduced from arXiv: 1907.11440 by Euntai Kim, Hongje Seong, Junhyuk Hyun.

Figure 2
Figure 2. Figure 2: Max pooling takes the maximum value within the pool [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average pooling averages the feature-map entries in each [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Standard pooling can be considered as a linear combina [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Stride convolution can be considered as stride pooling [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Application of softmax within the pooling block. Each [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Local and global pooling implemented by fully connected and convolutional layers. The red squares delineate the pooling blocks. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Examples from the CIFAR10 dataset, which comprises [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Boxplot of experiments on the CIFAR10 dataset, performed in the (a)VGG architecture, and (b)the ResNet architecture. Red [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Pooling weights trained by average pooling (top) and [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗
Figure 11
Figure 11. Figure 11: Examples from the Places2 dataset, which contains im [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Pooling weights trained by flexible pooling (top) and [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗
Figure 15
Figure 15. Figure 15: As in Figure 14, but with the pooling features taken [PITH_FULL_IMAGE:figures/full_fig_p009_15.png] view at source ↗
read the original abstract

Pooling is one of the main elements in convolutional neural networks. The pooling reduces the size of the feature map, enabling training and testing with a limited amount of computation. This paper proposes a new pooling method named universal pooling. Unlike the existing pooling methods such as average pooling, max pooling, and stride pooling with fixed pooling function, universal pooling generates any pooling function, depending on a given problem and dataset. Universal pooling was inspired by attention methods and can be considered as a channel-wise form of local spatial attention. Universal pooling is trained jointly with the main network and it is shown that it includes the existing pooling methods. Finally, when applied to two benchmark problems, the proposed method outperformed the existing pooling methods and performed with the expected diversity, adapting to the given problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes universal pooling, a trainable pooling operation for CNNs presented as a channel-wise form of local spatial attention. It claims that this method can generate any pooling function depending on the problem and dataset, subsumes existing fixed pooling operations (average, max, stride), is trained jointly with the network, and outperforms standard pooling on two benchmark problems while exhibiting adaptive diversity.

Significance. If the expressivity claim holds and the reported gains are reproducible and not due to added capacity alone, the work would offer a flexible, learnable alternative to fixed pooling layers, potentially improving CNN performance across tasks by allowing data-driven adaptation of spatial aggregation.

major comments (2)
  1. [Abstract] Abstract: the central claim that universal pooling 'generates any pooling function' and 'includes the existing pooling methods' is unsupported by any equation, parameterization of the attention weights, or proof of expressivity. No construction is given showing how the channel-wise local spatial attention recovers arbitrary pooling operators or the listed special cases, making the universality assertion unverifiable.
  2. [Abstract] Abstract: the performance claim ('outperformed the existing pooling methods on two benchmark problems') is stated without reference to the benchmarks, network architectures, baselines, metrics, training protocol, or any quantitative results or error analysis. This absence prevents evaluation of whether gains are attributable to the proposed pooling or to increased model capacity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract would benefit from greater specificity to support its claims and will revise it accordingly. Point-by-point responses to the major comments follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that universal pooling 'generates any pooling function' and 'includes the existing pooling methods' is unsupported by any equation, parameterization of the attention weights, or proof of expressivity. No construction is given showing how the channel-wise local spatial attention recovers arbitrary pooling operators or the listed special cases, making the universality assertion unverifiable.

    Authors: The abstract serves as a concise summary; the parameterization of the channel-wise attention weights, the explicit constructions showing recovery of average, max, and stride pooling as special cases, and the demonstration that the operation adapts to generate problem-specific pooling functions are all provided in the main text (method and analysis sections). We will revise the abstract to briefly reference this parameterization and the special-case recoveries, making the claims more directly verifiable from the abstract while preserving its length. revision: yes

  2. Referee: [Abstract] Abstract: the performance claim ('outperformed the existing pooling methods on two benchmark problems') is stated without reference to the benchmarks, network architectures, baselines, metrics, training protocol, or any quantitative results or error analysis. This absence prevents evaluation of whether gains are attributable to the proposed pooling or to increased model capacity.

    Authors: We acknowledge the abstract lacks these specifics. The experiments section details the two benchmarks, architectures, baselines, metrics, and quantitative results with error analysis. We will revise the abstract to name the benchmarks and report key performance deltas, enabling readers to assess the gains. The added capacity from the attention mechanism is minimal and fixed across experiments; we can add a clarifying clause in the revision if the editor deems it necessary. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained trainable module

full rationale

The paper defines universal pooling as a jointly trained channel-wise local spatial attention module whose parameters are optimized end-to-end with the network. No equation or claim reduces a derived quantity to a fitted input by construction, nor does any load-bearing step rely on a self-citation chain or an ansatz imported from prior author work. The claim that the module 'includes the existing pooling methods' is presented as an empirical observation after joint training rather than a definitional identity. The derivation therefore stands on independent trainable components and reported benchmark results without self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5659 in / 1005 out tokens · 21105 ms · 2026-05-24T15:59:30.436524+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

  1. [1]

    Sca-cnn: Spatial and channel-wise attention in convolutional networks for im- age captioning

    Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for im- age captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5659–5667, 2017

  2. [2]

    Sig- nal recovery from pooling representations

    Joan Bruna Estrach, Arthur Szlam, and Yann LeCun. Sig- nal recovery from pooling representations. In International conference on machine learning, pages 307–315, 2014

  3. [3]

    Evaluation of glycerol-preserved bone allografts in cervi- cal spine fusion: a prospective, randomized controlled trial

    R Scott Graham, Brian J Samsell, Allison Proffer, Mark A Moore, Rafael A Vega, Joel M Stary, and Bruce Mathern. Evaluation of glycerol-preserved bone allografts in cervi- cal spine fusion: a prospective, randomized controlled trial. Journal of Neurosurgery: Spine, 22(1):1–10, 2015

  4. [4]

    Learned-norm pooling for deep feedforward and recurrent neural networks

    Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, and Yoshua Bengio. Learned-norm pooling for deep feedforward and recurrent neural networks. In Joint European conference on machine learning and knowledge discovery in databases, pages 530–546. Springer, 2014

  5. [5]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  6. [6]

    Densely connected convolutional net- works

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

  7. [7]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Cite- seer, 2009

  8. [8]

    Imagenet classification with deep convolutional neural net- works

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. In Advances in neural information processing sys- tems, pages 1097–1105, 2012

  9. [9]

    Gen- eralizing pooling functions in convolutional neural networks: Mixed, gated, and tree

    Chen-Yu Lee, Patrick W Gallagher, and Zhuowen Tu. Gen- eralizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Artificial intelligence and statis- tics, pages 464–472, 2016

  10. [10]

    Spectral rep- resentations for convolutional neural networks

    Oren Rippel, Jasper Snoek, and Ryan P Adams. Spectral rep- resentations for convolutional neural networks. In Advances in neural information processing systems, pages 2449–2457, 2015

  11. [11]

    Imagenet large scale visual recognition challenge

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015

  12. [12]

    Detail-preserving pooling in deep networks

    Faraz Saeedan, Nicolas Weber, Michael Goesele, and Ste- fan Roth. Detail-preserving pooling in deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9108–9116, 2018

  13. [13]

    Very deep convo- lutional networks for large-scale image recognition

    Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition. InInter- national conference on learning representations, 2015

  14. [14]

    Going deeper with convolutions

    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

  15. [15]

    Cbam: Convolutional block attention module

    Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 3–19, 2018

  16. [16]

    Stochastic pooling for regularization of deep convolutional neural networks

    Matthew D Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In In- ternational conference on learning representations, 2013

  17. [17]

    S3pool: Pooling with stochastic spatial sampling

    Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, and Rogerio Feris. S3pool: Pooling with stochastic spatial sampling. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 4970–4978, 2017

  18. [18]

    Enhancing Operation of a Sewage Pumping Station for Inter Catchment Wastewater Transfer by Using Deep Learning and Hydraulic Model

    Duo Zhang, Erlend Skullestad Holland, Geir Lindholm, and Harsha Ratnaweera. Enhancing operation of a sewage pump- ing station for inter catchment wastewater transfer by us- ing deep learning and hydraulic model. arXiv preprint arXiv:1811.06367, 2018

  19. [19]

    Places: A 10 million image database for scene recognition

    Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2018