pith. sign in

arxiv: 1906.08819 · v1 · pith:I6LE46L2new · submitted 2019-06-20 · 💻 cs.DC

Performance Comparison Between OpenCV Built in CPU and GPU Functions on Image Processing Operations

Pith reviewed 2026-05-25 19:02 UTC · model grok-4.3

classification 💻 cs.DC
keywords image processingOpenCVGPUCPUCUDAperformance comparisonparallel processing
0
0 comments X

The pith

OpenCV's GPU functions using CUDA deliver greater speed than CPU functions for image processing operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares performance of common image processing operations such as convolution, Fourier transforms, and matrix operations using OpenCV's built-in CPU and GPU functions. It establishes that GPUs achieve higher speeds than CPUs due to parallel processing versus serial processing. This comparison matters because larger images and complex operations can exceed CPU capabilities. The tests rely on NVIDIA's CUDA platform implemented in the GPU functions. A sympathetic reader would care about when GPU acceleration becomes necessary for practical tasks.

Core claim

GPUs provide greater speed compared to CPUs for image processing operations because of their parallel processing nature, as measured using OpenCV's built-in CPU and GPU functions that use CUDA.

What carries the argument

OpenCV's built-in CPU and GPU functions using CUDA for operations including matrix inversion, transpose, derivative, convolution, and Fourier Transform.

If this is right

  • For the tested operations, developers can expect faster execution times by switching to the GPU functions.
  • As image sizes increase, the performance gap favors GPU parallel computation over CPU serial processing.
  • CUDA implementation in OpenCV enables practical acceleration for specialized digital signal processing tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The results could guide selection of hardware for real-time image pipelines beyond the specific operations tested.
  • Similar comparisons in other libraries might reveal consistent patterns in CUDA-based acceleration.
  • Extending tests to varied GPU architectures would show whether the speed advantage holds across devices.

Load-bearing premise

The built-in OpenCV GPU functions using CUDA are correctly implemented, representative of general performance, and the tested operations and image sizes reflect real-world usage without hardware-specific biases.

What would settle it

Measuring the same OpenCV operations on hardware where the GPU version shows no consistent speedup over the CPU version would falsify the central claim.

Figures

Figures reproduced from arXiv: 1906.08819 by Batuhan Hang\"un, \"Onder Eyecio\u{g}lu.

Figure 5
Figure 5. Figure 5: Thresholding function At this paper, Otsu’s method [9] was used for thresholding operations [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Original image resized to %25 of original image 2.2.2 Thresholding In some applications it may be desirable to binaryize digital images. Operation to create binary images is named Thresholding and the k value, which determines which intensity values are 1 and which intensity values are 0, is also called the threshold. 𝑠 = 𝑇(𝑟) = { 0, 𝑟 < 𝑘 1, 𝑟 ≥ 𝑘 [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 9
Figure 9. Figure 9: Detecting of edge in an image [PITH_FULL_IMAGE:figures/full_fig_p004_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: OpenCV Block Diagram with supported Operating Systems [13] 2.3.2 Compute Unified Device Architecture (CUDA) CUDA is an API (application programming interface) and a parallel computing platform model that was created by Nvidia at 2007 [14]. After that day it gained popularity amongst people who needs high computing power with parallelism. This platform is a software layer which provides direct Access to GP… view at source ↗
Figure 15
Figure 15. Figure 15: Test image lena.jpg. Its original size is 225x225. Image was processed for sizes 112x112, 338x338, 450x450, 562x562, 675x675, 788x788, 900x900, 1012x1012, 1125x1125, 1238x1238, 1350x1350, 1462x1462, 1575x1575, 1688x1688, 1800x1800, 1912x1912, 2025x2025, 2138x2138 [PITH_FULL_IMAGE:figures/full_fig_p006_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: The figure given above shows the comparison of built-in CPU and GPU functions that resize the image in terms of NxM(total number of pixels) and T(time spent in milliseconds). While resizing images, Linear Interpolation method was used [PITH_FULL_IMAGE:figures/full_fig_p006_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: The figure given above shows the comparison of built-in CPU and GPU functions that thresholds the image. While thresholding images, Otsu’s method was used [PITH_FULL_IMAGE:figures/full_fig_p007_17.png] view at source ↗
read the original abstract

Image Processing is a specialized area of Digital Signal Processing which contains various mathematical and algebraic operations such as matrix inversion, transpose of matrix, derivative, convolution, Fourier Transform etc. Operations like those require higher computational capabilities than daily usage purposes of computers. At that point, with increased image sizes and more complex operations, CPUs may be unsatisfactory since they use Serial Processing by default. GPUs are the solution that come up with greater speed compared to CPUs because of their Parallel Processing/Computation nature. A parallel computing platform and programming model named CUDA was created by NVIDIA and implemented by the graphics processing units (GPUs) which were produced by them. In this paper, computing performance of some commonly used Image Processing operations will be compared on OpenCV's built in CPU and GPU functions that use CUDA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript compares the performance of common image processing operations (such as those involving matrix operations, convolution, and transforms) using OpenCV's built-in CPU functions versus its GPU functions that rely on CUDA. It argues that GPUs provide greater speed due to parallel processing and reports execution times to support this for the tested operations and image sizes.

Significance. If the measurements are valid, the work could supply concrete timing data to help practitioners decide when to invoke cv::cuda:: routines in OpenCV for image-processing pipelines. The empirical nature of the study means its value rests entirely on the reproducibility and correctness of the reported timings.

major comments (1)
  1. The central claim requires that the CPU and GPU implementations compute functionally equivalent results; otherwise the reported speed-ups compare unlike operations. The manuscript provides no description of any output-equivalence verification (pixel-wise difference, L2 norm, or visual inspection) between the cv:: and cv::cuda:: results for any of the tested operations. This verification is load-bearing for the performance comparison and is absent from the reported methodology.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need to verify functional equivalence between CPU and GPU results. We address the single major comment below.

read point-by-point responses
  1. Referee: The central claim requires that the CPU and GPU implementations compute functionally equivalent results; otherwise the reported speed-ups compare unlike operations. The manuscript provides no description of any output-equivalence verification (pixel-wise difference, L2 norm, or visual inspection) between the cv:: and cv::cuda:: results for any of the tested operations. This verification is load-bearing for the performance comparison and is absent from the reported methodology.

    Authors: We agree that the absence of documented equivalence verification weakens the central claim. The original manuscript does not describe any pixel-wise, norm-based, or visual checks between cv:: and cv::cuda:: outputs. In the revised version we will add a dedicated methodology subsection that reports the verification procedure and results (maximum absolute difference and L2 norm) for every operation and image size, confirming that the compared functions produce numerically equivalent results within floating-point tolerance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark with no derivations or self-referential claims

full rationale

The paper reports direct wall-clock timing measurements of OpenCV CPU (cv::) versus GPU (cv::cuda::) implementations for standard image-processing kernels on fixed test images. No equations, fitted parameters, uniqueness theorems, or ansatzes appear. The central claim (GPU faster due to parallelism) is an observed outcome of the benchmark, not a quantity derived from or fitted to itself. No self-citations are load-bearing; the work is self-contained against external timing data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on standard assumptions about hardware performance differences but introduces no new parameters, axioms, or entities.

pith-pipeline@v0.9.0 · 5665 in / 878 out tokens · 34314 ms · 2026-05-25T19:02:04.896666+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [2]

    Oppenheim, Alan S

    Alan V. Oppenheim, Alan S. Willsky, S. Hamid, Signals and Systems (2nd Edition), Pearson, 1996, Pages 1-4

  2. [3]

    Smith, W, Steven, A Scientist and Engineer's’ Guide to DSP.,1997, Pages 1-2

  3. [4]

    Lin-Ching Chang, EsamEl -Araby, VinhQ.Dang, LamH.Dao, GPU acceleration of nonlinear diffusion tensor estimation using CUDA and MPI, Neurocomputing, Volume 135, 5 July 2014, Pages 328–338

  4. [5]

    Diptarup Sahaa, Mr. Karan Darjib, Narendra Patelc, Darshak Thakored, Implementation of Image Enhancement Algorithms and Recursive Ray Tracing using CUDA, 7th International Conference on Communication, Computing and Virtualization 2016

  5. [6]

    Wu Xin, Zhang Jian -qi, Huang Xi, Liu De -lian, Separable convolution template (SCT) background prediction accelerated by CUDA for infrared small target detection, Infrared Physics & Technology an International Research Journal, 2013

  6. [7]

    In Kyu Park, Nitin Singhal, Man Hee Lee, Sungdae Cho and Chris W. Kim, Design and Performance Evaluation of INTERNATIONAL JOURNAL of ENGINEERING SCIENCE AND APPLICATION Hangun and Eyecioglu, Vol.1, No.2, 2017 41 Image Processing Algorithms on GPUs, IEEE Transactions on Parallel and Distributed Systems, Vol. 22, No. 1, January 2011

  7. [8]

    Gonzalez, Richard E

    Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing (3rd Edition), Pearson 2009, Pages 1-5 (Book)

  8. [9]

    Nobuyuki Otsu, A Threshold Selection Method from Gray-Level Histograms, IEEE Transactions on Systems, Man, and Cybernetics (Volume: 9, Issue: 1, Jan. 1979)

  9. [10]

    John Canny, A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: PAMI -8, Issue: 6, Nov. 1986)

  10. [12]

    Gary Bradski, Adrian Kaehler, Learning OpenCV 3 (1st Edition), O'REILLY 2016, Pages 1-3

  11. [13]

    Gary Bradski, Adrian Kaehler, Learning OpenCV 3 (1st Edition), O'REILLY 2016, Pages 9-10

  12. [14]

    Nvidia CUDA Home Page

  13. [15]

    Sapientia

    Richard Forster, Agnes Fülöp, Yang -Mills Lattice on CUDA, The Journal of "Sapientia" Hungarian University of Transylvania (Volume 5, Issue: 2, Dec. 2013)

  14. [16]

    Sheaffer, Kevin Skadron, A performance study of general -purpose applications on graphics processors using CUDA, Journal of Parallel and Distributed Computing, 2008

    Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Kevin Skadron, A performance study of general -purpose applications on graphics processors using CUDA, Journal of Parallel and Distributed Computing, 2008

  15. [17]

    Antonella Galizia, Daniele D’Agostino, Andrea Clematis, An MPI–CUDA library for image processing on HPC architectures, Journal of Computational and Applied Mathematics, 2014

  16. [18]

    Zhiyi Yang, Yating Zhu, Yong Pu, Parallel Image Processing Based on CUDA, International Conference on Computer Science and Software Engineering