pith. sign in

arxiv: 1906.08911 · v2 · pith:GEBV3Y5Inew · submitted 2019-06-21 · 💻 cs.DC · cs.CL· cs.PF· cs.PL

Toward a Standard Interface for User-Defined Scheduling in OpenMP

Pith reviewed 2026-05-25 19:03 UTC · model grok-4.3

classification 💻 cs.DC cs.CLcs.PFcs.PL
keywords OpenMPuser-defined schedulingparallel loopsloop schedulingstandard interfaceCC++Fortran
0
0 comments X

The pith

OpenMP should add an interface letting users define custom strategies for scheduling parallel loops.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The three built-in scheduling choices in OpenMP prove insufficient for some parallel programs, making it impractical to add every possible strategy to the standard. The paper identifies the core components any user-defined scheduler would need and offers two alternative interface designs that could be added to OpenMP. These designs aim to let programmers supply application-specific schedulers while remaining fully standard-compliant. The interfaces are examined for compatibility with the three host languages OpenMP supports.

Core claim

The principal components required by user-defined scheduling are analyzed and two competing interfaces are proposed as candidates for the OpenMP standard to enable standard-compliant application-specific scheduling.

What carries the argument

Two competing interfaces for user-defined scheduling that capture the principal components needed for custom loop schedulers.

If this is right

  • Programmers could supply application-specific loop schedulers while staying inside the OpenMP standard.
  • Performance gains become possible in cases where the three built-in options are inadequate.
  • The community gains concrete designs to discuss and prototype across C, C++, and Fortran.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption would shift focus from adding fixed strategies to providing a general extension point.
  • Further work could test whether the interfaces support schedulers from outside OpenMP runtimes.
  • The approach avoids the need to standardize every conceivable scheduling method individually.

Load-bearing premise

The two proposed interfaces capture the needs of real user-defined schedulers and can be implemented without breaking existing OpenMP semantics or performance in C, C++, and Fortran.

What would settle it

A working prototype of either interface that cannot support common custom scheduling behaviors or that changes the behavior or speed of existing OpenMP programs would show the designs fall short.

Figures

Figures reproduced from arXiv: 1906.08911 by Christian Iwainsky, Florina M. Ciorba, Jonas H. Muller Korndorfer, Michael Klemm, Vivek Kale.

Figure 1
Figure 1. Figure 1: Basic loop scheduler code structure. 3 Support for User-defined Scheduling Strategies Let us consider what is needed to specify an arbitrary scheduling strategy for a parallel loop. The strategy can use a combination of shared data structures, a collection of low-overhead steal work queues, exclusive queues meant for each core, or shared queues from which multiple threads can dequeue tasks each rep￾resenti… view at source ↗
Figure 2
Figure 2. Figure 2: Naive example for implementing the OpenMP static scheduling clause using both proposed UDS strategies. Left side presents the implementation following the lambda-style specification, Sec. 4.1, while the right side follows the declare-directives style, Sec. 4.2. A potential solution would allow the use of the lambda-style syntax for C++, and the UDR-style for C and Fortran codes. Our suggested UDS approach … view at source ↗
read the original abstract

Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain instances. Given the large number of other possible scheduling strategies, it is infeasible to standardize each one. A more viable approach is to extend the OpenMP standard to allow for users to define loop scheduling strategies. The approach will enable standard-compliant application-specific scheduling. This work analyzes the principal components required by user-defined scheduling and proposes two competing interfaces as candidates for the OpenMP standard. We conceptually compare the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran. These interfaces serve the OpenMP community as a basis for discussion and prototype implementation for user-defined scheduling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript analyzes the principal components required by user-defined scheduling in OpenMP and proposes two competing interfaces as candidates for the OpenMP standard. It conceptually compares the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran, serving as a basis for discussion and prototype implementation.

Significance. If the proposed interfaces prove implementable and compatible upon prototyping, the work could meaningfully extend OpenMP's scheduling flexibility beyond the current three options, enabling application-specific strategies while remaining standard-compliant. The conceptual comparison across host languages is a useful contribution to the standardization process.

minor comments (3)
  1. The abstract states that two interfaces are proposed but does not name or briefly characterize them (e.g., by key differences in callback style or language binding), which would improve immediate readability for readers scanning the contribution.
  2. The manuscript would benefit from an explicit enumeration (perhaps in a table) of the principal components identified in the analysis, to make the mapping from analysis to the two interface designs more transparent.
  3. A short discussion of how the proposed interfaces would interact with existing OpenMP loop-scheduling clauses (schedule, nowait, etc.) is missing and would strengthen the claim of non-breaking compatibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were listed in the report, so we have no specific points to address point-by-point. We will make any necessary minor revisions to the manuscript as appropriate.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper performs an analysis of existing OpenMP loop scheduling options and proposes two candidate interfaces for user-defined scheduling. It contains no equations, derivations, fitted parameters, or predictions. The central output is framed explicitly as 'a basis for discussion and prototype implementation' rather than a derived result. No self-citations, uniqueness theorems, or ansatzes are load-bearing. The work is self-contained against external benchmarks (the current OpenMP specification) and does not reduce any claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal assumes that user-defined scheduling can be added to OpenMP without fundamental changes to its execution model and that the identified principal components are sufficient. No free parameters or invented entities are introduced.

axioms (2)
  • domain assumption OpenMP's current three scheduling options are insufficient for certain applications and adding all possible strategies to the standard is infeasible.
    Stated in the abstract as the motivation for user-defined scheduling.
  • domain assumption The proposed interfaces can be realized in a standard-compliant way for C, C++, and Fortran.
    Implicit in the claim that the interfaces serve as candidates for the OpenMP standard.

pith-pipeline@v0.9.0 · 5693 in / 1277 out tokens · 17522 ms · 2026-05-25T19:03:48.683616+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    https://openmp.llvm.org/

  2. [2]

    http://www.drdobbs.com/parallel/quickthread-a-new-c-multicore-library/ 221800155

  3. [3]

    https://github.com/lapesd/libgomp, accessed: 2018-04-27 14

    An Enhanced OpenMP Library. https://github.com/lapesd/libgomp, accessed: 2018-04-27 14

  4. [4]

    Banicescu, I.: Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm. Ph.D. thesis, New York Polytechnic University (1996)

  5. [5]

    In: Proc

    Banicescu, I., Liu, Z.: Adaptive Factoring: A Dynamic Scheduling Method Tuned to the Rate of Weight Changes. In: Proc. of 8th High performance computing Symposium. pp. 122–129. Society for Computer Simulation International (2000)

  6. [6]

    Cluster Computing 6(3), 215–226 (Jul 2003), https://doi.org/10.1023/A: 1023588520138

    Banicescu, I., Velusamy, V., Devaprasad, J.: On the Scalability of Dy- namic Scheduling Scientific Applications with Adaptive Weighted Factoring. Cluster Computing 6(3), 215–226 (Jul 2003), https://doi.org/10.1023/A: 1023588520138

  7. [7]

    In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures

    Bast, H.: Dynamic Scheduling with Incomplete Information. In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 182–191. SPAA ’98, ACM, New York, NY, USA (1998), http://doi.acm.org/10. 1145/277651.277684

  8. [8]

    In: Proceedings of the 2018 International Workshop on OpenMP (iWomp 2018)

    Ciorba, F.M., Iwainsky, C., Buder, P.: OpenMP Loop Scheduling Revisited: Mak- ing a Case for More Schedules. In: Proceedings of the 2018 International Workshop on OpenMP (iWomp 2018). Barcelona (2018)

  9. [9]

    IEEE Computational Science & Engineering 5(1) (January-March 1998)

    Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1) (January-March 1998)

  10. [10]

    In: IEEE International Parallel and Distributed Processing Symposium

    Donfack, S., Grigori, L., Gropp, W.D., Kale, V.: Hybrid Static/Dynamic Schedul- ing for Already Optimized Dense Matrix Factorizations. In: IEEE International Parallel and Distributed Processing Symposium. International Parallel and Dis- tributed Processing Symposium (IPDPS) 2012, Shanghai, China (2012)

  11. [11]

    In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications

    Dong, Y., Chen, J., Yang, X., Deng, L., Zhang, X.: Energy-Oriented OpenMP Parallel Loop Scheduling. In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications. pp. 162–169 (Dec 2008), https: //doi.org/10.1109/ISPA.2008.68

  12. [12]

    International Journal of High Performance Computer Applications 25(1) (2011)

    Dongarra, J., Beckman, P., et al.: The International Exascale Software Roadmap. International Journal of High Performance Computer Applications 25(1) (2011)

  13. [13]

    In: Szymanski, B.K., Sinharoy, B

    Flynn Hummel, S., Banicescu, I., Wang, C.T., Wein, J.: Load Balancing and Data Locality Via Fractiling: An Experimental Study. In: Szymanski, B.K., Sinharoy, B. (eds.) Languages, Compilers and Run-Time Systems for Scalable Computers, pp. 85–98. Springer US, Boston, MA (1996)

  14. [14]

    In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures

    Flynn Hummel, S., Schmidt, J., Uma, R.N., Wein, J.: Load-sharing in Heteroge- neous Systems via Weighted Factoring. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 318–328. SPAA ’96, ACM, New York, NY, USA (1996),http://doi.acm.org/10.1145/237502.237576

  15. [15]

    Flynn Hummel, S., Schonberg, E., Flynn, L.E.: Factoring: A Method for Scheduling Parallel Loops. Commun. ACM 35(8), 90–101 (Aug 1992), http://doi.acm.org/ 10.1145/135226.135232

  16. [16]

    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990)

  17. [17]

    Master’s thesis, Mississippi State University (2003)

    Govindaswamy, K.: An API for Adaptive Loop Scheduling in Shared Address Space Architectures. Master’s thesis, Mississippi State University (2003)

  18. [18]

    Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)

    Kale, V., Donfack, S., Grigori, L., Gropp, W.D.: Lightweight Scheduling for Bal- ancing the Tradeoff Between Load Balance and Locality. Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)

  19. [19]

    High Performance Computing, Networking Storage and Analysis, SC Companion p

    Kale, V., Gamblin, T., Hoefler, T., de Supinski, B.R., Gropp, W.D.: Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk- 15 synchronous MPI Applications. High Performance Computing, Networking Storage and Analysis, SC Companion p. 1392 (November 2012)

  20. [20]

    In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface

    Kale, V., Gropp, W.: Load Balancing for Regular Meshes on SMPs with MPI. In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface. pp. 229–238. EuroMPI ’10, Springer-Verlag, Stuttgart, Germany (2010)

  21. [21]

    In: OpenMP: Heterogenous Execution and Data Movements (iWomp 2015)

    Kale, V., Gropp, W.D.: Composing low-overhead scheduling strategies for improv- ing performance of scientific applications. In: OpenMP: Heterogenous Execution and Data Movements (iWomp 2015). Cham (2015)

  22. [22]

    In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019)

    Kasielke, F., Tsch¨ uter, R., Iwainsky, C., Velten, M., Ciorba, F.M., Banicescu, I.: Exploring Loop Scheduling Enhancements in OpenMP: An LLVM Case Study. In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019). Amsterdam (June 2019)

  23. [23]

    IEEE Transactions on Software Engineering 20(6), 432–444 (June 1994), https: //doi.org/10.1109/32.295892

    Krueger, P., Shivaratri, N.G.: Adaptive Location Policies for Global Scheduling. IEEE Transactions on Software Engineering 20(6), 432–444 (June 1994), https: //doi.org/10.1109/32.295892

  24. [24]

    IEEE Transactions on Software Engineering SE-11(10), 1001–1016 (Oct 1985), https://doi.org/10.1023/A:1023588520138

    Kruskal, C.P., Weiss, A.: Allocating Independent Subtasks on Parallel Processors. IEEE Transactions on Software Engineering SE-11(10), 1001–1016 (Oct 1985), https://doi.org/10.1023/A:1023588520138

  25. [25]

    In: Proceedings of the 1993 International Conference on Parallel Processing - Volume 02

    Li, H., Tandri, S., Stumm, M., Sevcik, K.C.: Locality and Loop Scheduling on NUMA Multiprocessors. In: Proceedings of the 1993 International Conference on Parallel Processing - Volume 02. pp. 140–147. ICPP ’93, IEEE Computer Society, Washington, DC, USA (1993), http://dx.doi.org/10.1109/ICPP.1993.112

  26. [26]

    IEEE Transactions on Computers C- 36(12), 1425–1439 (Dec 1987), https://doi.org/10.1109/TC.1987.5009495

    Polychronopoulos, C.D., Kuck, D.J.: Guided Self-Scheduling: A Practical Schedul- ing Scheme for Parallel Supercomputers. IEEE Transactions on Computers C- 36(12), 1425–1439 (Dec 1987), https://doi.org/10.1109/TC.1987.5009495

  27. [27]

    In: Proceedings of the 23rd International Conference on Supercomputing

    Rountree, B., Lowenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: Making DVS Practical for Complex HPC Applications. In: Proceedings of the 23rd International Conference on Supercomputing. pp. 460–469. ICS ’09, ACM, Yorktown Heights, NY, USA (2009)

  28. [28]

    IEEE Transactions on Parallel and Distributed Systems 29(3), 512–526 (March 2018)

    Seo, S., Amer, A., Balaji, P., Bordage, C., Bosilca, G., Brooks, A., Carns, P., Castell, A., Genet, D., Herault, T., Iwasaki, S., Jindal, P., Kal, L.V., Krishnamoor- thy, S., Lifflander, J., Lu, H., Meneses, E., Snir, M., Sun, Y., Taura, K., Beck- man, P.: Argobots: A lightweight low-level threading and tasking framework. IEEE Transactions on Parallel and D...

  29. [29]

    In: Proc

    Tang, P., Yew, P.C.: Processor Self-Scheduling for Multiple-Nested Parallel Loops. In: Proc. of International Conference on Parallel Processing. pp. 528–535. IEEE (12 1986)

  30. [30]

    In: Chapman, B.M., Massaioli, F., M¨ uller, M.S., Rorro, M

    Thoman, P., Jordan, H., Pellegrini, S., Fahringer, T.: Automatic OpenMP Loop Scheduling: A Combined Compiler and Runtime Approach. In: Chapman, B.M., Massaioli, F., M¨ uller, M.S., Rorro, M. (eds.) OpenMP in a Heterogeneous World. pp. 88–101. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)

  31. [31]

    IEEE Transactions on Parallel and Distributed Systems 4(1), 87–98 (Jan 1993), https://doi.org/10.1109/71.205655

    Tzen, T.H., Ni, L.M.: Trapezoid Self-scheduling: A Practical Scheduling Scheme for Parallel Compilers. IEEE Transactions on Parallel and Distributed Systems 4(1), 87–98 (Jan 1993), https://doi.org/10.1109/71.205655

  32. [32]

    In: 2012 19th International Conference on High Performance Computing

    Wang, Y., Nicolau, A., Cammarota, R., Veidenbaum, A.V.: A Fault Tolerant Self- scheduling Scheme for Parallel Loops on Shared Memory Systems. In: 2012 19th International Conference on High Performance Computing. pp. 1–10 (Dec 2012), https://doi.org/10.1109/HiPC.2012.6507476 16

  33. [33]

    In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sympos ium (IPDPS’05) - Papers - Volume 01

    Zhang, Y., Voss, M.: Runtime Empirical Selection of Loop Schedulers on Hy- perthreaded SMPs. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sympos ium (IPDPS’05) - Papers - Volume 01. pp. 44.2– . IPDPS ’05, IEEE Computer Society, Washington, DC, USA (2005), http: //dx.doi.org/10.1109/IPDPS.2005.386