Toward a Standard Interface for User-Defined Scheduling in OpenMP
Pith reviewed 2026-05-25 19:03 UTC · model grok-4.3
The pith
OpenMP should add an interface letting users define custom strategies for scheduling parallel loops.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The principal components required by user-defined scheduling are analyzed and two competing interfaces are proposed as candidates for the OpenMP standard to enable standard-compliant application-specific scheduling.
What carries the argument
Two competing interfaces for user-defined scheduling that capture the principal components needed for custom loop schedulers.
If this is right
- Programmers could supply application-specific loop schedulers while staying inside the OpenMP standard.
- Performance gains become possible in cases where the three built-in options are inadequate.
- The community gains concrete designs to discuss and prototype across C, C++, and Fortran.
Where Pith is reading between the lines
- Adoption would shift focus from adding fixed strategies to providing a general extension point.
- Further work could test whether the interfaces support schedulers from outside OpenMP runtimes.
- The approach avoids the need to standardize every conceivable scheduling method individually.
Load-bearing premise
The two proposed interfaces capture the needs of real user-defined schedulers and can be implemented without breaking existing OpenMP semantics or performance in C, C++, and Fortran.
What would settle it
A working prototype of either interface that cannot support common custom scheduling behaviors or that changes the behavior or speed of existing OpenMP programs would show the designs fall short.
Figures
read the original abstract
Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain instances. Given the large number of other possible scheduling strategies, it is infeasible to standardize each one. A more viable approach is to extend the OpenMP standard to allow for users to define loop scheduling strategies. The approach will enable standard-compliant application-specific scheduling. This work analyzes the principal components required by user-defined scheduling and proposes two competing interfaces as candidates for the OpenMP standard. We conceptually compare the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran. These interfaces serve the OpenMP community as a basis for discussion and prototype implementation for user-defined scheduling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the principal components required by user-defined scheduling in OpenMP and proposes two competing interfaces as candidates for the OpenMP standard. It conceptually compares the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran, serving as a basis for discussion and prototype implementation.
Significance. If the proposed interfaces prove implementable and compatible upon prototyping, the work could meaningfully extend OpenMP's scheduling flexibility beyond the current three options, enabling application-specific strategies while remaining standard-compliant. The conceptual comparison across host languages is a useful contribution to the standardization process.
minor comments (3)
- The abstract states that two interfaces are proposed but does not name or briefly characterize them (e.g., by key differences in callback style or language binding), which would improve immediate readability for readers scanning the contribution.
- The manuscript would benefit from an explicit enumeration (perhaps in a table) of the principal components identified in the analysis, to make the mapping from analysis to the two interface designs more transparent.
- A short discussion of how the proposed interfaces would interact with existing OpenMP loop-scheduling clauses (schedule, nowait, etc.) is missing and would strengthen the claim of non-breaking compatibility.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were listed in the report, so we have no specific points to address point-by-point. We will make any necessary minor revisions to the manuscript as appropriate.
Circularity Check
No significant circularity identified
full rationale
The paper performs an analysis of existing OpenMP loop scheduling options and proposes two candidate interfaces for user-defined scheduling. It contains no equations, derivations, fitted parameters, or predictions. The central output is framed explicitly as 'a basis for discussion and prototype implementation' rather than a derived result. No self-citations, uniqueness theorems, or ansatzes are load-bearing. The work is self-contained against external benchmarks (the current OpenMP specification) and does not reduce any claim to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption OpenMP's current three scheduling options are insufficient for certain applications and adding all possible strategies to the standard is infeasible.
- domain assumption The proposed interfaces can be realized in a standard-compliant way for C, C++, and Fortran.
Reference graph
Works this paper leans on
-
[1]
https://openmp.llvm.org/
-
[2]
http://www.drdobbs.com/parallel/quickthread-a-new-c-multicore-library/ 221800155
-
[3]
https://github.com/lapesd/libgomp, accessed: 2018-04-27 14
An Enhanced OpenMP Library. https://github.com/lapesd/libgomp, accessed: 2018-04-27 14
work page 2018
-
[4]
Banicescu, I.: Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm. Ph.D. thesis, New York Polytechnic University (1996)
work page 1996
- [5]
-
[6]
Cluster Computing 6(3), 215–226 (Jul 2003), https://doi.org/10.1023/A: 1023588520138
Banicescu, I., Velusamy, V., Devaprasad, J.: On the Scalability of Dy- namic Scheduling Scientific Applications with Adaptive Weighted Factoring. Cluster Computing 6(3), 215–226 (Jul 2003), https://doi.org/10.1023/A: 1023588520138
work page doi:10.1023/a: 2003
-
[7]
In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures
Bast, H.: Dynamic Scheduling with Incomplete Information. In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 182–191. SPAA ’98, ACM, New York, NY, USA (1998), http://doi.acm.org/10. 1145/277651.277684
-
[8]
In: Proceedings of the 2018 International Workshop on OpenMP (iWomp 2018)
Ciorba, F.M., Iwainsky, C., Buder, P.: OpenMP Loop Scheduling Revisited: Mak- ing a Case for More Schedules. In: Proceedings of the 2018 International Workshop on OpenMP (iWomp 2018). Barcelona (2018)
work page 2018
-
[9]
IEEE Computational Science & Engineering 5(1) (January-March 1998)
Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1) (January-March 1998)
work page 1998
-
[10]
In: IEEE International Parallel and Distributed Processing Symposium
Donfack, S., Grigori, L., Gropp, W.D., Kale, V.: Hybrid Static/Dynamic Schedul- ing for Already Optimized Dense Matrix Factorizations. In: IEEE International Parallel and Distributed Processing Symposium. International Parallel and Dis- tributed Processing Symposium (IPDPS) 2012, Shanghai, China (2012)
work page 2012
-
[11]
In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
Dong, Y., Chen, J., Yang, X., Deng, L., Zhang, X.: Energy-Oriented OpenMP Parallel Loop Scheduling. In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications. pp. 162–169 (Dec 2008), https: //doi.org/10.1109/ISPA.2008.68
-
[12]
International Journal of High Performance Computer Applications 25(1) (2011)
Dongarra, J., Beckman, P., et al.: The International Exascale Software Roadmap. International Journal of High Performance Computer Applications 25(1) (2011)
work page 2011
-
[13]
In: Szymanski, B.K., Sinharoy, B
Flynn Hummel, S., Banicescu, I., Wang, C.T., Wein, J.: Load Balancing and Data Locality Via Fractiling: An Experimental Study. In: Szymanski, B.K., Sinharoy, B. (eds.) Languages, Compilers and Run-Time Systems for Scalable Computers, pp. 85–98. Springer US, Boston, MA (1996)
work page 1996
-
[14]
In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures
Flynn Hummel, S., Schmidt, J., Uma, R.N., Wein, J.: Load-sharing in Heteroge- neous Systems via Weighted Factoring. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 318–328. SPAA ’96, ACM, New York, NY, USA (1996),http://doi.acm.org/10.1145/237502.237576
-
[15]
Flynn Hummel, S., Schonberg, E., Flynn, L.E.: Factoring: A Method for Scheduling Parallel Loops. Commun. ACM 35(8), 90–101 (Aug 1992), http://doi.acm.org/ 10.1145/135226.135232
-
[16]
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990)
work page 1990
-
[17]
Master’s thesis, Mississippi State University (2003)
Govindaswamy, K.: An API for Adaptive Loop Scheduling in Shared Address Space Architectures. Master’s thesis, Mississippi State University (2003)
work page 2003
-
[18]
Kale, V., Donfack, S., Grigori, L., Gropp, W.D.: Lightweight Scheduling for Bal- ancing the Tradeoff Between Load Balance and Locality. Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)
work page 2014
-
[19]
High Performance Computing, Networking Storage and Analysis, SC Companion p
Kale, V., Gamblin, T., Hoefler, T., de Supinski, B.R., Gropp, W.D.: Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk- 15 synchronous MPI Applications. High Performance Computing, Networking Storage and Analysis, SC Companion p. 1392 (November 2012)
work page 2012
-
[20]
Kale, V., Gropp, W.: Load Balancing for Regular Meshes on SMPs with MPI. In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface. pp. 229–238. EuroMPI ’10, Springer-Verlag, Stuttgart, Germany (2010)
work page 2010
-
[21]
In: OpenMP: Heterogenous Execution and Data Movements (iWomp 2015)
Kale, V., Gropp, W.D.: Composing low-overhead scheduling strategies for improv- ing performance of scientific applications. In: OpenMP: Heterogenous Execution and Data Movements (iWomp 2015). Cham (2015)
work page 2015
-
[22]
Kasielke, F., Tsch¨ uter, R., Iwainsky, C., Velten, M., Ciorba, F.M., Banicescu, I.: Exploring Loop Scheduling Enhancements in OpenMP: An LLVM Case Study. In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019). Amsterdam (June 2019)
work page 2019
-
[23]
Krueger, P., Shivaratri, N.G.: Adaptive Location Policies for Global Scheduling. IEEE Transactions on Software Engineering 20(6), 432–444 (June 1994), https: //doi.org/10.1109/32.295892
-
[24]
Kruskal, C.P., Weiss, A.: Allocating Independent Subtasks on Parallel Processors. IEEE Transactions on Software Engineering SE-11(10), 1001–1016 (Oct 1985), https://doi.org/10.1023/A:1023588520138
-
[25]
In: Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
Li, H., Tandri, S., Stumm, M., Sevcik, K.C.: Locality and Loop Scheduling on NUMA Multiprocessors. In: Proceedings of the 1993 International Conference on Parallel Processing - Volume 02. pp. 140–147. ICPP ’93, IEEE Computer Society, Washington, DC, USA (1993), http://dx.doi.org/10.1109/ICPP.1993.112
-
[26]
Polychronopoulos, C.D., Kuck, D.J.: Guided Self-Scheduling: A Practical Schedul- ing Scheme for Parallel Supercomputers. IEEE Transactions on Computers C- 36(12), 1425–1439 (Dec 1987), https://doi.org/10.1109/TC.1987.5009495
-
[27]
In: Proceedings of the 23rd International Conference on Supercomputing
Rountree, B., Lowenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: Making DVS Practical for Complex HPC Applications. In: Proceedings of the 23rd International Conference on Supercomputing. pp. 460–469. ICS ’09, ACM, Yorktown Heights, NY, USA (2009)
work page 2009
-
[28]
IEEE Transactions on Parallel and Distributed Systems 29(3), 512–526 (March 2018)
Seo, S., Amer, A., Balaji, P., Bordage, C., Bosilca, G., Brooks, A., Carns, P., Castell, A., Genet, D., Herault, T., Iwasaki, S., Jindal, P., Kal, L.V., Krishnamoor- thy, S., Lifflander, J., Lu, H., Meneses, E., Snir, M., Sun, Y., Taura, K., Beck- man, P.: Argobots: A lightweight low-level threading and tasking framework. IEEE Transactions on Parallel and D...
work page 2018
- [29]
-
[30]
In: Chapman, B.M., Massaioli, F., M¨ uller, M.S., Rorro, M
Thoman, P., Jordan, H., Pellegrini, S., Fahringer, T.: Automatic OpenMP Loop Scheduling: A Combined Compiler and Runtime Approach. In: Chapman, B.M., Massaioli, F., M¨ uller, M.S., Rorro, M. (eds.) OpenMP in a Heterogeneous World. pp. 88–101. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)
work page 2012
-
[31]
Tzen, T.H., Ni, L.M.: Trapezoid Self-scheduling: A Practical Scheduling Scheme for Parallel Compilers. IEEE Transactions on Parallel and Distributed Systems 4(1), 87–98 (Jan 1993), https://doi.org/10.1109/71.205655
-
[32]
In: 2012 19th International Conference on High Performance Computing
Wang, Y., Nicolau, A., Cammarota, R., Veidenbaum, A.V.: A Fault Tolerant Self- scheduling Scheme for Parallel Loops on Shared Memory Systems. In: 2012 19th International Conference on High Performance Computing. pp. 1–10 (Dec 2012), https://doi.org/10.1109/HiPC.2012.6507476 16
-
[33]
Zhang, Y., Voss, M.: Runtime Empirical Selection of Loop Schedulers on Hy- perthreaded SMPs. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sympos ium (IPDPS’05) - Papers - Volume 01. pp. 44.2– . IPDPS ’05, IEEE Computer Society, Washington, DC, USA (2005), http: //dx.doi.org/10.1109/IPDPS.2005.386
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.