Toward a Standard Interface for User-Defined Scheduling in OpenMP

Christian Iwainsky; Florina M. Ciorba; Jonas H. Muller Korndorfer; Michael Klemm; Vivek Kale

arxiv: 1906.08911 · v2 · pith:GEBV3Y5Inew · submitted 2019-06-21 · 💻 cs.DC · cs.CL· cs.PF· cs.PL

Toward a Standard Interface for User-Defined Scheduling in OpenMP

Vivek Kale , Christian Iwainsky , Michael Klemm , Jonas H. Muller Korndorfer , Florina M. Ciorba This is my paper

Pith reviewed 2026-05-25 19:03 UTC · model grok-4.3

classification 💻 cs.DC cs.CLcs.PFcs.PL

keywords OpenMPuser-defined schedulingparallel loopsloop schedulingstandard interfaceCC++Fortran

0 comments

The pith

OpenMP should add an interface letting users define custom strategies for scheduling parallel loops.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The three built-in scheduling choices in OpenMP prove insufficient for some parallel programs, making it impractical to add every possible strategy to the standard. The paper identifies the core components any user-defined scheduler would need and offers two alternative interface designs that could be added to OpenMP. These designs aim to let programmers supply application-specific schedulers while remaining fully standard-compliant. The interfaces are examined for compatibility with the three host languages OpenMP supports.

Core claim

The principal components required by user-defined scheduling are analyzed and two competing interfaces are proposed as candidates for the OpenMP standard to enable standard-compliant application-specific scheduling.

What carries the argument

Two competing interfaces for user-defined scheduling that capture the principal components needed for custom loop schedulers.

If this is right

Programmers could supply application-specific loop schedulers while staying inside the OpenMP standard.
Performance gains become possible in cases where the three built-in options are inadequate.
The community gains concrete designs to discuss and prototype across C, C++, and Fortran.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption would shift focus from adding fixed strategies to providing a general extension point.
Further work could test whether the interfaces support schedulers from outside OpenMP runtimes.
The approach avoids the need to standardize every conceivable scheduling method individually.

Load-bearing premise

The two proposed interfaces capture the needs of real user-defined schedulers and can be implemented without breaking existing OpenMP semantics or performance in C, C++, and Fortran.

What would settle it

A working prototype of either interface that cannot support common custom scheduling behaviors or that changes the behavior or speed of existing OpenMP programs would show the designs fall short.

Figures

Figures reproduced from arXiv: 1906.08911 by Christian Iwainsky, Florina M. Ciorba, Jonas H. Muller Korndorfer, Michael Klemm, Vivek Kale.

**Figure 1.** Figure 1: Basic loop scheduler code structure. 3 Support for User-defined Scheduling Strategies Let us consider what is needed to specify an arbitrary scheduling strategy for a parallel loop. The strategy can use a combination of shared data structures, a collection of low-overhead steal work queues, exclusive queues meant for each core, or shared queues from which multiple threads can dequeue tasks each representi… view at source ↗

**Figure 2.** Figure 2: Naive example for implementing the OpenMP static scheduling clause using both proposed UDS strategies. Left side presents the implementation following the lambda-style specification, Sec. 4.1, while the right side follows the declare-directives style, Sec. 4.2. A potential solution would allow the use of the lambda-style syntax for C++, and the UDR-style for C and Fortran codes. Our suggested UDS approach … view at source ↗

read the original abstract

Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain instances. Given the large number of other possible scheduling strategies, it is infeasible to standardize each one. A more viable approach is to extend the OpenMP standard to allow for users to define loop scheduling strategies. The approach will enable standard-compliant application-specific scheduling. This work analyzes the principal components required by user-defined scheduling and proposes two competing interfaces as candidates for the OpenMP standard. We conceptually compare the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran. These interfaces serve the OpenMP community as a basis for discussion and prototype implementation for user-defined scheduling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete component breakdown and two interface sketches for user-defined OpenMP scheduling, but stays purely conceptual with no implementation or tests.

read the letter

This paper identifies that OpenMP's three built-in schedulers fall short for some loops and proposes letting users supply their own. It breaks scheduling into components such as chunk selection, work distribution, and state management, then offers two interface designs and walks through how each would look in C, C++, and Fortran. That component list and the side-by-side language comparison are the actual new pieces; prior OpenMP papers had not laid out candidate interfaces at this level of detail. The work is grounded in the existing standard rather than inventing new abstractions from scratch, which makes the suggestions easier to evaluate against current semantics. The main limitation is the complete absence of any prototype, performance numbers, or even pseudocode that shows the interfaces can be realized without extra overhead or changes to existing programs. The authors correctly frame the output as a discussion starter, so the lack of validation is not hidden, but it does leave open whether the designs cover real scheduler needs or introduce hidden incompatibilities. Readers working on OpenMP runtime implementations or the language committee would find the component analysis useful as a checklist. A serious referee could help sharpen the interfaces and flag any language-specific pitfalls before any standardization effort begins. I would send it to review rather than desk-reject.

Referee Report

0 major / 3 minor

Summary. The manuscript analyzes the principal components required by user-defined scheduling in OpenMP and proposes two competing interfaces as candidates for the OpenMP standard. It conceptually compares the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran, serving as a basis for discussion and prototype implementation.

Significance. If the proposed interfaces prove implementable and compatible upon prototyping, the work could meaningfully extend OpenMP's scheduling flexibility beyond the current three options, enabling application-specific strategies while remaining standard-compliant. The conceptual comparison across host languages is a useful contribution to the standardization process.

minor comments (3)

The abstract states that two interfaces are proposed but does not name or briefly characterize them (e.g., by key differences in callback style or language binding), which would improve immediate readability for readers scanning the contribution.
The manuscript would benefit from an explicit enumeration (perhaps in a table) of the principal components identified in the analysis, to make the mapping from analysis to the two interface designs more transparent.
A short discussion of how the proposed interfaces would interact with existing OpenMP loop-scheduling clauses (schedule, nowait, etc.) is missing and would strengthen the claim of non-breaking compatibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were listed in the report, so we have no specific points to address point-by-point. We will make any necessary minor revisions to the manuscript as appropriate.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper performs an analysis of existing OpenMP loop scheduling options and proposes two candidate interfaces for user-defined scheduling. It contains no equations, derivations, fitted parameters, or predictions. The central output is framed explicitly as 'a basis for discussion and prototype implementation' rather than a derived result. No self-citations, uniqueness theorems, or ansatzes are load-bearing. The work is self-contained against external benchmarks (the current OpenMP specification) and does not reduce any claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal assumes that user-defined scheduling can be added to OpenMP without fundamental changes to its execution model and that the identified principal components are sufficient. No free parameters or invented entities are introduced.

axioms (2)

domain assumption OpenMP's current three scheduling options are insufficient for certain applications and adding all possible strategies to the standard is infeasible.
Stated in the abstract as the motivation for user-defined scheduling.
domain assumption The proposed interfaces can be realized in a standard-compliant way for C, C++, and Fortran.
Implicit in the claim that the interfaces serve as candidates for the OpenMP standard.

pith-pipeline@v0.9.0 · 5693 in / 1277 out tokens · 17522 ms · 2026-05-25T19:03:48.683616+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

https://openmp.llvm.org/

work page
[2]

http://www.drdobbs.com/parallel/quickthread-a-new-c-multicore-library/ 221800155

work page
[3]

https://github.com/lapesd/libgomp, accessed: 2018-04-27 14

An Enhanced OpenMP Library. https://github.com/lapesd/libgomp, accessed: 2018-04-27 14

work page 2018
[4]

Banicescu, I.: Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm. Ph.D. thesis, New York Polytechnic University (1996)

work page 1996
[5]

In: Proc

Banicescu, I., Liu, Z.: Adaptive Factoring: A Dynamic Scheduling Method Tuned to the Rate of Weight Changes. In: Proc. of 8th High performance computing Symposium. pp. 122–129. Society for Computer Simulation International (2000)

work page 2000
[6]

Cluster Computing 6(3), 215–226 (Jul 2003), https://doi.org/10.1023/A: 1023588520138

Banicescu, I., Velusamy, V., Devaprasad, J.: On the Scalability of Dy- namic Scheduling Scientiﬁc Applications with Adaptive Weighted Factoring. Cluster Computing 6(3), 215–226 (Jul 2003), https://doi.org/10.1023/A: 1023588520138

work page doi:10.1023/a: 2003
[7]

In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures

Bast, H.: Dynamic Scheduling with Incomplete Information. In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 182–191. SPAA ’98, ACM, New York, NY, USA (1998), http://doi.acm.org/10. 1145/277651.277684

work page arXiv 1998
[8]

In: Proceedings of the 2018 International Workshop on OpenMP (iWomp 2018)

Ciorba, F.M., Iwainsky, C., Buder, P.: OpenMP Loop Scheduling Revisited: Mak- ing a Case for More Schedules. In: Proceedings of the 2018 International Workshop on OpenMP (iWomp 2018). Barcelona (2018)

work page 2018
[9]

IEEE Computational Science & Engineering 5(1) (January-March 1998)

Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1) (January-March 1998)

work page 1998
[10]

In: IEEE International Parallel and Distributed Processing Symposium

Donfack, S., Grigori, L., Gropp, W.D., Kale, V.: Hybrid Static/Dynamic Schedul- ing for Already Optimized Dense Matrix Factorizations. In: IEEE International Parallel and Distributed Processing Symposium. International Parallel and Dis- tributed Processing Symposium (IPDPS) 2012, Shanghai, China (2012)

work page 2012
[11]

In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications

Dong, Y., Chen, J., Yang, X., Deng, L., Zhang, X.: Energy-Oriented OpenMP Parallel Loop Scheduling. In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications. pp. 162–169 (Dec 2008), https: //doi.org/10.1109/ISPA.2008.68

work page doi:10.1109/ispa.2008.68 2008
[12]

International Journal of High Performance Computer Applications 25(1) (2011)

Dongarra, J., Beckman, P., et al.: The International Exascale Software Roadmap. International Journal of High Performance Computer Applications 25(1) (2011)

work page 2011
[13]

In: Szymanski, B.K., Sinharoy, B

Flynn Hummel, S., Banicescu, I., Wang, C.T., Wein, J.: Load Balancing and Data Locality Via Fractiling: An Experimental Study. In: Szymanski, B.K., Sinharoy, B. (eds.) Languages, Compilers and Run-Time Systems for Scalable Computers, pp. 85–98. Springer US, Boston, MA (1996)

work page 1996
[14]

In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures

Flynn Hummel, S., Schmidt, J., Uma, R.N., Wein, J.: Load-sharing in Heteroge- neous Systems via Weighted Factoring. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 318–328. SPAA ’96, ACM, New York, NY, USA (1996),http://doi.acm.org/10.1145/237502.237576

work page doi:10.1145/237502.237576 1996
[15]

Flynn Hummel, S., Schonberg, E., Flynn, L.E.: Factoring: A Method for Scheduling Parallel Loops. Commun. ACM 35(8), 90–101 (Aug 1992), http://doi.acm.org/ 10.1145/135226.135232

work page doi:10.1145/135226.135232 1992
[16]

Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990)

work page 1990
[17]

Master’s thesis, Mississippi State University (2003)

Govindaswamy, K.: An API for Adaptive Loop Scheduling in Shared Address Space Architectures. Master’s thesis, Mississippi State University (2003)

work page 2003
[18]

Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)

Kale, V., Donfack, S., Grigori, L., Gropp, W.D.: Lightweight Scheduling for Bal- ancing the Tradeoﬀ Between Load Balance and Locality. Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)

work page 2014
[19]

High Performance Computing, Networking Storage and Analysis, SC Companion p

Kale, V., Gamblin, T., Hoeﬂer, T., de Supinski, B.R., Gropp, W.D.: Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk- 15 synchronous MPI Applications. High Performance Computing, Networking Storage and Analysis, SC Companion p. 1392 (November 2012)

work page 2012
[20]

In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface

Kale, V., Gropp, W.: Load Balancing for Regular Meshes on SMPs with MPI. In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface. pp. 229–238. EuroMPI ’10, Springer-Verlag, Stuttgart, Germany (2010)

work page 2010
[21]

In: OpenMP: Heterogenous Execution and Data Movements (iWomp 2015)

Kale, V., Gropp, W.D.: Composing low-overhead scheduling strategies for improv- ing performance of scientiﬁc applications. In: OpenMP: Heterogenous Execution and Data Movements (iWomp 2015). Cham (2015)

work page 2015
[22]

In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019)

Kasielke, F., Tsch¨ uter, R., Iwainsky, C., Velten, M., Ciorba, F.M., Banicescu, I.: Exploring Loop Scheduling Enhancements in OpenMP: An LLVM Case Study. In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019). Amsterdam (June 2019)

work page 2019
[23]

IEEE Transactions on Software Engineering 20(6), 432–444 (June 1994), https: //doi.org/10.1109/32.295892

Krueger, P., Shivaratri, N.G.: Adaptive Location Policies for Global Scheduling. IEEE Transactions on Software Engineering 20(6), 432–444 (June 1994), https: //doi.org/10.1109/32.295892

work page doi:10.1109/32.295892 1994
[24]

IEEE Transactions on Software Engineering SE-11(10), 1001–1016 (Oct 1985), https://doi.org/10.1023/A:1023588520138

Kruskal, C.P., Weiss, A.: Allocating Independent Subtasks on Parallel Processors. IEEE Transactions on Software Engineering SE-11(10), 1001–1016 (Oct 1985), https://doi.org/10.1023/A:1023588520138

work page doi:10.1023/a:1023588520138 1985
[25]

In: Proceedings of the 1993 International Conference on Parallel Processing - Volume 02

Li, H., Tandri, S., Stumm, M., Sevcik, K.C.: Locality and Loop Scheduling on NUMA Multiprocessors. In: Proceedings of the 1993 International Conference on Parallel Processing - Volume 02. pp. 140–147. ICPP ’93, IEEE Computer Society, Washington, DC, USA (1993), http://dx.doi.org/10.1109/ICPP.1993.112

work page doi:10.1109/icpp.1993.112 1993
[26]

IEEE Transactions on Computers C- 36(12), 1425–1439 (Dec 1987), https://doi.org/10.1109/TC.1987.5009495

Polychronopoulos, C.D., Kuck, D.J.: Guided Self-Scheduling: A Practical Schedul- ing Scheme for Parallel Supercomputers. IEEE Transactions on Computers C- 36(12), 1425–1439 (Dec 1987), https://doi.org/10.1109/TC.1987.5009495

work page doi:10.1109/tc.1987.5009495 1987
[27]

In: Proceedings of the 23rd International Conference on Supercomputing

Rountree, B., Lowenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: Making DVS Practical for Complex HPC Applications. In: Proceedings of the 23rd International Conference on Supercomputing. pp. 460–469. ICS ’09, ACM, Yorktown Heights, NY, USA (2009)

work page 2009
[28]

IEEE Transactions on Parallel and Distributed Systems 29(3), 512–526 (March 2018)

Seo, S., Amer, A., Balaji, P., Bordage, C., Bosilca, G., Brooks, A., Carns, P., Castell, A., Genet, D., Herault, T., Iwasaki, S., Jindal, P., Kal, L.V., Krishnamoor- thy, S., Liﬄander, J., Lu, H., Meneses, E., Snir, M., Sun, Y., Taura, K., Beck- man, P.: Argobots: A lightweight low-level threading and tasking framework. IEEE Transactions on Parallel and D...

work page 2018
[29]

In: Proc

Tang, P., Yew, P.C.: Processor Self-Scheduling for Multiple-Nested Parallel Loops. In: Proc. of International Conference on Parallel Processing. pp. 528–535. IEEE (12 1986)

work page 1986
[30]

In: Chapman, B.M., Massaioli, F., M¨ uller, M.S., Rorro, M

Thoman, P., Jordan, H., Pellegrini, S., Fahringer, T.: Automatic OpenMP Loop Scheduling: A Combined Compiler and Runtime Approach. In: Chapman, B.M., Massaioli, F., M¨ uller, M.S., Rorro, M. (eds.) OpenMP in a Heterogeneous World. pp. 88–101. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)

work page 2012
[31]

IEEE Transactions on Parallel and Distributed Systems 4(1), 87–98 (Jan 1993), https://doi.org/10.1109/71.205655

Tzen, T.H., Ni, L.M.: Trapezoid Self-scheduling: A Practical Scheduling Scheme for Parallel Compilers. IEEE Transactions on Parallel and Distributed Systems 4(1), 87–98 (Jan 1993), https://doi.org/10.1109/71.205655

work page doi:10.1109/71.205655 1993
[32]

In: 2012 19th International Conference on High Performance Computing

Wang, Y., Nicolau, A., Cammarota, R., Veidenbaum, A.V.: A Fault Tolerant Self- scheduling Scheme for Parallel Loops on Shared Memory Systems. In: 2012 19th International Conference on High Performance Computing. pp. 1–10 (Dec 2012), https://doi.org/10.1109/HiPC.2012.6507476 16

work page doi:10.1109/hipc.2012.6507476 2012
[33]

In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sympos ium (IPDPS’05) - Papers - Volume 01

Zhang, Y., Voss, M.: Runtime Empirical Selection of Loop Schedulers on Hy- perthreaded SMPs. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sympos ium (IPDPS’05) - Papers - Volume 01. pp. 44.2– . IPDPS ’05, IEEE Computer Society, Washington, DC, USA (2005), http: //dx.doi.org/10.1109/IPDPS.2005.386

work page doi:10.1109/ipdps.2005.386 2005

[1] [1]

https://openmp.llvm.org/

work page

[2] [2]

http://www.drdobbs.com/parallel/quickthread-a-new-c-multicore-library/ 221800155

work page

[3] [3]

https://github.com/lapesd/libgomp, accessed: 2018-04-27 14

An Enhanced OpenMP Library. https://github.com/lapesd/libgomp, accessed: 2018-04-27 14

work page 2018

[4] [4]

Banicescu, I.: Load Balancing and Data Locality in the Parallelization of the Fast Multipole Algorithm. Ph.D. thesis, New York Polytechnic University (1996)

work page 1996

[5] [5]

In: Proc

Banicescu, I., Liu, Z.: Adaptive Factoring: A Dynamic Scheduling Method Tuned to the Rate of Weight Changes. In: Proc. of 8th High performance computing Symposium. pp. 122–129. Society for Computer Simulation International (2000)

work page 2000

[6] [6]

Cluster Computing 6(3), 215–226 (Jul 2003), https://doi.org/10.1023/A: 1023588520138

Banicescu, I., Velusamy, V., Devaprasad, J.: On the Scalability of Dy- namic Scheduling Scientiﬁc Applications with Adaptive Weighted Factoring. Cluster Computing 6(3), 215–226 (Jul 2003), https://doi.org/10.1023/A: 1023588520138

work page doi:10.1023/a: 2003

[7] [7]

In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures

Bast, H.: Dynamic Scheduling with Incomplete Information. In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 182–191. SPAA ’98, ACM, New York, NY, USA (1998), http://doi.acm.org/10. 1145/277651.277684

work page arXiv 1998

[8] [8]

In: Proceedings of the 2018 International Workshop on OpenMP (iWomp 2018)

Ciorba, F.M., Iwainsky, C., Buder, P.: OpenMP Loop Scheduling Revisited: Mak- ing a Case for More Schedules. In: Proceedings of the 2018 International Workshop on OpenMP (iWomp 2018). Barcelona (2018)

work page 2018

[9] [9]

IEEE Computational Science & Engineering 5(1) (January-March 1998)

Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1) (January-March 1998)

work page 1998

[10] [10]

In: IEEE International Parallel and Distributed Processing Symposium

Donfack, S., Grigori, L., Gropp, W.D., Kale, V.: Hybrid Static/Dynamic Schedul- ing for Already Optimized Dense Matrix Factorizations. In: IEEE International Parallel and Distributed Processing Symposium. International Parallel and Dis- tributed Processing Symposium (IPDPS) 2012, Shanghai, China (2012)

work page 2012

[11] [11]

In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications

Dong, Y., Chen, J., Yang, X., Deng, L., Zhang, X.: Energy-Oriented OpenMP Parallel Loop Scheduling. In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications. pp. 162–169 (Dec 2008), https: //doi.org/10.1109/ISPA.2008.68

work page doi:10.1109/ispa.2008.68 2008

[12] [12]

International Journal of High Performance Computer Applications 25(1) (2011)

Dongarra, J., Beckman, P., et al.: The International Exascale Software Roadmap. International Journal of High Performance Computer Applications 25(1) (2011)

work page 2011

[13] [13]

In: Szymanski, B.K., Sinharoy, B

Flynn Hummel, S., Banicescu, I., Wang, C.T., Wein, J.: Load Balancing and Data Locality Via Fractiling: An Experimental Study. In: Szymanski, B.K., Sinharoy, B. (eds.) Languages, Compilers and Run-Time Systems for Scalable Computers, pp. 85–98. Springer US, Boston, MA (1996)

work page 1996

[14] [14]

In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures

Flynn Hummel, S., Schmidt, J., Uma, R.N., Wein, J.: Load-sharing in Heteroge- neous Systems via Weighted Factoring. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures. pp. 318–328. SPAA ’96, ACM, New York, NY, USA (1996),http://doi.acm.org/10.1145/237502.237576

work page doi:10.1145/237502.237576 1996

[15] [15]

Flynn Hummel, S., Schonberg, E., Flynn, L.E.: Factoring: A Method for Scheduling Parallel Loops. Commun. ACM 35(8), 90–101 (Aug 1992), http://doi.acm.org/ 10.1145/135226.135232

work page doi:10.1145/135226.135232 1992

[16] [16]

Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990)

work page 1990

[17] [17]

Master’s thesis, Mississippi State University (2003)

Govindaswamy, K.: An API for Adaptive Loop Scheduling in Shared Address Space Architectures. Master’s thesis, Mississippi State University (2003)

work page 2003

[18] [18]

Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)

Kale, V., Donfack, S., Grigori, L., Gropp, W.D.: Lightweight Scheduling for Bal- ancing the Tradeoﬀ Between Load Balance and Locality. Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)

work page 2014

[19] [19]

High Performance Computing, Networking Storage and Analysis, SC Companion p

Kale, V., Gamblin, T., Hoeﬂer, T., de Supinski, B.R., Gropp, W.D.: Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk- 15 synchronous MPI Applications. High Performance Computing, Networking Storage and Analysis, SC Companion p. 1392 (November 2012)

work page 2012

[20] [20]

In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface

Kale, V., Gropp, W.: Load Balancing for Regular Meshes on SMPs with MPI. In: Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface. pp. 229–238. EuroMPI ’10, Springer-Verlag, Stuttgart, Germany (2010)

work page 2010

[21] [21]

In: OpenMP: Heterogenous Execution and Data Movements (iWomp 2015)

Kale, V., Gropp, W.D.: Composing low-overhead scheduling strategies for improv- ing performance of scientiﬁc applications. In: OpenMP: Heterogenous Execution and Data Movements (iWomp 2015). Cham (2015)

work page 2015

[22] [22]

In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019)

Kasielke, F., Tsch¨ uter, R., Iwainsky, C., Velten, M., Ciorba, F.M., Banicescu, I.: Exploring Loop Scheduling Enhancements in OpenMP: An LLVM Case Study. In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019). Amsterdam (June 2019)

work page 2019

[23] [23]

IEEE Transactions on Software Engineering 20(6), 432–444 (June 1994), https: //doi.org/10.1109/32.295892

Krueger, P., Shivaratri, N.G.: Adaptive Location Policies for Global Scheduling. IEEE Transactions on Software Engineering 20(6), 432–444 (June 1994), https: //doi.org/10.1109/32.295892

work page doi:10.1109/32.295892 1994

[24] [24]

IEEE Transactions on Software Engineering SE-11(10), 1001–1016 (Oct 1985), https://doi.org/10.1023/A:1023588520138

Kruskal, C.P., Weiss, A.: Allocating Independent Subtasks on Parallel Processors. IEEE Transactions on Software Engineering SE-11(10), 1001–1016 (Oct 1985), https://doi.org/10.1023/A:1023588520138

work page doi:10.1023/a:1023588520138 1985

[25] [25]

In: Proceedings of the 1993 International Conference on Parallel Processing - Volume 02

Li, H., Tandri, S., Stumm, M., Sevcik, K.C.: Locality and Loop Scheduling on NUMA Multiprocessors. In: Proceedings of the 1993 International Conference on Parallel Processing - Volume 02. pp. 140–147. ICPP ’93, IEEE Computer Society, Washington, DC, USA (1993), http://dx.doi.org/10.1109/ICPP.1993.112

work page doi:10.1109/icpp.1993.112 1993

[26] [26]

IEEE Transactions on Computers C- 36(12), 1425–1439 (Dec 1987), https://doi.org/10.1109/TC.1987.5009495

Polychronopoulos, C.D., Kuck, D.J.: Guided Self-Scheduling: A Practical Schedul- ing Scheme for Parallel Supercomputers. IEEE Transactions on Computers C- 36(12), 1425–1439 (Dec 1987), https://doi.org/10.1109/TC.1987.5009495

work page doi:10.1109/tc.1987.5009495 1987

[27] [27]

In: Proceedings of the 23rd International Conference on Supercomputing

Rountree, B., Lowenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: Making DVS Practical for Complex HPC Applications. In: Proceedings of the 23rd International Conference on Supercomputing. pp. 460–469. ICS ’09, ACM, Yorktown Heights, NY, USA (2009)

work page 2009

[28] [28]

IEEE Transactions on Parallel and Distributed Systems 29(3), 512–526 (March 2018)

Seo, S., Amer, A., Balaji, P., Bordage, C., Bosilca, G., Brooks, A., Carns, P., Castell, A., Genet, D., Herault, T., Iwasaki, S., Jindal, P., Kal, L.V., Krishnamoor- thy, S., Liﬄander, J., Lu, H., Meneses, E., Snir, M., Sun, Y., Taura, K., Beck- man, P.: Argobots: A lightweight low-level threading and tasking framework. IEEE Transactions on Parallel and D...

work page 2018

[29] [29]

In: Proc

Tang, P., Yew, P.C.: Processor Self-Scheduling for Multiple-Nested Parallel Loops. In: Proc. of International Conference on Parallel Processing. pp. 528–535. IEEE (12 1986)

work page 1986

[30] [30]

In: Chapman, B.M., Massaioli, F., M¨ uller, M.S., Rorro, M

Thoman, P., Jordan, H., Pellegrini, S., Fahringer, T.: Automatic OpenMP Loop Scheduling: A Combined Compiler and Runtime Approach. In: Chapman, B.M., Massaioli, F., M¨ uller, M.S., Rorro, M. (eds.) OpenMP in a Heterogeneous World. pp. 88–101. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)

work page 2012

[31] [31]

IEEE Transactions on Parallel and Distributed Systems 4(1), 87–98 (Jan 1993), https://doi.org/10.1109/71.205655

Tzen, T.H., Ni, L.M.: Trapezoid Self-scheduling: A Practical Scheduling Scheme for Parallel Compilers. IEEE Transactions on Parallel and Distributed Systems 4(1), 87–98 (Jan 1993), https://doi.org/10.1109/71.205655

work page doi:10.1109/71.205655 1993

[32] [32]

In: 2012 19th International Conference on High Performance Computing

Wang, Y., Nicolau, A., Cammarota, R., Veidenbaum, A.V.: A Fault Tolerant Self- scheduling Scheme for Parallel Loops on Shared Memory Systems. In: 2012 19th International Conference on High Performance Computing. pp. 1–10 (Dec 2012), https://doi.org/10.1109/HiPC.2012.6507476 16

work page doi:10.1109/hipc.2012.6507476 2012

[33] [33]

In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sympos ium (IPDPS’05) - Papers - Volume 01

Zhang, Y., Voss, M.: Runtime Empirical Selection of Loop Schedulers on Hy- perthreaded SMPs. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sympos ium (IPDPS’05) - Papers - Volume 01. pp. 44.2– . IPDPS ’05, IEEE Computer Society, Washington, DC, USA (2005), http: //dx.doi.org/10.1109/IPDPS.2005.386

work page doi:10.1109/ipdps.2005.386 2005