Reduced I/O Latency with Futures

I-Ting Angelina Lee; Kunal Agrawal; Kyle Singer

arxiv: 1906.08239 · v1 · pith:ZMUMTZZ5new · submitted 2019-06-19 · 💻 cs.DC

Reduced I/O Latency with Futures

Kyle Singer , Kunal Agrawal , I-Ting Angelina Lee This is my paper

Pith reviewed 2026-05-25 19:57 UTC · model grok-4.3

classification 💻 cs.DC

keywords task parallelismfuturesI/O latencyscheduling algorithmsruntime systemsCilkinteractive applications

0 comments

The pith

Representing I/O operations as futures lets schedulers hide their latency and prove better execution time guarantees than prior approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a scheduling method that represents pending I/O operations as futures in a task-parallel program. Once an I/O completes, the future becomes a ready task that the scheduler can execute without extra overhead. The authors prove that this approach yields stronger bounds on overall execution time than the only previous theoretical result for such interactive workloads. They also built a working implementation on the Cilk-F runtime and measured its performance on sample programs.

Core claim

By treating completed I/O futures as ordinary ready tasks, the scheduler can interleave computation and I/O without blocking, delivering execution time bounds superior to those of earlier latency-hiding techniques.

What carries the argument

The futures abstraction for I/O, which the scheduler treats identically to computation tasks once the I/O completes.

If this is right

The total execution time is bounded more tightly because idle time on processors is reduced.
Interactive applications can achieve higher throughput on parallel hardware without changing the programming model significantly.
Existing task-parallel runtimes can be extended with this technique to support I/O-bound workloads better.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar future-based representations could apply to other blocking operations such as network communication.
The approach might generalize to heterogeneous systems where some processors handle I/O differently.
Programmers could write code that mixes heavy computation and I/O without manual thread management.

Load-bearing premise

I/O operations can be represented as futures that incur no extra synchronization cost when they become ready.

What would settle it

A counter-example workload where the measured execution time with the new method exceeds the claimed bound or matches but does not improve on prior work.

Figures

Figures reproduced from arXiv: 1906.08239 by I-Ting Angelina Lee, Kunal Agrawal, Kyle Singer.

**Figure 1.** Figure 1: Distributed map and reduce pseudo code. immediately (e.g., the next package has not arrived on the network channel yet), however, we would like to put the request aside and process it later when the I/O device becomes ready (has more input to be consumed). In order to describe how the actual mechanism works, we need to briey discuss how I/O works on Linux. As mentioned earlier, any I/O device on Linux is … view at source ↗

**Figure 2.** Figure 2: Speedups of map-reduce running in Cilk-F, Cilk-F (O), and Cilk-L with latencies of (a) 500 ms, (b) 100 ms, (c) 50 ms, and (d) 1 ms (where ms is millisecond). The x-axis shows the core counts (P) and the y-axis shows the speedup compared to the one-core running time on Cilk-F. Cilk-F (O) oversubscribes the system by using 2P workers instead of P, for P number of cores (i.e. every core has two workers pinned… view at source ↗

**Figure 3.** Figure 3: The execution times, in seconds, of map-reduce using Cilk-L with dierent latencies (ms is milliseconds). The values in parentheses are overheads relative to the correspondingTP time of ideal, which runs the baseline of map-reduce on Cilk-F with (eectively) zero latency and incurs no system overhead for latency hiding. execution time of running the baseline version of map-reduce on Cilk-F. Similar to what… view at source ↗

**Figure 4.** Figure 4: The execution times, in seconds, of map-reduce with various configurations of Cilk-L with no I/O latency (i.e. reads do not block). The overheads are relative to the corresponding TP time of ideal, which uses Cilk-F without latency-hiding. Overhead in Latency-Hiding e use of IO futures in Cilk-L has some inherent overhead: 1) seing up and tear down of IO futures, 2) invoking the epoll mechanism, which ha… view at source ↗

read the original abstract

Task parallelism research has traditionally focused on optimizing computation-intensive applications. Due to the proliferation of commodity parallel processors, there has been recent interest in supporting interactive applications. Such interactive applications frequently rely on I/O operations that may incur significant latency. In order to increase performance, when a particular thread of control is blocked on an I/O operation, ideally we would like to hide this latency by using the processing resources to do other ready work instead of blocking or spin waiting on this I/O. There has been limited prior work on hiding this latency and only one result that provides a theoretical bound for interactive applications that use I/Os. In this work, we propose a method for hiding the latency of I/O operations by using the futures abstraction. We provide a theoretical analysis of our algorithm that shows our algorithm provides better execution time guarantees than prior work. We also implemented the algorithm in a practically efficient prototype library that runs on top of the Cilk-F runtime, a runtime system that supports futures within the context of the Cilk Plus language, and performed experiments that demonstrate the efficiency of our implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes a futures-based method to hide I/O latency in task-parallel programs by treating completed I/O operations as ordinary ready tasks. It claims a theoretical analysis establishing better execution-time guarantees than prior work, implements the approach as a prototype library atop the Cilk-F runtime (extending Cilk Plus), and reports experiments demonstrating practical efficiency.

Significance. If the claimed bounds hold under the stated task-parallel model, the work strengthens the theoretical foundation for latency hiding in interactive applications on parallel hardware. The Cilk-F prototype supplies an independent practical check, and the modeling choice of treating I/O futures as schedulable tasks is stated explicitly and used consistently for the comparison to prior work.

minor comments (2)

[Abstract] Abstract: the single prior theoretical result is referenced only generically; adding the citation would allow readers to locate the baseline bound being improved upon.
The description of the model (I/O futures becoming ordinary ready tasks with no extra synchronization) is clear in the abstract but should be restated with a short formal definition early in the methods section for precision.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report lists no major comments, so we have no specific points to address point-by-point. We will incorporate any minor editorial suggestions during revision.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes a futures-based method for I/O latency hiding in task-parallel programs and claims a theoretical analysis yielding strictly better execution-time bounds than the single prior result on the topic. The model treats completed I/O futures as ordinary ready tasks; this assumption is stated explicitly in the abstract and forms the basis for the comparison. No equations, parameters, or uniqueness theorems are shown to reduce by construction to fitted inputs or to self-citations whose own justification is internal to the present work. The central bound is therefore independent of the implementation details and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal ledger entries; the central claim rests on an unelaborated task-parallel execution model and the assumption that futures can be scheduled without additional model cost.

axioms (1)

domain assumption Task-parallel execution model in which pending I/O can be represented as futures that the scheduler treats equivalently to ordinary tasks
Invoked when the abstract claims the algorithm hides latency and provides better execution-time guarantees

pith-pipeline@v0.9.0 · 5713 in / 1183 out tokens · 25491 ms · 2026-05-25T19:57:27.050569+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Linux Programmer’s Manual EPOLL(7)

2019a. Linux Programmer’s Manual EPOLL(7). h/t_tp://man7.org/linux/man-pages/man7/epoll.7.html. (2019). Accessed in January

work page 2019
[2]

Linux Programmer’s Manual EVENTFD(2)

2019b. Linux Programmer’s Manual EVENTFD(2). h/t_tp://man7.org/linux/man-pages/man2/eventfd.2.html. (2019). Accessed in January

work page 2019
[3]

Linux Programmer’s Manual TIMERFD _CREATE(2)

2019c. Linux Programmer’s Manual TIMERFD _CREATE(2). h/t_tp://man7.org/linux/man-pages/man2/timerfd_create.2.html. (2019). Accessed in January

work page 2019
[4]

Blelloch, and Robert Blumofe

Umut Acar, Guy E. Blelloch, and Robert Blumofe. 2002a. /T_he Data Locality of Work Stealing./T_heory of Computing Systems 35, 3 (2002). Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe

work page 2002
[5]

1–12. Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2002b. /T_he Data Locality of Work Stealing./T_heory Comput. Syst.35, 3 (2002), 321–347. Kunal Agrawal, Jeremy T. Fineman, Kefu Lu, Brendan Sheridan, Jim Sukha, and Robert U/t_terback

work page 2002
[6]

/T_heory of Computing Systems(2001), 115–144

/T_hread Scheduling for Multiprogrammed Multiprocessors. /T_heory of Computing Systems(2001), 115–144. Henry C. Baker, Jr. and Carl Hewi/t_t

work page 2001
[7]

/T_he incremental garbage collection of processes.SIGPLAN Notices 12, 8 (1977), 55–59. Rajkishore Barik, Zoran Budimli´c, Vincent Cav`e, Sanjay Cha/t_terjee, Yi Guo, David Peixo/t_to, Raghavan Raman, Jun Shirako, Sa˘gnak Tas ¸ırlar, Yonghong Yan, Yisheng Zhao, and Vivek Sarkar

work page 1977
[8]

JACM 46, 5 (1999), 720–748

Scheduling Multithreaded Computations by Work Stealing. JACM 46, 5 (1999), 720–748. Randal E. Bryant and David R. O’Hallaron

work page 1999
[9]

IEEE Computer 27, 8 (Aug

COOL: An Object-Based Language for Parallel Programming. IEEE Computer 27, 8 (Aug. 1994), 13–26. Philippe Charles, Christian Grothoﬀ, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar

work page 1994
[10]

Science of Computer Programming 63, 2 (Dec

Programming with exceptions in JCilk. Science of Computer Programming 63, 2 (Dec. 2008), 147–171. Ma/t_thew Fluet, Mike Rainey, John Reppy, and Adam Shaw

work page 2008
[11]

2010), 537–576

Implicitly /T_hreaded Parallelism in Manticore.Journal of Functional Programming 20, 5-6 (Nov. 2010), 537–576. h/t_tps://doi.org/10.1017/S0956796810000201 D.P. Friedman and D.S. Wise

work page doi:10.1017/s0956796810000201 2010
[12]

IEEE Trans

Aspects of Applicative Programming for Parallel Processing. IEEE Trans. Comput. C-27, 4 (1978), 289–296. Ma/t_teo Frigo, Charles E. Leiserson, and Keith H. Randall

work page 1978
[13]

ACM TOPLAS 7, 4 (Oct

Multilisp: A Language for Concurrent Symbolic Computation. ACM TOPLAS 7, 4 (Oct. 1985), 501–538. Shams Imam and Vivek Sarkar

work page 1985
[14]

In Proceedings of the 28th European Conference on ECOOP 2014 — Object-Oriented Programming - Volume

Cooperative Scheduling of Parallel Tasks with General Synchronization Pa/t_terns. In Proceedings of the 28th European Conference on ECOOP 2014 — Object-Oriented Programming - Volume

work page 2014
[15]

h/t_tps://doi.org/10.1007/978-3-662-44202-9_25 Intel

Springer-Verlag New York, Inc., New York, NY, USA, 618–643. h/t_tps://doi.org/10.1007/978-3-662-44202-9_25 Intel

work page doi:10.1007/978-3-662-44202-9_25
[16]

h/t_tps://www.cilkplus.org

Intel ® Cilk™ Plus. h/t_tps://www.cilkplus.org. (2013). 18 Intel Corporation

work page 2013
[17]

InProceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC ’14)

/T_he Future(s) of Shared Data Structures. InProceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC ’14) . ACM, Paris, France, 30–39. h/t_tp://doi.acm.org/10.1145/2611462.2611496 David A. Kranz, Robert H. Halstead, Jr., and Eric Mohr

work page doi:10.1145/2611462.2611496 2014
[18]

In ACM 2000 Conference on Java Grande

A Java fork/join framework. In ACM 2000 Conference on Java Grande . 36–43. I-Ting Angelina Lee, Silas Boyd-Wickizer, Zhiyi Huang, and Charles E. Leiserson

work page 2000
[19]

Supercomputing 51, 3 (2010), 244–257

/T_he Cilk++ Concurrency Platform.J. Supercomputing 51, 3 (2010), 244–257. Li Lu, Weixing Ji, and Michael L. Sco/t_t

work page 2010
[20]

In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19)

Proactive Work Stealing for Futures. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19) . ACM, New York, NY, USA, 257–271. h/t_tps: //doi.org/10.1145/3293883.3295735 Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gibbons, and Robert Harper

work page doi:10.1145/3293883.3295735
[21]

In Proceedings of the Twenty-/f_irst Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09)

Beyond Nested Parallelism: Tight Bounds on Work-stealing Overheads for Parallel Futures. In Proceedings of the Twenty-/f_irst Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). ACM, Calgary, AB, Canada, 91–100. h/t_tps://doi.org/10.1145/1583991.1584019 Daniel Spoonhower, Guy E. Blelloch, Robert Harper, and Phillip B. Gibbons

work page doi:10.1145/1583991.1584019
[22]

In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016)

Automatic Parallelization of Pure Method Calls via Conditional Future Synthesis. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, New York, NY, USA, 20–38. h/t_tps://doi.org/10.1145/2983990.2984035 Olivier Tardieu, Haichuan Wang, and Haibo Lin

work page doi:10.1145/2983990.2984035 2016
[23]

InProceedings of the 2011 International Conference on Parallel Processing (ICPP ’11)

Data-Driven Tasks and /T_heir Implementation. InProceedings of the 2011 International Conference on Parallel Processing (ICPP ’11) . IEEE Computer Society, Taipei City, Taiwan, 652–661. Robert U/t_terback, Kunal Agrawal, I-Ting Angelina Lee, and Milind Kulkarni

work page 2011
[24]

In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17)

Processor-Oblivious Record and Replay. In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17) . ACM, Austin, Texas, USA, 145–161. h/t_tps://doi.org/10.1145/3018743.3018764 Christopher S. Zakian, Timothy A. Zakian, Abhishek Kulkarni, Buddhika Chamith, and Ryan R. Newton

work page doi:10.1145/3018743.3018764

[1] [1]

Linux Programmer’s Manual EPOLL(7)

2019a. Linux Programmer’s Manual EPOLL(7). h/t_tp://man7.org/linux/man-pages/man7/epoll.7.html. (2019). Accessed in January

work page 2019

[2] [2]

Linux Programmer’s Manual EVENTFD(2)

2019b. Linux Programmer’s Manual EVENTFD(2). h/t_tp://man7.org/linux/man-pages/man2/eventfd.2.html. (2019). Accessed in January

work page 2019

[3] [3]

Linux Programmer’s Manual TIMERFD _CREATE(2)

2019c. Linux Programmer’s Manual TIMERFD _CREATE(2). h/t_tp://man7.org/linux/man-pages/man2/timerfd_create.2.html. (2019). Accessed in January

work page 2019

[4] [4]

Blelloch, and Robert Blumofe

Umut Acar, Guy E. Blelloch, and Robert Blumofe. 2002a. /T_he Data Locality of Work Stealing./T_heory of Computing Systems 35, 3 (2002). Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe

work page 2002

[5] [5]

1–12. Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2002b. /T_he Data Locality of Work Stealing./T_heory Comput. Syst.35, 3 (2002), 321–347. Kunal Agrawal, Jeremy T. Fineman, Kefu Lu, Brendan Sheridan, Jim Sukha, and Robert U/t_terback

work page 2002

[6] [6]

/T_heory of Computing Systems(2001), 115–144

/T_hread Scheduling for Multiprogrammed Multiprocessors. /T_heory of Computing Systems(2001), 115–144. Henry C. Baker, Jr. and Carl Hewi/t_t

work page 2001

[7] [7]

/T_he incremental garbage collection of processes.SIGPLAN Notices 12, 8 (1977), 55–59. Rajkishore Barik, Zoran Budimli´c, Vincent Cav`e, Sanjay Cha/t_terjee, Yi Guo, David Peixo/t_to, Raghavan Raman, Jun Shirako, Sa˘gnak Tas ¸ırlar, Yonghong Yan, Yisheng Zhao, and Vivek Sarkar

work page 1977

[8] [8]

JACM 46, 5 (1999), 720–748

Scheduling Multithreaded Computations by Work Stealing. JACM 46, 5 (1999), 720–748. Randal E. Bryant and David R. O’Hallaron

work page 1999

[9] [9]

IEEE Computer 27, 8 (Aug

COOL: An Object-Based Language for Parallel Programming. IEEE Computer 27, 8 (Aug. 1994), 13–26. Philippe Charles, Christian Grothoﬀ, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar

work page 1994

[10] [10]

Science of Computer Programming 63, 2 (Dec

Programming with exceptions in JCilk. Science of Computer Programming 63, 2 (Dec. 2008), 147–171. Ma/t_thew Fluet, Mike Rainey, John Reppy, and Adam Shaw

work page 2008

[11] [11]

2010), 537–576

Implicitly /T_hreaded Parallelism in Manticore.Journal of Functional Programming 20, 5-6 (Nov. 2010), 537–576. h/t_tps://doi.org/10.1017/S0956796810000201 D.P. Friedman and D.S. Wise

work page doi:10.1017/s0956796810000201 2010

[12] [12]

IEEE Trans

Aspects of Applicative Programming for Parallel Processing. IEEE Trans. Comput. C-27, 4 (1978), 289–296. Ma/t_teo Frigo, Charles E. Leiserson, and Keith H. Randall

work page 1978

[13] [13]

ACM TOPLAS 7, 4 (Oct

Multilisp: A Language for Concurrent Symbolic Computation. ACM TOPLAS 7, 4 (Oct. 1985), 501–538. Shams Imam and Vivek Sarkar

work page 1985

[14] [14]

In Proceedings of the 28th European Conference on ECOOP 2014 — Object-Oriented Programming - Volume

Cooperative Scheduling of Parallel Tasks with General Synchronization Pa/t_terns. In Proceedings of the 28th European Conference on ECOOP 2014 — Object-Oriented Programming - Volume

work page 2014

[15] [15]

h/t_tps://doi.org/10.1007/978-3-662-44202-9_25 Intel

Springer-Verlag New York, Inc., New York, NY, USA, 618–643. h/t_tps://doi.org/10.1007/978-3-662-44202-9_25 Intel

work page doi:10.1007/978-3-662-44202-9_25

[16] [16]

h/t_tps://www.cilkplus.org

Intel ® Cilk™ Plus. h/t_tps://www.cilkplus.org. (2013). 18 Intel Corporation

work page 2013

[17] [17]

InProceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC ’14)

/T_he Future(s) of Shared Data Structures. InProceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC ’14) . ACM, Paris, France, 30–39. h/t_tp://doi.acm.org/10.1145/2611462.2611496 David A. Kranz, Robert H. Halstead, Jr., and Eric Mohr

work page doi:10.1145/2611462.2611496 2014

[18] [18]

In ACM 2000 Conference on Java Grande

A Java fork/join framework. In ACM 2000 Conference on Java Grande . 36–43. I-Ting Angelina Lee, Silas Boyd-Wickizer, Zhiyi Huang, and Charles E. Leiserson

work page 2000

[19] [19]

Supercomputing 51, 3 (2010), 244–257

/T_he Cilk++ Concurrency Platform.J. Supercomputing 51, 3 (2010), 244–257. Li Lu, Weixing Ji, and Michael L. Sco/t_t

work page 2010

[20] [20]

In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19)

Proactive Work Stealing for Futures. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19) . ACM, New York, NY, USA, 257–271. h/t_tps: //doi.org/10.1145/3293883.3295735 Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gibbons, and Robert Harper

work page doi:10.1145/3293883.3295735

[21] [21]

In Proceedings of the Twenty-/f_irst Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09)

Beyond Nested Parallelism: Tight Bounds on Work-stealing Overheads for Parallel Futures. In Proceedings of the Twenty-/f_irst Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). ACM, Calgary, AB, Canada, 91–100. h/t_tps://doi.org/10.1145/1583991.1584019 Daniel Spoonhower, Guy E. Blelloch, Robert Harper, and Phillip B. Gibbons

work page doi:10.1145/1583991.1584019

[22] [22]

In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016)

Automatic Parallelization of Pure Method Calls via Conditional Future Synthesis. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, New York, NY, USA, 20–38. h/t_tps://doi.org/10.1145/2983990.2984035 Olivier Tardieu, Haichuan Wang, and Haibo Lin

work page doi:10.1145/2983990.2984035 2016

[23] [23]

InProceedings of the 2011 International Conference on Parallel Processing (ICPP ’11)

Data-Driven Tasks and /T_heir Implementation. InProceedings of the 2011 International Conference on Parallel Processing (ICPP ’11) . IEEE Computer Society, Taipei City, Taiwan, 652–661. Robert U/t_terback, Kunal Agrawal, I-Ting Angelina Lee, and Milind Kulkarni

work page 2011

[24] [24]

In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17)

Processor-Oblivious Record and Replay. In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17) . ACM, Austin, Texas, USA, 145–161. h/t_tps://doi.org/10.1145/3018743.3018764 Christopher S. Zakian, Timothy A. Zakian, Abhishek Kulkarni, Buddhika Chamith, and Ryan R. Newton

work page doi:10.1145/3018743.3018764