pith. sign in

arxiv: 0706.0403 · v1 · submitted 2007-06-04 · 🧮 math.PR

Asymptotic Behavior of Total Times For Jobs That Must Start Over If a Failure Occurs

classification 🧮 math.PR
keywords taskfailuredistributionasymptoticfailurestimetotalanalysis
0
0 comments X
read the original abstract

Many processes must complete in the presence of failures. Different systems respond to task failure in different ways. The system may resume a failed task from the failure point (or a saved checkpoint shortly before the failure point), it may give up on the task and select a replacement task from the ready queue, or it may restart the task. The behavior of systems under the first two scenarios is well documented, but the third ({\em RESTART}) has resisted detailed analysis. In this paper we derive tight asymptotic relations between the distribution of {\em task times} without failures to the {\em total time} when including failures, for any failure distribution. In particular, we show that if the task time distribution has an unbounded support then the total time distribution $H$ is always heavy-tailed. Asymptotic expressions are given for the tail of $H$ in various scenarios. The key ingredients of the analysis are the Cram\'er--Lundberg asymptotics for geometric sums and integral asymptotics, that in some cases are obtained via Tauberian theorems and in some cases by bare-hand calculations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.