pith. sign in

arxiv: 1507.08492 · v1 · pith:2EYUZISSnew · submitted 2015-07-30 · 💻 cs.DB · cs.DC

Cost optimization of data flows based on task re-ordering

classification 💻 cs.DB cs.DC
keywords dataflowsexecutionsolutionscostflowoptimaloptimization
0
0 comments X
read the original abstract

Analyzing big data in a highly dynamic environment becomes more and more critical because of the increasingly need for end-to-end processing of this data. Modern data flows are quite complex and there are not efficient, cost-based, fully-automated, scalable optimization solutions that can facilitate flow designers. The state-of-the-art proposals fail to provide near optimal solutions even for simple data flows. To tackle this problem, we introduce a set of approximate algorithms for defining the execution order of the constituent tasks, in order to minimize the total execution cost of a data flow. We also present the advantages of the parallel execution of data flows. We validated our proposals in both a real tool and synthetic flows and the results show that we can achieve significant speed-ups, moving much closer to optimal solutions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.