DOD-ETL: Distributed On-Demand ETL for Near Real-Time Business Intelligence

Adriano C. M. Pereira; Gustavo V. Machado; \'Italo Cunha; Leonardo B. Oliveira

arxiv: 1907.06723 · v1 · pith:3GTIJALRnew · submitted 2019-07-15 · 💻 cs.DC · cs.DB

DOD-ETL: Distributed On-Demand ETL for Near Real-Time Business Intelligence

Gustavo V. Machado , \'Italo Cunha , Adriano C. M. Pereira , Leonardo B. Oliveira This is my paper

Pith reviewed 2026-05-24 21:01 UTC · model grok-4.3

classification 💻 cs.DC cs.DB

keywords near real-time ETLdistributed data processingstream processingbusiness intelligencedata pipelinein-memory cachingdata partitioning

0 comments

The pith

DOD-ETL performs near real-time ETL up to 10 times faster than other stream processing frameworks through its on-demand distributed pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DOD-ETL as a solution to the slow ETL bottleneck that prevents timely business intelligence. It combines an on-demand streaming pipeline, distributed parallel processing, in-memory caching, and data partitioning into a technology-independent system. This setup delivers workloads up to 10 times faster than existing frameworks and was deployed in a steelworks to enable previously unavailable near real-time reports.

Core claim

DOD-ETL addresses the main bottleneck in Business Intelligence solutions, the Extract Transform Load process, by providing it in near real-time. It achieves this by combining an on-demand data stream pipeline with a distributed, parallel and technology-independent architecture with in-memory caching and efficient data partitioning. Comparisons with other Stream Processing frameworks show DOD-ETL executes workloads up to 10 times faster. Deployment in a large steelworks replaced its previous ETL solution and enabled near real-time reports previously unavailable.

What carries the argument

on-demand data stream pipeline with distributed parallel architecture, in-memory caching, and efficient data partitioning

If this is right

ETL ceases to be the primary delay in turning data into actionable business information.
Existing stream processing tools can be replaced in large operations to support faster reporting.
The technology-independent design allows the same pipeline to run across varied computing setups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same architecture could be tested in other high-volume data environments such as finance or logistics to check if similar speed gains appear.
Further scaling experiments would clarify whether in-memory caching remains effective as data volumes grow beyond the steelworks case.
The partitioning method might reduce costs in cloud deployments by lowering the need for constant resource allocation.

Load-bearing premise

The on-demand streaming pipeline with distributed architecture, caching, and partitioning can be realized in production without hidden bottlenecks or correctness issues, as shown only in one steelworks deployment.

What would settle it

A head-to-head performance test on a different large industrial dataset or workload where DOD-ETL does not achieve the reported speedup or encounters data errors would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 1907.06723 by Adriano C. M. Pereira, Gustavo V. Machado, \'Italo Cunha, Leonardo B. Oliveira.

**Figure 1.** Figure 1: Batch vs. Near real-time ETL. Sabtu et al. [27] enumerate several problems related to near real-time ETL and, along with Ellis [8], they provide some directions and possible solutions to each problem. However, due to these problems complexity, ETL solutions do not always address them directly: to avoid affecting efficiency on transaction databases, ETL processes were usually run in batches and off-hours (… view at source ↗

**Figure 2.** Figure 2: DOD-ETL workflow step by step. All steps depend on configuration parameters to work properly. Thus, during DOD-ETL’s deployment, it is imperative to go through a configuration process, where decisions are made to set the following parameters: tables to extract—define which tables will have data extracted from; table nature—from the defined tables, detail which ones are operational (constantly updated) and… view at source ↗

**Figure 3.** Figure 3: Data splitting working on metals industry context. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: In-memory cache initialization overhead. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Scalability: Listener experiment result. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Scalability: Stream Processor experiment result. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

The competitive dynamics of the globalized market demand information on the internal and external reality of corporations. Information is a precious asset and is responsible for establishing key advantages to enable companies to maintain their leadership. However, reliable, rich information is no longer the only goal. The time frame to extract information from data determines its usefulness. This work proposes DOD-ETL, a tool that addresses, in an innovative manner, the main bottleneck in Business Intelligence solutions, the Extract Transform Load process (ETL), providing it in near real-time. DODETL achieves this by combining an on-demand data stream pipeline with a distributed, parallel and technology-independent architecture with in-memory caching and efficient data partitioning. We compared DOD-ETL with other Stream Processing frameworks used to perform near real-time ETL and found DOD-ETL executes workloads up to 10 times faster. We have deployed it in a large steelworks as a replacement for its previous ETL solution, enabling near real-time reports previously unavailable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DOD-ETL is a practical ETL system with a real deployment but the 10x speedup rests on thin evidence with no benchmark details.

read the letter

The paper describes DOD-ETL, a tool that uses on-demand streaming, distributed parallel execution, in-memory caching, and partitioning to handle ETL for near real-time BI. The main concrete outcome is its replacement of an existing system in a steelworks, which allowed reports that were previously unavailable. That deployment is the strongest part of the work because it shows the system running in production rather than just in a lab setting. The authors also position the approach as technology-independent, which could make it easier to adopt in different environments. The 10x speedup claim compared to other stream processing frameworks is the central result, but it comes without workload descriptions, hardware specs, configuration details, or error bars. A single production story does not establish that the architecture itself drove the gains versus implementation choices or data specifics. The ideas draw from existing stream processing and caching techniques, so the novelty sits in the packaged tool and the ETL focus rather than a new mechanism. This is an engineering paper aimed at practitioners who need faster BI pipelines and might want to evaluate or replicate the system. Researchers looking for new distributed systems results will find little to cite. The evidence is too light for strong claims, but the practical angle is clear enough that a serious editor should send it to referees who can ask for the missing benchmark methodology and more evaluation data. I would engage with it in review to see if the authors can fill those gaps.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes DOD-ETL, a tool for near real-time ETL in business intelligence. It uses an on-demand data stream pipeline combined with a distributed parallel architecture, in-memory caching, and data partitioning to overcome traditional ETL bottlenecks. The central claim is that DOD-ETL executes workloads up to 10 times faster than other stream processing frameworks, supported by a production deployment in a large steelworks that enabled previously unavailable near real-time reports.

Significance. If the speedup and production claims can be substantiated through controlled experiments, the approach could meaningfully advance practical near real-time BI systems in industrial environments by reducing ETL latency. The architecture elements address a recognized pain point, but the manuscript provides no reproducible evidence that the techniques deliver the attributed gains.

major comments (3)

[Abstract] Abstract: the claim that 'DOD-ETL executes workloads up to 10 times faster' is presented without naming the compared frameworks, describing the workloads, hardware/network configuration, measurement protocol, or any error bars, rendering the central performance result unverifiable and load-bearing for the paper's contribution.
[Abstract] Abstract (steelworks deployment paragraph): the replacement of the previous ETL solution is described only anecdotally with no quantitative before/after metrics, workload characteristics, or implementation details, so the assertion that the architecture enables 'near real-time reports previously unavailable' rests on a single uncontrolled case study.
[Abstract] Abstract: no discussion or evidence is supplied that the on-demand pipeline, in-memory caching, and partitioning avoid hidden bottlenecks or correctness issues in production, which is required to attribute any observed difference to the proposed techniques rather than implementation or data artifacts.

minor comments (1)

[Abstract] The abstract contains several long, general sentences about market dynamics that could be shortened without loss of technical content.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the detailed feedback on the abstract. We agree that greater specificity is required for the performance claims and will revise the abstract to improve verifiability. The production deployment description is constrained by confidentiality, limiting quantitative disclosure.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'DOD-ETL executes workloads up to 10 times faster' is presented without naming the compared frameworks, describing the workloads, hardware/network configuration, measurement protocol, or any error bars, rendering the central performance result unverifiable and load-bearing for the paper's contribution.

Authors: The abstract summarizes results whose details—including compared frameworks (Apache Spark Streaming and Apache Flink), workloads, hardware/network setup, measurement protocol, and error bars—are presented in Section 5. We will revise the abstract to name the frameworks and briefly note the experimental conditions to make the claim more self-contained. revision: yes
Referee: [Abstract] Abstract (steelworks deployment paragraph): the replacement of the previous ETL solution is described only anecdotally with no quantitative before/after metrics, workload characteristics, or implementation details, so the assertion that the architecture enables 'near real-time reports previously unavailable' rests on a single uncontrolled case study.

Authors: Section 6 provides additional implementation context on the integration. Quantitative before/after metrics cannot be released due to non-disclosure agreements with the partner. We will partially revise the abstract to clarify the qualitative outcome (enabling previously unavailable reports due to latency) while noting the case-study nature. revision: partial
Referee: [Abstract] Abstract: no discussion or evidence is supplied that the on-demand pipeline, in-memory caching, and partitioning avoid hidden bottlenecks or correctness issues in production, which is required to attribute any observed difference to the proposed techniques rather than implementation or data artifacts.

Authors: Sections 3 and 4 explain the design rationale for the on-demand pipeline, caching, and partitioning to mitigate bottlenecks and maintain correctness. We will add a brief reference in the abstract to these sections to better link observed gains to the techniques. revision: yes

standing simulated objections not resolved

Quantitative before/after metrics and workload characteristics from the steelworks deployment, restricted by confidentiality agreements.

Circularity Check

0 steps flagged

No circularity; empirical claims only

full rationale

The paper contains no equations, derivations, fitted parameters, or first-principles results. Its 10x speedup claim is stated as the outcome of direct empirical comparisons against other frameworks plus one production deployment; these are presented as measurements rather than quantities defined in terms of the paper's own inputs. No self-citation load-bearing steps, ansatzes, or renamings appear. The work is therefore self-contained against external benchmarks and receives the default non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review reveals no explicit free parameters, mathematical axioms, or newly postulated entities; the contribution is an engineering architecture whose correctness rests on unstated implementation assumptions.

pith-pipeline@v0.9.0 · 5713 in / 1048 out tokens · 22123 ms · 2026-05-24T21:01:32.216764+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Apache Beam

Apache. Apache Beam. https://beam.apache.org/, 2015

work page 2015
[2]

Azvine, Z

B. Azvine, Z. Cui, D. D. Nauck, and B. Majeed. Real time business intelligence for the adaptive enterprise. In E-Commerce Technology, 2006. The 8th IEEE Interna- tional Conference on and Enterprise Computing, E-Commerce, and E-Services, The 3rd IEEE International Conference on, pages 29–29. IEEE, 2006

work page 2006
[3]

M. A. Bornea, A. Deligiannakis, Y . Kotidis, and V . Vassalos. Semi-streamed index join for near-real time execution of etl transformations. InData Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 159–170. IEEE, 2011

work page 2011
[4]

Carbone, A

P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas. Apache ﬂink: Stream and batch processing in a single engine.Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 2015

work page 2015
[5]

E. F. Codd, S. B. Codd, and C. T. Salley. Providing olap (on-line analytical process- ing) to user-analysts: An it mandate. Codd and Date, 32, 1993

work page 1993
[6]

D. Cutting. Apache Avro. https://avro.apache.org/, 2009

work page 2009
[7]

Dean and S

J. Dean and S. Ghemawat. Mapreduce: simpliﬁed data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008

work page 2008
[8]

B. Ellis. Real-time analytics: Techniques to analyze and visualize streaming data . John Wiley & Sons, 2014

work page 2014
[9]

W. A. Giovinazzo. Object-oriented data warehouse design: building a star schema. Prentice Hall PTR, 2000

work page 2000
[10]

Google Dataﬂow

Google. Google Dataﬂow. https://cloud.google.com/dataflow/, 2015

work page 2015
[11]

P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordina- tion for internet-scale systems. In USENIX annual technical conference, volume 8, page 9. Boston, MA, USA, 2010

work page 2010
[12]

Enterprise-Control System Integration Part 2 : Object Model Attributes

International Society of Automation. Enterprise-Control System Integration Part 2 : Object Model Attributes. Isa, 2001

work page 2001
[13]

T. Jain, S. Rajasree, and S. Saluja. Refreshing datawarehouse in near real-time. International Journal of Computer Applications, 46(18):24–29, 2012

work page 2012
[14]

Karakasidis, P

A. Karakasidis, P. Vassiliadis, and E. Pitoura. Etl queues for active data warehous- ing. In Proceedings of the 2nd international workshop on Information quality in information systems, pages 28–39. ACM, 2005

work page 2005
[15]

Kreps, N

J. Kreps, N. Narkhede, J. Rao, et al. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB, pages 1–7, 2011

work page 2011
[16]

Ljungberg

˜O. Ljungberg. Measurement of overall equipment effectiveness as a basis for tpm activities. International Journal of Operations & Production Management , 18(5): 495–507, 1998

work page 1998
[17]

Malhotra

Y . Malhotra. From information management to knowledge management. beyond the’hi-tech hidebound’systems. Knowledge management and business model inno- vation, pages 115–134, 2001

work page 2001
[18]

Mesiti, L

M. Mesiti, L. Ferrari, S. Valtolina, G. Licari, G. Galliani, M. Dao, K. Zettsu, et al. Streamloader: an event-driven etl system for the on-line processing of heterogeneous sensor data. In Extending Database Technology, pages 628–631. OpenProceedings, 2016

work page 2016
[19]

Azure Stream Analytics

Microsoft. Azure Stream Analytics. https://azure.microsoft.com/ en-us/services/stream-analytics/, 2015

work page 2015
[20]

T. Mueller. H2 Database. http://www.h2database.com/, 2012

work page 2012
[21]

M. A. Naeem, G. Dobbie, and G. Webber. An event-based near real-time data inte- gration architecture. In Enterprise Distributed Object Computing Conference Work- shops, 2008 12th, pages 401–404. IEEE, 2008

work page 2008
[22]

M. A. Naeem, G. Dobbie, G. Weber, and S. Alam. R-meshjoin for near-real-time data warehousing. In Proceedings of the ACM 13th international workshop on Data warehousing and OLAP, pages 53–60. ACM, 2010

work page 2010
[23]

Neumeyer, B

L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream com- puting platform. In Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, pages 170–177. IEEE, 2010

work page 2010
[24]

T. M. Nguyen, J. Schiefer, and A. M. Tjoa. Sense & response service architec- ture (saresa): an approach towards a real-time business intelligence solution and its use for a fraud detection application. In Proceedings of the 8th ACM international workshop on Data warehousing and OLAP, pages 77–86. ACM, 2005

work page 2005
[25]

Polyzotis, S

N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N.-E. Frantzell. Sup- porting streaming updates in an active data warehouse. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 476–485. IEEE, 2007

work page 2007
[26]

Polyzotis, S

N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N. Frantzell. Meshing streaming updates with persistent data in an active data warehouse. IEEE Transac- tions on Knowledge and Data Engineering, 20(7):976–991, 2008

work page 2008
[27]

Sabtu, N

A. Sabtu, N. F. M. Azmi, N. N. A. Sjarif, S. A. Ismail, O. M. Yusop, H. Sarkan, and S. Chuprat. The challenges of extract, transform and loading (etl) system implemen- tation for near real-time environment. In Research and Innovation in Information Systems (ICRIIS), 2017 International Conference on, pages 1–5. IEEE, 2017

work page 2017
[28]

Sahay and J

B. Sahay and J. Ranjan. Real time business intelligence in supply chain analytics. Information Management & Computer Security, 16(1):28–48, 2008

work page 2008
[29]

Stamatis

D. Stamatis. The OEE Primer: Understanding Overall Equipment Effectiveness, Reliability, and Maintainability. Productivity Press, 1 pap/cdr edition, 6 2010. ISBN 9781439814062. URL http://amazon.com/o/ASIN/1439814066/

work page arXiv 2010
[30]

Thalhammer, M

T. Thalhammer, M. Schreﬂ, and M. Mohania. Active data warehouses: comple- menting olap with analysis rules. Data & Knowledge Engineering, 39(3):241–269, 2001

work page 2001
[31]

Toshniwal, S

A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jack- son, K. Gade, M. Fu, J. Donham, et al. Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data , pages 147–156. ACM, 2014

work page 2014
[32]

Vassiliadis and A

P. Vassiliadis and A. Simitsis. Near real time etl. In New trends in data warehousing and data analysis, pages 1–31. Springer, 2009

work page 2009
[33]

F. Waas, R. Wrembel, T. Freudenreich, M. Thiele, C. Koncilia, and P. Furtado. On- demand elt architecture for right-time bi: extending the vision.International Journal of Data Warehousing and Mining (IJDWM), 9(2):21–38, 2013

work page 2013
[34]

H. J. Watson and B. H. Wixom. The current state of business intelligence.Computer, 40(9), 2007

work page 2007
[35]

A. Wibowo. Problems and available solutions on the stage of extract, transform, and loading in near real-time data warehousing (a literature study). In Intelligent Technology and Its Applications (ISITIA), 2015 International Seminar on , pages 345–350. IEEE, 2015

work page 2015
[36]

Zaharia, T

M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: An efﬁcient and fault-tolerant model for stream processing on large clusters. HotCloud, 12:10– 10, 2012

work page 2012
[37]

Zhang, J

F. Zhang, J. Cao, S. U. Khan, K. Li, and K. Hwang. A task-level adaptive mapreduce framework for real-time streaming data in healthcare applications. Future Genera- tion Computer Systems, 43:149–160, 2015

work page 2015

[1] [1]

Apache Beam

Apache. Apache Beam. https://beam.apache.org/, 2015

work page 2015

[2] [2]

Azvine, Z

B. Azvine, Z. Cui, D. D. Nauck, and B. Majeed. Real time business intelligence for the adaptive enterprise. In E-Commerce Technology, 2006. The 8th IEEE Interna- tional Conference on and Enterprise Computing, E-Commerce, and E-Services, The 3rd IEEE International Conference on, pages 29–29. IEEE, 2006

work page 2006

[3] [3]

M. A. Bornea, A. Deligiannakis, Y . Kotidis, and V . Vassalos. Semi-streamed index join for near-real time execution of etl transformations. InData Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 159–170. IEEE, 2011

work page 2011

[4] [4]

Carbone, A

P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas. Apache ﬂink: Stream and batch processing in a single engine.Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 2015

work page 2015

[5] [5]

E. F. Codd, S. B. Codd, and C. T. Salley. Providing olap (on-line analytical process- ing) to user-analysts: An it mandate. Codd and Date, 32, 1993

work page 1993

[6] [6]

D. Cutting. Apache Avro. https://avro.apache.org/, 2009

work page 2009

[7] [7]

Dean and S

J. Dean and S. Ghemawat. Mapreduce: simpliﬁed data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008

work page 2008

[8] [8]

B. Ellis. Real-time analytics: Techniques to analyze and visualize streaming data . John Wiley & Sons, 2014

work page 2014

[9] [9]

W. A. Giovinazzo. Object-oriented data warehouse design: building a star schema. Prentice Hall PTR, 2000

work page 2000

[10] [10]

Google Dataﬂow

Google. Google Dataﬂow. https://cloud.google.com/dataflow/, 2015

work page 2015

[11] [11]

P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordina- tion for internet-scale systems. In USENIX annual technical conference, volume 8, page 9. Boston, MA, USA, 2010

work page 2010

[12] [12]

Enterprise-Control System Integration Part 2 : Object Model Attributes

International Society of Automation. Enterprise-Control System Integration Part 2 : Object Model Attributes. Isa, 2001

work page 2001

[13] [13]

T. Jain, S. Rajasree, and S. Saluja. Refreshing datawarehouse in near real-time. International Journal of Computer Applications, 46(18):24–29, 2012

work page 2012

[14] [14]

Karakasidis, P

A. Karakasidis, P. Vassiliadis, and E. Pitoura. Etl queues for active data warehous- ing. In Proceedings of the 2nd international workshop on Information quality in information systems, pages 28–39. ACM, 2005

work page 2005

[15] [15]

Kreps, N

J. Kreps, N. Narkhede, J. Rao, et al. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB, pages 1–7, 2011

work page 2011

[16] [16]

Ljungberg

˜O. Ljungberg. Measurement of overall equipment effectiveness as a basis for tpm activities. International Journal of Operations & Production Management , 18(5): 495–507, 1998

work page 1998

[17] [17]

Malhotra

Y . Malhotra. From information management to knowledge management. beyond the’hi-tech hidebound’systems. Knowledge management and business model inno- vation, pages 115–134, 2001

work page 2001

[18] [18]

Mesiti, L

M. Mesiti, L. Ferrari, S. Valtolina, G. Licari, G. Galliani, M. Dao, K. Zettsu, et al. Streamloader: an event-driven etl system for the on-line processing of heterogeneous sensor data. In Extending Database Technology, pages 628–631. OpenProceedings, 2016

work page 2016

[19] [19]

Azure Stream Analytics

Microsoft. Azure Stream Analytics. https://azure.microsoft.com/ en-us/services/stream-analytics/, 2015

work page 2015

[20] [20]

T. Mueller. H2 Database. http://www.h2database.com/, 2012

work page 2012

[21] [21]

M. A. Naeem, G. Dobbie, and G. Webber. An event-based near real-time data inte- gration architecture. In Enterprise Distributed Object Computing Conference Work- shops, 2008 12th, pages 401–404. IEEE, 2008

work page 2008

[22] [22]

M. A. Naeem, G. Dobbie, G. Weber, and S. Alam. R-meshjoin for near-real-time data warehousing. In Proceedings of the ACM 13th international workshop on Data warehousing and OLAP, pages 53–60. ACM, 2010

work page 2010

[23] [23]

Neumeyer, B

L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream com- puting platform. In Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, pages 170–177. IEEE, 2010

work page 2010

[24] [24]

T. M. Nguyen, J. Schiefer, and A. M. Tjoa. Sense & response service architec- ture (saresa): an approach towards a real-time business intelligence solution and its use for a fraud detection application. In Proceedings of the 8th ACM international workshop on Data warehousing and OLAP, pages 77–86. ACM, 2005

work page 2005

[25] [25]

Polyzotis, S

N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N.-E. Frantzell. Sup- porting streaming updates in an active data warehouse. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 476–485. IEEE, 2007

work page 2007

[26] [26]

Polyzotis, S

N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N. Frantzell. Meshing streaming updates with persistent data in an active data warehouse. IEEE Transac- tions on Knowledge and Data Engineering, 20(7):976–991, 2008

work page 2008

[27] [27]

Sabtu, N

A. Sabtu, N. F. M. Azmi, N. N. A. Sjarif, S. A. Ismail, O. M. Yusop, H. Sarkan, and S. Chuprat. The challenges of extract, transform and loading (etl) system implemen- tation for near real-time environment. In Research and Innovation in Information Systems (ICRIIS), 2017 International Conference on, pages 1–5. IEEE, 2017

work page 2017

[28] [28]

Sahay and J

B. Sahay and J. Ranjan. Real time business intelligence in supply chain analytics. Information Management & Computer Security, 16(1):28–48, 2008

work page 2008

[29] [29]

Stamatis

D. Stamatis. The OEE Primer: Understanding Overall Equipment Effectiveness, Reliability, and Maintainability. Productivity Press, 1 pap/cdr edition, 6 2010. ISBN 9781439814062. URL http://amazon.com/o/ASIN/1439814066/

work page arXiv 2010

[30] [30]

Thalhammer, M

T. Thalhammer, M. Schreﬂ, and M. Mohania. Active data warehouses: comple- menting olap with analysis rules. Data & Knowledge Engineering, 39(3):241–269, 2001

work page 2001

[31] [31]

Toshniwal, S

A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jack- son, K. Gade, M. Fu, J. Donham, et al. Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data , pages 147–156. ACM, 2014

work page 2014

[32] [32]

Vassiliadis and A

P. Vassiliadis and A. Simitsis. Near real time etl. In New trends in data warehousing and data analysis, pages 1–31. Springer, 2009

work page 2009

[33] [33]

F. Waas, R. Wrembel, T. Freudenreich, M. Thiele, C. Koncilia, and P. Furtado. On- demand elt architecture for right-time bi: extending the vision.International Journal of Data Warehousing and Mining (IJDWM), 9(2):21–38, 2013

work page 2013

[34] [34]

H. J. Watson and B. H. Wixom. The current state of business intelligence.Computer, 40(9), 2007

work page 2007

[35] [35]

A. Wibowo. Problems and available solutions on the stage of extract, transform, and loading in near real-time data warehousing (a literature study). In Intelligent Technology and Its Applications (ISITIA), 2015 International Seminar on , pages 345–350. IEEE, 2015

work page 2015

[36] [36]

Zaharia, T

M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: An efﬁcient and fault-tolerant model for stream processing on large clusters. HotCloud, 12:10– 10, 2012

work page 2012

[37] [37]

Zhang, J

F. Zhang, J. Cao, S. U. Khan, K. Li, and K. Hwang. A task-level adaptive mapreduce framework for real-time streaming data in healthcare applications. Future Genera- tion Computer Systems, 43:149–160, 2015

work page 2015