Hillview: A trillion-cell spreadsheet for big data

Han Kruiger; Lalith Suresh; Marcos K. Aguilera; Mihai Budiu; Parikshit Gopalan; Udi Wieder

arxiv: 1907.04827 · v1 · pith:TZW3RVUJnew · submitted 2019-07-10 · 💻 cs.DC

Hillview: A trillion-cell spreadsheet for big data

Mihai Budiu , Parikshit Gopalan , Lalith Suresh , Udi Wieder , Han Kruiger , Marcos K. Aguilera This is my paper

Pith reviewed 2026-05-24 23:30 UTC · model grok-4.3

classification 💻 cs.DC

keywords distributed spreadsheetsbig data visualizationvizketchesdata sketchinginteractive analyticsscalabilityprogressive rendering

0 comments

The pith

Hillview lets users interactively explore spreadsheets with trillions of cells on just eight servers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Hillview turns very large datasets into responsive distributed spreadsheets that support rapid navigation and visualization changes. It achieves this scaling through vizketches, compact summaries that merge data reduction algorithms with graphics rendering methods. The system delivers progressive results with accuracy bounds while keeping communication low enough to run on small clusters. A sympathetic reader would care because this removes the single-machine barrier that currently limits spreadsheet-style analysis of big data.

Core claim

Hillview shows that visualization sketches called vizketches can scale spreadsheet interactivity to tens of billions of rows and trillions of cells by parallelizing computation across servers, reducing communication, supporting progressive rendering, and providing precise accuracy guarantees.

What carries the argument

Vizketches: compact visualizations that combine algorithmic data summarization with computer graphics rendering principles to enable low-latency, accurate displays.

If this is right

Users can switch between many visualizations without reloading data.
Exploration remains feasible on datasets far larger than main memory.
Accuracy guarantees let analysts trust the displayed summaries for decisions.
Computation parallelizes across a modest number of servers.
Progressive rendering gives immediate feedback while full precision arrives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sketching approach might support live updates from streaming sources if incremental maintenance is added.
Similar compact summaries could improve interactivity in other visual analytics tools such as geographic maps or network diagrams.
Accuracy bounds might allow automatic query optimization by choosing sketch granularity based on display resolution.

Load-bearing premise

Vizketches can be computed and rendered with low enough latency and communication cost to preserve spreadsheet-style interactivity on arbitrary real-world data.

What would settle it

Measure end-to-end latency for a sequence of arbitrary user queries on a trillion-cell dataset and check whether response times stay under a few seconds with the published accuracy guarantees.

Figures

Figures reproduced from arXiv: 1907.04827 by Han Kruiger, Lalith Suresh, Marcos K. Aguilera, Mihai Budiu, Parikshit Gopalan, Udi Wieder.

**Figure 2.** Figure 2: Some clutter-free visualizations for large datasets. Visualizations cover a single variable (column) or multiple variables, up [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Charts in Hillview have an error of at most 1/2 pixel [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Spreadsheet operations. The + indicates serial operations, while & indicates concurrent operation. Numerical data refers to integer or floating point. interface of the web browser, and we measure two response times at the browser: first partial visualization and final visualization. For the Spark baseline, we start the measurement when the computation starts, and end the measurement when the query result … view at source ↗

**Figure 5.** Figure 5: End-to-end performance comparison. The top graph shows the response time to produce each visualization, while the [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 8.** Figure 8: Scalability as we add more servers and increase the [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Effort required to implement vizketches. [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 11.** Figure 11: Number of actions and time in minutes:seconds [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 10.** Figure 10: Questions used to evaluate the effectiveness of Hill [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 12.** Figure 12: Abstract computational model for vizketches. [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 13.** Figure 13: Charts in Hillview have an error of at most one pixel or one color shade with high probability. (a) A cdf plot with dimension [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

**Figure 14.** Figure 14: Using vizketches to implement specific spreadsheet [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗

read the original abstract

Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hillview shows how vizketches can push interactive spreadsheets to trillions of cells on eight servers, but the performance claims need the full evaluation section to stand up.

read the letter

The main point is that this system paper describes Hillview, which uses vizketches to let analysts browse and visualize datasets with tens of billions of rows and trillions of cells while keeping spreadsheet-style interactivity. The approach runs on eight servers and claims to exceed what competing systems have shown in published work. Vizketches combine data summarization algorithms with graphics rendering rules to cut communication, enable parallel work, support progressive views, and give accuracy bounds. That combination is the concrete new element here, and it looks like a straightforward but effective way to scale the core spreadsheet interaction model without forcing sampling or pre-aggregation. The paper does a reasonable job explaining the distributed architecture and the responsiveness goals. The soft spot is that the abstract gives performance numbers and guarantees without baselines, error bars, or experimental setup details, so the central scaling result is difficult to judge from the summary alone. The assumption that these sketches stay fast enough on real data distributions for arbitrary queries is the part that would need the full measurements to confirm. This work is aimed at people building or using interactive tools for large-scale data exploration. A reader focused on distributed systems or visualization would pick up usable ideas on how to keep latency low at scale. It deserves peer review because the system is complete enough and the scale claim is specific enough that referees can check the implementation and numbers directly.

Referee Report

1 major / 0 minor

Summary. Hillview is a distributed spreadsheet system for interactive exploration of datasets too large for a single machine. It introduces vizketches—compact visualization sketches that combine data summarization algorithms with graphics rendering principles—to enable parallel computation, reduced communication, progressive rendering, and accuracy guarantees while preserving spreadsheet-style interactivity. The central empirical claim is that the system, running on eight servers, supports navigation and visualization of tens of billions of rows and trillions of cells, exceeding published capabilities of competing systems.

Significance. If the reported scaling and latency results hold under the stated conditions, the work provides a concrete demonstration that spreadsheet interactivity can be extended to trillion-cell scales via targeted summarization techniques. This has potential impact on big-data analytics tools by showing how algorithmic sketches can be integrated with rendering to maintain responsiveness without sacrificing accuracy guarantees. The emphasis on progressive visualizations and precise error bounds is a constructive contribution to distributed systems for data exploration.

major comments (1)

[abstract] The central scaling claim (abstract) rests on empirical measurements of vizketches under real-world query workloads, yet the provided text supplies no experimental section, baselines, hardware details, or error-bar information. Without these, the load-bearing assumption that vizketches deliver low-latency interactivity on arbitrary data distributions cannot be evaluated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to clarify aspects of our work. We address the single major comment below, pointing to the relevant sections of the full manuscript.

read point-by-point responses

Referee: [abstract] The central scaling claim (abstract) rests on empirical measurements of vizketches under real-world query workloads, yet the provided text supplies no experimental section, baselines, hardware details, or error-bar information. Without these, the load-bearing assumption that vizketches deliver low-latency interactivity on arbitrary data distributions cannot be evaluated.

Authors: The full manuscript includes Section 6 (Evaluation), which provides the requested details: hardware specifications for the eight-server cluster, descriptions of real-world workloads and datasets (including navigation and visualization tasks on tens of billions of rows), direct baselines against competing systems such as Spark-based tools and other distributed visualization frameworks, measured latencies, and accuracy guarantees with error bounds for the vizketches. These experiments support the abstract's scaling claims under the tested conditions. The abstract is a concise summary and does not duplicate the full experimental methodology or results, which appear in the body of the paper. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a systems paper describing an implementation of a distributed spreadsheet (Hillview) and its vizketches mechanism. The abstract and provided text contain no equations, derivations, fitted parameters, or load-bearing self-citations that reduce a claimed result to its own inputs by construction. Performance claims rest on empirical measurements rather than any self-referential mathematical chain. No instances of the enumerated circularity patterns are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a systems description rather than a derivation; the abstract introduces no free parameters, mathematical axioms, or new postulated entities.

pith-pipeline@v0.9.0 · 5680 in / 993 out tokens · 16153 ms · 2026-05-24T23:30:37.378461+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering... compute only what you can display.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The summarize function outputs a vector of B bin counts, and the merge function adds two vectors.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

108 extracted references · 108 canonical work pages · 2 internal anchors

[1]

The user interface in the browser is implemented in TypeScript [95], using parts of the D3 JavaScript library [11]

IMPLEMENTATION Hillview consists of 35000 lines of Java and 16000 lines of TypeScript code. The user interface in the browser is implemented in TypeScript [95], using parts of the D3 JavaScript library [11]. Graphics is done using SVG [25]. The web server runs the Apache Tomcat application server [4]. The browser gets progressive replies from web server u...

work page
[2]

Flight- Kx

EV ALUATION Our evaluation goal is to determine whether Hillview provides interactive performance with large data sets, how Hillview com- pares to existing systems, how vizketches contribute to that goal, and how effective the spreadsheet is. Summary. We ﬁnd the following results: • Hillview can handle spreadsheets with 130B rows and 1.4T cells using only...

work page 2000
[3]

overview ﬁrst, zoom and ﬁlter, details on demand

RELATED WORK Hillview is the ﬁrst spreadsheet to scale massively with in- teractive speed. Hillview borrows ideas from the algorithms and computer graphics literature, namely mergeable summaries [2] (or sketches) and visualization-driven computation; it uses relies on many techniques from databases (approximate query processing, on-line analytics), big-da...

work page
[4]

Hillview introduces a new query ex- ecution engine specialized to render tabular views and charts for a spreadsheet

CONCLUSION Hillview is a spreadsheet that supports a trillion cells even with a modest number of servers. Hillview introduces a new query ex- ecution engine specialized to render tabular views and charts for a spreadsheet. The new engine uses vizketches, a new but simple idea that parallelizes computation and calculates only what is needed for a good visu...

work page
[5]

Abraham, J

L. Abraham, J. Allen, O. Barykin, V . R. Borkar, B. Chopra, C. Gerea, D. Merl, J. Metzler, D. Reiss, S. Subramanian, J. L. Wiener, and O. Zed. Scuba: Diving into data at Facebook. PVLDB, 6(11):1057–1067, 2013

work page 2013
[6]

P. K. Agarwal, G. Cormode, Z. Huang, J. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In ACM SIGMOD International conference on Management of data, pages 23–34, 2012

work page 2012
[7]

Agarwal, B

S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: Queries with bounded errors and bounded response times on very large data. In European Conference on Computer Systems (EuroSys), Prague, Czech Republic, 2013

work page 2013
[8]

http://tomcat.apache.org

Apache Tomcat. http://tomcat.apache.org. Retrieved March 2019

work page 2019
[9]

Barnett, B

M. Barnett, B. Chandramouli, R. DeLine, S. Drucker, D. Fisher, J. Goldstein, P. Morrison, and J. Platt. Stat!: an interactive analytics environment for big data. In ACM SIGMOD International conference on Management of data, pages 1013–1016, 2013

work page 2013
[10]

Battle, R

L. Battle, R. Chang, and M. Stonebraker. Dynamic reduction of query result sets for interactive visualization. In IEEE International Conference on Big Data, pages 1–8, Oct 2013

work page 2013
[11]

Battle, R

L. Battle, R. Chang, and M. Stonebraker. Dynamic prefetching of data tiles for interactive visualization. In International Conference on Management of Data (SIGMOD ’16), pages 1363–1375, 2016

work page 2016
[12]

Behrisch, D

M. Behrisch, D. Streeb, F. Stoffel, D. Seebacher, B. Matejek, S. H. Weber, S. Mittelstaedt, H. Pﬁster, and D. Keim. Commercial visual analytics systems – advances in the big data analytics ﬁeld.IEEE Transactions on Visualization and Computer Graphics, 2018

work page 2018
[13]

N. Bikakis. Big data visualization tools. In S. Sakr and A. Zomaya, editors, Encyclopedia of Big Data Technologies, pages 1–6. Springer International Publishing, Cham, 2018

work page 2018
[14]

Bikakis, G

N. Bikakis, G. Papastefanatos, M. Skourla, and T. Sellis. A hierarchical aggregation framework for efﬁcient multilevel visual exploration and analysis. Semantic Web, 8(1):139–179, 2017

work page 2017
[15]

Bostock, V

M. Bostock, V . Ogievetsky, and J. Heer. D3: Data-driven documents. IEEE Trans. Visualization and Comp. Graphics (Proc. InfoVis), 2011

work page 2011
[16]

M. Brown. BigSheets for the common man. https://www.ibm.com/developerworks/library/bd-bigsheets/index.html, December 2013

work page 2013
[17]

Budiu, P

M. Budiu, P. Gopalan, L. Suresh, U. Wieder, H. Kruiger, and M. K. Aguilera. Hillview: A trillion-cell spreadsheet for big data (extended version). http://github.com/vmware/hillview/tree/master/docs/paper.pdf, 2019

work page 2019
[18]

Budiu, R

M. Budiu, R. Isaacs, D. Murray, G. Plotkin, P. Barham, S. Al-Kiswany, Y . Boshmaf, Q. Luo, and A. Andoni. Interacting with large distributed datasets using Sketch. In Eurographics Symposium on Parallel Graphics and Visualization, Groningen, Netherlands, June 6-7 2016

work page 2016
[19]

Chaudhuri, G

S. Chaudhuri, G. Das, and V . Narasayya. A robust, optimization-based approach for approximate answering of aggregate queries. In ACM SIGMOD International conference on Management of data, pages 295–306, 2001

work page 2001
[20]

J. Choo, C. Lee, H. Kim, H. Lee, C. Reddy, B. Drake, and H. Park. PIVE: Per-iteration visualization environment for supporting real-time interactions with computational methods. In Visual Analytics Science and Technology (VAST), 2014

work page 2014
[21]

Christopher and V

R. Christopher and V . Krishnan. Optimizing your Amazon Redshift and Tableau software deployment for better performance v2. https://www.tableau.com/sites/default/ﬁles/ whitepapers/optimizing tableau aws redshift whitepaper v2.pdf, 2017

work page 2017
[22]

L. Chu, H. Tang, T. Yang, and K. Shen. Optimizing data aggregation for cluster-based Internet services. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 119–130, 2003

work page 2003
[23]

Cohen and H

E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In ACM Symposium on Principles of Distributed Computing (PODC), pages 225–234, New York, NY , USA,

work page
[24]

Condie, N

T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce online. InUSENIX Conference on Networked Systems Design and Implementation (NSDI), 2010

work page 2010
[25]

G. Cormode. Data sketching. Communications of the ACM, 60(9):48–55, Aug. 2017

work page 2017
[26]

Crotty, A

A. Crotty, A. Galakatos, K. Dursun, T. Kraska, C. Binnig, U. Cetintemel, and S. Zdonik. An architecture for compiling UDF-centric workﬂows. PVLDB, 8(12):1466–1477, Aug. 2015

work page 2015
[27]

Crotty, A

A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. Vizdom: Interactive analytics through pen and touch. PVLDB, 8(12):2024–2027, Aug. 2015

work page 2024
[28]

Crotty, A

A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. The case for interactive data exploration accelerators (IDEAs). In Human-In-the-Loop Data Analytics (HILDA), pages 11:1–11:6, 2016

work page 2016
[29]

Dahlström, P

E. Dahlström, P. Dengler, A. Grasso, C. Lilley, C. McCormack, D. Schepers, J. Watt, J. Ferraiolo, F. Jun, and D. Jackson. Scalable vector graphics (SVG) 1.1. https://www.w3.org/TR/SVG/, August 2011

work page 2011
[30]

de Jonge

K. de Jonge. DirectQuery in SQL server 2016 analysis services. http://download.microsoft.com/download/F/6/F/ F6FBC1FC-F956-49A1-80CD-2941C3B6E417/DirectQuery%20in% 20Analysis%20Services%20-%20Whitepaper.pdf, January 2017

work page 2016
[31]

Dean and S

J. Dean and S. Ghemawat. MapReduce: Simpliﬁed data processing on large clusters. In Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, December 2004

work page 2004
[32]

Demiralp, P

Ç. Demiralp, P. J. Haas, S. Parthasarathy, and T. Pedapati. Foresight: Recommending visual insights. PVLDB, 10(12):1937–1940, 2017

work page 1937
[33]

B. Ding, S. Huang, S. Chaudhuri, K. Chakrabarti, and C. Wang. Sample + seek: Approximating aggregates with distribution precision guarantee. In ACM SIGMOD International conference on Management of data, pages 679–694, 2016

work page 2016
[34]

Dix and G

A. Dix and G. Ellis. by chance: enhancing interaction with large data sets through statistical sampling. In Advanced Visual Interfaces, pages 167–176, 2002

work page 2002
[35]

El-Hindi, Z

M. El-Hindi, Z. Zhao, C. Binnig, and T. Kraska. VisTrees: fast indexes for interactive data exploration. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA), page 5, 2016

work page 2016
[36]

Eldawy, M

A. Eldawy, M. F. Mokbel, and C. Jonathan. HadoopViz: A MapReduce framework for extensible visualization of big spatial data. In International Conference on Data Engineering (ICDE), Helsinki, Finland, May 2016

work page 2016
[37]

Elmqvist and J

N. Elmqvist and J. Fekete. Hierarchical aggregation for information visualization: Overview, techniques, and design guidelines. IEEE Transactions on Visualization and Computer Graphics, 16(3):439–454, May 2010

work page 2010
[38]

http://fastutil.di.unimi.it

:::fastutil: Fast and compact type-speciﬁc collections for Java. http://fastutil.di.unimi.it. Retrieved October 2017

work page 2017
[39]

Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis

J.-D. Fekete and R. Primet. Progressive analytics: A computation paradigm for exploratory data analysis. https://arxiv.org/abs/1607.05162, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[40]

Feldman, S

J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. ACM Trans. Algorithms, 6(4):66:1–66:19, 2010

work page 2010
[41]

Fette and A

I. Fette and A. Melnikov. The WebSocket protocol. IETF RFC 6455, December 2001

work page 2001
[42]

D. Fisher. Big data exploration requires collaboration between visualization and data infrastructures. In Human-In-the-Loop Data Analytics (HILDA), pages 16:1–16:5, 2016

work page 2016
[43]

Fisher, I

D. Fisher, I. Popov, S. Drucker, and M. Schraefel. Trust me, I’m partially right: Incremental visualization lets analysts explore large datasets faster. In SIGCHI Conference on Human Factors in Computing Systems, pages 1673–1682, 2012

work page 2012
[44]

Flajolet, Éric Fusy, O

P. Flajolet, Éric Fusy, O. Gandouet, and F. Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms (AofA) DMTCS proc., pages 127–146, 2007

work page 2007
[45]

Ghosh, M

A. Ghosh, M. Nashaat, J. Miller, S. Quader, and C. Marston. A comprehensive review of tools for exploratory analysis of tabular industrial datasets. Visual Informatics, 2018

work page 2018
[46]

Godfrey, J

P. Godfrey, J. Gryz, and P. Lasek. Interactive visualization of large data sets. IEEE Transactions on Knowledge and Data Engineering, 28(8):2142–2157, 2016

work page 2016
[47]

Godfrey, J

P. Godfrey, J. Gryz, P. Lasek, and N. Razavi. Visualization through inductive aggregation. In International Conference on Extending Database Technology (EDBT), pages 600–603, 2016

work page 2016
[48]

https://grpc.io/

gRPC: A high performance, open-source universal RPC framework. https://grpc.io/. Retrieved October 2017

work page 2017
[49]

A. Hall, O. Bachmann, R. Büssow, S. G ˘anceanu, and M. Nunkesser. Processing a trillion cells per mouse click. PVLDB, 5(11):1436–1446, July 2012

work page 2012
[50]

Hausenblas and J

M. Hausenblas and J. Nadeau. Apache Drill: Interactive ad-hoc analysis at scale. IEEE Comput. Graph. Appl., 1(2), June 2013

work page 2013
[51]

J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In ACM SIGMOD International conference on Management of data, pages 171–182, 1997

work page 1997
[52]

J. F. Hughes, A. van Dam, M. McGuide, D. F. Sklar, J. D. Foley, S. K. Feiner, and K. Akeley.Computer Graphics: Principles and Practice (3rd Edition). Addison-Wesley Professional, 2013

work page 2013
[53]

J.-F. Im, K. Gopalakrishna, S. Subramaniam, M. Shrivastava, A. Tumbde, X. Jiang, J. Dai, S. Lee, N. Pawar, J. Li, and R. Aringunram. Pinot: Realtime OLAP for 530 million users. In International Conference on Management of Data (SIGMOD), pages 583–594, 2018

work page 2018
[54]

J.-F. Im, F. G. Villegas, and M. J. McGufﬁn. VisReduce: Fast and responsive incremental information visualization of large datasets. In IEEE International Conference on Big Data, pages 25–32, Oct 2013

work page 2013
[55]

J. Jo, W. Kim, S. Yoo, B. Kim, and J. Seo. SwiftTuna: Incrementally exploring large-scale multidimensional data. In IEEE VIS, Phoenix, AZ, October 2016

work page 2016
[56]

J. Jo, W. Kim, S. Yoo, B. Kim, and J. Seo. SwiftTuna: Responsive and incremental visual exploration of large-scale multidimensional data. In Paciﬁc Visualization Symposium (PaciﬁcVis), pages 131–140, Seoul, Korea, 2017

work page 2017
[57]

Jugel, Z

U. Jugel, Z. Jerzak, G. Hackenbroich, and V . Markl. M4: A visualization-oriented time series data aggregation. PVLDB, 7(10):797–808, June 2014

work page 2014
[58]

Jugel, Z

U. Jugel, Z. Jerzak, G. Hackenbroich, and V . Markl. VDDA: Automatic visualization-driven data aggregation in relational databases. The VLDB Journal, 25(1):53–77, Feb. 2016

work page 2016
[59]

Kamat, P

N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi. Distributed interactive cube exploration. In International Conference on Data Engineering (ICDE), pages 472–483, March 2014

work page 2014
[60]

Kamat and A

N. Kamat and A. Nandi. A session-based approach to fast-but-approximate interactive data cube exploration. ACM Trans. Knowl. Discov. Data, 12(1):1–26, Feb. 2018

work page 2018
[61]

Kandel, R

S. Kandel, R. Parikh, A. Paepcke, J. Hellerstein, and J. Heer. Proﬁler: Integrated statistical analysis and visualization for data quality assessment. In Advanced Visual Interfaces, 2012

work page 2012
[62]

A. Kim, E. Blais, A. Parameswaran, P. Indyk, S. Madden, and R. Rubinfeld. Rapid sampling for visualizations with ordering guarantees. PVLDB, 8(5):521–532, Jan. 2015

work page 2015
[63]

A. Kim, L. Xu, T. Siddiqui, S. Huang, S. Madden, and A. Parameswaran. Optimally leveraging density and locality for exploratory browsing and sampling. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA 18), HILDA, pages 7:1–7:7, 2018

work page 2018
[64]

Kornacker, A

M. Kornacker, A. Behm, V . Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for Hadoop. In Conference on Innovative Data Sys...

work page 2015
[65]

Laptev, K

N. Laptev, K. Zeng, and C. Zaniolo. Early accurate results for advanced analytics on MapReduce. PVLDB, 5(10):1028–1039, June 2012

work page 2012
[66]

L. Lins, J. T. Klosowski, and C. Scheidegger. Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Transactions on Visualization and Computer Graphics, 19(12):2456–2465, 2013

work page 2013
[67]

Z. Liu, B. Jiang, and J. Heer. imMens: Real-time visual querying of big data. Computer Graphics Forum (Proc. EuroVis), 32, 2013

work page 2013
[68]

E. Meijer. Your mouse is a database. ACM Queue, 10(3):20–33, Mar. 2012

work page 2012
[69]

Melnik, A

S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive analysis of web-scale datasets. PVLDB, 3(1-2):330–339, Sept. 2010

work page 2010
[70]

Microsoft Corp. Tempe. http://research.microsoft.com/en-us/projects/tempe/. Retrieved January 2019

work page 2019
[71]

https://powerbi.microsoft.com

Microsoft PowerBI. https://powerbi.microsoft.com. Accessed October 2017

work page 2017
[72]

Misra and D

J. Misra and D. Gries. Finding repeated elements. Science of Computer Programming, 2:143–152, 1982

work page 1982
[73]

Moritz, D

D. Moritz, D. Fisher, B. Ding, and C. Wang. Trust, but verify: Optimistic visualizations of approximate queries for exploring big data. In ACM Human Factors in Computing Systems (CHI), 2017

work page 2017
[74]

Muthukrishnan

S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and trends in theoretical computer science. Now Publishers, 2005

work page 2005
[75]

U. D. of Transportation. Airline on-time performance data. https://transtats.bts.gov/Tables.asp?DB ID=120. Retrieved January 2019

work page 2019
[76]

https://www.omnisci.com, Retrieved October 2018

OmniSci is the extreme analytics platform. https://www.omnisci.com, Retrieved October 2018

work page 2018
[77]

Project Nashorn

Oracle Corp. Project Nashorn. http://openjdk.java.net/projects/nashorn/. Retrieved February 2018

work page 2018
[78]

C. A. L. Pahins, S. A. Stephens, C. Scheidegger, and J. L. D. Comba. Hashedcubes: Simple, low memory, real-time visual exploration of big data. IEEE Transactions on Visualization and Computer Graphics, 23(1):671–680, 2017

work page 2017
[79]

Pansare, V

N. Pansare, V . R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large MapReduce jobs. In PVLDB, Seattle, W A, August 2011

work page 2011
[80]

Y . Park, M. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. In International Conference on Data Engineering (ICDE), pages 755–766. IEEE, 2016

work page 2016

Showing first 80 references.

[1] [1]

The user interface in the browser is implemented in TypeScript [95], using parts of the D3 JavaScript library [11]

IMPLEMENTATION Hillview consists of 35000 lines of Java and 16000 lines of TypeScript code. The user interface in the browser is implemented in TypeScript [95], using parts of the D3 JavaScript library [11]. Graphics is done using SVG [25]. The web server runs the Apache Tomcat application server [4]. The browser gets progressive replies from web server u...

work page

[2] [2]

Flight- Kx

EV ALUATION Our evaluation goal is to determine whether Hillview provides interactive performance with large data sets, how Hillview com- pares to existing systems, how vizketches contribute to that goal, and how effective the spreadsheet is. Summary. We ﬁnd the following results: • Hillview can handle spreadsheets with 130B rows and 1.4T cells using only...

work page 2000

[3] [3]

overview ﬁrst, zoom and ﬁlter, details on demand

RELATED WORK Hillview is the ﬁrst spreadsheet to scale massively with in- teractive speed. Hillview borrows ideas from the algorithms and computer graphics literature, namely mergeable summaries [2] (or sketches) and visualization-driven computation; it uses relies on many techniques from databases (approximate query processing, on-line analytics), big-da...

work page

[4] [4]

Hillview introduces a new query ex- ecution engine specialized to render tabular views and charts for a spreadsheet

CONCLUSION Hillview is a spreadsheet that supports a trillion cells even with a modest number of servers. Hillview introduces a new query ex- ecution engine specialized to render tabular views and charts for a spreadsheet. The new engine uses vizketches, a new but simple idea that parallelizes computation and calculates only what is needed for a good visu...

work page

[5] [5]

Abraham, J

L. Abraham, J. Allen, O. Barykin, V . R. Borkar, B. Chopra, C. Gerea, D. Merl, J. Metzler, D. Reiss, S. Subramanian, J. L. Wiener, and O. Zed. Scuba: Diving into data at Facebook. PVLDB, 6(11):1057–1067, 2013

work page 2013

[6] [6]

P. K. Agarwal, G. Cormode, Z. Huang, J. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In ACM SIGMOD International conference on Management of data, pages 23–34, 2012

work page 2012

[7] [7]

Agarwal, B

S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: Queries with bounded errors and bounded response times on very large data. In European Conference on Computer Systems (EuroSys), Prague, Czech Republic, 2013

work page 2013

[8] [8]

http://tomcat.apache.org

Apache Tomcat. http://tomcat.apache.org. Retrieved March 2019

work page 2019

[9] [9]

Barnett, B

M. Barnett, B. Chandramouli, R. DeLine, S. Drucker, D. Fisher, J. Goldstein, P. Morrison, and J. Platt. Stat!: an interactive analytics environment for big data. In ACM SIGMOD International conference on Management of data, pages 1013–1016, 2013

work page 2013

[10] [10]

Battle, R

L. Battle, R. Chang, and M. Stonebraker. Dynamic reduction of query result sets for interactive visualization. In IEEE International Conference on Big Data, pages 1–8, Oct 2013

work page 2013

[11] [11]

Battle, R

L. Battle, R. Chang, and M. Stonebraker. Dynamic prefetching of data tiles for interactive visualization. In International Conference on Management of Data (SIGMOD ’16), pages 1363–1375, 2016

work page 2016

[12] [12]

Behrisch, D

M. Behrisch, D. Streeb, F. Stoffel, D. Seebacher, B. Matejek, S. H. Weber, S. Mittelstaedt, H. Pﬁster, and D. Keim. Commercial visual analytics systems – advances in the big data analytics ﬁeld.IEEE Transactions on Visualization and Computer Graphics, 2018

work page 2018

[13] [13]

N. Bikakis. Big data visualization tools. In S. Sakr and A. Zomaya, editors, Encyclopedia of Big Data Technologies, pages 1–6. Springer International Publishing, Cham, 2018

work page 2018

[14] [14]

Bikakis, G

N. Bikakis, G. Papastefanatos, M. Skourla, and T. Sellis. A hierarchical aggregation framework for efﬁcient multilevel visual exploration and analysis. Semantic Web, 8(1):139–179, 2017

work page 2017

[15] [15]

Bostock, V

M. Bostock, V . Ogievetsky, and J. Heer. D3: Data-driven documents. IEEE Trans. Visualization and Comp. Graphics (Proc. InfoVis), 2011

work page 2011

[16] [16]

M. Brown. BigSheets for the common man. https://www.ibm.com/developerworks/library/bd-bigsheets/index.html, December 2013

work page 2013

[17] [17]

Budiu, P

M. Budiu, P. Gopalan, L. Suresh, U. Wieder, H. Kruiger, and M. K. Aguilera. Hillview: A trillion-cell spreadsheet for big data (extended version). http://github.com/vmware/hillview/tree/master/docs/paper.pdf, 2019

work page 2019

[18] [18]

Budiu, R

M. Budiu, R. Isaacs, D. Murray, G. Plotkin, P. Barham, S. Al-Kiswany, Y . Boshmaf, Q. Luo, and A. Andoni. Interacting with large distributed datasets using Sketch. In Eurographics Symposium on Parallel Graphics and Visualization, Groningen, Netherlands, June 6-7 2016

work page 2016

[19] [19]

Chaudhuri, G

S. Chaudhuri, G. Das, and V . Narasayya. A robust, optimization-based approach for approximate answering of aggregate queries. In ACM SIGMOD International conference on Management of data, pages 295–306, 2001

work page 2001

[20] [20]

J. Choo, C. Lee, H. Kim, H. Lee, C. Reddy, B. Drake, and H. Park. PIVE: Per-iteration visualization environment for supporting real-time interactions with computational methods. In Visual Analytics Science and Technology (VAST), 2014

work page 2014

[21] [21]

Christopher and V

R. Christopher and V . Krishnan. Optimizing your Amazon Redshift and Tableau software deployment for better performance v2. https://www.tableau.com/sites/default/ﬁles/ whitepapers/optimizing tableau aws redshift whitepaper v2.pdf, 2017

work page 2017

[22] [22]

L. Chu, H. Tang, T. Yang, and K. Shen. Optimizing data aggregation for cluster-based Internet services. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 119–130, 2003

work page 2003

[23] [23]

Cohen and H

E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In ACM Symposium on Principles of Distributed Computing (PODC), pages 225–234, New York, NY , USA,

work page

[24] [24]

Condie, N

T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce online. InUSENIX Conference on Networked Systems Design and Implementation (NSDI), 2010

work page 2010

[25] [25]

G. Cormode. Data sketching. Communications of the ACM, 60(9):48–55, Aug. 2017

work page 2017

[26] [26]

Crotty, A

A. Crotty, A. Galakatos, K. Dursun, T. Kraska, C. Binnig, U. Cetintemel, and S. Zdonik. An architecture for compiling UDF-centric workﬂows. PVLDB, 8(12):1466–1477, Aug. 2015

work page 2015

[27] [27]

Crotty, A

A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. Vizdom: Interactive analytics through pen and touch. PVLDB, 8(12):2024–2027, Aug. 2015

work page 2024

[28] [28]

Crotty, A

A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. The case for interactive data exploration accelerators (IDEAs). In Human-In-the-Loop Data Analytics (HILDA), pages 11:1–11:6, 2016

work page 2016

[29] [29]

Dahlström, P

E. Dahlström, P. Dengler, A. Grasso, C. Lilley, C. McCormack, D. Schepers, J. Watt, J. Ferraiolo, F. Jun, and D. Jackson. Scalable vector graphics (SVG) 1.1. https://www.w3.org/TR/SVG/, August 2011

work page 2011

[30] [30]

de Jonge

K. de Jonge. DirectQuery in SQL server 2016 analysis services. http://download.microsoft.com/download/F/6/F/ F6FBC1FC-F956-49A1-80CD-2941C3B6E417/DirectQuery%20in% 20Analysis%20Services%20-%20Whitepaper.pdf, January 2017

work page 2016

[31] [31]

Dean and S

J. Dean and S. Ghemawat. MapReduce: Simpliﬁed data processing on large clusters. In Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, December 2004

work page 2004

[32] [32]

Demiralp, P

Ç. Demiralp, P. J. Haas, S. Parthasarathy, and T. Pedapati. Foresight: Recommending visual insights. PVLDB, 10(12):1937–1940, 2017

work page 1937

[33] [33]

B. Ding, S. Huang, S. Chaudhuri, K. Chakrabarti, and C. Wang. Sample + seek: Approximating aggregates with distribution precision guarantee. In ACM SIGMOD International conference on Management of data, pages 679–694, 2016

work page 2016

[34] [34]

Dix and G

A. Dix and G. Ellis. by chance: enhancing interaction with large data sets through statistical sampling. In Advanced Visual Interfaces, pages 167–176, 2002

work page 2002

[35] [35]

El-Hindi, Z

M. El-Hindi, Z. Zhao, C. Binnig, and T. Kraska. VisTrees: fast indexes for interactive data exploration. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA), page 5, 2016

work page 2016

[36] [36]

Eldawy, M

A. Eldawy, M. F. Mokbel, and C. Jonathan. HadoopViz: A MapReduce framework for extensible visualization of big spatial data. In International Conference on Data Engineering (ICDE), Helsinki, Finland, May 2016

work page 2016

[37] [37]

Elmqvist and J

N. Elmqvist and J. Fekete. Hierarchical aggregation for information visualization: Overview, techniques, and design guidelines. IEEE Transactions on Visualization and Computer Graphics, 16(3):439–454, May 2010

work page 2010

[38] [38]

http://fastutil.di.unimi.it

:::fastutil: Fast and compact type-speciﬁc collections for Java. http://fastutil.di.unimi.it. Retrieved October 2017

work page 2017

[39] [39]

Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis

J.-D. Fekete and R. Primet. Progressive analytics: A computation paradigm for exploratory data analysis. https://arxiv.org/abs/1607.05162, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[40] [40]

Feldman, S

J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. ACM Trans. Algorithms, 6(4):66:1–66:19, 2010

work page 2010

[41] [41]

Fette and A

I. Fette and A. Melnikov. The WebSocket protocol. IETF RFC 6455, December 2001

work page 2001

[42] [42]

D. Fisher. Big data exploration requires collaboration between visualization and data infrastructures. In Human-In-the-Loop Data Analytics (HILDA), pages 16:1–16:5, 2016

work page 2016

[43] [43]

Fisher, I

D. Fisher, I. Popov, S. Drucker, and M. Schraefel. Trust me, I’m partially right: Incremental visualization lets analysts explore large datasets faster. In SIGCHI Conference on Human Factors in Computing Systems, pages 1673–1682, 2012

work page 2012

[44] [44]

Flajolet, Éric Fusy, O

P. Flajolet, Éric Fusy, O. Gandouet, and F. Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms (AofA) DMTCS proc., pages 127–146, 2007

work page 2007

[45] [45]

Ghosh, M

A. Ghosh, M. Nashaat, J. Miller, S. Quader, and C. Marston. A comprehensive review of tools for exploratory analysis of tabular industrial datasets. Visual Informatics, 2018

work page 2018

[46] [46]

Godfrey, J

P. Godfrey, J. Gryz, and P. Lasek. Interactive visualization of large data sets. IEEE Transactions on Knowledge and Data Engineering, 28(8):2142–2157, 2016

work page 2016

[47] [47]

Godfrey, J

P. Godfrey, J. Gryz, P. Lasek, and N. Razavi. Visualization through inductive aggregation. In International Conference on Extending Database Technology (EDBT), pages 600–603, 2016

work page 2016

[48] [48]

https://grpc.io/

gRPC: A high performance, open-source universal RPC framework. https://grpc.io/. Retrieved October 2017

work page 2017

[49] [49]

A. Hall, O. Bachmann, R. Büssow, S. G ˘anceanu, and M. Nunkesser. Processing a trillion cells per mouse click. PVLDB, 5(11):1436–1446, July 2012

work page 2012

[50] [50]

Hausenblas and J

M. Hausenblas and J. Nadeau. Apache Drill: Interactive ad-hoc analysis at scale. IEEE Comput. Graph. Appl., 1(2), June 2013

work page 2013

[51] [51]

J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In ACM SIGMOD International conference on Management of data, pages 171–182, 1997

work page 1997

[52] [52]

J. F. Hughes, A. van Dam, M. McGuide, D. F. Sklar, J. D. Foley, S. K. Feiner, and K. Akeley.Computer Graphics: Principles and Practice (3rd Edition). Addison-Wesley Professional, 2013

work page 2013

[53] [53]

J.-F. Im, K. Gopalakrishna, S. Subramaniam, M. Shrivastava, A. Tumbde, X. Jiang, J. Dai, S. Lee, N. Pawar, J. Li, and R. Aringunram. Pinot: Realtime OLAP for 530 million users. In International Conference on Management of Data (SIGMOD), pages 583–594, 2018

work page 2018

[54] [54]

J.-F. Im, F. G. Villegas, and M. J. McGufﬁn. VisReduce: Fast and responsive incremental information visualization of large datasets. In IEEE International Conference on Big Data, pages 25–32, Oct 2013

work page 2013

[55] [55]

J. Jo, W. Kim, S. Yoo, B. Kim, and J. Seo. SwiftTuna: Incrementally exploring large-scale multidimensional data. In IEEE VIS, Phoenix, AZ, October 2016

work page 2016

[56] [56]

J. Jo, W. Kim, S. Yoo, B. Kim, and J. Seo. SwiftTuna: Responsive and incremental visual exploration of large-scale multidimensional data. In Paciﬁc Visualization Symposium (PaciﬁcVis), pages 131–140, Seoul, Korea, 2017

work page 2017

[57] [57]

Jugel, Z

U. Jugel, Z. Jerzak, G. Hackenbroich, and V . Markl. M4: A visualization-oriented time series data aggregation. PVLDB, 7(10):797–808, June 2014

work page 2014

[58] [58]

Jugel, Z

U. Jugel, Z. Jerzak, G. Hackenbroich, and V . Markl. VDDA: Automatic visualization-driven data aggregation in relational databases. The VLDB Journal, 25(1):53–77, Feb. 2016

work page 2016

[59] [59]

Kamat, P

N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi. Distributed interactive cube exploration. In International Conference on Data Engineering (ICDE), pages 472–483, March 2014

work page 2014

[60] [60]

Kamat and A

N. Kamat and A. Nandi. A session-based approach to fast-but-approximate interactive data cube exploration. ACM Trans. Knowl. Discov. Data, 12(1):1–26, Feb. 2018

work page 2018

[61] [61]

Kandel, R

S. Kandel, R. Parikh, A. Paepcke, J. Hellerstein, and J. Heer. Proﬁler: Integrated statistical analysis and visualization for data quality assessment. In Advanced Visual Interfaces, 2012

work page 2012

[62] [62]

A. Kim, E. Blais, A. Parameswaran, P. Indyk, S. Madden, and R. Rubinfeld. Rapid sampling for visualizations with ordering guarantees. PVLDB, 8(5):521–532, Jan. 2015

work page 2015

[63] [63]

A. Kim, L. Xu, T. Siddiqui, S. Huang, S. Madden, and A. Parameswaran. Optimally leveraging density and locality for exploratory browsing and sampling. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA 18), HILDA, pages 7:1–7:7, 2018

work page 2018

[64] [64]

Kornacker, A

M. Kornacker, A. Behm, V . Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for Hadoop. In Conference on Innovative Data Sys...

work page 2015

[65] [65]

Laptev, K

N. Laptev, K. Zeng, and C. Zaniolo. Early accurate results for advanced analytics on MapReduce. PVLDB, 5(10):1028–1039, June 2012

work page 2012

[66] [66]

L. Lins, J. T. Klosowski, and C. Scheidegger. Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Transactions on Visualization and Computer Graphics, 19(12):2456–2465, 2013

work page 2013

[67] [67]

Z. Liu, B. Jiang, and J. Heer. imMens: Real-time visual querying of big data. Computer Graphics Forum (Proc. EuroVis), 32, 2013

work page 2013

[68] [68]

E. Meijer. Your mouse is a database. ACM Queue, 10(3):20–33, Mar. 2012

work page 2012

[69] [69]

Melnik, A

S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive analysis of web-scale datasets. PVLDB, 3(1-2):330–339, Sept. 2010

work page 2010

[70] [70]

Microsoft Corp. Tempe. http://research.microsoft.com/en-us/projects/tempe/. Retrieved January 2019

work page 2019

[71] [71]

https://powerbi.microsoft.com

Microsoft PowerBI. https://powerbi.microsoft.com. Accessed October 2017

work page 2017

[72] [72]

Misra and D

J. Misra and D. Gries. Finding repeated elements. Science of Computer Programming, 2:143–152, 1982

work page 1982

[73] [73]

Moritz, D

D. Moritz, D. Fisher, B. Ding, and C. Wang. Trust, but verify: Optimistic visualizations of approximate queries for exploring big data. In ACM Human Factors in Computing Systems (CHI), 2017

work page 2017

[74] [74]

Muthukrishnan

S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and trends in theoretical computer science. Now Publishers, 2005

work page 2005

[75] [75]

U. D. of Transportation. Airline on-time performance data. https://transtats.bts.gov/Tables.asp?DB ID=120. Retrieved January 2019

work page 2019

[76] [76]

https://www.omnisci.com, Retrieved October 2018

OmniSci is the extreme analytics platform. https://www.omnisci.com, Retrieved October 2018

work page 2018

[77] [77]

Project Nashorn

Oracle Corp. Project Nashorn. http://openjdk.java.net/projects/nashorn/. Retrieved February 2018

work page 2018

[78] [78]

C. A. L. Pahins, S. A. Stephens, C. Scheidegger, and J. L. D. Comba. Hashedcubes: Simple, low memory, real-time visual exploration of big data. IEEE Transactions on Visualization and Computer Graphics, 23(1):671–680, 2017

work page 2017

[79] [79]

Pansare, V

N. Pansare, V . R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large MapReduce jobs. In PVLDB, Seattle, W A, August 2011

work page 2011

[80] [80]

Y . Park, M. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. In International Conference on Data Engineering (ICDE), pages 755–766. IEEE, 2016

work page 2016