pith. sign in

arxiv: 1907.04827 · v1 · pith:TZW3RVUJnew · submitted 2019-07-10 · 💻 cs.DC

Hillview: A trillion-cell spreadsheet for big data

Pith reviewed 2026-05-24 23:30 UTC · model grok-4.3

classification 💻 cs.DC
keywords distributed spreadsheetsbig data visualizationvizketchesdata sketchinginteractive analyticsscalabilityprogressive rendering
0
0 comments X

The pith

Hillview lets users interactively explore spreadsheets with trillions of cells on just eight servers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Hillview turns very large datasets into responsive distributed spreadsheets that support rapid navigation and visualization changes. It achieves this scaling through vizketches, compact summaries that merge data reduction algorithms with graphics rendering methods. The system delivers progressive results with accuracy bounds while keeping communication low enough to run on small clusters. A sympathetic reader would care because this removes the single-machine barrier that currently limits spreadsheet-style analysis of big data.

Core claim

Hillview shows that visualization sketches called vizketches can scale spreadsheet interactivity to tens of billions of rows and trillions of cells by parallelizing computation across servers, reducing communication, supporting progressive rendering, and providing precise accuracy guarantees.

What carries the argument

Vizketches: compact visualizations that combine algorithmic data summarization with computer graphics rendering principles to enable low-latency, accurate displays.

If this is right

  • Users can switch between many visualizations without reloading data.
  • Exploration remains feasible on datasets far larger than main memory.
  • Accuracy guarantees let analysts trust the displayed summaries for decisions.
  • Computation parallelizes across a modest number of servers.
  • Progressive rendering gives immediate feedback while full precision arrives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sketching approach might support live updates from streaming sources if incremental maintenance is added.
  • Similar compact summaries could improve interactivity in other visual analytics tools such as geographic maps or network diagrams.
  • Accuracy bounds might allow automatic query optimization by choosing sketch granularity based on display resolution.

Load-bearing premise

Vizketches can be computed and rendered with low enough latency and communication cost to preserve spreadsheet-style interactivity on arbitrary real-world data.

What would settle it

Measure end-to-end latency for a sequence of arbitrary user queries on a trillion-cell dataset and check whether response times stay under a few seconds with the published accuracy guarantees.

Figures

Figures reproduced from arXiv: 1907.04827 by Han Kruiger, Lalith Suresh, Marcos K. Aguilera, Mihai Budiu, Parikshit Gopalan, Udi Wieder.

Figure 1
Figure 1. Figure 1: Hillview is a spreadsheet for browsing big data. It [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Some clutter-free visualizations for large datasets. Visualizations cover a single variable (column) or multiple variables, up [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Charts in Hillview have an error of at most 1/2 pixel [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Spreadsheet operations. The + indicates serial oper￾ations, while & indicates concurrent operation. Numerical data refers to integer or floating point. interface of the web browser, and we measure two response times at the browser: first partial visualization and final visualization. For the Spark baseline, we start the measurement when the computation starts, and end the measurement when the query result … view at source ↗
Figure 5
Figure 5. Figure 5: End-to-end performance comparison. The top graph shows the response time to produce each visualization, while the [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Scalability as we add more servers and increase the [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effort required to implement vizketches. [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Number of actions and time in minutes:seconds [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 10
Figure 10. Figure 10: Questions used to evaluate the effectiveness of Hill [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Abstract computational model for vizketches. [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Charts in Hillview have an error of at most one pixel or one color shade with high probability. (a) A cdf plot with dimension [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Using vizketches to implement specific spreadsheet [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
read the original abstract

Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. Hillview is a distributed spreadsheet system for interactive exploration of datasets too large for a single machine. It introduces vizketches—compact visualization sketches that combine data summarization algorithms with graphics rendering principles—to enable parallel computation, reduced communication, progressive rendering, and accuracy guarantees while preserving spreadsheet-style interactivity. The central empirical claim is that the system, running on eight servers, supports navigation and visualization of tens of billions of rows and trillions of cells, exceeding published capabilities of competing systems.

Significance. If the reported scaling and latency results hold under the stated conditions, the work provides a concrete demonstration that spreadsheet interactivity can be extended to trillion-cell scales via targeted summarization techniques. This has potential impact on big-data analytics tools by showing how algorithmic sketches can be integrated with rendering to maintain responsiveness without sacrificing accuracy guarantees. The emphasis on progressive visualizations and precise error bounds is a constructive contribution to distributed systems for data exploration.

major comments (1)
  1. [abstract] The central scaling claim (abstract) rests on empirical measurements of vizketches under real-world query workloads, yet the provided text supplies no experimental section, baselines, hardware details, or error-bar information. Without these, the load-bearing assumption that vizketches deliver low-latency interactivity on arbitrary data distributions cannot be evaluated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to clarify aspects of our work. We address the single major comment below, pointing to the relevant sections of the full manuscript.

read point-by-point responses
  1. Referee: [abstract] The central scaling claim (abstract) rests on empirical measurements of vizketches under real-world query workloads, yet the provided text supplies no experimental section, baselines, hardware details, or error-bar information. Without these, the load-bearing assumption that vizketches deliver low-latency interactivity on arbitrary data distributions cannot be evaluated.

    Authors: The full manuscript includes Section 6 (Evaluation), which provides the requested details: hardware specifications for the eight-server cluster, descriptions of real-world workloads and datasets (including navigation and visualization tasks on tens of billions of rows), direct baselines against competing systems such as Spark-based tools and other distributed visualization frameworks, measured latencies, and accuracy guarantees with error bounds for the vizketches. These experiments support the abstract's scaling claims under the tested conditions. The abstract is a concise summary and does not duplicate the full experimental methodology or results, which appear in the body of the paper. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a systems paper describing an implementation of a distributed spreadsheet (Hillview) and its vizketches mechanism. The abstract and provided text contain no equations, derivations, fitted parameters, or load-bearing self-citations that reduce a claimed result to its own inputs by construction. Performance claims rest on empirical measurements rather than any self-referential mathematical chain. No instances of the enumerated circularity patterns are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a systems description rather than a derivation; the abstract introduces no free parameters, mathematical axioms, or new postulated entities.

pith-pipeline@v0.9.0 · 5680 in / 993 out tokens · 16153 ms · 2026-05-24T23:30:37.378461+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

108 extracted references · 108 canonical work pages · 2 internal anchors

  1. [1]

    The user interface in the browser is implemented in TypeScript [95], using parts of the D3 JavaScript library [11]

    IMPLEMENTATION Hillview consists of 35000 lines of Java and 16000 lines of TypeScript code. The user interface in the browser is implemented in TypeScript [95], using parts of the D3 JavaScript library [11]. Graphics is done using SVG [25]. The web server runs the Apache Tomcat application server [4]. The browser gets progressive replies from web server u...

  2. [2]

    Flight- Kx

    EV ALUATION Our evaluation goal is to determine whether Hillview provides interactive performance with large data sets, how Hillview com- pares to existing systems, how vizketches contribute to that goal, and how effective the spreadsheet is. Summary. We find the following results: • Hillview can handle spreadsheets with 130B rows and 1.4T cells using only...

  3. [3]

    overview first, zoom and filter, details on demand

    RELATED WORK Hillview is the first spreadsheet to scale massively with in- teractive speed. Hillview borrows ideas from the algorithms and computer graphics literature, namely mergeable summaries [2] (or sketches) and visualization-driven computation; it uses relies on many techniques from databases (approximate query processing, on-line analytics), big-da...

  4. [4]

    Hillview introduces a new query ex- ecution engine specialized to render tabular views and charts for a spreadsheet

    CONCLUSION Hillview is a spreadsheet that supports a trillion cells even with a modest number of servers. Hillview introduces a new query ex- ecution engine specialized to render tabular views and charts for a spreadsheet. The new engine uses vizketches, a new but simple idea that parallelizes computation and calculates only what is needed for a good visu...

  5. [5]

    Abraham, J

    L. Abraham, J. Allen, O. Barykin, V . R. Borkar, B. Chopra, C. Gerea, D. Merl, J. Metzler, D. Reiss, S. Subramanian, J. L. Wiener, and O. Zed. Scuba: Diving into data at Facebook. PVLDB, 6(11):1057–1067, 2013

  6. [6]

    P. K. Agarwal, G. Cormode, Z. Huang, J. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In ACM SIGMOD International conference on Management of data, pages 23–34, 2012

  7. [7]

    Agarwal, B

    S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: Queries with bounded errors and bounded response times on very large data. In European Conference on Computer Systems (EuroSys), Prague, Czech Republic, 2013

  8. [8]

    http://tomcat.apache.org

    Apache Tomcat. http://tomcat.apache.org. Retrieved March 2019

  9. [9]

    Barnett, B

    M. Barnett, B. Chandramouli, R. DeLine, S. Drucker, D. Fisher, J. Goldstein, P. Morrison, and J. Platt. Stat!: an interactive analytics environment for big data. In ACM SIGMOD International conference on Management of data, pages 1013–1016, 2013

  10. [10]

    Battle, R

    L. Battle, R. Chang, and M. Stonebraker. Dynamic reduction of query result sets for interactive visualization. In IEEE International Conference on Big Data, pages 1–8, Oct 2013

  11. [11]

    Battle, R

    L. Battle, R. Chang, and M. Stonebraker. Dynamic prefetching of data tiles for interactive visualization. In International Conference on Management of Data (SIGMOD ’16), pages 1363–1375, 2016

  12. [12]

    Behrisch, D

    M. Behrisch, D. Streeb, F. Stoffel, D. Seebacher, B. Matejek, S. H. Weber, S. Mittelstaedt, H. Pfister, and D. Keim. Commercial visual analytics systems – advances in the big data analytics field.IEEE Transactions on Visualization and Computer Graphics, 2018

  13. [13]

    N. Bikakis. Big data visualization tools. In S. Sakr and A. Zomaya, editors, Encyclopedia of Big Data Technologies, pages 1–6. Springer International Publishing, Cham, 2018

  14. [14]

    Bikakis, G

    N. Bikakis, G. Papastefanatos, M. Skourla, and T. Sellis. A hierarchical aggregation framework for efficient multilevel visual exploration and analysis. Semantic Web, 8(1):139–179, 2017

  15. [15]

    Bostock, V

    M. Bostock, V . Ogievetsky, and J. Heer. D3: Data-driven documents. IEEE Trans. Visualization and Comp. Graphics (Proc. InfoVis), 2011

  16. [16]

    M. Brown. BigSheets for the common man. https://www.ibm.com/developerworks/library/bd-bigsheets/index.html, December 2013

  17. [17]

    Budiu, P

    M. Budiu, P. Gopalan, L. Suresh, U. Wieder, H. Kruiger, and M. K. Aguilera. Hillview: A trillion-cell spreadsheet for big data (extended version). http://github.com/vmware/hillview/tree/master/docs/paper.pdf, 2019

  18. [18]

    Budiu, R

    M. Budiu, R. Isaacs, D. Murray, G. Plotkin, P. Barham, S. Al-Kiswany, Y . Boshmaf, Q. Luo, and A. Andoni. Interacting with large distributed datasets using Sketch. In Eurographics Symposium on Parallel Graphics and Visualization, Groningen, Netherlands, June 6-7 2016

  19. [19]

    Chaudhuri, G

    S. Chaudhuri, G. Das, and V . Narasayya. A robust, optimization-based approach for approximate answering of aggregate queries. In ACM SIGMOD International conference on Management of data, pages 295–306, 2001

  20. [20]

    J. Choo, C. Lee, H. Kim, H. Lee, C. Reddy, B. Drake, and H. Park. PIVE: Per-iteration visualization environment for supporting real-time interactions with computational methods. In Visual Analytics Science and Technology (VAST), 2014

  21. [21]

    Christopher and V

    R. Christopher and V . Krishnan. Optimizing your Amazon Redshift and Tableau software deployment for better performance v2. https://www.tableau.com/sites/default/files/ whitepapers/optimizing tableau aws redshift whitepaper v2.pdf, 2017

  22. [22]

    L. Chu, H. Tang, T. Yang, and K. Shen. Optimizing data aggregation for cluster-based Internet services. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 119–130, 2003

  23. [23]

    Cohen and H

    E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In ACM Symposium on Principles of Distributed Computing (PODC), pages 225–234, New York, NY , USA,

  24. [24]

    Condie, N

    T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce online. InUSENIX Conference on Networked Systems Design and Implementation (NSDI), 2010

  25. [25]

    G. Cormode. Data sketching. Communications of the ACM, 60(9):48–55, Aug. 2017

  26. [26]

    Crotty, A

    A. Crotty, A. Galakatos, K. Dursun, T. Kraska, C. Binnig, U. Cetintemel, and S. Zdonik. An architecture for compiling UDF-centric workflows. PVLDB, 8(12):1466–1477, Aug. 2015

  27. [27]

    Crotty, A

    A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. Vizdom: Interactive analytics through pen and touch. PVLDB, 8(12):2024–2027, Aug. 2015

  28. [28]

    Crotty, A

    A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. The case for interactive data exploration accelerators (IDEAs). In Human-In-the-Loop Data Analytics (HILDA), pages 11:1–11:6, 2016

  29. [29]

    Dahlström, P

    E. Dahlström, P. Dengler, A. Grasso, C. Lilley, C. McCormack, D. Schepers, J. Watt, J. Ferraiolo, F. Jun, and D. Jackson. Scalable vector graphics (SVG) 1.1. https://www.w3.org/TR/SVG/, August 2011

  30. [30]

    de Jonge

    K. de Jonge. DirectQuery in SQL server 2016 analysis services. http://download.microsoft.com/download/F/6/F/ F6FBC1FC-F956-49A1-80CD-2941C3B6E417/DirectQuery%20in% 20Analysis%20Services%20-%20Whitepaper.pdf, January 2017

  31. [31]

    Dean and S

    J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, December 2004

  32. [32]

    Demiralp, P

    Ç. Demiralp, P. J. Haas, S. Parthasarathy, and T. Pedapati. Foresight: Recommending visual insights. PVLDB, 10(12):1937–1940, 2017

  33. [33]

    B. Ding, S. Huang, S. Chaudhuri, K. Chakrabarti, and C. Wang. Sample + seek: Approximating aggregates with distribution precision guarantee. In ACM SIGMOD International conference on Management of data, pages 679–694, 2016

  34. [34]

    Dix and G

    A. Dix and G. Ellis. by chance: enhancing interaction with large data sets through statistical sampling. In Advanced Visual Interfaces, pages 167–176, 2002

  35. [35]

    El-Hindi, Z

    M. El-Hindi, Z. Zhao, C. Binnig, and T. Kraska. VisTrees: fast indexes for interactive data exploration. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA), page 5, 2016

  36. [36]

    Eldawy, M

    A. Eldawy, M. F. Mokbel, and C. Jonathan. HadoopViz: A MapReduce framework for extensible visualization of big spatial data. In International Conference on Data Engineering (ICDE), Helsinki, Finland, May 2016

  37. [37]

    Elmqvist and J

    N. Elmqvist and J. Fekete. Hierarchical aggregation for information visualization: Overview, techniques, and design guidelines. IEEE Transactions on Visualization and Computer Graphics, 16(3):439–454, May 2010

  38. [38]

    http://fastutil.di.unimi.it

    :::fastutil: Fast and compact type-specific collections for Java. http://fastutil.di.unimi.it. Retrieved October 2017

  39. [39]

    Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis

    J.-D. Fekete and R. Primet. Progressive analytics: A computation paradigm for exploratory data analysis. https://arxiv.org/abs/1607.05162, 2016

  40. [40]

    Feldman, S

    J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. ACM Trans. Algorithms, 6(4):66:1–66:19, 2010

  41. [41]

    Fette and A

    I. Fette and A. Melnikov. The WebSocket protocol. IETF RFC 6455, December 2001

  42. [42]

    D. Fisher. Big data exploration requires collaboration between visualization and data infrastructures. In Human-In-the-Loop Data Analytics (HILDA), pages 16:1–16:5, 2016

  43. [43]

    Fisher, I

    D. Fisher, I. Popov, S. Drucker, and M. Schraefel. Trust me, I’m partially right: Incremental visualization lets analysts explore large datasets faster. In SIGCHI Conference on Human Factors in Computing Systems, pages 1673–1682, 2012

  44. [44]

    Flajolet, Éric Fusy, O

    P. Flajolet, Éric Fusy, O. Gandouet, and F. Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms (AofA) DMTCS proc., pages 127–146, 2007

  45. [45]

    Ghosh, M

    A. Ghosh, M. Nashaat, J. Miller, S. Quader, and C. Marston. A comprehensive review of tools for exploratory analysis of tabular industrial datasets. Visual Informatics, 2018

  46. [46]

    Godfrey, J

    P. Godfrey, J. Gryz, and P. Lasek. Interactive visualization of large data sets. IEEE Transactions on Knowledge and Data Engineering, 28(8):2142–2157, 2016

  47. [47]

    Godfrey, J

    P. Godfrey, J. Gryz, P. Lasek, and N. Razavi. Visualization through inductive aggregation. In International Conference on Extending Database Technology (EDBT), pages 600–603, 2016

  48. [48]

    https://grpc.io/

    gRPC: A high performance, open-source universal RPC framework. https://grpc.io/. Retrieved October 2017

  49. [49]

    A. Hall, O. Bachmann, R. Büssow, S. G ˘anceanu, and M. Nunkesser. Processing a trillion cells per mouse click. PVLDB, 5(11):1436–1446, July 2012

  50. [50]

    Hausenblas and J

    M. Hausenblas and J. Nadeau. Apache Drill: Interactive ad-hoc analysis at scale. IEEE Comput. Graph. Appl., 1(2), June 2013

  51. [51]

    J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In ACM SIGMOD International conference on Management of data, pages 171–182, 1997

  52. [52]

    J. F. Hughes, A. van Dam, M. McGuide, D. F. Sklar, J. D. Foley, S. K. Feiner, and K. Akeley.Computer Graphics: Principles and Practice (3rd Edition). Addison-Wesley Professional, 2013

  53. [53]

    J.-F. Im, K. Gopalakrishna, S. Subramaniam, M. Shrivastava, A. Tumbde, X. Jiang, J. Dai, S. Lee, N. Pawar, J. Li, and R. Aringunram. Pinot: Realtime OLAP for 530 million users. In International Conference on Management of Data (SIGMOD), pages 583–594, 2018

  54. [54]

    J.-F. Im, F. G. Villegas, and M. J. McGuffin. VisReduce: Fast and responsive incremental information visualization of large datasets. In IEEE International Conference on Big Data, pages 25–32, Oct 2013

  55. [55]

    J. Jo, W. Kim, S. Yoo, B. Kim, and J. Seo. SwiftTuna: Incrementally exploring large-scale multidimensional data. In IEEE VIS, Phoenix, AZ, October 2016

  56. [56]

    J. Jo, W. Kim, S. Yoo, B. Kim, and J. Seo. SwiftTuna: Responsive and incremental visual exploration of large-scale multidimensional data. In Pacific Visualization Symposium (PacificVis), pages 131–140, Seoul, Korea, 2017

  57. [57]

    Jugel, Z

    U. Jugel, Z. Jerzak, G. Hackenbroich, and V . Markl. M4: A visualization-oriented time series data aggregation. PVLDB, 7(10):797–808, June 2014

  58. [58]

    Jugel, Z

    U. Jugel, Z. Jerzak, G. Hackenbroich, and V . Markl. VDDA: Automatic visualization-driven data aggregation in relational databases. The VLDB Journal, 25(1):53–77, Feb. 2016

  59. [59]

    Kamat, P

    N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi. Distributed interactive cube exploration. In International Conference on Data Engineering (ICDE), pages 472–483, March 2014

  60. [60]

    Kamat and A

    N. Kamat and A. Nandi. A session-based approach to fast-but-approximate interactive data cube exploration. ACM Trans. Knowl. Discov. Data, 12(1):1–26, Feb. 2018

  61. [61]

    Kandel, R

    S. Kandel, R. Parikh, A. Paepcke, J. Hellerstein, and J. Heer. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Advanced Visual Interfaces, 2012

  62. [62]

    A. Kim, E. Blais, A. Parameswaran, P. Indyk, S. Madden, and R. Rubinfeld. Rapid sampling for visualizations with ordering guarantees. PVLDB, 8(5):521–532, Jan. 2015

  63. [63]

    A. Kim, L. Xu, T. Siddiqui, S. Huang, S. Madden, and A. Parameswaran. Optimally leveraging density and locality for exploratory browsing and sampling. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA 18), HILDA, pages 7:1–7:7, 2018

  64. [64]

    Kornacker, A

    M. Kornacker, A. Behm, V . Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for Hadoop. In Conference on Innovative Data Sys...

  65. [65]

    Laptev, K

    N. Laptev, K. Zeng, and C. Zaniolo. Early accurate results for advanced analytics on MapReduce. PVLDB, 5(10):1028–1039, June 2012

  66. [66]

    L. Lins, J. T. Klosowski, and C. Scheidegger. Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Transactions on Visualization and Computer Graphics, 19(12):2456–2465, 2013

  67. [67]

    Z. Liu, B. Jiang, and J. Heer. imMens: Real-time visual querying of big data. Computer Graphics Forum (Proc. EuroVis), 32, 2013

  68. [68]

    E. Meijer. Your mouse is a database. ACM Queue, 10(3):20–33, Mar. 2012

  69. [69]

    Melnik, A

    S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive analysis of web-scale datasets. PVLDB, 3(1-2):330–339, Sept. 2010

  70. [70]

    Microsoft Corp. Tempe. http://research.microsoft.com/en-us/projects/tempe/. Retrieved January 2019

  71. [71]

    https://powerbi.microsoft.com

    Microsoft PowerBI. https://powerbi.microsoft.com. Accessed October 2017

  72. [72]

    Misra and D

    J. Misra and D. Gries. Finding repeated elements. Science of Computer Programming, 2:143–152, 1982

  73. [73]

    Moritz, D

    D. Moritz, D. Fisher, B. Ding, and C. Wang. Trust, but verify: Optimistic visualizations of approximate queries for exploring big data. In ACM Human Factors in Computing Systems (CHI), 2017

  74. [74]

    Muthukrishnan

    S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and trends in theoretical computer science. Now Publishers, 2005

  75. [75]

    U. D. of Transportation. Airline on-time performance data. https://transtats.bts.gov/Tables.asp?DB ID=120. Retrieved January 2019

  76. [76]

    https://www.omnisci.com, Retrieved October 2018

    OmniSci is the extreme analytics platform. https://www.omnisci.com, Retrieved October 2018

  77. [77]

    Project Nashorn

    Oracle Corp. Project Nashorn. http://openjdk.java.net/projects/nashorn/. Retrieved February 2018

  78. [78]

    C. A. L. Pahins, S. A. Stephens, C. Scheidegger, and J. L. D. Comba. Hashedcubes: Simple, low memory, real-time visual exploration of big data. IEEE Transactions on Visualization and Computer Graphics, 23(1):671–680, 2017

  79. [79]

    Pansare, V

    N. Pansare, V . R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large MapReduce jobs. In PVLDB, Seattle, W A, August 2011

  80. [80]

    Y . Park, M. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. In International Conference on Data Engineering (ICDE), pages 755–766. IEEE, 2016

Showing first 80 references.