pith. sign in

arxiv: 1907.06946 · v1 · pith:3A6XU7BTnew · submitted 2019-07-16 · 💻 cs.DB

A Subjective Interestingness measure for Business Intelligence explorations

Pith reviewed 2026-05-24 20:38 UTC · model grok-4.3

classification 💻 cs.DB
keywords subjective interestingnessbusiness intelligencedata explorationrandom walkuser beliefmultidimensional queriesdata cube
0
0 comments X

The pith

A random walk over past BI queries infers user beliefs to measure subjective interestingness of new queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses defining a subjective interestingness measure for business intelligence explorations by modeling the user's prior belief. It proposes inferring this belief automatically from the user's past interactions with a data cube, the cube schema, and other users' activities, expressed as a probability distribution over query parts learned via random walk. This belief distribution then serves as the basis for scoring how interesting a multidimensional query is to the user. Experiments on simulated and real explorations validate that the measure aligns with user behaviors and that query parts are a good proxy for belief inference.

Core claim

The central discovery is that user belief can be modeled as a probability distribution over all potentially accessible query parts, learned through a random walk on past interactions, and this model enables a subjective interestingness measure for multidimensional queries without requiring direct user input on their beliefs.

What carries the argument

A random walk that learns a probability distribution over query parts from past user interactions over a data cube.

Load-bearing premise

Past interactions over a data cube combined with a random walk can accurately infer the degree of belief the user holds in each element of their knowledge without any direct user input.

What would settle it

An experiment where users rate the interestingness of queries and the measure's scores are compared to those ratings for correlation; low correlation would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.06946 by Alexandre Chanson, Ben Crulis, Nicolas Labroche, Patrick Marcel.

Figure 1
Figure 1. Figure 1: Toy SSB benchmark session Example. Consider the exploration over the schema of the Star Schema Bench￾mark [23], consisting of 3 queries, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Envisioned use of belief and subjective interestingness measures in data exploration [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Aligned with De Bie’s framework, query parts can be seen as restrictions to the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Exploration templates in CubeLoad (from [25]): , [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of probabilities computed by our model for all 4 user profiles when [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of probabilities computed by our model for all 4 user profiles when [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of probabilities over the first (and most used) cube of the DOPAN [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Cumulated number of unique query parts by CubeLoad template for each query [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Subjective Interestingness for each cubeload profile [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Cumulated number of unique query parts by skill for each query index in the [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Average and confidence interval of SI by skill for each query index in the explo [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
read the original abstract

This paper addresses the problem of defining a subjective interestingness measure for BI exploration. Such a measure involves prior modeling of the belief of the user. The complexity of this problem lies in the impossibility to ask the user about the degree of belief in each element composing their knowledge prior to the writing of a query. To this aim, we propose to automatically infer this user belief based on the user's past interactions over a data cube, the cube schema and other users past activities. We express the belief under the form of a probability distribution over all the query parts potentially accessible to the user, and use a random walk to learn this distribution. This belief is then used to define a first Subjective Interestingness measure over multidimensional queries. Experiments conducted on simulated and real explorations show how this new subjective interestingness measure relates to prototypical and real user behaviors, and that query parts offer a reasonable proxy to infer user belief.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper addresses defining a subjective interestingness measure for BI explorations. It proposes automatically inferring a user's belief as a probability distribution over query parts via a random walk on the user's past interactions with a data cube, the cube schema, and other users' activities (without direct user input on beliefs). This distribution is then used to define the interestingness measure. Experiments on simulated and real explorations are claimed to show that the measure relates to prototypical and real user behaviors and that query parts are a reasonable proxy for inferring belief.

Significance. If the central claim holds, the work offers a practical approach to subjective interestingness in multidimensional data exploration by leveraging existing interaction logs and schema information. The random-walk formulation on combined user/schema/other-user data is a concrete technical contribution that could inform BI tool design. However, its significance hinges on whether the inferred beliefs are shown to be more than post-hoc correlates of activity patterns.

major comments (2)
  1. [Abstract/Experiments] Abstract and Experiments section: The load-bearing claim is that the random walk produces a distribution that 'accurately reflects' subjective belief and serves as a 'reasonable proxy.' No independent ground-truth comparison (e.g., direct elicitation of user belief degrees for held-out query parts) is described; the reported experiments relate the measure to behaviors but do not isolate whether the inferred distribution matches actual mental models or simply reproduces activity statistics.
  2. [Method] Method (belief inference): The belief distribution is learned from the same class of past interactions to which the interestingness measure is later applied. While the inclusion of cube schema and other users' activities supplies partial external signal, the paper must demonstrate that this does not introduce circular dependence that makes the interestingness score tautological with respect to the target user's own history.
minor comments (2)
  1. [Notation/Method] Clarify the precise definition of 'query parts' and the normalization of the learned probability distribution (e.g., how the random-walk stationary distribution is mapped to [0,1] beliefs).
  2. [Experiments] The abstract states that experiments 'support the claims,' but the manuscript should include quantitative metrics (e.g., correlation coefficients or ablation results) rather than qualitative descriptions of relation to behaviors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, clarifying the role of behavioral validation and the external signals in the belief model while noting where revisions will strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract/Experiments] Abstract and Experiments section: The load-bearing claim is that the random walk produces a distribution that 'accurately reflects' subjective belief and serves as a 'reasonable proxy.' No independent ground-truth comparison (e.g., direct elicitation of user belief degrees for held-out query parts) is described; the reported experiments relate the measure to behaviors but do not isolate whether the inferred distribution matches actual mental models or simply reproduces activity statistics.

    Authors: We agree that the paper does not include direct elicitation of belief degrees from users. Such elicitation is impractical for every query part and was outside the scope of the work. The experiments instead demonstrate that the resulting interestingness scores align with both prototypical simulated behaviors and observed real-user exploration patterns (Section 5), thereby supporting the claim that query parts constitute a reasonable proxy. We will revise the Experiments and Discussion sections to explicitly acknowledge the lack of independent ground-truth validation and to articulate why behavioral correlation constitutes supporting evidence for the proxy approach. revision: partial

  2. Referee: [Method] Method (belief inference): The belief distribution is learned from the same class of past interactions to which the interestingness measure is later applied. While the inclusion of cube schema and other users' activities supplies partial external signal, the paper must demonstrate that this does not introduce circular dependence that makes the interestingness score tautological with respect to the target user's own history.

    Authors: The random-walk model combines the target user's history with the cube schema (a static external structure) and other users' activities (cross-user signal). The interestingness score is then computed as the divergence between a new query and the resulting belief distribution, allowing it to flag departures from the inferred prior even when the prior incorporates past activity. We will add a clarifying paragraph in the Method section that explicitly separates the sources of the belief distribution and discusses why the formulation avoids direct tautology with the target user's history alone. revision: partial

Circularity Check

0 steps flagged

No significant circularity; inference and validation are independent of the target measure

full rationale

The derivation infers a belief distribution via random walk over past cube interactions, schema, and other users' activities, then defines the subjective interestingness measure from that distribution. Experiments on simulated and real explorations test the measure against user behaviors, providing external grounding. No step reduces by construction to its inputs (no self-definitional equations, no fitted parameter renamed as prediction, no load-bearing self-citation). The approach is self-contained against benchmarks and receives a normal non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two modeling choices that are introduced without independent evidence in the abstract: representing belief as a probability distribution and learning it via random walk. No free parameters or invented entities are explicitly named.

axioms (2)
  • domain assumption User belief about query parts can be represented as a probability distribution over all potentially accessible query parts
    Directly stated in the abstract as the form chosen for the belief model.
  • ad hoc to paper A random walk on past interactions, cube schema, and other users' activities can learn this belief distribution
    The inference technique proposed in the abstract without further justification.

pith-pipeline@v0.9.0 · 5682 in / 1410 out tokens · 23797 ms · 2026-05-24T20:38:29.680545+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Julien Aligon, Enrico Gallinucci, Matteo Golfarelli, Patrick Marcel, and Stefano Rizzi. 2015. A collaborative filtering approach for recommending OLAP sessions. DSS 69 (2015), 20–30

  2. [2]

    S. Alvarez. 2003. Chi-squared computation for association rules: pre- liminary results . Technical Report BC-CS-2003-01. Computer Science Dept. Boston College, Chestnut Hill, MA 02467 USA. 11 pages. http: //www.cs.bc.edu/~alvarez/ChiSquare/chi2tr.pdf

  3. [3]

    Tijl De Bie. 2011. An information theoretic framework for data mining. In KDD. ACM, 564–572

  4. [4]

    Tijl De Bie. 2013. Subjective Interestingness in Exploratory Data Mining. In Advances in Intelligent Data Analysis XII - 12th International Sympo- sium, IDA 2013, London, UK, October 17-19, 2013. Proceedings . 19–31. https://doi.org/10.1007/978-3-642-41398-8_3

  5. [5]

    2014 (accessed on December 2018)

    Tijl De Bie. 2014 (accessed on December 2018). The Science of Finding Interesting Patterns in Data . http://www.interesting-patterns.net/ forsied/

  6. [6]

    Tijl De Bie. 2018. An information-theoretic framework for data exploration. From Itemsets to embeddings, from interestingness to privacy. In Keynote presentation given at IDEA’18 @ the KDD’18 conference. http://www.interesting-patterns.net/forsied/ keynote-presentation-given-at-idea18-the-kdd18-conference/ 31

  7. [7]

    Tom Brijs, Koen Vanhoof, and Geert Wets. 2004. Defining Interesting- ness for Association Rules. International Journal ”Information Theories & Applications” 10 (2004), 370–375

  8. [8]

    Sergey Brin and Lawrence Page. 2012. Reprint of: The anatomy of a large- scale hypertextual web search engine. Computer Networks 56, 18 (2012), 3825–3833. https://doi.org/10.1016/j.comnet.2012.10.007

  9. [9]

    V´ eronique Cariou, J´ erˆ ome Cubill´ e, Christian Derquenne, Sabine Goutier, Fran¸ coise Guisnel, and Henri Klajnmic. 2009. Embedded indicators to facilitate the exploration of a data cube. IJBIDM 4, 3/4 (2009), 329–349. https://doi.org/10.1504/IJBIDM.2009.029083

  10. [10]

    Alexandre Chanson, Ben Crulis, Krista Drushku, Nicolas Labroche, and Patrick Marcel. 2019. Profiling User Belief in BI Exploration for Measuring Subjective Interestingness. In Proceedings of the 21st Inter- national Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, co-located with EDBT/ICDT Joint Conference, DOLAP@EDBT/ICDT...

  11. [11]

    Mahfoud Djedaini, Krista Drushku, Nicolas Labroche, Patrick Marcel, Ver´ onika Peralta, and Willeme Verdeaux. 2019. Automatic assessment of interactive OLAP explorations. Information Systems 82 (2019), 148–163

  12. [12]

    Mahfoud Djedaini, Nicolas Labroche, Patrick Marcel, and Ver´ onika Peralta

  13. [13]

    In ADBIS

    Detecting User Focus in OLAP Analyses. In ADBIS. 105–119

  14. [14]

    Krista Drushku, Julien Aligon, Nicolas Labroche, Patrick Marcel, Ver´ onika Peralta, and Bruno Dumant. 2017. User Interests Clustering in Business Intelligence Interactions. In Advanced Information Systems Engineering - 29th International Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings. 144–158

  15. [15]

    Fabris and Alex Alves Freitas

    Carem C. Fabris and Alex Alves Freitas. 2001. Incorporating Deviation- Detection Functionality into the OLAP Paradigm. In XVI Simp´ osio Brasileiro de Banco de Dados, 1-3 Outubro 2001, Rio de Janeiro, Brasil, Anais/Proceedings.274–285

  16. [16]

    Hamilton

    Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. ACM Comput. Surv. 38, 3 (2006), 9

  17. [17]

    Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, and Ed La- zowska. 2016. SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment. In Proceedings of the 2016 International Conference on Man- agement of Data (SIGMOD ’16) . ACM, New York, NY, USA, 281–293. https://doi.org/10.1145/2882903.2882957 32

  18. [18]

    Marius Kaminskas and Derek Bridge. 2017. Diversity, Serendipity, Nov- elty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems. TiiS 7, 1 (2017), 2:1–2:42

  19. [19]

    Klemettinen, H

    M. Klemettinen, H. Mannila, and H. Toivonen. 1999. Interactive explo- ration of interesting findings in the Telecommunication Network Alarm Sequence Analyzer (TASA). Information and Software Technology 41, 9 (1999), 557 – 567

  20. [20]

    Kleanthis-Nikolaos Kontonasios and Tijl De Bie. 2015. Subjectively in- teresting alternative clusterings. Machine Learning 98, 1-2 (2015), 31–56. https://doi.org/10.1007/s10994-013-5333-z

  21. [21]

    Navin Kumar, Aryya Gangopadhyay, Sanjay Bapna, George Karabatis, and Zhiyuan Chen. 2008. Measuring interestingness of discovered skewed patterns in data cubes. Decision Support Systems 46, 1 (2008), 429 – 439

  22. [22]

    Jefrey Lijffijt, Bo Kang, Wouter Duivesteijn, Kai Puolam¨ aki, Emilia Oikari- nen, and Tijl De Bie. 2018. Subjectively Interesting Subgroup Discovery on Real-Valued Targets. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018 . 1352–1355. https://doi.org/10.1109/ICDE.2018.00148

  23. [23]

    Wanyu Liu, Rafael Lucas D’Oliveira, Michel Beaudouin-Lafon, and Olivier Rioul. 2017. BIGnav: Bayesian Information Gain for Guiding Multiscale Navigation. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06-11, 2017. 5869–5880. https://doi.org/10.1145/3025453.3025524

  24. [24]

    O’Neil, Elizabeth J

    Patrick E. O’Neil, Elizabeth J. O’Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In Performance Evaluation and Benchmarking, First TPC Technology Conference, TPCTC 2009, Lyon, France, August 24- 28, 2009, Revised Selected Papers . 237–252. https://doi.org/10.1007/ 978-3-642-10424-4_17

  25. [25]

    Kai Puolam¨ aki, Emilia Oikarinen, Bo Kang, Jefrey Lijffijt, and Tijl De Bie

  26. [26]

    In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018

    Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018 . 1208–

  27. [27]

    https://doi.org/10.1109/ICDE.2018.00112

  28. [28]

    Stefano Rizzi and Enrico Gallinucci. 2014. CubeLoad: A Parametric Gen- erator of Realistic OLAP Workloads. In Advanced Information Systems Engineering - 26th International Conference, CAiSE 2014, Thessaloniki, Greece, June 16-20, 2014. Proceedings . 610–624. https://doi.org/10. 1007/978-3-319-07881-6_41 33

  29. [29]

    Carsten Sapia. 2000. PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems. In DaWaK. 224–233

  30. [30]

    Sunita Sarawagi. 2000. User-Adaptive Exploration of Multidimensional Data. In VLDB. Morgan Kaufmann, 307–316

  31. [31]

    Sunita Sarawagi. 2001. User-cognizant multidimensional analysis. VLDB J. 10, 2-3 (2001), 224–239. https://doi.org/10.1007/s007780100046

  32. [32]

    Abraham Silberschatz and Alexander Tuzhilin. 1995. On Subjective Mea- sures of Interestingness in Knowledge Discovery. In Proceedings of the First International Conference on Knowledge Discovery and Data Min- ing (KDD-95), Montreal, Canada, August 20-21, 1995 . 275–281. http: //www.aaai.org/Library/KDD/1995/kdd95-032.php

  33. [33]

    Matthijs van Leeuwen, Tijl De Bie, Eirini Spyropoulou, and C´ edric Mesnage. 2016. Subjective interestingness of subgraph patterns. Ma- chine Learning 105, 1 (2016), 41–75. https://doi.org/10.1007/ s10994-015-5539-3

  34. [34]

    Panos Vassiliadis and Patrick Marcel. 2018. The Road to Highlights is Paved with Good Intentions: Envisioning a Paradigm Shift in OLAP Mod- eling. In Proceedings of the 20th International Workshop on Design, Opti- mization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference (EDBT/ICDT 2018), Vienna, Aus- tria, ...