A Subjective Interestingness measure for Business Intelligence explorations
Pith reviewed 2026-05-24 20:38 UTC · model grok-4.3
The pith
A random walk over past BI queries infers user beliefs to measure subjective interestingness of new queries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that user belief can be modeled as a probability distribution over all potentially accessible query parts, learned through a random walk on past interactions, and this model enables a subjective interestingness measure for multidimensional queries without requiring direct user input on their beliefs.
What carries the argument
A random walk that learns a probability distribution over query parts from past user interactions over a data cube.
Load-bearing premise
Past interactions over a data cube combined with a random walk can accurately infer the degree of belief the user holds in each element of their knowledge without any direct user input.
What would settle it
An experiment where users rate the interestingness of queries and the measure's scores are compared to those ratings for correlation; low correlation would falsify the claim.
Figures
read the original abstract
This paper addresses the problem of defining a subjective interestingness measure for BI exploration. Such a measure involves prior modeling of the belief of the user. The complexity of this problem lies in the impossibility to ask the user about the degree of belief in each element composing their knowledge prior to the writing of a query. To this aim, we propose to automatically infer this user belief based on the user's past interactions over a data cube, the cube schema and other users past activities. We express the belief under the form of a probability distribution over all the query parts potentially accessible to the user, and use a random walk to learn this distribution. This belief is then used to define a first Subjective Interestingness measure over multidimensional queries. Experiments conducted on simulated and real explorations show how this new subjective interestingness measure relates to prototypical and real user behaviors, and that query parts offer a reasonable proxy to infer user belief.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper addresses defining a subjective interestingness measure for BI explorations. It proposes automatically inferring a user's belief as a probability distribution over query parts via a random walk on the user's past interactions with a data cube, the cube schema, and other users' activities (without direct user input on beliefs). This distribution is then used to define the interestingness measure. Experiments on simulated and real explorations are claimed to show that the measure relates to prototypical and real user behaviors and that query parts are a reasonable proxy for inferring belief.
Significance. If the central claim holds, the work offers a practical approach to subjective interestingness in multidimensional data exploration by leveraging existing interaction logs and schema information. The random-walk formulation on combined user/schema/other-user data is a concrete technical contribution that could inform BI tool design. However, its significance hinges on whether the inferred beliefs are shown to be more than post-hoc correlates of activity patterns.
major comments (2)
- [Abstract/Experiments] Abstract and Experiments section: The load-bearing claim is that the random walk produces a distribution that 'accurately reflects' subjective belief and serves as a 'reasonable proxy.' No independent ground-truth comparison (e.g., direct elicitation of user belief degrees for held-out query parts) is described; the reported experiments relate the measure to behaviors but do not isolate whether the inferred distribution matches actual mental models or simply reproduces activity statistics.
- [Method] Method (belief inference): The belief distribution is learned from the same class of past interactions to which the interestingness measure is later applied. While the inclusion of cube schema and other users' activities supplies partial external signal, the paper must demonstrate that this does not introduce circular dependence that makes the interestingness score tautological with respect to the target user's own history.
minor comments (2)
- [Notation/Method] Clarify the precise definition of 'query parts' and the normalization of the learned probability distribution (e.g., how the random-walk stationary distribution is mapped to [0,1] beliefs).
- [Experiments] The abstract states that experiments 'support the claims,' but the manuscript should include quantitative metrics (e.g., correlation coefficients or ablation results) rather than qualitative descriptions of relation to behaviors.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below, clarifying the role of behavioral validation and the external signals in the belief model while noting where revisions will strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract/Experiments] Abstract and Experiments section: The load-bearing claim is that the random walk produces a distribution that 'accurately reflects' subjective belief and serves as a 'reasonable proxy.' No independent ground-truth comparison (e.g., direct elicitation of user belief degrees for held-out query parts) is described; the reported experiments relate the measure to behaviors but do not isolate whether the inferred distribution matches actual mental models or simply reproduces activity statistics.
Authors: We agree that the paper does not include direct elicitation of belief degrees from users. Such elicitation is impractical for every query part and was outside the scope of the work. The experiments instead demonstrate that the resulting interestingness scores align with both prototypical simulated behaviors and observed real-user exploration patterns (Section 5), thereby supporting the claim that query parts constitute a reasonable proxy. We will revise the Experiments and Discussion sections to explicitly acknowledge the lack of independent ground-truth validation and to articulate why behavioral correlation constitutes supporting evidence for the proxy approach. revision: partial
-
Referee: [Method] Method (belief inference): The belief distribution is learned from the same class of past interactions to which the interestingness measure is later applied. While the inclusion of cube schema and other users' activities supplies partial external signal, the paper must demonstrate that this does not introduce circular dependence that makes the interestingness score tautological with respect to the target user's own history.
Authors: The random-walk model combines the target user's history with the cube schema (a static external structure) and other users' activities (cross-user signal). The interestingness score is then computed as the divergence between a new query and the resulting belief distribution, allowing it to flag departures from the inferred prior even when the prior incorporates past activity. We will add a clarifying paragraph in the Method section that explicitly separates the sources of the belief distribution and discusses why the formulation avoids direct tautology with the target user's history alone. revision: partial
Circularity Check
No significant circularity; inference and validation are independent of the target measure
full rationale
The derivation infers a belief distribution via random walk over past cube interactions, schema, and other users' activities, then defines the subjective interestingness measure from that distribution. Experiments on simulated and real explorations test the measure against user behaviors, providing external grounding. No step reduces by construction to its inputs (no self-definitional equations, no fitted parameter renamed as prediction, no load-bearing self-citation). The approach is self-contained against benchmarks and receives a normal non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption User belief about query parts can be represented as a probability distribution over all potentially accessible query parts
- ad hoc to paper A random walk on past interactions, cube schema, and other users' activities can learn this belief distribution
Reference graph
Works this paper leans on
-
[1]
Julien Aligon, Enrico Gallinucci, Matteo Golfarelli, Patrick Marcel, and Stefano Rizzi. 2015. A collaborative filtering approach for recommending OLAP sessions. DSS 69 (2015), 20–30
work page 2015
-
[2]
S. Alvarez. 2003. Chi-squared computation for association rules: pre- liminary results . Technical Report BC-CS-2003-01. Computer Science Dept. Boston College, Chestnut Hill, MA 02467 USA. 11 pages. http: //www.cs.bc.edu/~alvarez/ChiSquare/chi2tr.pdf
work page 2003
-
[3]
Tijl De Bie. 2011. An information theoretic framework for data mining. In KDD. ACM, 564–572
work page 2011
-
[4]
Tijl De Bie. 2013. Subjective Interestingness in Exploratory Data Mining. In Advances in Intelligent Data Analysis XII - 12th International Sympo- sium, IDA 2013, London, UK, October 17-19, 2013. Proceedings . 19–31. https://doi.org/10.1007/978-3-642-41398-8_3
-
[5]
2014 (accessed on December 2018)
Tijl De Bie. 2014 (accessed on December 2018). The Science of Finding Interesting Patterns in Data . http://www.interesting-patterns.net/ forsied/
work page 2014
-
[6]
Tijl De Bie. 2018. An information-theoretic framework for data exploration. From Itemsets to embeddings, from interestingness to privacy. In Keynote presentation given at IDEA’18 @ the KDD’18 conference. http://www.interesting-patterns.net/forsied/ keynote-presentation-given-at-idea18-the-kdd18-conference/ 31
work page 2018
-
[7]
Tom Brijs, Koen Vanhoof, and Geert Wets. 2004. Defining Interesting- ness for Association Rules. International Journal ”Information Theories & Applications” 10 (2004), 370–375
work page 2004
-
[8]
Sergey Brin and Lawrence Page. 2012. Reprint of: The anatomy of a large- scale hypertextual web search engine. Computer Networks 56, 18 (2012), 3825–3833. https://doi.org/10.1016/j.comnet.2012.10.007
-
[9]
V´ eronique Cariou, J´ erˆ ome Cubill´ e, Christian Derquenne, Sabine Goutier, Fran¸ coise Guisnel, and Henri Klajnmic. 2009. Embedded indicators to facilitate the exploration of a data cube. IJBIDM 4, 3/4 (2009), 329–349. https://doi.org/10.1504/IJBIDM.2009.029083
-
[10]
Alexandre Chanson, Ben Crulis, Krista Drushku, Nicolas Labroche, and Patrick Marcel. 2019. Profiling User Belief in BI Exploration for Measuring Subjective Interestingness. In Proceedings of the 21st Inter- national Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, co-located with EDBT/ICDT Joint Conference, DOLAP@EDBT/ICDT...
work page 2019
-
[11]
Mahfoud Djedaini, Krista Drushku, Nicolas Labroche, Patrick Marcel, Ver´ onika Peralta, and Willeme Verdeaux. 2019. Automatic assessment of interactive OLAP explorations. Information Systems 82 (2019), 148–163
work page 2019
-
[12]
Mahfoud Djedaini, Nicolas Labroche, Patrick Marcel, and Ver´ onika Peralta
- [13]
-
[14]
Krista Drushku, Julien Aligon, Nicolas Labroche, Patrick Marcel, Ver´ onika Peralta, and Bruno Dumant. 2017. User Interests Clustering in Business Intelligence Interactions. In Advanced Information Systems Engineering - 29th International Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings. 144–158
work page 2017
-
[15]
Carem C. Fabris and Alex Alves Freitas. 2001. Incorporating Deviation- Detection Functionality into the OLAP Paradigm. In XVI Simp´ osio Brasileiro de Banco de Dados, 1-3 Outubro 2001, Rio de Janeiro, Brasil, Anais/Proceedings.274–285
work page 2001
- [16]
-
[17]
Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, and Ed La- zowska. 2016. SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment. In Proceedings of the 2016 International Conference on Man- agement of Data (SIGMOD ’16) . ACM, New York, NY, USA, 281–293. https://doi.org/10.1145/2882903.2882957 32
-
[18]
Marius Kaminskas and Derek Bridge. 2017. Diversity, Serendipity, Nov- elty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems. TiiS 7, 1 (2017), 2:1–2:42
work page 2017
-
[19]
M. Klemettinen, H. Mannila, and H. Toivonen. 1999. Interactive explo- ration of interesting findings in the Telecommunication Network Alarm Sequence Analyzer (TASA). Information and Software Technology 41, 9 (1999), 557 – 567
work page 1999
-
[20]
Kleanthis-Nikolaos Kontonasios and Tijl De Bie. 2015. Subjectively in- teresting alternative clusterings. Machine Learning 98, 1-2 (2015), 31–56. https://doi.org/10.1007/s10994-013-5333-z
-
[21]
Navin Kumar, Aryya Gangopadhyay, Sanjay Bapna, George Karabatis, and Zhiyuan Chen. 2008. Measuring interestingness of discovered skewed patterns in data cubes. Decision Support Systems 46, 1 (2008), 429 – 439
work page 2008
-
[22]
Jefrey Lijffijt, Bo Kang, Wouter Duivesteijn, Kai Puolam¨ aki, Emilia Oikari- nen, and Tijl De Bie. 2018. Subjectively Interesting Subgroup Discovery on Real-Valued Targets. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018 . 1352–1355. https://doi.org/10.1109/ICDE.2018.00148
-
[23]
Wanyu Liu, Rafael Lucas D’Oliveira, Michel Beaudouin-Lafon, and Olivier Rioul. 2017. BIGnav: Bayesian Information Gain for Guiding Multiscale Navigation. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06-11, 2017. 5869–5880. https://doi.org/10.1145/3025453.3025524
-
[24]
Patrick E. O’Neil, Elizabeth J. O’Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In Performance Evaluation and Benchmarking, First TPC Technology Conference, TPCTC 2009, Lyon, France, August 24- 28, 2009, Revised Selected Papers . 237–252. https://doi.org/10.1007/ 978-3-642-10424-4_17
work page 2009
-
[25]
Kai Puolam¨ aki, Emilia Oikarinen, Bo Kang, Jefrey Lijffijt, and Tijl De Bie
-
[26]
Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018 . 1208–
work page 2018
-
[27]
https://doi.org/10.1109/ICDE.2018.00112
-
[28]
Stefano Rizzi and Enrico Gallinucci. 2014. CubeLoad: A Parametric Gen- erator of Realistic OLAP Workloads. In Advanced Information Systems Engineering - 26th International Conference, CAiSE 2014, Thessaloniki, Greece, June 16-20, 2014. Proceedings . 610–624. https://doi.org/10. 1007/978-3-319-07881-6_41 33
work page 2014
-
[29]
Carsten Sapia. 2000. PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems. In DaWaK. 224–233
work page 2000
-
[30]
Sunita Sarawagi. 2000. User-Adaptive Exploration of Multidimensional Data. In VLDB. Morgan Kaufmann, 307–316
work page 2000
-
[31]
Sunita Sarawagi. 2001. User-cognizant multidimensional analysis. VLDB J. 10, 2-3 (2001), 224–239. https://doi.org/10.1007/s007780100046
-
[32]
Abraham Silberschatz and Alexander Tuzhilin. 1995. On Subjective Mea- sures of Interestingness in Knowledge Discovery. In Proceedings of the First International Conference on Knowledge Discovery and Data Min- ing (KDD-95), Montreal, Canada, August 20-21, 1995 . 275–281. http: //www.aaai.org/Library/KDD/1995/kdd95-032.php
work page 1995
-
[33]
Matthijs van Leeuwen, Tijl De Bie, Eirini Spyropoulou, and C´ edric Mesnage. 2016. Subjective interestingness of subgraph patterns. Ma- chine Learning 105, 1 (2016), 41–75. https://doi.org/10.1007/ s10994-015-5539-3
work page 2016
-
[34]
Panos Vassiliadis and Patrick Marcel. 2018. The Road to Highlights is Paved with Good Intentions: Envisioning a Paradigm Shift in OLAP Mod- eling. In Proceedings of the 20th International Workshop on Design, Opti- mization, Languages and Analytical Processing of Big Data co-located with 10th EDBT/ICDT Joint Conference (EDBT/ICDT 2018), Vienna, Aus- tria, ...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.