pith. sign in

arxiv: 2508.17403 · v3 · submitted 2025-08-24 · 💻 cs.LG · stat.AP

Mutual Information Surprise: Rethinking Unexpectedness in Autonomous Systems

Pith reviewed 2026-05-18 21:02 UTC · model grok-4.3

classification 💻 cs.LG stat.AP
keywords Mutual Information Surpriseepistemic growthsurprise measuresautonomous systemsadaptive behaviorreaction policymutual informationlearning progression
0
0 comments X

The pith

Mutual Information Surprise redefines unexpectedness as a measure of epistemic growth to guide adaptive reactions in autonomous systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Mutual Information Surprise as an alternative to traditional surprise measures like Shannon or Bayesian surprise. It treats surprise as the effect of new observations on the mutual information between data and the system's internal model, turning it into a signal of how much the system is learning rather than a mere anomaly detector. A statistical test sequence detects when this measure warrants action, and a reaction policy uses it to adjust sampling rates and fork processes dynamically. Evaluations on synthetic tasks and a pollution map estimation problem indicate that systems following this policy maintain better stability, respond more appropriately, and achieve higher predictive accuracy than those using older surprise definitions. A sympathetic reader would care because this framing moves surprise from an immediate reaction trigger toward a tool for the system to monitor and steer its own learning trajectory.

Core claim

Mutual Information Surprise quantifies the impact of new observations on the mutual information between those observations and the system's internal model. This quantity serves as a direct indicator of epistemic growth. The paper shows that a statistical test built around this quantity can trigger a reaction policy that governs system behavior by adjusting sampling and forking processes, and that systems controlled by this policy outperform those based on classical surprise measures in stability, responsiveness, and accuracy on both synthetic domains and a dynamic pollution map estimation task.

What carries the argument

Mutual Information Surprise, which computes how new data changes the mutual information shared with the internal model and thereby signals epistemic growth to drive sampling adjustments and process forking.

If this is right

  • A system using the MIS-based reaction policy exhibits greater stability and responsiveness than systems driven by Shannon or Bayesian surprise.
  • The measure shifts surprise from a purely reactive signal to one that supports reflection on the system's own learning progression.
  • Dynamic adjustment of sampling and process forking under MIS leads to higher predictive accuracy in tasks such as pollution map estimation.
  • The approach supplies a concrete mechanism for autonomous systems to become more self-aware and adaptive in complex, changing environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the link between mutual information change and epistemic growth holds, MIS could be inserted into active learning loops to decide when to query new data based on information gain rather than prediction error alone.
  • The same quantity might help multi-agent systems coordinate by letting each agent share its current surprise level to align exploration priorities.
  • Testing MIS in reinforcement learning agents could reveal whether the reflective policy reduces unnecessary exploration in stable regimes while accelerating adaptation when the environment shifts.

Load-bearing premise

That changes in mutual information reliably indicate genuine epistemic growth and can be converted into a reaction policy that improves adaptation without creating fresh instabilities or demanding extensive tuning.

What would settle it

Run the MIS policy and a classical surprise policy side-by-side on a rapidly changing environment where the internal model must be updated continuously; if the MIS version exhibits lower stability or worse long-term accuracy than the classical version, the claim that MIS produces superior reflective adaptation would be falsified.

read the original abstract

A community of researchers appears to think that a machine can be surprised and have introduced various surprise measures, principally the Shannon Surprise and the Bayesian Surprise. The questions of what constitutes a surprise and how to react to one still elicit debates. In this work, we introduce Mutual Information Surprise (MIS), a new framework that redefines surprise not as anomaly measure, but as a signal of epistemic growth. Furthermore, we develop a statistical test sequence that could trigger a surprise reaction and propose a MIS-based reaction policy that dynamically governs system behavior through sampling adjustment and process forking. Empirical evaluations -- on both synthetic domains and a dynamic pollution map estimation task -- show that a system governed by the MIS-based reaction policy significantly outperforms those under classical surprise-based approaches in stability, responsiveness, and predictive accuracy. The important implication of our new proposal is that MIS quantifies the impact of new observations on mutual information, shifts surprise from reactive to reflective, enables reflection on learning progression, and thus offers a path toward self-aware and adaptive autonomous systems. We expect the new surprise measure to play a critical role in further advancing autonomous systems on their ability to learn and adapt in a complex and dynamic environment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Mutual Information Surprise (MIS) as a new framework that redefines surprise in autonomous systems as the impact of new observations on mutual information between data and the internal model, shifting it from a reactive anomaly measure to a reflective signal of epistemic growth. It develops a statistical test sequence to trigger reactions and a MIS-based reaction policy that dynamically adjusts sampling and performs process forking. Empirical evaluations on synthetic domains and a dynamic pollution map estimation task claim that systems governed by this policy significantly outperform classical surprise-based approaches in stability, responsiveness, and predictive accuracy, with implications for self-aware and adaptive autonomous systems.

Significance. If the empirical claims are substantiated, the work could meaningfully advance research on surprise measures and adaptive agents in machine learning by providing a mechanism for systems to reflect on their learning progression via mutual information. This addresses ongoing debates around Shannon and Bayesian surprise by emphasizing epistemic growth and could support more robust behavior in non-stationary environments.

major comments (2)
  1. [Abstract] The abstract asserts empirical superiority in stability, responsiveness, and accuracy on synthetic and pollution tasks, but supplies no details on baselines, statistical tests, data splits, or error analysis, so it is not possible to verify whether the data actually supports the central claims.
  2. [MIS-based reaction policy] The stability of the MIS reaction policy hinges on untested assumptions about MI estimation robustness in non-stationary environments; the manuscript does not address whether the test thresholds or forking logic were tuned per task or if performance degrades under modest changes to observation noise or model capacity.
minor comments (1)
  1. The manuscript would benefit from an explicit mathematical definition of MIS (e.g., how the impact on mutual information is computed) and a direct comparison to Bayesian surprise to clarify the claimed novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] The abstract asserts empirical superiority in stability, responsiveness, and accuracy on synthetic and pollution tasks, but supplies no details on baselines, statistical tests, data splits, or error analysis, so it is not possible to verify whether the data actually supports the central claims.

    Authors: We agree that the abstract is concise and omits key experimental details. The full manuscript describes the baselines (Shannon and Bayesian surprise), the statistical test sequence, synthetic data generation procedures, the pollution map task setup including train/test splits, and error metrics for predictive accuracy. To improve verifiability, we will revise the abstract to include a short reference to these elements and add explicit pointers to the experimental sections. revision: yes

  2. Referee: [MIS-based reaction policy] The stability of the MIS reaction policy hinges on untested assumptions about MI estimation robustness in non-stationary environments; the manuscript does not address whether the test thresholds or forking logic were tuned per task or if performance degrades under modest changes to observation noise or model capacity.

    Authors: The manuscript derives test thresholds from the statistical test sequence and evaluates the policy across synthetic domains and the pollution task to show consistent behavior. We did not include explicit sensitivity analysis for observation noise levels or model capacity changes. We will add a discussion of the underlying assumptions regarding MI estimation and include additional robustness experiments in the revised version. revision: partial

Circularity Check

0 steps flagged

No significant circularity: MIS defined from standard mutual information with separate empirical validation

full rationale

The paper defines Mutual Information Surprise (MIS) directly from the established mutual information quantity between observations and the internal model, then proposes a statistical test sequence and reaction policy (sampling adjustment and process forking) as downstream applications. Empirical results on synthetic domains and pollution mapping are presented as separate evaluations of the policy's performance in stability and accuracy. No equations or steps reduce the central claims to fitted parameters on the same data or to self-citations that bear the load of the derivation; the definition and policy remain independent of the reported outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based on abstract only; the central claim rests on the domain assumption that mutual information serves as a direct proxy for epistemic growth and that a derived reaction policy will improve system behavior without side effects.

axioms (1)
  • domain assumption Mutual information between new observations and the system's model can be used as a reliable indicator of epistemic growth.
    This premise underpins the redefinition of surprise and the design of the reaction policy.

pith-pipeline@v0.9.0 · 5735 in / 1275 out tokens · 33434 ms · 2026-05-18T21:02:14.583297+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

  1. [1]

    This is true when we regard the initial observations as true system information

    We assume that the existing observations are typical in the sense of the Asymptotic Equipar- tition Property (44), meaning that empirical statistics computed from the data are repre- sentative of their corresponding expected values under the experimental design’s intended distribution, i.e., ˆ𝐼𝑛 ≈ E[ ˆ𝐼𝑛]. This is true when we regard the initial observati...

  2. [2]

    𝑛 ≪ |X| , |Y |

    The number of existing observations 𝑛 is much smaller than cardinality of space X, Y. 𝑛 ≪ |X| , |Y |

  3. [3]

    The number of new observations 𝑚 is much smaller than the number of existing observations. 𝑚 ≪ 𝑛. Theorem 1. Consider a well-regulated autonomous system defined in Section 3.1, which satisfies the conditions in Assumption 1. With probability at least 1 − 𝜌, the change in MLE-based mutual information estimates satisfies: ˆ𝐼𝑛+𝑚 − ˆ𝐼𝑛 ∈ (log(𝑚 + 𝑛) − log 𝑛) ...

  4. [4]

    Stagnation in Exploration:A downward shift driven by a decrease in input entropyΔ𝐻 (x) < 0 suggests the system repeatedly samples in a limited region, thus gathering redundant data 12 with minimal new information

  5. [5]

    Practically, this often signifies increased external noise or a fundamental change in the underlying process

    Increased Noise or Process Drift: A downward shift could also result from increased conditional entropy Δ𝐻 (y | x) > 0, indicating greater uncertainty in predicting y given x. Practically, this often signifies increased external noise or a fundamental change in the underlying process. Violation from Above: Sudden Growth in Understanding If MIS > MIS+, thi...

  6. [6]

    Aggressive Exploration: If the increase is driven by higher input entropy Δ𝐻 (x) > 0, the system is likely exploring previously unvisited regions aggressively, potentially inflating knowledge gains without sufficient validation

  7. [7]

    Reduction in Noise: An increase due to reduced conditional entropy Δ𝐻 (y | x) < 0 signals a desirable decrease in uncertainty, thus generally representing a beneficial development

  8. [8]

    Novel Discovery:An increase in output entropy Δ𝐻 (y) > 0 suggests discovery of novel and previously rare outputs—particularly valuable in exploratory or scientific contexts. Summary Table Violation Type Possible Causes Trend in Mutual Information Violation from Below Stagnation in exploration ↓ 𝐻 (x) ⇒↓ 𝐼 (x, y) Increased noise / process drift ↑ 𝐻 (y | x)...

  9. [9]

    If Δ ˆ𝐻 (x) > 0 dominates MIS, indicating overly aggressive exploration, the system should moderate exploration and emphasize exploitation to prevent fitting to noise

    Sampling Adjustment.The first policy addresses variations in input entropy𝐻 (x). If Δ ˆ𝐻 (x) > 0 dominates MIS, indicating overly aggressive exploration, the system should moderate exploration and emphasize exploitation to prevent fitting to noise. Conversely, if Δ ˆ𝐻 (x) < 0, suggesting redundant sampling, the system should enhance exploration to restore...

  10. [10]

    The second policy responds to variations in conditional entropy 𝐻 (y | x), 14 i.e., changes in function mapping

    Process Forking. The second policy responds to variations in conditional entropy 𝐻 (y | x), 14 i.e., changes in function mapping. Upon surprise triggered by Δ ˆ𝐻 (y | x), the system forks into two subprocesses, each consisting of 𝑛 existing observations and 𝑚 new observations divided at the surprise moment (Theorem 1). The two subprocesses represent the p...

  11. [11]

    the extra resources spent on deciding the nature of an observation

    Coin Toss Resolution. There are occasions where changes in Δ ˆ𝐻 (x) and Δ ˆ𝐻 (y | x) are comparable, making selecting a reaction policy challenging. Instead of arbitrarily favoring the slightly larger change, we always use a biased coin toss approach, stochastically selecting which entropy to address based on the magnitude of changes: 𝑝adjust = |Δ ˆ𝐻 (x)|...

  12. [12]

    SR: The surprise-reactive sampling method (14) switches between exploration and exploita- tion modes based on observed Shannon or Bayesian Surprise. By default, SR operates in an exploration mode guided by the widely used space-filling principle ( 53), selecting new 24 sampling locations via the min-max objective: x∗ = argmax x min x𝑖 ∈X ∥x − x𝑖 ∥2, where...

  13. [13]

    SC/E: The subtractive clustering/entropy active learning strategy (51) selects the next sam- pling location by maximizing a custom acquisition function. For an unseen region X and a probabilistic predictive function ˆ𝑓 (x) trained on the observed data, the acquisition function is defined as: 𝑎(x) = (1 − 𝜂)Ex′∈X [𝑒−∥x−x′ ∥2] + 𝜂𝐻 ( ˆ𝑓 (x)), where 𝜂 is the ...

  14. [14]

    GS/QBC: The greedy search/query by committee active learning strategy (52) uses a different acquisition function. Given the set of seen observations {X, Y} and a model committee F composed of multiple predictive models trained on this data, the acquisition function is defined as: 𝑎(x) = (1 − 𝜂) min x′,y′∈X,y ∥x − x′∥2∥ ˆ𝑓 (x) − y′∥2 + 𝜂 max ˆ𝑓 (·), ˆ𝑓 ′ (...

  15. [15]

    in one 𝑋-category and one 𝑌-category the counts change by ±1 (all other marginal counts are unchanged)

  16. [16]

    in one joint cell the count changes by −1 and in another joint cell the count changes by +1. Step 1. How much can one empirical Shannon entropy change? Assume a single observation is moved from category 𝐴 to category 𝐵. Let the counts before the move be 𝐴 = 𝑎 (with 𝑎 ≥ 1) and 𝐵 = 𝑏 (with 𝑏 ≥ 0). After the move the counts become𝑎 − 1 and 𝑏 + 1. Only these ...

  17. [17]

    A mobile robotic chemist,

    B. Burger, P. M. Maffettone, V. V. Gusev, C. M. Aitchison, Y. Bai, X. Wang, X. Li, B. M. Alston, B. Li, R. Clowes, N. Rankin, B. Harris, R. S. Sprick, and A. I. Cooper, “A mobile robotic chemist,”Nature, vol. 583, pp. 237–241, 2020

  18. [18]

    Scaling deep learning for materials discovery,

    A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk, “Scaling deep learning for materials discovery,”Nature, vol. 624, pp. 80–85, 2023

  19. [19]

    An autonomous laboratory for the accelerated synthesis of novel materials,

    N. J. Szymanski, B. Rendy, Y. Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, H. Kim, A. Jain, C. J. Bartel, K. Persson, Y. Zeng, and G. Ceder, “An autonomous laboratory for the accelerated synthesis of novel materials,”Nature, vol. 624, pp. 86–91, 2023

  20. [20]

    Autonomous mobile robots for exploratory synthetic chemistry,

    T. Dai, S. Vijayakrishnan, F. T. Szczypi ´nski, J.-F. Ayme, E. Simaei, T. Fellowes, R. Clowes, L. Kotopanov, C. E. Shields, Z. Zhou, J. W. Ward, and A. I. Cooper, “Autonomous mobile robots for exploratory synthetic chemistry,”Nature, vol. 635, pp. 890–897, 2024

  21. [21]

    Towards fully autonomous driving: Systems and algorithms,

    J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, J. Z. Kolter, D. Langer, O. Pink, V. Pratt, M. Sokolsky, G. Stanek, D. Stavens, A. Teichman, M. Werling, and S. Thrun, “Towards fully autonomous driving: Systems and algorithms,” in Proceedings of the 2011 IEEE Intelligent Vehicles Symposium, (Baden-Baden, Germany), June 2011

  22. [22]

    Self-driving laboratory for accelerated discovery of thin-film materials,

    B. P. MacLeod, F. G. Parlane, T. D. Morrissey, F. H ¨ase, L. M. Roch, K. E. Dettelbach, R. Moreira, L. P. Yunker, M. B. Rooney, and J. R. Deeth, “Self-driving laboratory for accelerated discovery of thin-film materials,”Science Advances, vol. 6, no. 20, p. eaaz8867, 2020

  23. [23]

    A survey of autonomous driving: Common practices and emerging technologies,

    E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,”IEEE Access, vol. 8, pp. 58443–58469, 2020

  24. [24]

    Anomaly detection in autonomous driving: A survey,

    D. Bogdoll, M. Nitsche, and J. M. Z ¨ollner, “Anomaly detection in autonomous driving: A survey,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, (New Orleans, USA), June 2022. 41

  25. [25]

    An autonomous manufacturing system based on swarm of cognitive agents,

    H.-S. Park and N.-H. Tran, “An autonomous manufacturing system based on swarm of cognitive agents,”Journal of Manufacturing Systems, vol. 31, no. 3, pp. 337–348, 2012

  26. [26]

    Towards resilience in industry 5.0: A decentralized autonomous manufacturing paradigm,

    J. Leng, Y. Zhong, Z. Lin, K. Xu, D. Mourtzis, X. Zhou, P. Zheng, Q. Liu, J. L. Zhao, and W. Shen, “Towards resilience in industry 5.0: A decentralized autonomous manufacturing paradigm,” Journal of Manufacturing Systems, vol. 71, pp. 95–114, 2023

  27. [27]

    High-tech defense industries: Developing autonomous intelligent systems,

    J. Reis, Y. Cohen, N. Mel˜ao, J. Costa, and D. Jorge, “High-tech defense industries: Developing autonomous intelligent systems,”Applied Sciences, vol. 11, no. 11, p. 4920, 2021

  28. [28]

    Autonomy in materials research: A case study in carbon nanotube growth,

    P. Nikolaev, D. Hooper, F. Webber, R. Rao, K. Decker, M. Krein, J. Poleski, R. Barto, and B. Maruyama, “Autonomy in materials research: A case study in carbon nanotube growth,” NPJ Computational Materials, vol. 2, p. 16031, 2016

  29. [29]

    Efficient closed-loop maximization of carbon nanotube growth rate using Bayesian optimization,

    J. Chang, P. Nikolaev, J. Carpena-N´ u˜nez, R. Rao, K. Decker, A. E. Islam, J. Kim, M. A. Pitt, J. I. Myung, and B. Maruyama, “Efficient closed-loop maximization of carbon nanotube growth rate using Bayesian optimization,”Scientific Reports, vol. 10, p. 9040, 2020

  30. [30]

    Toward futuristic autonomous experi- mentation—a surprise-reacting sequential experiment policy,

    I. Ahmed, S. T. Bukkapatnam, B. Botcha, and Y. Ding, “Toward futuristic autonomous experi- mentation—a surprise-reacting sequential experiment policy,” IEEE Transactions on Automa- tion Science and Engineering, vol. 22, pp. 7912–7926, 2025

  31. [31]

    Continuous anomaly detection in satellite image time series based on z-scores of season-trend model residuals,

    Z.-G. Zhou and P. Tang, “Continuous anomaly detection in satellite image time series based on z-scores of season-trend model residuals,” in Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium, (Beijing, China), July 2016

  32. [32]

    Active hypothesis testing for anomaly detection,

    K. Cohen and Q. Zhao, “Active hypothesis testing for anomaly detection,” IEEE Transactions on Information Theory, vol. 61, no. 3, pp. 1432–1450, 2015

  33. [33]

    Null hypothesis test for anomaly detection,

    J. F. Kamenik and M. Szewc, “Null hypothesis test for anomaly detection,” Physics Letters B, vol. 840, p. 137836, 2023

  34. [34]

    A survey of distance and similarity measures used within network intrusion anomaly detection,

    D. J. Weller-Fahy, B. J. Borghetti, and A. A. Sodemann, “A survey of distance and similarity measures used within network intrusion anomaly detection,” IEEE Communications Surveys & Tutorials, vol. 17, no. 1, pp. 70–91, 2014. 42

  35. [35]

    Artificial immune system via Euclidean distance minimization for anomaly detection in bearings,

    L. Montechiesi, M. Cocconcelli, and R. Rubini, “Artificial immune system via Euclidean distance minimization for anomaly detection in bearings,” Mechanical Systems and Signal Processing, vol. 76, pp. 380–393, 2016

  36. [36]

    Online anomaly detection for hard disk drives based on Mahalanobis distance,

    Y. Wang, Q. Miao, E. W. Ma, K.-L. Tsui, and M. G. Pecht, “Online anomaly detection for hard disk drives based on Mahalanobis distance,” IEEE Transactions on Reliability, vol. 62, no. 1, pp. 136–145, 2013

  37. [37]

    Mahalanobis distance based adversarial network for anomaly detection,

    Y. Hou, Z. Chen, M. Wu, C.-S. Foo, X. Li, and R. M. Shubair, “Mahalanobis distance based adversarial network for anomaly detection,” in Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, (Virtual), May 2020

  38. [38]

    F-anoGAN: Fast unsupervised anomaly detection with generative adversarial networks,

    T. Schlegl, P. Seeb ¨ock, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth, “F-anoGAN: Fast unsupervised anomaly detection with generative adversarial networks,” Medical Image Analysis, vol. 54, pp. 30–44, 2019

  39. [39]

    Anomaly detection and correction of optimizing autonomous systems with inverse reinforce- ment learning,

    B. Lian, Y. Kartal, F. L. Lewis, D. G. Mikulski, G. R. Hudas, Y. Wan, and A. Davoudi, “Anomaly detection and correction of optimizing autonomous systems with inverse reinforce- ment learning,” IEEE Transactions on Cybernetics, vol. 53, no. 7, pp. 4555–4566, 2022

  40. [40]

    Novelty or surprise?,

    A. Barto, M. Mirolli, and G. Baldassarre, “Novelty or surprise?,” Frontiers in Psychology, vol. 4, p. 907, 2013

  41. [41]

    Bayesian surprise attracts human attention,

    L. Itti and P. Baldi, “Bayesian surprise attracts human attention,” Vision Research, vol. 49, no. 10, pp. 1295–1306, 2009

  42. [42]

    Learning in volatile environments with the Bayes factor surprise,

    V. Liakoni, A. Modirshanechi, W. Gerstner, and J. Brea, “Learning in volatile environments with the Bayes factor surprise,”Neural Computation, vol. 33, no. 2, pp. 269–340, 2021

  43. [43]

    Balancing new against old information: The role of puzzlement surprise in learning,

    M. Faraji, K. Preuschoff, and W. Gerstner, “Balancing new against old information: The role of puzzlement surprise in learning,” Neural Computation, vol. 30, no. 1, pp. 34–83, 2018

  44. [44]

    Anomaly detection for au- tonomous guided vehicles using Bayesian surprise,

    O. C ¸ atal, S. Leroux, C. De Boom, T. Verbelen, and B. Dhoedt, “Anomaly detection for au- tonomous guided vehicles using Bayesian surprise,” in Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, (Las Vegas, USA), October 2020. 43

  45. [45]

    A Bayesian surprise approach in designing cognitive radar for autonomous driving,

    Y. Zamiri-Jafarian and K. N. Plataniotis, “A Bayesian surprise approach in designing cognitive radar for autonomous driving,”Entropy, vol. 24, no. 5, p. 672, 2022

  46. [46]

    Dinparastdjadid, I

    A. Dinparastdjadid, I. Supeene, and J. Engstrom, “Measuring surprise in the wild,” arXiv preprint arXiv:2305.07733, 2023

  47. [47]

    An augmented surprise-guided se- quential learning framework for predicting the melt pool geometry,

    A. S. Raihan, H. Khosravi, T. H. Bhuiyan, and I. Ahmed, “An augmented surprise-guided se- quential learning framework for predicting the melt pool geometry,”Journal of Manufacturing Systems, vol. 75, pp. 56–77, 2024

  48. [48]

    Autonomous experimentation systems and benefit of surprise-based Bayesian optimization,

    S. Jin, J. R. Deneault, B. Maruyama, and Y. Ding, “Autonomous experimentation systems and benefit of surprise-based Bayesian optimization,” in Proceedings of the 2022 International Symposium on Flexible Automation, (Yokohama, Japan), July 2022

  49. [49]

    A taxonomy of surprise definitions,

    A. Modirshanechi, J. Brea, and W. Gerstner, “A taxonomy of surprise definitions,” Journal of Mathematical Psychology, vol. 110, p. 102712, 2022

  50. [50]

    A computational theory of surprise,

    P. Baldi, “A computational theory of surprise,” in Information, Coding and Mathematics: Proceedings of Workshop Honoring Prof. Bob Mceliece on his 60th Birthday, pp. 1–25, 2002

  51. [51]

    Human inference in changing environments with temporal structure,

    A. Prat-Carrabin, R. C. Wilson, J. D. Cohen, and R. Azeredo da Silveira, “Human inference in changing environments with temporal structure,” Psychological Review, vol. 128, no. 5, p. 879–912, 2021

  52. [52]

    Alternatives to the median absolute deviation,

    P. J. Rousseeuw and C. Croux, “Alternatives to the median absolute deviation,”Journal of the American Statistical Association, vol. 88, no. 424, pp. 1273–1283, 1993

  53. [53]

    Clustering and unsupervised anomaly detection with l-2 normalized deep auto-encoder representations,

    C. Aytekin, X. Ni, F. Cricri, and E. Aksu, “Clustering and unsupervised anomaly detection with l-2 normalized deep auto-encoder representations,” in Proceedings of the 2018 International Joint Conference on Neural Networks, (Rio de Janeiro, Brazil), October 2018

  54. [54]

    Anomaly detection with multiple-hypotheses predictions,

    D. T. Nguyen, Z. Lou, M. Klar, and T. Brox, “Anomaly detection with multiple-hypotheses predictions,” inProceedings of the 36th International Conference on Machine Learning, (Long Beach, USA), June 2019. 44

  55. [55]

    A computational analysis of the neural bases of Bayesian inference,

    A. Kolossa, B. Kopp, and T. Fingscheidt, “A computational analysis of the neural bases of Bayesian inference,”Neuroimage, vol. 106, pp. 222–237, 2015

  56. [56]

    A mathematical theory of communication,

    C. E. Shannon, “A mathematical theory of communication,”The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948

  57. [57]

    Estimation of entropy and mutual information,

    L. Paninski, “Estimation of entropy and mutual information,” Neural Computation, vol. 15, no. 6, pp. 1191–1253, 2003

  58. [58]

    The permutation test for feature selection by mutual information,

    D. Franc ¸ois, V. Wertz, and M. Verleysen, “The permutation test for feature selection by mutual information,” in Proceedings of the 14th European Symposium on Artificial Neural Networks, (Bruges, Belgium), April 2006

  59. [59]

    Mutual information-based feature selection for multilabel classification,

    G. Doquire and M. Verleysen, “Mutual information-based feature selection for multilabel classification,”Neurocomputing, vol. 122, pp. 148–155, 2013

  60. [60]

    T. M. Cover, Elements of Information Theory. John Wiley & Sons, 1999

  61. [61]

    Exploration vs. exploitation in active learning: A Bayesian approach,

    A. Bondu, V. Lemaire, and M. Boull ´e, “Exploration vs. exploitation in active learning: A Bayesian approach,” in Proceedings of the 2010 International Joint Conference on Neural Networks, (Barcelona, Spain), July 2010

  62. [62]

    A unifying view on dataset shift in classification,

    J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodr´ıguez, N. V. Chawla, and F. Herrera, “A unifying view on dataset shift in classification,”Pattern Recognition, vol. 45, no. 1, pp. 521–530, 2012

  63. [63]

    Covariate shift adaptation by importance weighted cross validation,

    M. Sugiyama, M. Krauledat, and K.-R. M¨ uller, “Covariate shift adaptation by importance weighted cross validation,” Journal of Machine Learning Research , vol. 8, no. 5, pp. 985– 1005, 2007

  64. [64]

    Discriminative learning under covariate shift,

    S. Bickel, M. Br¨ uckner, and T. Scheffer, “Discriminative learning under covariate shift,”Journal of Machine Learning Research, vol. 10, no. 9, pp. 2137–2155, 2009

  65. [65]

    An overview of concept drift applications,

    I. ˇZliobait˙e, M. Pechenizkiy, and J. Gama, “An overview of concept drift applications,” Big Data Analysis: New Algorithms for a New Society, vol. 16, pp. 91–114, 2016

  66. [66]

    Concept drift monitoring and diagnostics of supervised learning models via score vectors,

    K. Zhang, A. T. Bui, and D. W. Apley, “Concept drift monitoring and diagnostics of supervised learning models via score vectors,”Technometrics, vol. 65, no. 2, pp. 137–149, 2023. 45

  67. [67]

    Active learning for object classification: From exploration to exploitation,

    N. Cebron and M. R. Berthold, “Active learning for object classification: From exploration to exploitation,”Data Mining and Knowledge Discovery, vol. 18, pp. 283–299, 2009

  68. [68]

    Dynamic exploration–exploitation trade-off in active learning regression with Bayesian hierarchical modeling,

    U. J. Islam, K. Paynabar, G. Runger, and A. S. Iquebal, “Dynamic exploration–exploitation trade-off in active learning regression with Bayesian hierarchical modeling,”IISE Transactions, vol. 57, no. 4, pp. 393–407, 2025

  69. [69]

    Space-filling designs for computer experiments: A review,

    V. R. Joseph, “Space-filling designs for computer experiments: A review,”Quality Engineering, vol. 28, no. 1, pp. 28–35, 2016

  70. [70]

    Generalization errors and learning curves for regression with multi-task Gaussian processes,

    K. Chai, “Generalization errors and learning curves for regression with multi-task Gaussian processes,” in Proceedings of the 23rd Advances in Neural Information Processing Systems , (Vancouver, Canada), December 2009

  71. [71]

    Entropy and information in neural spike trains,

    S. P. Strong, R. Koberle, R. R. D. R. Van Steveninck, and W. Bialek, “Entropy and information in neural spike trains,”Physical Review Letters, vol. 80, p. 197, 1998

  72. [72]

    On the method of bounded differences,

    C. McDiarmid, “On the method of bounded differences,” Surveys in Combinatorics, vol. 141, no. 1, pp. 148–188, 1989. 46