pith. machine review for the scientific record. sign in

arxiv: 2604.01021 · v2 · submitted 2026-04-01 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Transfer learning for nonparametric Bayesian networks

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:26 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords transfer learningnonparametric Bayesian networksstructure learningnegative transferkernel density estimationPC-stable algorithmhill climbingscarce data
0
0 comments X

The pith

Two transfer learning algorithms improve nonparametric Bayesian network estimation from limited data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PCS-TL, a constraint-based method, and HC-TL, a score-based method, to learn nonparametric Bayesian networks when only scarce target data is available. It adds specific metrics to detect and block negative transfer, where borrowing from a source domain would degrade performance, and uses log-linear pooling to combine parameters across domains. Evaluation on synthetic networks of varying sizes and UCI repository datasets, with added noise to simulate mismatch, shows these methods outperform learning from the target data alone. A Friedman test with Bergmann-Hommel post-hoc analysis supplies statistical evidence of the improvement. In practical terms this means models can be deployed faster in settings where collecting large amounts of domain-specific data is costly.

Core claim

PCS-TL and HC-TL are reliable transfer learning procedures for nonparametric Bayesian networks that raise structure-learning and parameter accuracy under scarce target data by selectively importing information from related source datasets while using dedicated metrics to prevent negative transfer; log-linear pooling is used for the parameters, and the gains are confirmed on both synthetic networks and real UCI data via statistical testing.

What carries the argument

PCS-TL (PC-stable transfer learning) and HC-TL (hill-climbing transfer learning) algorithms that embed negative-transfer detection metrics and apply log-linear pooling to parameter estimates.

If this is right

  • Structure and parameter estimates for kernel-density-estimation Bayesian networks become more accurate than standard learning when target samples are few.
  • Negative-transfer metrics succeed in protecting performance when source and target distributions differ.
  • Statistical tests confirm the methods outperform non-transfer baselines across multiple dataset sizes and noise levels.
  • Deployment time for such networks in data-scarce industrial settings is reduced because less target data needs to be collected.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same negative-transfer safeguards could be adapted to other nonparametric density estimators beyond Bayesian networks.
  • If source datasets arrive incrementally, the pooling step could be updated online without restarting the structure search.
  • The approach may shorten model-building cycles in any domain where related but not identical data sources are easier to obtain than perfectly matched data.

Load-bearing premise

Suitable related source datasets exist and the proposed negative-transfer metrics can reliably detect and block harmful transfers without adding new biases.

What would settle it

On fresh scarce-data problems where the chosen sources are unrelated, either PCS-TL or HC-TL produces lower accuracy than learning from the target data alone or the metrics fail to flag the mismatch.

Figures

Figures reproduced from arXiv: 2604.01021 by Concha Bielza, Pedro Larra\~naga, Rafael Sojo.

Figure 1
Figure 1. Figure 1: Flowchart of PCS-TL (in orange) and HC-TL (in blue). In green, functions that are shared by both processes. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Structures of the synthetic SPBNs [56]. Model Nodes Arcs Max indegree Synthetic SPBN 1 7 10 3 Synthetic SPBN 2 13 21 5 Synthetic SPBN 3 8 7 1 Synthetic SPBN 4 15 14 1 [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Results for the synthetic SPBNs with two auxiliary source domains. 0% and 10% of arc modification. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Results for the synthetic SPBNs with three auxiliary source domains. 5%, 10% and 20% of arc modification. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results for the bnlearn networks with two auxiliary source domains. 0% and 10% of arc modification. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Results for the bnlearn networks with three auxiliary source domains. 5%, 10% and 20% of arc modification. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Results for the UCI datasets with two auxiliary source domains. 0% and 10% of arc modification. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Results for the UCI datasets with three auxiliary source domains. 5%, 10 and 20% of arc modification. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Average UCI structures for the HC and HC-TL algorithms with 25 target instances. Results for 3 auxiliary [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Critical difference diagram for the network’s DHD results. Evaluation for less than [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Significance heatmap with p-values for the network’s DHD results. Evaluation for less than 525 target instances. ( [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Critical difference diagram for the network’s log-likelihood results. Evaluation for less than [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Significance heatmap with p-values for the network’s log-likelihood results. Evaluation for less than 525 target instances. instance, in [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
read the original abstract

This paper introduces two transfer learning methodologies for estimating nonparametric Bayesian networks under scarce data. We propose two algorithms, a constraint-based structure learning method, called PC-stable-transfer learning (PCS-TL), and a score-based method, called hill climbing transfer learning (HC-TL). We also define particular metrics to tackle the negative transfer problem in each of them, a situation in which transfer learning has a negative impact on the model's performance. Then, for the parameters, we propose a log-linear pooling approach. For the evaluation, we learn kernel density estimation Bayesian networks, a type of nonparametric Bayesian network, and compare their transfer learning performance with the models alone. To do so, we sample data from small, medium and large-sized synthetic networks and datasets from the UCI Machine Learning repository. Then, we add noise and modifications to these datasets to test their ability to avoid negative transfer. To conclude, we perform a Friedman test with a Bergmann-Hommel post-hoc analysis to show statistical proof of the enhanced experimental behavior of our methods. Thus, PCS-TL and HC-TL demonstrate to be reliable algorithms for improving the learning performance of a nonparametric Bayesian network with scarce data, which in real industrial environments implies a reduction in the required time to deploy the network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces PCS-TL, a constraint-based structure learning method, and HC-TL, a score-based method, for transfer learning in nonparametric Bayesian networks with scarce data. It defines metrics to address negative transfer, employs log-linear pooling for parameter estimation, and evaluates the approaches on synthetic networks of varying sizes and UCI datasets by adding noise and modifications, demonstrating statistical improvements via Friedman tests with Bergmann-Hommel post-hoc analysis.

Significance. If the proposed negative-transfer metrics are shown to be robust, the work could facilitate more efficient deployment of Bayesian network models in data-scarce industrial applications by leveraging related source datasets. The inclusion of statistical hypothesis testing provides a solid empirical foundation for the performance claims.

major comments (3)
  1. [Methods] Methods (PCS-TL and HC-TL definitions): The exact mathematical definitions and thresholds of the negative-transfer metrics are not specified. This is load-bearing for the central reliability claim, as the abstract states these metrics are used to avoid negative transfer; without explicit forms, it is impossible to assess whether they generalize beyond the tested noise additions or introduce new biases.
  2. [Evaluation] Evaluation section: No ablation results are reported to separate the effect of the negative-transfer metrics from the log-linear pooling step or the base PC-stable/HC algorithms. The performance claims rest on high-level summaries of Friedman/Bergmann-Hommel tests without error bars or detailed cases of prevented negative transfer, weakening the assertion that the methods reliably improve learning under scarce data.
  3. [Experimental setup] Experimental setup: The specific perturbations ('noise and modifications') applied to synthetic networks and UCI datasets are not detailed (e.g., whether they affect higher-order moments or conditional independencies). This leaves open whether the metrics detect harmful transfers only under the tested conditions or more broadly, directly impacting the industrial deployment-time reduction claim.
minor comments (2)
  1. [Abstract] Abstract: Refers to 'particular metrics' without naming or briefly describing them; expanding this would improve clarity for readers.
  2. [Throughout] Throughout: Missing implementation details such as code availability, exact hyperparameter settings for kernel density estimation, or the precise form of the log-linear pooling weights would strengthen reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's detailed feedback, which highlights areas for improvement in clarity and empirical validation. We will make the suggested revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods] Methods (PCS-TL and HC-TL definitions): The exact mathematical definitions and thresholds of the negative-transfer metrics are not specified. This is load-bearing for the central reliability claim, as the abstract states these metrics are used to avoid negative transfer; without explicit forms, it is impossible to assess whether they generalize beyond the tested noise additions or introduce new biases.

    Authors: We thank the referee for pointing this out. Upon review, the definitions of the negative-transfer metrics for PCS-TL and HC-TL were described in prose but lacked explicit mathematical formulations and specific threshold values. In the revised version, we will include the precise equations for these metrics, such as the condition for detecting negative transfer based on performance degradation, and specify the thresholds (e.g., a 5% drop in accuracy or similar). This will allow better assessment of their robustness. revision: yes

  2. Referee: [Evaluation] Evaluation section: No ablation results are reported to separate the effect of the negative-transfer metrics from the log-linear pooling step or the base PC-stable/HC algorithms. The performance claims rest on high-level summaries of Friedman/Bergmann-Hommel tests without error bars or detailed cases of prevented negative transfer, weakening the assertion that the methods reliably improve learning under scarce data.

    Authors: We acknowledge that no explicit ablation studies were presented to isolate the contributions of the negative-transfer metrics versus the log-linear pooling and the base algorithms. To address this, we will add ablation experiments in the revised manuscript, comparing variants with and without the metrics, to demonstrate their individual impacts. Additionally, we will include error bars in the performance summaries and detail specific cases where negative transfer was prevented. revision: yes

  3. Referee: [Experimental setup] Experimental setup: The specific perturbations ('noise and modifications') applied to synthetic networks and UCI datasets are not detailed (e.g., whether they affect higher-order moments or conditional independencies). This leaves open whether the metrics detect harmful transfers only under the tested conditions or more broadly, directly impacting the industrial deployment-time reduction claim.

    Authors: We agree that the specific perturbations applied to the datasets were not detailed sufficiently. In the revision, we will expand the experimental setup section to describe the exact noise additions (e.g., Gaussian noise with specific variances) and modifications (e.g., altering conditional probability tables or removing edges), including how they impact higher-order moments and conditional independencies. This will clarify the conditions under which the metrics operate. revision: yes

Circularity Check

0 steps flagged

Methods defined independently of evaluation data; central claims do not reduce to fitted inputs or self-citation chains

full rationale

The paper defines PCS-TL and HC-TL algorithms, negative-transfer metrics, and log-linear pooling explicitly in terms of structure learning and parameter estimation steps that operate on source and target datasets. Evaluation uses external UCI repository datasets and synthetic networks with added noise/modifications, followed by Friedman/Bergmann-Hommel tests. No equation or definition in the derivation chain equates a reported performance gain to a quantity fitted on the same validation data, nor does any load-bearing premise rest solely on prior self-citation without independent content. This yields only a minor self-citation score with no circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard Bayesian network assumptions (Markov condition, faithfulness for constraint-based learning) and introduces new algorithmic components without new free parameters or invented entities explicitly stated in the abstract.

axioms (1)
  • domain assumption Markov condition and faithfulness assumptions standard to constraint-based and score-based Bayesian network structure learning.
    Implicit foundation for both PCS-TL and HC-TL methods.

pith-pipeline@v0.9.0 · 5520 in / 1247 out tokens · 35057 ms · 2026-05-13T22:26:32.909834+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

  1. [1]

    A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22:1345–1359, 2010

    Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22:1345–1359, 2010

  2. [2]

    A survey on negative transfer.IEEE/CAA Journal of Automatica Sinica, 10(2):305–329, 2023

    Wen Zhang, Lingfei Deng, Lei Zhang, and Dongrui Wu. A survey on negative transfer.IEEE/CAA Journal of Automatica Sinica, 10(2):305–329, 2023

  3. [3]

    Kouw and Marco Loog

    Wouter M. Kouw and Marco Loog. A review of domain adaptation without target labels.IEEE Transactions on Pattern Analysis & Machine Intelligence, 43(03):766–785, 2021

  4. [4]

    Instance reweighting and dynamic distribution alignment for domain adaptation.Journal of Ambient Intelligence and Humanized Computing, 13(10):4967–4987, 2022

    Maryam Azarkesht and Fatemeh Afsari. Instance reweighting and dynamic distribution alignment for domain adaptation.Journal of Ambient Intelligence and Humanized Computing, 13(10):4967–4987, 2022

  5. [5]

    Bayesian adaptation for covariate shift

    Aurick Zhou and Sergey Levine. Bayesian adaptation for covariate shift. InAdvances in Neural Information Processing Systems, volume 34, pages 914–927. Curran Associates, Inc., 2021

  6. [6]

    Inductive transfer for Bayesian network structure learning

    Alexandru Niculescu-Mizil and Rich Caruana. Inductive transfer for Bayesian network structure learning. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, volume 27 ofProceedings of Machine Learning Research, pages 167–180. PMLR, 2012

  7. [7]

    Transfer learning for Bayesian discovery of multiple Bayesian networks.Knowledge and Information Systems, 43(1):1–28, 2015

    Diane Oyen and Terran Lane. Transfer learning for Bayesian discovery of multiple Bayesian networks.Knowledge and Information Systems, 43(1):1–28, 2015

  8. [8]

    Multi-task transfer learning for Bayesian network structures

    Sarah Benikhlef, Philippe Leray, Guillaume Raschia, Montassar Ben Messaoud, and Fayrouz Sakly. Multi-task transfer learning for Bayesian network structures. InSymbolic and Quantitative Approaches to Reasoning with Uncertainty, pages 217–228. Springer, 2021

  9. [9]

    Koller and N

    D. Koller and N. Friedman.Probabilistic Graphical Models: Principles and Techniques. The MIT Press, 2009

  10. [10]

    An algorithm for fast recovery of sparse causal graphs.Social Science Computer Review, 9:62–72, 1991

    Peter Spirtes and Clark Glymour. An algorithm for fast recovery of sparse causal graphs.Social Science Computer Review, 9:62–72, 1991

  11. [11]

    Causality from probability

    Peter Spirtes, Clark Glymour, and Richard Scheines. Causality from probability. Technical report, Department of Philosophy, Carnegie Mellon University, 1989

  12. [12]

    Maathuis

    Diego Colombo and Marloes H. Maathuis. Order-independent constraint-based causal structure learning.Journal of Machine Learning Research, 15(116):3921–3962, 2014

  13. [13]

    Cooper and Edward Herskovits

    Gregory F. Cooper and Edward Herskovits. A Bayesian method for the induction of probabilistic networks from data.Machine Learning, 9(4):309–347, 1992. 21 Binned Semiparametric Bayesian networks

  14. [14]

    Bouckaert

    Remco R. Bouckaert. Properties of Bayesian belief network learning algorithms. InProceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pages 102–109, 1994

  15. [15]

    John Wiley & Sons, 1997

    Fred Glover and Manuel Laguna.Tabu Search. John Wiley & Sons, 1997

  16. [16]

    Constantinou, Zhigao Guo, Yang Liu, and Kiattikun Chobtham

    Neville Kenneth Kitson, Anthony C. Constantinou, Zhigao Guo, Yang Liu, and Kiattikun Chobtham. A survey of Bayesian network structure learning.Artificial Intelligence Review, 56(8):8721–8814, 2023

  17. [17]

    On the sample complexity of learning bayesian networks

    Nir Friedman and Zohar Yakhini. On the sample complexity of learning bayesian networks. InProceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996), pages 274–282, 1996

  18. [18]

    The sample complexity of learning fixed-structure Bayesian networks.Machine Learning, 29(2):165–180, 1997

    Sanjoy Dasgupta. The sample complexity of learning fixed-structure Bayesian networks.Machine Learning, 29(2):165–180, 1997

  19. [19]

    Lourenço, and G.V

    Mayank Mishra, Paulo B. Lourenço, and G.V . Ramana. Structural health monitoring of civil engineering structures by using the internet of things: A review.Journal of Building Engineering, 48:103954, 2022

  20. [20]

    Significance of machine learning in healthcare: Features, pillars and applications.International Journal of Intelligent Networks, 3:58–73, 2022

    Mohd Javaid, Abid Haleem, Ravi Pratap Singh, Rajiv Suman, and Shanay Rab. Significance of machine learning in healthcare: Features, pillars and applications.International Journal of Intelligent Networks, 3:58–73, 2022

  21. [21]

    Inductive transfer for learning Bayesian networks.Machine Learning, 79:227–255, 2010

    Roger Luis, Luis Sucar, and Eduardo Morales. Inductive transfer for learning Bayesian networks.Machine Learning, 79:227–255, 2010

  22. [22]

    M. Stone. The opinion pool.The Annals of Mathematical Statistics, 32(4):1339–1342, 1961

  23. [23]

    Fiedler, L

    Lindsey J. Fiedler, L. Enrique Sucar, and Eduardo F. Morales. Transfer learning for temporal nodes Bayesian networks.Applied Intelligence, 43(3):578–597, 2015

  24. [24]

    Operational adjustment modeling approach based on Bayesian network transfer learning for new flotation process under scarce data.Journal of Process Control, 128, 2023

    Hao Yan, Shiji Song, Fuli Wang, Dakuo He, and Jianjun Zhao. Operational adjustment modeling approach based on Bayesian network transfer learning for new flotation process under scarce data.Journal of Process Control, 128, 2023

  25. [25]

    Hao Yan, Xinchun Jia, Kang Li, and Fuli Wang. A Bayesian network method using transfer learning for solving small data problems in abnormal condition diagnosis of fused magnesia smelting process.Control Engineering Practice, 147, 2024

  26. [26]

    Ping Yuan, Yufeng Sun, Hui Li, Fuli Wang, and Hongru Li. Abnormal condition identification modeling method based on Bayesian network parameters transfer learning for the electro-fused magnesia smelting process.IEEE Access, 7:149764–149775, 2019

  27. [27]

    Bearing fault diagnosis under small data set condition: A bayesian network method with transfer learning for parameter estimation.IEEE Access, 10:35768–35783, 2022

    Yongyan Hou, Ao Yang, Wenqiang Guo, Enrang Zheng, Qinkun Xiao, Zhigao Guo, and Zixuan Huang. Bearing fault diagnosis under small data set condition: A bayesian network method with transfer learning for parameter estimation.IEEE Access, 10:35768–35783, 2022

  28. [28]

    Dougherty

    Alireza Karbalayghareh, Xiaoning Qian, and Edward R. Dougherty. Optimal Bayesian transfer learning.IEEE Transactions on Signal Processing, 66(14):3724–3739, 2018

  29. [29]

    Hospedales, and Norman Fenton

    Yun Zhou, Timothy M. Hospedales, and Norman Fenton. When and where to transfer for Bayesian network parameter learning.Expert Systems with Applications, 55:361–373, 2016

  30. [30]

    Manton, Uwe Aickelin, and Jingge Zhu

    Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, and Jingge Zhu. A Bayesian approach to (online) transfer learning: Theory and algorithms.Artificial Intelligence, 324, 2023

  31. [31]

    Ferguson

    Thomas S. Ferguson. A Bayesian analysis of some nonparametric problems.Annals of Statistics, 1(2):209–230, 1973

  32. [32]

    Transferring model structure in Bayesian transfer learning for Gaussian process regression.Knowledge-Based Systems, 251:108875, 2022

    Milan Papež and Anthony Quinn. Transferring model structure in Bayesian transfer learning for Gaussian process regression.Knowledge-Based Systems, 251:108875, 2022

  33. [33]

    Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. The MIT Press, 2006

  34. [34]

    Bayesian nonparametric learning and knowledge transfer for object tracking under unknown time-varying conditions.Frontiers in Signal Processing, 2:868638, 2022

    Omar Alotaibi and Antonia Papandreou-Suppappola. Bayesian nonparametric learning and knowledge transfer for object tracking under unknown time-varying conditions.Frontiers in Signal Processing, 2:868638, 2022

  35. [35]

    Distribution inference from early-stage stationary data streams by transfer learning.IISE Transactions, pages 1–25, 2021

    Kai Wang, Jian Li, and Fugee Tsung. Distribution inference from early-stage stationary data streams by transfer learning.IISE Transactions, pages 1–25, 2021

  36. [36]

    Dynamic Bayesian networks for feature learning and transfer applications in remaining useful life estimation.IEEE Transactions on Instrumentation and Measurement, 72:1–12, 2023

    Lingquan Zeng, Junhua Zheng, Le Yao, and Zhiqiang Ge. Dynamic Bayesian networks for feature learning and transfer applications in remaining useful life estimation.IEEE Transactions on Instrumentation and Measurement, 72:1–12, 2023

  37. [37]

    A multiple kernel-based kernel density estimator for multimodal probability density functions.Engineering Applications of Artificial Intelligence, 132:107979, 2024

    Jia-Qi Chen, Yu-Lin He, Ying-Chao Cheng, Philippe Fournier-Viger, and Joshua Zhexue Huang. A multiple kernel-based kernel density estimator for multimodal probability density functions.Engineering Applications of Artificial Intelligence, 132:107979, 2024. 22 Binned Semiparametric Bayesian networks

  38. [38]

    Scott.Multivariate Density Estimation: Theory, Practice, and Visualization

    David W. Scott.Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, Inc., 2015

  39. [39]

    Djuric, and Franz Hlawatsch

    Gunther Koliander, Yousef El-Laham, Petar M. Djuric, and Franz Hlawatsch. Fusion of probability density functions.Proceedings of the IEEE, 110(4):404–453, 2022

  40. [40]

    Strobl, Kun Zhang, and Shyam Visweswaran

    Eric V . Strobl, Kun Zhang, and Shyam Visweswaran. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery.Journal of Causal Inference, 7(1):20180017, 2019

  41. [41]

    Discovering structure in continuous variables using Bayesian networks

    Reimar Hofmann and V olker Tresp. Discovering structure in continuous variables using Bayesian networks. Advances in Neural Information Processing Systems, 8:501–507, 1995

  42. [42]

    A characterization theorem for externally Bayesian groups.The Annals of Statistics, 12(3):1100 – 1105, 1984

    Christian Genest. A characterization theorem for externally Bayesian groups.The Annals of Statistics, 12(3):1100 – 1105, 1984

  43. [43]

    M. P. Wand. Error analysis for general multivariate kernel estimators.Journal of Nonparametric Statistics, 2(1):1–15, 1992

  44. [44]

    Adjacency-faithfulness and conservative causal inference

    Joseph Ramsey, Peter Spirtes, and Jiji Zhang. Adjacency-faithfulness and conservative causal inference. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI 2006), pages 401–408, 2006

  45. [45]

    Causal inference and causal explanation with background knowledge

    Christopher Meek. Causal inference and causal explanation with background knowledge. InProceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, page 403 – 410, 1994

  46. [46]

    Dor and M

    D. Dor and M. Tarsi. A simple algorithm to construct a consistent extension of a partially oriented graph. Technical report, UCLA, Cognitive Systems Laboratory, 1992. Available as Technical Report R-185

  47. [47]

    Semiparametric Bayesian networks.Information Sciences, 584:564–582, 2022

    David Atienza, Concha Bielza, and Pedro Larrañaga. Semiparametric Bayesian networks.Information Sciences, 584:564–582, 2022

  48. [48]

    Random features for large-scale kernel machines

    Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. InAdvances in Neural Information Processing Systems 20 (NIPS 2007), pages 1177–1184, 2007

  49. [49]

    Kernel-based conditional independence test and application in causal discovery

    Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Kernel-based conditional independence test and application in causal discovery. InProceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pages 804–813. AUAI Press, 2011

  50. [50]

    UCI Machine learning repository

    Dheeru Dua and Casey Graff. UCI Machine learning repository. http://archive.ics.uci.edu/ml, 2017

  51. [51]

    statistical comparisons of classifiers over multiple data sets

    Salvador García and Francisco Herrera. An extension on "statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons.Journal of Machine Learning Research, 9:2677–2694, 2008

  52. [52]

    J. E. Chacón and T. Duong.Multivariate Kernel Smoothing and Its Applications. Chapman & Hall/CRC, 1st edition, 2018

  53. [53]

    Wand and C

    P. Wand and C. Jones.Kernel Smoothing. Chapman & Hall/CRC, 1st edition, 1994

  54. [54]

    PyBNesian: An extensible python package for Bayesian networks.Neurocomputing, 504:204–209, 2022

    David Atienza, Concha Bielza, and Pedro Larrañaga. PyBNesian: An extensible python package for Bayesian networks.Neurocomputing, 504:204–209, 2022

  55. [55]

    Shachter and C

    Ross D. Shachter and C. Robert Kenley. Gaussian influence diagrams.Management Science, 35(5):527–550, 1989

  56. [56]

    Binned semiparametric Bayesian networks for efficient kernel density estimation, 2025

    Rafael Sojo, Javier Díaz-Rozo, Concha Bielza, and Pedro Larrañaga. Binned semiparametric Bayesian networks for efficient kernel density estimation, 2025. https://arxiv.org/abs/2506.21997

  57. [57]

    Learning Bayesian networks with the bnlearn R package.Journal of Statistical Software, 35(3):1–22, 2010

    Marco Scutari. Learning Bayesian networks with the bnlearn R package.Journal of Statistical Software, 35(3):1–22, 2010

  58. [58]

    Home monitoring for older singles: A gas sensor array system.Sensors and Actuators B: Chemical, 393:134036, 2023

    Daniel Marín, Joshua Llano-Viles, Zouhair Haddi, Alexandre Perera-Lluna, and Jordi Fonollosa. Home monitoring for older singles: A gas sensor array system.Sensors and Actuators B: Chemical, 393:134036, 2023

  59. [59]

    Lyon, Ben W

    Robert J. Lyon, Ben W. Stappers, Sally Cooper, J. M. Brooke, and Joshua D. Knowles. Fifty years of pulsar candidate selection: From simple filters to a new principled real-time classification approach.Monthly Notices of the Royal Astronomical Society, 459:1104–1123, 2016

  60. [61]

    R. Bock. MAGIC gamma telescope. UCI Machine Learning Repository, 2004. https://doi.org/10.24432/C58K54

  61. [62]

    Ibarra Candanedo, Veronique Feldheim, and Dominique Deramaix

    Luis M. Ibarra Candanedo, Veronique Feldheim, and Dominique Deramaix. Data driven prediction models of energy use of appliances in a low-energy house.Energy and Buildings, 140:81–97, 2017

  62. [63]

    Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research, 7(1):1–30, 2006

    Janez Demšar. Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research, 7(1):1–30, 2006. 23 Binned Semiparametric Bayesian networks A Synthetic SPBNs Synthetic SPBN 1: f(a)∼ N(µ A = 3, σA = 2) f(b|a)∼ N(µ B =a·0.5, σ B = 2) f(c|a)∼0.45· N(µ C1 =a·0.5, σ C1 = 1.5) + 0.55· N(µ C2 = 5, σC2 = 1) f(d|b, c)∼0.5· N(µ D1...