Soft Bayesian Context Tree Models for Real-Valued Time Series

Shota Saito; Toshiyasu Matsushima; Yuta Nakahara

arxiv: 2601.11079 · v2 · pith:Y7CO4P5Enew · submitted 2026-01-16 · 💻 cs.LG

Soft Bayesian Context Tree Models for Real-Valued Time Series

Shota Saito , Yuta Nakahara , Toshiyasu Matsushima This is my paper

Pith reviewed 2026-05-22 11:15 UTC · model grok-4.3

classification 💻 cs.LG

keywords soft bayesian context treereal-valued time seriesvariational inferencecontext tree modelsprobabilistic splitstime series modelingbayesian methods

0 comments

The pith

The soft Bayesian context tree model replaces hard deterministic splits with probabilistic ones and learns them via variational inference for real-valued time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the Soft-BCT as a new model for real-valued time series. It replaces the hard deterministic splits used in earlier Bayesian context tree models with soft probabilistic splits of the context space. The authors develop a variational inference algorithm to learn both the soft assignments and the model parameters. Experiments indicate that the Soft-BCT achieves better results than the previous hard-split version on some datasets. A sympathetic reader would care because the change allows the model to express uncertainty about which past observations form the relevant context rather than committing to a single branch at each step.

Core claim

The Soft-BCT considers soft (probabilistic) splits of the context space, instead of hard (deterministic) splits of the context space as in the previous BCT for real-valued time series. A learning algorithm of the Soft-BCT is proposed based on the variational inference. The results of experiments demonstrate the superiority of the Soft-BCT compared to the previous BCT for some datasets.

What carries the argument

Soft probabilistic splits of the context space, learned by variational inference to assign probabilities over possible contexts instead of selecting one deterministically.

If this is right

The model captures context dependencies with greater flexibility than hard-split trees.
Variational inference supplies a practical route to training the probabilistic assignments.
Performance gains appear on at least some real-valued time series benchmarks.
The approach extends context tree methods to cases where context membership is inherently uncertain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The soft-split idea could be combined with neural network components to handle longer or higher-dimensional sequences.
Similar probabilistic softening might improve other tree-structured models used in sequential forecasting.
The method may prove especially useful in domains such as sensor data or finance where the right history length varies smoothly rather than in sharp jumps.

Load-bearing premise

Variational inference can optimize the soft context assignments and parameters without introducing large approximation errors or instability.

What would settle it

Re-running the reported experiments on the same datasets and finding that the Soft-BCT does not outperform the hard BCT would show the claimed gains are not reliable.

Figures

Figures reproduced from arXiv: 2601.11079 by Shota Saito, Toshiyasu Matsushima, Yuta Nakahara.

**Figure 2.** Figure 2: An example of Ut. where u ⊤ t,d = [ut,d,1, ut,d,2, . . . , ut,d,M] is an M-dimensional one-hot vector (i.e., one of the elements equals 1 and all remaining elements equal 0), and ut,d,m = 1 indicates a path to an m-th child node at the branch of depth d (see [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: The MAP estimated model and parameters for [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

This paper proposes the soft Bayesian context tree model (Soft-BCT), which is a novel BCT model for real-valued time series. The Soft-BCT considers soft (probabilistic) splits of the context space, instead of hard (deterministic) splits of the context space as in the previous BCT for real-valued time series. A learning algorithm of the Soft-BCT is proposed based on the variational inference. The results of experiments demonstrate the superiority of the Soft-BCT compared to the previous BCT for some datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Soft-BCT adds probabilistic splits to real-valued context trees and a variational learner, but the performance edge rests on thin experimental detail.

read the letter

Colleague, the main point is that this paper takes the existing Bayesian context tree setup for real-valued series and replaces the hard deterministic splits with soft probabilistic ones, then fits everything with variational inference. The abstract frames this as a direct extension that beats the prior hard-split version on some datasets. That shift to soft assignments is the actual new element; it lets the model express uncertainty over which context is active instead of forcing a single path. The construction itself looks clean and stays inside the BCT framework without obvious circularity or missing identifiability conditions. The variational updates are presented as the practical learning method, which is a reasonable engineering choice for this kind of model. The soft spots sit mostly in the evidence. The superiority claim is stated but the abstract supplies no numbers, no dataset sizes, no error bars, and no statistical tests, so it is hard to tell whether the gains are real, stable, or just an artifact of the approximation. If the full paper has solid tables and controls, that would fix it; otherwise the central result stays under-supported. Variational inference can also hide bias or instability, and the paper would benefit from explicit checks on that. This is aimed at people who already work with context trees or probabilistic time-series models and want a modest relaxation of the hard-split assumption. A specialist might try the idea on their own data, but it is not broad enough to interest a general audience. I would send it to peer review. The modeling step is coherent and the extension is honest, so referees can sort out whether the experiments hold up and whether the variational step is reliable enough for the claimed gains.

Referee Report

0 major / 1 minor

Summary. This paper proposes the soft Bayesian context tree model (Soft-BCT) for real-valued time series. It introduces soft probabilistic splits of the context space in contrast to the hard deterministic splits in previous BCT models. A variational inference-based learning algorithm is developed, and experiments indicate superiority over the prior BCT on certain datasets.

Significance. If the results hold, the Soft-BCT offers an enhanced model class with greater flexibility through probabilistic context assignments, which could better capture uncertainties in time series contexts. The variational inference approach provides a practical learning method that may generalize well to other sequential data tasks.

minor comments (1)

[Abstract] The abstract asserts experimental superiority without referencing specific metrics, datasets, or statistical tests; adding a concise summary of these in the abstract or a dedicated results paragraph would strengthen the presentation of the central claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our work on Soft-BCT. The recommendation for minor revision is noted. As the report lists no major comments, we have no specific points to address.

Circularity Check

0 steps flagged

No significant circularity; model extension and empirical results are self-contained

full rationale

The paper defines the Soft-BCT by extending prior BCT models through the explicit introduction of soft probabilistic splits on the context space and a variational inference procedure for optimization. The claimed superiority is supported by experimental comparisons on datasets rather than any derivation that reduces a prediction or uniqueness result to a fitted parameter or self-citation by construction. No load-bearing equations or premises in the provided abstract and described construction collapse into tautological inputs; the variational step functions as a standard learning algorithm without forcing the performance outcome. This qualifies as a normal non-circular finding for an empirical model proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; no explicit parameter counts or new postulated objects are described.

pith-pipeline@v0.9.0 · 5611 in / 960 out tokens · 34932 ms · 2026-05-22T11:15:12.534854+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Context tree weighting : A sequential universal source coding procedure for fsmx sources,

F. Willems, Y . Shtarkov, and T. Tjalkens, “Context tree weighting : A sequential universal source coding procedure for fsmx sources,” in Proceedings. IEEE International Symposium on Information Theory, 1993, pp. 59–59

work page 1993
[2]

The context-tree weighting method: basic properties,

F. M. J. Willems, Y . M. Shtarkov, and T. J. Tjalkens, “The context-tree weighting method: basic properties,”IEEE Transactions on Information Theory, vol. 41, no. 3, pp. 653–664, 1995

work page 1995
[3]

The context-tree weighting method: extensions,

F. Willems, “The context-tree weighting method: extensions,”IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 792–798, 1998

work page 1998
[4]

On the context tree maximizing algorithm,

P. V olf and F. Willems, “On the context tree maximizing algorithm,” inProceedings of 1995 IEEE International Symposium on Information Theory, 1995, pp. 20–

work page 1995
[5]

From the entropy to the statistical structure of spike trains,

Y . Gao, I. Kontoyiannis, and E. Bienenstock, “From the entropy to the statistical structure of spike trains,” in2006 IEEE International Symposium on Information Theory, 2006, pp. 645–649

work page 2006
[6]

Estimating the secrecy-rate of physical unclonable functions with the context-tree weighting method,

T. Ignatenko, G.-j. Schrijen, B. Skoric, P. Tuyls, and F. Willems, “Estimating the secrecy-rate of physical unclonable functions with the context-tree weighting method,” in2006 IEEE International Symposium on Information Theory, 2006, pp. 499–503

work page 2006
[7]

On prediction using variable order markov models,

R. Begleiter, R. El-Yaniv, and G. Yona, “On prediction using variable order markov models,”J. Artif. Int. Res., vol. 22, no. 1, p. 385–421, dec 2004

work page 2004
[8]

Estimating the entropy of binary time series: Methodology, some theory and a simulation study,

Y . Gao, I. Kontoyiannis, and E. Bienenstock, “Estimating the entropy of binary time series: Methodology, some theory and a simulation study,” Entropy, vol. 10, no. 2, pp. 71–99, 2008

work page 2008
[9]

A Bayes coding algorithm using context tree,

T. Matsushima and S. Hirasawa, “A Bayes coding algorithm using context tree,” inProceedings of 1994 IEEE International Symposium on Information Theory, 1994, p. 386

work page 1994
[10]

A class of distortion- less codes designed by Bayes decision theory,

T. Matsushima, H. Inazumi, and S. Hirasawa, “A class of distortion- less codes designed by Bayes decision theory,”IEEE Transactions on Information Theory, vol. 37, no. 5, pp. 1288–1293, 1991

work page 1991
[11]

A class of prior distributions on context tree models and an efficient algorithm of the Bayes codes assuming it,

T. Matsushima and S. Hirasawa, “A class of prior distributions on context tree models and an efficient algorithm of the Bayes codes assuming it,” in2007 IEEE International Symposium on Signal Processing and Information Technology, 2007, pp. 938–941

work page 2007
[12]

Reducing the space complexity of a Bayes coding algorithm using an expanded context tree,

T. Matsushima and S. Hirasawa, “Reducing the space complexity of a Bayes coding algorithm using an expanded context tree,” in2009 IEEE International Symposium on Information Theory, June 2009, pp. 719– 723

work page 2009
[13]

An efficient Bayes coding algorithm for changing context tree model,

K. Shimada, S. Saito, and T. Matsushima, “An efficient Bayes coding algorithm for changing context tree model,”IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E107.A, no. 3, pp. 448–457, 2024

work page 2024
[14]

Variable splitting binary tree models based on Bayesian context tree models for time series segmentation,

Y . Nakahara, S. Saito, K. Horinouchi, K. Shimada, N. Ichijo, M. Kobayashi, and T. Matsushima, “Variable splitting binary tree models based on Bayesian context tree models for time series segmentation,” arXiv, 2026

work page 2026
[15]

A stochastic model for block seg- mentation of images based on the quadtree and the Bayes code for it,

Y . Nakahara and T. Matsushima, “A stochastic model for block seg- mentation of images based on the quadtree and the Bayes code for it,” Entropy, vol. 23, no. 8, 2021

work page 2021
[16]

Stochastic model of block segmentation based on improper quadtree and optimal code under the Bayes criterion,

——, “Stochastic model of block segmentation based on improper quadtree and optimal code under the Bayes criterion,”Entropy, vol. 24, no. 8, 2022

work page 2022
[17]

Meta-tree random forest: Probabilistic data-generative model and Bayes optimal prediction,

N. Dobashi, S. Saito, Y . Nakahara, and T. Matsushima, “Meta-tree random forest: Probabilistic data-generative model and Bayes optimal prediction,”Entropy, vol. 23, no. 6, 2021

work page 2021
[18]

Batch updating of a posterior tree distribution over a meta-tree,

Y . Nakahara and T. Matsushima, “Batch updating of a posterior tree distribution over a meta-tree,”IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E107.A, no. 3, pp. 523–525, 2024

work page 2024
[19]

Bayesian decision theory on decision trees: Uncertainty evaluation and interpretability,

Y . Nakahara, S. Saito, N. Ichijo, K. Kazama, and T. Matsushima, “Bayesian decision theory on decision trees: Uncertainty evaluation and interpretability,” inProceedings of The 28th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, Y . Li, S. Mandt, S. Agrawal, and E. Khan, Eds., vol. 258. P...

work page 2025
[20]

Meta-tree: Bayesian approach to avoid overfitting in decision trees and analysis on the application to boosting,

N. Ichijo and T. Matsushima, “Meta-tree: Bayesian approach to avoid overfitting in decision trees and analysis on the application to boosting,” in2025 IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP), 2025, pp. 1–6

work page 2025
[21]

Probability distribution on rooted trees: Generalization from full trees,

Y . Nakahara, S. Saito, A. Kamatsuka, and T. Matsushima, “Probability distribution on rooted trees: Generalization from full trees,”IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E109-A, no. 3, 2025

work page 2025
[22]

A generalization of B. S. Clarke and A. R. Barron’s asymptotics of Bayes codes for fsmx sources,

M. Gotoh, T. Matsushima, and S. Hirasawa, “A generalization of B. S. Clarke and A. R. Barron’s asymptotics of Bayes codes for fsmx sources,”IEICE TRANSACTIONS on Fundamentals, vol. E81-A, no. 10, pp. 2123–2132, October 1998

work page 1998
[23]

An analysis of the difference of code lengths between two-step codes based on MDL principle and Bayes codes,

M. Goto, T. Matsushima, and S. Hirasawa, “An analysis of the difference of code lengths between two-step codes based on MDL principle and Bayes codes,”IEEE Transactions on Information Theory, vol. 47, no. 3, pp. 927–944, 2001

work page 2001
[24]

Asymptotics of Bayesian inference for a class of probabilistic models under misspecifi- cation,

N. Miya, T. Suko, G. Yasuda, and T. Matsushima, “Asymptotics of Bayesian inference for a class of probabilistic models under misspecifi- cation,”IEICE TRANSACTIONS on Fundamentals, vol. E97-A, no. 12, pp. 2352–2360, December 2014

work page 2014
[25]

Fundamental limit and pointwise asymptotics of the Bayes code for markov sources,

S. Saito, N. Miya, and T. Matsushima, “Fundamental limit and pointwise asymptotics of the Bayes code for markov sources,” in2015 IEEE International Symposium on Information Theory (ISIT), 2015, pp. 1986– 1990

work page 2015
[26]

Evaluation of the Bayes code from viewpoints of the distribution of its codeword lengths,

——, “Evaluation of the Bayes code from viewpoints of the distribution of its codeword lengths,”IEICE TRANSACTIONS on Fundamentals, vol. E98-A, no. 12, pp. 2407–2414, December 2015

work page 2015
[27]

Evaluation of overflow probability of Bayes code in moderate deviation regime,

S. Saito and T. Matsushima, “Evaluation of overflow probability of Bayes code in moderate deviation regime,”IEICE TRANSACTIONS on Fundamentals, vol. E100-A, no. 12, pp. 2728–2731, December 2017

work page 2017
[28]

Revisiting context-tree weighting for Bayesian infer- ence,

I. Papageorgiou, I. Kontoyiannis, L. Mertzanis, A. Panotopoulou, and M. Skoularidou, “Revisiting context-tree weighting for Bayesian infer- ence,” in2021 IEEE International Symposium on Information Theory (ISIT), 2021, pp. 2906–2911

work page 2021
[29]

Bayesian context trees: Modelling and exact inference for discrete time series,

I. Kontoyiannis, L. Mertzanis, A. Panotopoulou, I. Papageorgiou, and M. Skoularidou, “Bayesian context trees: Modelling and exact inference for discrete time series,”Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 84, no. 4, pp. 1287–1323, 2022

work page 2022
[30]

Bayesian change-point detection via context-tree weighting,

V . Lungu, I. Papageorgiou, and I. Kontoyiannis, “Bayesian change-point detection via context-tree weighting,” in2022 IEEE Information Theory Workshop (ITW), 2022, pp. 125–130

work page 2022
[31]

The posterior distribution of Bayesian context-tree models: Theory and applications,

I. Papageorgiou and I. Kontoyiannis, “The posterior distribution of Bayesian context-tree models: Theory and applications,” in2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 702– 707

work page 2022
[32]

Posterior Representations for Bayesian Context Trees: Sampling, Estimation and Convergence,

——, “Posterior Representations for Bayesian Context Trees: Sampling, Estimation and Convergence,”Bayesian Analysis, vol. 19, no. 2, pp. 501 – 529, 2024

work page 2024
[33]

Truly Bayesian entropy estimation,

——, “Truly Bayesian entropy estimation,” in2023 IEEE Information Theory Workshop (ITW), 2023, pp. 497–502

work page 2023
[34]

Context-tree weighting and Bayesian context trees: Asymptotic and non-asymptotic justifications,

I. Kontoyiannis, “Context-tree weighting and Bayesian context trees: Asymptotic and non-asymptotic justifications,”IEEE Transactions on Information Theory, vol. 70, no. 2, pp. 1204–1219, 2024

work page 2024
[35]

Context-tree weighting for real- valued time series: Bayesian inference with hierarchical mixture mod- els,

I. Papageorgiou and I. Kontoyiannis, “Context-tree weighting for real- valued time series: Bayesian inference with hierarchical mixture mod- els,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 2464–2469

work page 2023
[36]

The Bayesian context trees state space model for time series modelling and forecasting,

——, “The Bayesian context trees state space model for time series modelling and forecasting,”International Journal of Forecasting, 2025

work page 2025
[37]

Hierarchical mixtures of experts and the em algorithm,

M. Jordan and R. Jacobs, “Hierarchical mixtures of experts and the em algorithm,” inProceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), vol. 2, 1993, pp. 1339– 1344 vol.2

work page 1993
[38]

Bishop,Pattern Recognition and Machine Learning

C. Bishop,Pattern Recognition and Machine Learning. Springer, January 2006

work page 2006
[39]

Probability distribution on full rooted trees,

Y . Nakahara, S. Saito, A. Kamatsuka, and T. Matsushima, “Probability distribution on full rooted trees,”Entropy, vol. 24, no. 3, 2022

work page 2022
[40]

J. O. Berger,Statistical decision theory and Bayesian analysis. New York: Springer-Verlag, 1985

work page 1985
[41]

Hyperparameter learning of Bayesian context tree models,

Y . Nakahara, S. Saito, K. Shimada, and T. Matsushima, “Hyperparameter learning of Bayesian context tree models,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 537–542

work page 2023
[42]

Streaming variational Bayes,

T. Broderick, N. Boyd, A. Wibisono, A. C. Wilson, and M. I. Jordan, “Streaming variational Bayes,” inAdvances in Neural Information Processing Systems, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., vol. 26. Curran Associates, Inc., 2013. APPENDIXA LEMMAS OF POSTERIOR DISTRIBUTIONS The following lemmas, Lemma 1 and Lemma 2, give...

work page 2013

[1] [1]

Context tree weighting : A sequential universal source coding procedure for fsmx sources,

F. Willems, Y . Shtarkov, and T. Tjalkens, “Context tree weighting : A sequential universal source coding procedure for fsmx sources,” in Proceedings. IEEE International Symposium on Information Theory, 1993, pp. 59–59

work page 1993

[2] [2]

The context-tree weighting method: basic properties,

F. M. J. Willems, Y . M. Shtarkov, and T. J. Tjalkens, “The context-tree weighting method: basic properties,”IEEE Transactions on Information Theory, vol. 41, no. 3, pp. 653–664, 1995

work page 1995

[3] [3]

The context-tree weighting method: extensions,

F. Willems, “The context-tree weighting method: extensions,”IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 792–798, 1998

work page 1998

[4] [4]

On the context tree maximizing algorithm,

P. V olf and F. Willems, “On the context tree maximizing algorithm,” inProceedings of 1995 IEEE International Symposium on Information Theory, 1995, pp. 20–

work page 1995

[5] [5]

From the entropy to the statistical structure of spike trains,

Y . Gao, I. Kontoyiannis, and E. Bienenstock, “From the entropy to the statistical structure of spike trains,” in2006 IEEE International Symposium on Information Theory, 2006, pp. 645–649

work page 2006

[6] [6]

Estimating the secrecy-rate of physical unclonable functions with the context-tree weighting method,

T. Ignatenko, G.-j. Schrijen, B. Skoric, P. Tuyls, and F. Willems, “Estimating the secrecy-rate of physical unclonable functions with the context-tree weighting method,” in2006 IEEE International Symposium on Information Theory, 2006, pp. 499–503

work page 2006

[7] [7]

On prediction using variable order markov models,

R. Begleiter, R. El-Yaniv, and G. Yona, “On prediction using variable order markov models,”J. Artif. Int. Res., vol. 22, no. 1, p. 385–421, dec 2004

work page 2004

[8] [8]

Estimating the entropy of binary time series: Methodology, some theory and a simulation study,

Y . Gao, I. Kontoyiannis, and E. Bienenstock, “Estimating the entropy of binary time series: Methodology, some theory and a simulation study,” Entropy, vol. 10, no. 2, pp. 71–99, 2008

work page 2008

[9] [9]

A Bayes coding algorithm using context tree,

T. Matsushima and S. Hirasawa, “A Bayes coding algorithm using context tree,” inProceedings of 1994 IEEE International Symposium on Information Theory, 1994, p. 386

work page 1994

[10] [10]

A class of distortion- less codes designed by Bayes decision theory,

T. Matsushima, H. Inazumi, and S. Hirasawa, “A class of distortion- less codes designed by Bayes decision theory,”IEEE Transactions on Information Theory, vol. 37, no. 5, pp. 1288–1293, 1991

work page 1991

[11] [11]

A class of prior distributions on context tree models and an efficient algorithm of the Bayes codes assuming it,

T. Matsushima and S. Hirasawa, “A class of prior distributions on context tree models and an efficient algorithm of the Bayes codes assuming it,” in2007 IEEE International Symposium on Signal Processing and Information Technology, 2007, pp. 938–941

work page 2007

[12] [12]

Reducing the space complexity of a Bayes coding algorithm using an expanded context tree,

T. Matsushima and S. Hirasawa, “Reducing the space complexity of a Bayes coding algorithm using an expanded context tree,” in2009 IEEE International Symposium on Information Theory, June 2009, pp. 719– 723

work page 2009

[13] [13]

An efficient Bayes coding algorithm for changing context tree model,

K. Shimada, S. Saito, and T. Matsushima, “An efficient Bayes coding algorithm for changing context tree model,”IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E107.A, no. 3, pp. 448–457, 2024

work page 2024

[14] [14]

Variable splitting binary tree models based on Bayesian context tree models for time series segmentation,

Y . Nakahara, S. Saito, K. Horinouchi, K. Shimada, N. Ichijo, M. Kobayashi, and T. Matsushima, “Variable splitting binary tree models based on Bayesian context tree models for time series segmentation,” arXiv, 2026

work page 2026

[15] [15]

A stochastic model for block seg- mentation of images based on the quadtree and the Bayes code for it,

Y . Nakahara and T. Matsushima, “A stochastic model for block seg- mentation of images based on the quadtree and the Bayes code for it,” Entropy, vol. 23, no. 8, 2021

work page 2021

[16] [16]

Stochastic model of block segmentation based on improper quadtree and optimal code under the Bayes criterion,

——, “Stochastic model of block segmentation based on improper quadtree and optimal code under the Bayes criterion,”Entropy, vol. 24, no. 8, 2022

work page 2022

[17] [17]

Meta-tree random forest: Probabilistic data-generative model and Bayes optimal prediction,

N. Dobashi, S. Saito, Y . Nakahara, and T. Matsushima, “Meta-tree random forest: Probabilistic data-generative model and Bayes optimal prediction,”Entropy, vol. 23, no. 6, 2021

work page 2021

[18] [18]

Batch updating of a posterior tree distribution over a meta-tree,

Y . Nakahara and T. Matsushima, “Batch updating of a posterior tree distribution over a meta-tree,”IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E107.A, no. 3, pp. 523–525, 2024

work page 2024

[19] [19]

Bayesian decision theory on decision trees: Uncertainty evaluation and interpretability,

Y . Nakahara, S. Saito, N. Ichijo, K. Kazama, and T. Matsushima, “Bayesian decision theory on decision trees: Uncertainty evaluation and interpretability,” inProceedings of The 28th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, Y . Li, S. Mandt, S. Agrawal, and E. Khan, Eds., vol. 258. P...

work page 2025

[20] [20]

Meta-tree: Bayesian approach to avoid overfitting in decision trees and analysis on the application to boosting,

N. Ichijo and T. Matsushima, “Meta-tree: Bayesian approach to avoid overfitting in decision trees and analysis on the application to boosting,” in2025 IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP), 2025, pp. 1–6

work page 2025

[21] [21]

Probability distribution on rooted trees: Generalization from full trees,

Y . Nakahara, S. Saito, A. Kamatsuka, and T. Matsushima, “Probability distribution on rooted trees: Generalization from full trees,”IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E109-A, no. 3, 2025

work page 2025

[22] [22]

A generalization of B. S. Clarke and A. R. Barron’s asymptotics of Bayes codes for fsmx sources,

M. Gotoh, T. Matsushima, and S. Hirasawa, “A generalization of B. S. Clarke and A. R. Barron’s asymptotics of Bayes codes for fsmx sources,”IEICE TRANSACTIONS on Fundamentals, vol. E81-A, no. 10, pp. 2123–2132, October 1998

work page 1998

[23] [23]

An analysis of the difference of code lengths between two-step codes based on MDL principle and Bayes codes,

M. Goto, T. Matsushima, and S. Hirasawa, “An analysis of the difference of code lengths between two-step codes based on MDL principle and Bayes codes,”IEEE Transactions on Information Theory, vol. 47, no. 3, pp. 927–944, 2001

work page 2001

[24] [24]

Asymptotics of Bayesian inference for a class of probabilistic models under misspecifi- cation,

N. Miya, T. Suko, G. Yasuda, and T. Matsushima, “Asymptotics of Bayesian inference for a class of probabilistic models under misspecifi- cation,”IEICE TRANSACTIONS on Fundamentals, vol. E97-A, no. 12, pp. 2352–2360, December 2014

work page 2014

[25] [25]

Fundamental limit and pointwise asymptotics of the Bayes code for markov sources,

S. Saito, N. Miya, and T. Matsushima, “Fundamental limit and pointwise asymptotics of the Bayes code for markov sources,” in2015 IEEE International Symposium on Information Theory (ISIT), 2015, pp. 1986– 1990

work page 2015

[26] [26]

Evaluation of the Bayes code from viewpoints of the distribution of its codeword lengths,

——, “Evaluation of the Bayes code from viewpoints of the distribution of its codeword lengths,”IEICE TRANSACTIONS on Fundamentals, vol. E98-A, no. 12, pp. 2407–2414, December 2015

work page 2015

[27] [27]

Evaluation of overflow probability of Bayes code in moderate deviation regime,

S. Saito and T. Matsushima, “Evaluation of overflow probability of Bayes code in moderate deviation regime,”IEICE TRANSACTIONS on Fundamentals, vol. E100-A, no. 12, pp. 2728–2731, December 2017

work page 2017

[28] [28]

Revisiting context-tree weighting for Bayesian infer- ence,

I. Papageorgiou, I. Kontoyiannis, L. Mertzanis, A. Panotopoulou, and M. Skoularidou, “Revisiting context-tree weighting for Bayesian infer- ence,” in2021 IEEE International Symposium on Information Theory (ISIT), 2021, pp. 2906–2911

work page 2021

[29] [29]

Bayesian context trees: Modelling and exact inference for discrete time series,

I. Kontoyiannis, L. Mertzanis, A. Panotopoulou, I. Papageorgiou, and M. Skoularidou, “Bayesian context trees: Modelling and exact inference for discrete time series,”Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 84, no. 4, pp. 1287–1323, 2022

work page 2022

[30] [30]

Bayesian change-point detection via context-tree weighting,

V . Lungu, I. Papageorgiou, and I. Kontoyiannis, “Bayesian change-point detection via context-tree weighting,” in2022 IEEE Information Theory Workshop (ITW), 2022, pp. 125–130

work page 2022

[31] [31]

The posterior distribution of Bayesian context-tree models: Theory and applications,

I. Papageorgiou and I. Kontoyiannis, “The posterior distribution of Bayesian context-tree models: Theory and applications,” in2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 702– 707

work page 2022

[32] [32]

Posterior Representations for Bayesian Context Trees: Sampling, Estimation and Convergence,

——, “Posterior Representations for Bayesian Context Trees: Sampling, Estimation and Convergence,”Bayesian Analysis, vol. 19, no. 2, pp. 501 – 529, 2024

work page 2024

[33] [33]

Truly Bayesian entropy estimation,

——, “Truly Bayesian entropy estimation,” in2023 IEEE Information Theory Workshop (ITW), 2023, pp. 497–502

work page 2023

[34] [34]

Context-tree weighting and Bayesian context trees: Asymptotic and non-asymptotic justifications,

I. Kontoyiannis, “Context-tree weighting and Bayesian context trees: Asymptotic and non-asymptotic justifications,”IEEE Transactions on Information Theory, vol. 70, no. 2, pp. 1204–1219, 2024

work page 2024

[35] [35]

Context-tree weighting for real- valued time series: Bayesian inference with hierarchical mixture mod- els,

I. Papageorgiou and I. Kontoyiannis, “Context-tree weighting for real- valued time series: Bayesian inference with hierarchical mixture mod- els,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 2464–2469

work page 2023

[36] [36]

The Bayesian context trees state space model for time series modelling and forecasting,

——, “The Bayesian context trees state space model for time series modelling and forecasting,”International Journal of Forecasting, 2025

work page 2025

[37] [37]

Hierarchical mixtures of experts and the em algorithm,

M. Jordan and R. Jacobs, “Hierarchical mixtures of experts and the em algorithm,” inProceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), vol. 2, 1993, pp. 1339– 1344 vol.2

work page 1993

[38] [38]

Bishop,Pattern Recognition and Machine Learning

C. Bishop,Pattern Recognition and Machine Learning. Springer, January 2006

work page 2006

[39] [39]

Probability distribution on full rooted trees,

Y . Nakahara, S. Saito, A. Kamatsuka, and T. Matsushima, “Probability distribution on full rooted trees,”Entropy, vol. 24, no. 3, 2022

work page 2022

[40] [40]

J. O. Berger,Statistical decision theory and Bayesian analysis. New York: Springer-Verlag, 1985

work page 1985

[41] [41]

Hyperparameter learning of Bayesian context tree models,

Y . Nakahara, S. Saito, K. Shimada, and T. Matsushima, “Hyperparameter learning of Bayesian context tree models,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 537–542

work page 2023

[42] [42]

Streaming variational Bayes,

T. Broderick, N. Boyd, A. Wibisono, A. C. Wilson, and M. I. Jordan, “Streaming variational Bayes,” inAdvances in Neural Information Processing Systems, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., vol. 26. Curran Associates, Inc., 2013. APPENDIXA LEMMAS OF POSTERIOR DISTRIBUTIONS The following lemmas, Lemma 1 and Lemma 2, give...

work page 2013