Probabilistic Regressor Chains with Monte Carlo Methods

Jesse Read; Luca Martino

arxiv: 1907.08087 · v1 · pith:JOWAT72Cnew · submitted 2019-07-18 · 💻 cs.LG · stat.ML

Probabilistic Regressor Chains with Monte Carlo Methods

Jesse Read , Luca Martino This is my paper

Pith reviewed 2026-05-24 19:40 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords regressor chainsmulti-output regressionsequential Monte Carloprobabilistic modelsmulti-label classificationcontinuous outputsdependency modeling

0 comments

The pith

A sequential Monte Carlo scheme in probabilistic regressor chains overcomes greedy inference limits for multi-output regression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that regressor chains for predicting multiple continuous outputs have been held back by greedy inference, which leads to poor accuracy and limited flexibility compared to training separate models. It develops a probabilistic version that uses sequential Monte Carlo sampling to propagate predictions and uncertainties through the chain. A sympathetic reader would care because multi-output regression appears in many practical settings with dependent continuous targets, and a workable chaining approach could capture output dependencies without the drawbacks seen in earlier attempts. The work also uses this development to clarify the broader picture of chaining methods for both regression and classification.

Core claim

The authors identify limitations in prior regressor chains, such as reliance on greedy inference, weak performance relative to independent models, and restricted applicability, then introduce a sequential Monte Carlo scheme inside a probabilistic regressor chain that samples successive predictions while accounting for uncertainty; they show this scheme is effective, flexible, and useful across several data types while placing the method in the general context of multi-output learning with continuous targets and shedding light on classifier chains.

What carries the argument

Sequential Monte Carlo scheme inside a probabilistic regressor chain, which replaces greedy point predictions with sampling to propagate and average over output dependencies.

If this is right

Regressor chains become competitive with independent models on continuous multi-output tasks.
The approach extends naturally to different base learners and loss functions.
Chaining methods gain improved handling of uncertainty and explainability.
Insights from the regression case clarify how classifier chains manage label dependencies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sampling approach may reduce error accumulation in long chains where early mistakes would otherwise compound.
Similar Monte Carlo techniques could be tested in other structured prediction settings that currently rely on greedy decoding.
The framework might support online or streaming updates to the chain when new data arrives sequentially.

Load-bearing premise

Replacing greedy inference with sequential Monte Carlo sampling in a regressor chain will overcome prior limitations without creating new problems of scalability or accuracy.

What would settle it

A head-to-head comparison on standard multi-output regression benchmarks in which the Monte Carlo regressor chain yields higher error than both independent models and the earlier greedy chains.

Figures

Figures reproduced from arXiv: 1907.08087 by Jesse Read, Luca Martino.

**Figure 2.** Figure 2: Probabilistic classifier chains where L = 3, yj ∈ {0, 1}: As a probabilistic graphical model (left), and with two explored paths in Y L in the probability tree. Note that the best path (in red, right) is not found by greedy inference. There are 2L possible paths (Y = {0, 1} 3 ). The label on each edge indicates Pj (yj = 1) (shown for explored paths). Many search methods have been applied for this purpose … view at source ↗

**Figure 3.** Figure 3: Even when x2 arrives only at timestep 2, information can still be carried forward from x1 (rather than via y1), thus making the label cascade superfluous wrt the prediction ˆy2 as long as f(x) is well modeled, even in this case. Another fundamental issue is the selection of loss metric. An obvious and popular choice for regression is based on the mean squared error (MSE) loss criterion (as indeed considere… view at source ↗

**Figure 4.** Figure 4: A hypothetical search tree through particle space along the paths for [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: The first three figures above related to the synthetic data shown in [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: A bimodal joint distribution over two labels [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

A large number and diversity of techniques have been offered in the literature in recent years for solving multi-label classification tasks, including classifier chains where predictions are cascaded to other models as additional features. The idea of extending this chaining methodology to multi-output regression has already been suggested and trialed: regressor chains. However, this has so-far been limited to greedy inference and has provided relatively poor results compared to individual models, and of limited applicability. In this paper we identify and discuss the main limitations, including an analysis of different base models, loss functions, explainability, and other desiderata of real-world applications. To overcome the identified limitations we study and develop methods for regressor chains. In particular we present a sequential Monte Carlo scheme in the framework of a probabilistic regressor chain, and we show it can be effective, flexible and useful in several types of data. We place regressor chains in context in general terms of multi-output learning with continuous outputs, and in doing this shed additional light on classifier chains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adds sequential Monte Carlo to regressor chains to avoid greedy errors, but the abstract leaves the actual performance gains unshown.

read the letter

This paper takes the idea of regressor chains and replaces the usual greedy inference with a sequential Monte Carlo scheme inside a probabilistic model. That is the main new piece. It does a solid job spelling out the shortcomings of earlier regressor chain work, including how they compare to independent models and issues around base learners, losses, and explainability. Framing it all within multi-output regression with continuous outputs also helps put classifier chains in perspective. The Monte Carlo approach makes sense as a way to sample through the chain and avoid error propagation from greedy steps. The soft spot is the empirical side. The claim that the method is effective, flexible, and useful rests entirely on experiments that the abstract only mentions without details. If those experiments show clear gains without excessive compute or variance, then it lands. Otherwise the contribution is mostly the idea rather than a demonstrated fix. This is for readers who work on multi-output problems and want chaining methods that handle uncertainty better. It is worth sending to peer review so the results and any scalability questions can be checked properly.

Referee Report

0 major / 2 minor

Summary. The paper identifies key limitations of existing greedy regressor chains for multi-output regression (poor performance relative to independent models, limited applicability, issues with base learners, loss functions, and explainability). It proposes a sequential Monte Carlo sampling scheme inside a probabilistic regressor chain framework as a solution and claims this approach is effective, flexible, and useful across several data types. The work also situates regressor chains more broadly within multi-output learning with continuous targets and draws connections back to classifier chains.

Significance. If the experimental results are robust, the contribution would be a practical inference method for chained multi-output regression that mitigates error accumulation without introducing prohibitive new scalability costs. The explicit positioning of the technique within the wider multi-output regression literature is a secondary but useful service to the field.

minor comments (2)

The abstract states that the SMC scheme 'can be effective, flexible and useful' but the manuscript should ensure that the experimental section quantifies these properties against both independent regressors and prior greedy chain baselines with appropriate error bars or statistical tests.
Notation for the probabilistic model and the SMC proposal distribution should be introduced once and used consistently; any re-use of symbols across sections risks confusion for readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and the recommendation for minor revision. The assessment correctly captures the motivation for moving beyond greedy inference in regressor chains and the role of sequential Monte Carlo sampling within a probabilistic framework. We are pleased that the broader contextualization of regressor chains within multi-output regression is viewed as a useful contribution.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical method: a sequential Monte Carlo scheme inside probabilistic regressor chains to address limitations of prior greedy regressor chains. The abstract frames the contribution as identifying limitations of existing approaches and demonstrating effectiveness via the new scheme across data types. No derivation chain, first-principles result, or prediction is shown that reduces by construction to fitted inputs or self-referential definitions. The central claim rests on asserted experimental outcomes rather than any load-bearing theoretical step that collapses to the paper's own inputs. Self-citations, if present, are not required to justify uniqueness or force the result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be extracted from the provided text.

pith-pipeline@v0.9.0 · 5697 in / 924 out tokens · 14359 ms · 2026-05-24T19:40:58.446235+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

Bayesian Reasoning and Machine Learning

David Barber. Bayesian Reasoning and Machine Learning . Cambridge University Press, 2012

work page 2012
[2]

H. Bijl, T. B. Schon, J. W. van Wingerden, and M. Verhaegen. System identiﬁcation through online sparse Gaussian Process regression with input noise. In arXiv:1601.08068, pages 1–25, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

A survey on multi-output regression.Wiley Int

Hanen Borchani, Gherardo Varando, Concha Bielza, and Pedro Larra˜ naga. A survey on multi-output regression.Wiley Int. Rev. Data Min. and Knowl. Disc., 5(5):216–233, September 2015

work page 2015
[4]

Bugallo, Luca Martino, and Jukka Corander

Monica F. Bugallo, Luca Martino, and Jukka Corander. Adaptive impor- tance sampling in signal processing. Digital Signal Processing, 47(Supple- ment C):36 – 49, 2015

work page 2015
[5]

Adios: Archi- tectures deep in output space

Moustapha Cisse, Maruan Al-Shedivat, and Samy Bengio. Adios: Archi- tectures deep in output space. In Proceedings of The 33rd International Conference on Machine Learning, volume 48, pages 2770–2779, New York, New York, USA, 20–22 Jun 2016. PMLR

work page 2016
[6]

An approximate inference with Gaussian Process to latent functions from uncertain data

Patrick Dallaire, Camille Besse, and Brahim Chaib-draa. An approximate inference with Gaussian Process to latent functions from uncertain data. Neurocomputing, 74(11):1945 – 1955, 2011

work page 1945
[7]

Deep Gaussian Processes

Patrick Dallaire, Camille Besse, and Brahim Chaib-draa. Deep Gaussian Processes. Proceedings of the Sixteenth International Workshop on Artiﬁ- cial Intelligence and Statistics (AISTATS) , pages 207–215, 2013

work page 2013
[8]

M. P. Deisenroth, M. F. Huber, and U. D. Hanebeck. Analytic moment- based Gaussian process ﬁltering. In Proceedings of the 26th Annual Inter- national Conference on Machine Learning (ICML) , pages 225–232, 2009

work page 2009
[9]

Dellaportas and D

P. Dellaportas and D. A. Stephens. Bayesian analysis of errors-in-variables regression models. Biometrics, 51(3):1085–1095, 2009

work page 2009
[10]

Bayes op- timal multilabel classiﬁcation via probabilistic classiﬁer chains

Krzysztof Dembczy´ nski, Weiwei Cheng, and Eyke H¨ ullermeier. Bayes op- timal multilabel classiﬁcation via probabilistic classiﬁer chains. In ICML ’10: 27th International Conference on Machine Learning , pages 279–286, Haifa, Israel, June 2010. Omnipress

work page 2010
[11]

On label dependence and loss minimization in multi-label classiﬁcation

Krzysztof Dembczy´ nski, Willem Waegeman, Weiwei Cheng, and Eyke H¨ ullermeier. On label dependence and loss minimization in multi-label classiﬁcation. Mach. Learn., 88(1-2):5–45, July 2012. 23

work page 2012
[12]

An analysis of chaining in multi-label classiﬁcation

Krzysztof Dembczy´ nski, Willem Waegeman, and Eyke H¨ ullermeier. An analysis of chaining in multi-label classiﬁcation. In ECAI: European Con- ference of Artiﬁcial Intelligence , volume 242, pages 294–299. IOS Press, 2012

work page 2012
[13]

P. M. Djuric, J. H. Kotecha, Jianqui Zhang, Yufei Huang, T. Ghirmai, M. F. Bugallo, and J. Miguez. Particle ﬁltering. IEEE Signal Processing Magazine, 20(5):19–38, Sept 2003

work page 2003
[14]

Elvira, L

V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo. Heretical multiple importance sampling. IEEE Signal Processing Letters , 23(10):1474–1478, 2016

work page 2016
[15]

Multi-label classiﬁcation using conditional dependency networks

Yuhong Guo and Suicheng Gu. Multi-label classiﬁcation using conditional dependency networks. In IJCAI ’11: 24th International Conference on Artiﬁcial Intelligence, pages 1300–1305. IJCAI/AAAI, 2011

work page 2011
[16]

A novel boosted-neural network ensemble for modeling multi-target regression problems

Esmaeil Hadavandi, Jamal Shahrabi, and Shahaboddin Shamshirband. A novel boosted-neural network ensemble for modeling multi-target regression problems. Engineering Applications of Artiﬁcial Intelligence , 45:204 – 219, 2015

work page 2015
[17]

The Elements of Statistical Learning

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001

work page 2001
[18]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016
[19]

J. E. Johnson, V. Laparra, and G. Camps-Valls. A derivative-based vari- ance estimate for Gaussian Process regression. Submitted, pages 1–20, 2018

work page 2018
[20]

Conditional entropy based classiﬁer chains for multi-label classiﬁcation

Xie Jun, Yu Lu, Zhu Lei, and Duan Guolun. Conditional entropy based classiﬁer chains for multi-label classiﬁcation. Neurocomputing, 335:185 – 194, 2019

work page 2019
[21]

A comprehensive analysis of deep regression

St´ ephane Lathuili` ere, Pablo Mesejo, Xavier Alameda-Pineda, and Radu Horaud. A comprehensive analysis of deep regression. CoRR, abs/1803.08450, 2018

work page arXiv 2018
[22]

Bayesian Warped Gaussian Processes

Miguel L´ azaro-Gredilla. Bayesian Warped Gaussian Processes. InAdvances in Neural Information Processing Systems 25 , pages 1619–1627. 2012

work page 2012
[23]

Martino, V

L. Martino, V. Elvira, and F. Louzada. Eﬀective sample size for importance sampling based on discrepancy measures. Signal Processing, 131:386 – 401, 2017

work page 2017
[24]

Group importance sampling for particle ﬁltering and MCMC

Luca Martino, Victor Elvira, and Gustau Camps-Valls. Group importance sampling for particle ﬁltering and MCMC. Digital Signal Processing, 82:133 – 151, 2018. 24

work page 2018
[25]

Cooper- ative parallel particle ﬁlters for online model selection and applications to urban mobility

Luca Martino, Jesse Read, Victor Elvira, and Francisco Louzada. Cooper- ative parallel particle ﬁlters for online model selection and applications to urban mobility. Digital Signal Processing, 60(January):172–185, 2017

work page 2017
[26]

Using A* for inference in probabilistic classiﬁer chains

Deiner Mena, Elena Monta˜ n´ es, Jos´ e Ram´ on Quevedo, and Juan Jos´ e del Coz. Using A* for inference in probabilistic classiﬁer chains. In Proceed- ings of the Twenty-Fourth International Joint Conference on Artiﬁcial In- telligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015 , pages 3707–3713, 2015

work page 2015
[27]

Scikit- MultiFlow: A multi-output streaming framework

Jacob Montiel, Jesse Read, Albert Bifet, and Talel Abdessalem. Scikit- MultiFlow: A multi-output streaming framework. Journal of Machine Learning Research, 18, 2018

work page 2018
[28]

Maximizing subset accuracy with recurrent neural networks in multi-label classiﬁcation

Jinseok Nam, Eneldo Loza Menc´ ıa, Hyunwoo J Kim, and Johannes F¨ urnkranz. Maximizing subset accuracy with recurrent neural networks in multi-label classiﬁcation. In Advances in Neural Information Processing Systems 30, pages 5413–5423, 2017

work page 2017
[29]

Qui˜ nonero-Candela, A

J. Qui˜ nonero-Candela, A. Girard, and C. Rasmussen. Prediction at an uncertain input for Gaussian Processes and Relevance Vector Machines ap- plication to multiple-step ahead time-series forecasting. Technical Report, no. 1, pages 1–14, 2003

work page 2003
[30]

Multi-dimensional clas- siﬁcation with super-classes

Jesse Read, Concha Bielza, and Pedro Larra˜ naga. Multi-dimensional clas- siﬁcation with super-classes. Transactions on Knowledge and Data Engi- neering, 26(7):1720–1733, 2014

work page 2014
[31]

Multi-label Classification using Labels as Hidden Nodes

Jesse Read and Jaakko Hollm´ en. Multi-label classiﬁcation using labels as hidden nodes. Technical Report 1503.09022v3, ArXiv.org, 2017. ArXiv

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

Multi-label methods for prediction with sequential data

Jesse Read, Luca Martino, and Jaakko Hollm´ en. Multi-label methods for prediction with sequential data. Pattern Recognition, 63(Supplement C):45 – 55, 2017

work page 2017
[33]

Multi-label methods for prediction with sequential data

Jesse Read, Luca Martino, and Jaakko Hollm´ en. Multi-label methods for prediction with sequential data. Pattern Recognition, 63(March):45–55, 2017

work page 2017
[34]

Eﬃcient Monte Carlo meth- ods for multi-dimensional learning with classiﬁer chains

Jesse Read, Luca Martino, and David Luengo. Eﬃcient Monte Carlo meth- ods for multi-dimensional learning with classiﬁer chains. Pattern Recogni- tion, 47(3):1535–1546, 2014

work page 2014
[35]

Olmos, and David Luengo

Jesse Read, Luca Martino, Pablo M. Olmos, and David Luengo. Scalable multi-output label prediction: From classiﬁer chains to classiﬁer trellises. Pattern Recognition, 48(6):2096 – 2109, 2015

work page 2096
[36]

Classi- ﬁer chains for multi-label classiﬁcation

Jesse Read, Bernhard Pfahringer, Geoﬀ Holmes, and Eibe Frank. Classi- ﬁer chains for multi-label classiﬁcation. Machine Learning, 85(3):333–359, 2011. 25

work page 2011
[37]

Snelson, Z

E. Snelson, Z. Ghahramani, and C. Rasmussen. Warped Gaussian Pro- cesses. In Advances in Neural Information Processing Systems 16 , pages 1–8. 2003

work page 2003
[38]

Multi-target regression via input space expansion: treating targets as inputs

Eleftherios Spyromitros-Xiouﬁs, Grigorios Tsoumakas, William Groves, and Ioannis Vlahavas. Multi-target regression via input space expansion: treating targets as inputs. Machine Learning, pages 1–44, 2016

work page 2016
[39]

Ccnet: Joint multi-label classiﬁcation and feature selec- tion using classiﬁer chains and elastic net regularization

Pawe Teisseyre. Ccnet: Joint multi-label classiﬁcation and feature selec- tion using classiﬁer chains and elastic net regularization. Neurocomputing, 235:98 – 111, 2017

work page 2017
[40]

Random k- labelsets for multi-label classiﬁcation

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Random k- labelsets for multi-label classiﬁcation. IEEE Transactions on Knowledge and Data Engineering , 23(7):1079–1089, 2011

work page 2011
[41]

Multi- target prediction: A unifying view on problems and methods

Willem Waegeman, Krzysztof Dembczynski, and Eyke Huellermeier. Multi- target prediction: A unifying view on problems and methods. page ArXiV, 09 2018

work page 2018
[42]

Monte carlo tree search in continuous action spaces with execution uncertainty

Timothy Yee, Viliam Lisy, and Michael Bowling. Monte carlo tree search in continuous action spaces with execution uncertainty. In Proceedings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intelligence , IJCAI’16, pages 690–696. AAAI Press, 2016. A Parallel Metropolis-Hastings (MH) chains For completeness, we elaborate the algorithm based...

work page 2016
[43]

Draw z ∼q(y|y(m) j,k−1)

work page
[44]

Setting for simplicity pj(z) = pj(z|x,˜y(m) 1:j−1), accept the movement y(m) j,k =z with probability α = min [ 1, pj(z)q(y(m) j,k−1|z) pj(y(m) j,k−1)q(z|y(m) j,k−1) ] (26) Otherwise, with probability 1 −α, set y(m) j,k =y(m) j,k−1 Therefore ﬁnal set of samples for an iteration is{˜y(1) j ,..., ˜y(M ) j } = {˜y(1) j,K,..., ˜y(m) j,K }. 26

work page

[1] [1]

Bayesian Reasoning and Machine Learning

David Barber. Bayesian Reasoning and Machine Learning . Cambridge University Press, 2012

work page 2012

[2] [2]

H. Bijl, T. B. Schon, J. W. van Wingerden, and M. Verhaegen. System identiﬁcation through online sparse Gaussian Process regression with input noise. In arXiv:1601.08068, pages 1–25, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[3] [3]

A survey on multi-output regression.Wiley Int

Hanen Borchani, Gherardo Varando, Concha Bielza, and Pedro Larra˜ naga. A survey on multi-output regression.Wiley Int. Rev. Data Min. and Knowl. Disc., 5(5):216–233, September 2015

work page 2015

[4] [4]

Bugallo, Luca Martino, and Jukka Corander

Monica F. Bugallo, Luca Martino, and Jukka Corander. Adaptive impor- tance sampling in signal processing. Digital Signal Processing, 47(Supple- ment C):36 – 49, 2015

work page 2015

[5] [5]

Adios: Archi- tectures deep in output space

Moustapha Cisse, Maruan Al-Shedivat, and Samy Bengio. Adios: Archi- tectures deep in output space. In Proceedings of The 33rd International Conference on Machine Learning, volume 48, pages 2770–2779, New York, New York, USA, 20–22 Jun 2016. PMLR

work page 2016

[6] [6]

An approximate inference with Gaussian Process to latent functions from uncertain data

Patrick Dallaire, Camille Besse, and Brahim Chaib-draa. An approximate inference with Gaussian Process to latent functions from uncertain data. Neurocomputing, 74(11):1945 – 1955, 2011

work page 1945

[7] [7]

Deep Gaussian Processes

Patrick Dallaire, Camille Besse, and Brahim Chaib-draa. Deep Gaussian Processes. Proceedings of the Sixteenth International Workshop on Artiﬁ- cial Intelligence and Statistics (AISTATS) , pages 207–215, 2013

work page 2013

[8] [8]

M. P. Deisenroth, M. F. Huber, and U. D. Hanebeck. Analytic moment- based Gaussian process ﬁltering. In Proceedings of the 26th Annual Inter- national Conference on Machine Learning (ICML) , pages 225–232, 2009

work page 2009

[9] [9]

Dellaportas and D

P. Dellaportas and D. A. Stephens. Bayesian analysis of errors-in-variables regression models. Biometrics, 51(3):1085–1095, 2009

work page 2009

[10] [10]

Bayes op- timal multilabel classiﬁcation via probabilistic classiﬁer chains

Krzysztof Dembczy´ nski, Weiwei Cheng, and Eyke H¨ ullermeier. Bayes op- timal multilabel classiﬁcation via probabilistic classiﬁer chains. In ICML ’10: 27th International Conference on Machine Learning , pages 279–286, Haifa, Israel, June 2010. Omnipress

work page 2010

[11] [11]

On label dependence and loss minimization in multi-label classiﬁcation

Krzysztof Dembczy´ nski, Willem Waegeman, Weiwei Cheng, and Eyke H¨ ullermeier. On label dependence and loss minimization in multi-label classiﬁcation. Mach. Learn., 88(1-2):5–45, July 2012. 23

work page 2012

[12] [12]

An analysis of chaining in multi-label classiﬁcation

Krzysztof Dembczy´ nski, Willem Waegeman, and Eyke H¨ ullermeier. An analysis of chaining in multi-label classiﬁcation. In ECAI: European Con- ference of Artiﬁcial Intelligence , volume 242, pages 294–299. IOS Press, 2012

work page 2012

[13] [13]

P. M. Djuric, J. H. Kotecha, Jianqui Zhang, Yufei Huang, T. Ghirmai, M. F. Bugallo, and J. Miguez. Particle ﬁltering. IEEE Signal Processing Magazine, 20(5):19–38, Sept 2003

work page 2003

[14] [14]

Elvira, L

V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo. Heretical multiple importance sampling. IEEE Signal Processing Letters , 23(10):1474–1478, 2016

work page 2016

[15] [15]

Multi-label classiﬁcation using conditional dependency networks

Yuhong Guo and Suicheng Gu. Multi-label classiﬁcation using conditional dependency networks. In IJCAI ’11: 24th International Conference on Artiﬁcial Intelligence, pages 1300–1305. IJCAI/AAAI, 2011

work page 2011

[16] [16]

A novel boosted-neural network ensemble for modeling multi-target regression problems

Esmaeil Hadavandi, Jamal Shahrabi, and Shahaboddin Shamshirband. A novel boosted-neural network ensemble for modeling multi-target regression problems. Engineering Applications of Artiﬁcial Intelligence , 45:204 – 219, 2015

work page 2015

[17] [17]

The Elements of Statistical Learning

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001

work page 2001

[18] [18]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016

[19] [19]

J. E. Johnson, V. Laparra, and G. Camps-Valls. A derivative-based vari- ance estimate for Gaussian Process regression. Submitted, pages 1–20, 2018

work page 2018

[20] [20]

Conditional entropy based classiﬁer chains for multi-label classiﬁcation

Xie Jun, Yu Lu, Zhu Lei, and Duan Guolun. Conditional entropy based classiﬁer chains for multi-label classiﬁcation. Neurocomputing, 335:185 – 194, 2019

work page 2019

[21] [21]

A comprehensive analysis of deep regression

St´ ephane Lathuili` ere, Pablo Mesejo, Xavier Alameda-Pineda, and Radu Horaud. A comprehensive analysis of deep regression. CoRR, abs/1803.08450, 2018

work page arXiv 2018

[22] [22]

Bayesian Warped Gaussian Processes

Miguel L´ azaro-Gredilla. Bayesian Warped Gaussian Processes. InAdvances in Neural Information Processing Systems 25 , pages 1619–1627. 2012

work page 2012

[23] [23]

Martino, V

L. Martino, V. Elvira, and F. Louzada. Eﬀective sample size for importance sampling based on discrepancy measures. Signal Processing, 131:386 – 401, 2017

work page 2017

[24] [24]

Group importance sampling for particle ﬁltering and MCMC

Luca Martino, Victor Elvira, and Gustau Camps-Valls. Group importance sampling for particle ﬁltering and MCMC. Digital Signal Processing, 82:133 – 151, 2018. 24

work page 2018

[25] [25]

Cooper- ative parallel particle ﬁlters for online model selection and applications to urban mobility

Luca Martino, Jesse Read, Victor Elvira, and Francisco Louzada. Cooper- ative parallel particle ﬁlters for online model selection and applications to urban mobility. Digital Signal Processing, 60(January):172–185, 2017

work page 2017

[26] [26]

Using A* for inference in probabilistic classiﬁer chains

Deiner Mena, Elena Monta˜ n´ es, Jos´ e Ram´ on Quevedo, and Juan Jos´ e del Coz. Using A* for inference in probabilistic classiﬁer chains. In Proceed- ings of the Twenty-Fourth International Joint Conference on Artiﬁcial In- telligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015 , pages 3707–3713, 2015

work page 2015

[27] [27]

Scikit- MultiFlow: A multi-output streaming framework

Jacob Montiel, Jesse Read, Albert Bifet, and Talel Abdessalem. Scikit- MultiFlow: A multi-output streaming framework. Journal of Machine Learning Research, 18, 2018

work page 2018

[28] [28]

Maximizing subset accuracy with recurrent neural networks in multi-label classiﬁcation

Jinseok Nam, Eneldo Loza Menc´ ıa, Hyunwoo J Kim, and Johannes F¨ urnkranz. Maximizing subset accuracy with recurrent neural networks in multi-label classiﬁcation. In Advances in Neural Information Processing Systems 30, pages 5413–5423, 2017

work page 2017

[29] [29]

Qui˜ nonero-Candela, A

J. Qui˜ nonero-Candela, A. Girard, and C. Rasmussen. Prediction at an uncertain input for Gaussian Processes and Relevance Vector Machines ap- plication to multiple-step ahead time-series forecasting. Technical Report, no. 1, pages 1–14, 2003

work page 2003

[30] [30]

Multi-dimensional clas- siﬁcation with super-classes

Jesse Read, Concha Bielza, and Pedro Larra˜ naga. Multi-dimensional clas- siﬁcation with super-classes. Transactions on Knowledge and Data Engi- neering, 26(7):1720–1733, 2014

work page 2014

[31] [31]

Multi-label Classification using Labels as Hidden Nodes

Jesse Read and Jaakko Hollm´ en. Multi-label classiﬁcation using labels as hidden nodes. Technical Report 1503.09022v3, ArXiv.org, 2017. ArXiv

work page internal anchor Pith review Pith/arXiv arXiv 2017

[32] [32]

Multi-label methods for prediction with sequential data

Jesse Read, Luca Martino, and Jaakko Hollm´ en. Multi-label methods for prediction with sequential data. Pattern Recognition, 63(Supplement C):45 – 55, 2017

work page 2017

[33] [33]

Multi-label methods for prediction with sequential data

Jesse Read, Luca Martino, and Jaakko Hollm´ en. Multi-label methods for prediction with sequential data. Pattern Recognition, 63(March):45–55, 2017

work page 2017

[34] [34]

Eﬃcient Monte Carlo meth- ods for multi-dimensional learning with classiﬁer chains

Jesse Read, Luca Martino, and David Luengo. Eﬃcient Monte Carlo meth- ods for multi-dimensional learning with classiﬁer chains. Pattern Recogni- tion, 47(3):1535–1546, 2014

work page 2014

[35] [35]

Olmos, and David Luengo

Jesse Read, Luca Martino, Pablo M. Olmos, and David Luengo. Scalable multi-output label prediction: From classiﬁer chains to classiﬁer trellises. Pattern Recognition, 48(6):2096 – 2109, 2015

work page 2096

[36] [36]

Classi- ﬁer chains for multi-label classiﬁcation

Jesse Read, Bernhard Pfahringer, Geoﬀ Holmes, and Eibe Frank. Classi- ﬁer chains for multi-label classiﬁcation. Machine Learning, 85(3):333–359, 2011. 25

work page 2011

[37] [37]

Snelson, Z

E. Snelson, Z. Ghahramani, and C. Rasmussen. Warped Gaussian Pro- cesses. In Advances in Neural Information Processing Systems 16 , pages 1–8. 2003

work page 2003

[38] [38]

Multi-target regression via input space expansion: treating targets as inputs

Eleftherios Spyromitros-Xiouﬁs, Grigorios Tsoumakas, William Groves, and Ioannis Vlahavas. Multi-target regression via input space expansion: treating targets as inputs. Machine Learning, pages 1–44, 2016

work page 2016

[39] [39]

Ccnet: Joint multi-label classiﬁcation and feature selec- tion using classiﬁer chains and elastic net regularization

Pawe Teisseyre. Ccnet: Joint multi-label classiﬁcation and feature selec- tion using classiﬁer chains and elastic net regularization. Neurocomputing, 235:98 – 111, 2017

work page 2017

[40] [40]

Random k- labelsets for multi-label classiﬁcation

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Random k- labelsets for multi-label classiﬁcation. IEEE Transactions on Knowledge and Data Engineering , 23(7):1079–1089, 2011

work page 2011

[41] [41]

Multi- target prediction: A unifying view on problems and methods

Willem Waegeman, Krzysztof Dembczynski, and Eyke Huellermeier. Multi- target prediction: A unifying view on problems and methods. page ArXiV, 09 2018

work page 2018

[42] [42]

Monte carlo tree search in continuous action spaces with execution uncertainty

Timothy Yee, Viliam Lisy, and Michael Bowling. Monte carlo tree search in continuous action spaces with execution uncertainty. In Proceedings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intelligence , IJCAI’16, pages 690–696. AAAI Press, 2016. A Parallel Metropolis-Hastings (MH) chains For completeness, we elaborate the algorithm based...

work page 2016

[43] [43]

Draw z ∼q(y|y(m) j,k−1)

work page

[44] [44]

Setting for simplicity pj(z) = pj(z|x,˜y(m) 1:j−1), accept the movement y(m) j,k =z with probability α = min [ 1, pj(z)q(y(m) j,k−1|z) pj(y(m) j,k−1)q(z|y(m) j,k−1) ] (26) Otherwise, with probability 1 −α, set y(m) j,k =y(m) j,k−1 Therefore ﬁnal set of samples for an iteration is{˜y(1) j ,..., ˜y(M ) j } = {˜y(1) j,K,..., ˜y(m) j,K }. 26

work page