Uncertainty Estimates for Ordinal Embeddings

Michael Lohaus; Philipp Hennig; Ulrike von Luxburg

arxiv: 1906.11655 · v1 · pith:OTIHFVGZnew · submitted 2019-06-27 · 💻 cs.LG · stat.ML

Uncertainty Estimates for Ordinal Embeddings

Michael Lohaus , Philipp Hennig , Ulrike von Luxburg This is my paper

Pith reviewed 2026-05-25 14:43 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords ordinal embeddingstriplet comparisonsuncertainty estimationbootstrapBayesian methodsnoisy dataembedding algorithms

0 comments

The pith

Bootstrap and Bayesian procedures supply well-calibrated uncertainty estimates for embeddings learned from noisy triplet comparisons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops methods to attach uncertainty estimates to the positions of objects embedded in Euclidean space from ordinal triplet data. It applies a bootstrap resampling procedure and a Bayesian sampling approach to standard embedding algorithms when the number of noisy comparisons is small. Simulations on synthetic data show that the resulting uncertainty intervals achieve the nominal coverage rates. These estimates can then be used to choose embedding dimension or other hyperparameters and to report variability in downstream scientific uses of the embedding.

Core claim

When objects are placed in Euclidean space so that as many noisy triplet comparisons as possible are satisfied, bootstrap resampling of the triplets and Bayesian posterior sampling over embedding coordinates both produce uncertainty estimates whose calibration can be verified on synthetic data with known ground-truth positions.

What carries the argument

Bootstrap resampling and Bayesian posterior sampling applied to the output of standard ordinal embedding algorithms on triplet data.

If this is right

Embedding dimension or regularization strength can be chosen by minimizing a measure of estimated uncertainty rather than cross-validation alone.
Downstream scientific conclusions that rely on distances or clusters in the embedding can be accompanied by explicit uncertainty statements.
The same resampling and sampling machinery applies to any embedding algorithm whose loss is a sum over triplet violations.
When new triplets arrive, the uncertainty estimates can be updated without recomputing the entire embedding from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same calibration checks could be performed on real data by holding out a subset of triplets and testing whether the held-out comparisons are satisfied inside the uncertainty regions.
If the noise process that generates the triplets deviates strongly from the model implicit in the bootstrap or prior, the reported intervals will lose calibration.
The approach could be combined with active learning to request the triplets that most reduce the estimated uncertainty volume.

Load-bearing premise

The bootstrap and Bayesian procedures correctly capture the variability induced by noisy triplet data for the standard embedding algorithms used.

What would settle it

On synthetic triplet data generated from known object positions plus controlled noise, the fraction of true positions falling inside the reported uncertainty intervals deviates systematically from the claimed coverage probability.

Figures

Figures reproduced from arXiv: 1906.11655 by Michael Lohaus, Philipp Hennig, Ulrike von Luxburg.

**Figure 2.** Figure 2: a) We embed noisy triplets that were created from a projection of MNIST on [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Triplet prediction with STE Bootstrap performed on [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: (a) We compare the mean embeddings resulting [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: We compare random selection of triplets with our active approaches on the satellite data set using the STE [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: (a) The upper figure illustrates a Gaussian Pro [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: a) We embed noisy triplets that were created from a projection of MNIST on [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: This two-dimensional mixture of three Gaus [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 13.** Figure 13: Recall, that we use n = 200 points and the noise level is σ = 0.1. We start with a random seed of 2, 000 triplets and repeatedly add 1, 000 triplets to the training set by determining the 1, 000 most uncertain triplet comparisons. In the case of the information gain criterion, we add those 1, 000 triplets that have the highest information gain. We measure the embeddings after each step by performing the t… view at source ↗

**Figure 9.** Figure 9: We selected 50 points from the mixture of Gaussians. The top row shows the mean and standard deviation of the embedding error measured by the procrustes distance by the standard embedding algorithms. The second row shows the mean and standard deviation of the average uncertainty. a) We selected one percent of all triplets. Increasing the noise induces a higher embedding error. Simultaneously, the uncertain… view at source ↗

**Figure 10.** Figure 10: Triplet prediction with Bayesian STE performed on [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Breast Cancer. The original dimension is [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: MNIST. The original dimension is 784, the embedding dimension is d = 5. i) Triplet Prediction 0 2000 4000 6000 8000 0.0 0.2 0.4 0.6 triplet prediction error STE STE - random STE - IG STE Bootstrap Bayesian STE 0 2000 4000 6000 8000 0.0 0.1 0.2 0.3 tSTE tSTE - random tSTE - IG tSTE Bootstrap Bayesian tSTE ii) Classification 0 2000 4000 6000 8000 0.0 0.2 0.4 0.6 0.8 classification error 0 2000 4000 6000 800… view at source ↗

**Figure 13.** Figure 13: Landsat Satellite. The original dimension is [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

read the original abstract

To investigate objects without a describable notion of distance, one can gather ordinal information by asking triplet comparisons of the form "Is object $x$ closer to $y$ or is $x$ closer to $z$?" In order to learn from such data, the objects are typically embedded in a Euclidean space while satisfying as many triplet comparisons as possible. In this paper, we introduce empirical uncertainty estimates for standard embedding algorithms when few noisy triplets are available, using a bootstrap and a Bayesian approach. In particular, simulations show that these estimates are well calibrated and can serve to select embedding parameters or to quantify uncertainty in scientific applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bootstrap and Bayesian uncertainty estimates for ordinal embeddings look reasonably calibrated in the simulations shown.

read the letter

The core contribution is a practical way to attach uncertainty to embeddings learned from noisy triplet comparisons, using bootstrap resampling and a Bayesian formulation. Simulations under known ground-truth embeddings show that the resulting intervals achieve reasonable coverage, which directly addresses the need to report reliability when these embeddings are used for parameter selection or scientific inference. That simulation design is a strength: it tests whether the uncertainty procedures capture the variability induced by the noisy comparisons rather than just fitting the observed triplets. The work is narrow but fills a clear gap for anyone already using standard ordinal embedding algorithms on comparative data. The main limitation is the exclusive reliance on synthetic data; no real-data examples appear to demonstrate that the calibration carries over when the generative assumptions do not hold exactly. The methods themselves are standard tools applied to this setting, so the novelty is in the combination and the calibration check rather than a new algorithm. This is the kind of incremental but usable result that belongs in a specialized venue. It is worth sending to referees because the validation is honest and the practical need is real, even if revisions will likely be needed to add real-data checks and clarify implementation details.

Referee Report

2 major / 2 minor

Summary. The paper introduces bootstrap and Bayesian empirical uncertainty estimates for standard ordinal embedding algorithms learned from noisy triplet comparisons. Simulations are used to show that the resulting uncertainty intervals are well calibrated, and the estimates are positioned as tools for selecting embedding parameters or quantifying uncertainty in scientific applications.

Significance. If the simulation-based calibration holds under the paper's data-generating process, the work would supply a practical method for assessing reliability in low-data ordinal embedding settings, which appear in applications such as perceptual modeling and scientific data analysis where direct distances are unavailable.

major comments (2)

[Simulation section (likely §4)] The central calibration claim rests on the simulation protocol; the data-generating process, noise model, and exact embedding algorithms used in the experiments must be specified with sufficient detail (including any hyper-parameters) to allow independent reproduction and sensitivity checks.
[Abstract and experimental evaluation] No real-data experiments are described; if the methods are intended for scientific applications, at least one case study with held-out validation or domain-specific ground truth would strengthen the claim that the estimates capture variability induced by noisy triplets.

minor comments (2)

[Methods] Notation for the embedding dimension, number of triplets, and noise level should be introduced consistently in the methods section before being used in the simulation results.
[Abstract] The abstract states that the estimates 'can serve to select embedding parameters'; the precise selection criterion (e.g., a coverage-based objective) should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recommendation and the detailed comments, which help improve the clarity and reproducibility of the work. We respond to each major comment below.

read point-by-point responses

Referee: [Simulation section (likely §4)] The central calibration claim rests on the simulation protocol; the data-generating process, noise model, and exact embedding algorithms used in the experiments must be specified with sufficient detail (including any hyper-parameters) to allow independent reproduction and sensitivity checks.

Authors: We agree that complete specification is essential for reproducibility. The manuscript provides the core parameters of the simulation protocol, but we will expand Section 4 in the revised version to include exhaustive details on the data-generating process, the precise noise model, the embedding algorithms (including any specific solvers or libraries), and all hyper-parameters. This will enable independent reproduction and facilitate sensitivity analyses as suggested. revision: yes
Referee: [Abstract and experimental evaluation] No real-data experiments are described; if the methods are intended for scientific applications, at least one case study with held-out validation or domain-specific ground truth would strengthen the claim that the estimates capture variability induced by noisy triplets.

Authors: The manuscript's core contribution is the introduction of bootstrap and Bayesian uncertainty estimates together with their calibration properties established via controlled simulations. We position the methods as potentially useful for scientific applications but do not present real-data validation, as obtaining suitable ground truth for ordinal embeddings is challenging and outside the scope of this work. We will revise the abstract, introduction, and discussion to more explicitly state that validation is simulation-based and that real-data case studies with held-out validation constitute valuable future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces bootstrap and Bayesian uncertainty estimates for ordinal embeddings from triplet comparisons and validates calibration via simulations on synthetic data generated from known ground-truth embeddings. No load-bearing derivation, equation, or prediction reduces to a fitted input or self-citation by construction; the simulation-based check is an external benchmark that directly tests coverage of variability induced by noisy triplets. The approach is self-contained against external benchmarks with no self-definitional, fitted-input, or uniqueness-imported steps evident from the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no equations, parameters, or modeling assumptions; ledger is therefore empty.

pith-pipeline@v0.9.0 · 5626 in / 803 out tokens · 22118 ms · 2026-05-25T14:43:57.405785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

[1]

Agarwal, J

S. Agarwal, J. Wills, L. Cayton, G. Lanckriet, D. Kriegman, and S. Belongie. Generalized non-metric multidimensional scaling. In Artificial Intelligence and Statistics, 2007

work page 2007
[2]

Amid and A

E. Amid and A. Ukkonen. Multiview triplet embedding: Learning attributes in multiple maps. In International Conference on Machine Learning, 2015

work page 2015
[3]

Revealing the Basis: Ordinal Embedding Through Geometry

J. Anderton , V. Pavlu , and J. Aslam . Revealing the Basis: Ordinal Embedding Through Geometry . arXiv e-prints, art. arXiv:1805.07589, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Arias-Castro

E. Arias-Castro. Some theory for ordinal embedding. Bernoulli, 2017

work page 2017
[5]

Bartels and P

S. Bartels and P. Hennig. Probabilistic approximate least-squares. In Artificial Intelligence and Statistics, 2016

work page 2016
[6]

T. H.A. Bijmolt and M. Wedel. The effects of alternative methods of collecting similarity data for multidimensional scaling. International Journal of Research in Marketing, 1995

work page 1995
[7]

Borg and P.J.F

I. Borg and P.J.F. Groenen. Modern Multidimensional Scaling: Theory and Applications . Springer, 2005

work page 2005
[8]

Dattorro

J. Dattorro. Convex Optimization & Euclidean Distance Geometry. 2005

work page 2005
[9]

Dheeru and E

D. Dheeru and E. Karra Taniskidou. UCI machine learning repository, 2017

work page 2017
[10]

G. A. Gescheider. Psychophysics: The Fundamentals. Lawrence Erlbaum Associates, 1997

work page 1997
[11]

Heikinheimo and A

H. Heikinheimo and A. Ukkonen. The crowd-median algorithm. In HCOMP, 2013

work page 2013
[12]

L. Jain, K. G. Jamieson, and R. D. Nowak. Finite sample prediction and recovery bounds for ordinal embedding. In Advances in Neural Information Processing Systems. 2016

work page 2016
[13]

K. G. Jamieson and R. D. Nowak. Low-dimensional embedding using adaptively selected ordinal data. In 49th Annual Allerton Conference on Communication, Control, and Computing, 2011

work page 2011
[14]

K. G. Jamieson, L. Jain, C. Fernandez, N. J. Glattard, and R. D. Nowak. Next: A system for real-world development, evaluation, and application of active learning. In Advances in Neural Information Processing Systems. 2015

work page 2015
[15]

Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences

M. Kanagawa , P. Hennig , D. Sejdinovic , and B.K. Sriperumbudur . Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences . arXiv e-prints, art. arXiv:1807.02582, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

Karaletsos , S

T. Karaletsos , S. Belongie , and G. R \"a tsch . Bayesian representation learning with oracle constraints. In International Conference on Learning Representations, 2016

work page 2016
[17]

Kleindessner and U

M. Kleindessner and U. von Luxburg. Uniqueness of ordinal embedding. In Conference on Learning Theory, 2014

work page 2014
[18]

J. B. Kruskal. Nonmetric multidimensional scaling: A numerical method. Psychometrika, 1964

work page 1964
[19]

LeCun, L

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998

work page 1998
[20]

Murray, R

I. Murray, R. P. Adams, and D. J. C. MacKay. Elliptical slice sampling. In Artificial Intelligence and Statistics, 2010

work page 2010
[21]

C. E. Rasmussen and C. K. I. Williams. G aussian P rocesses for M achine L earning . MIT Press, 2006

work page 2006
[22]

Schultz and T

M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems. 2004

work page 2004
[23]

R. N. Shepard. The analysis of proximities: Multidimensional scaling with an unknown distance function. Psychometrika, 1962

work page 1962
[24]

Tamuz, C

O. Tamuz, C. Liu, S. Belongie, O. Shamir, and A. T. Kalai. Adaptively learning the crowd kernel. In International Conference on Machine Learning, 2011

work page 2011
[25]

Terada and U

Y. Terada and U. von Luxburg . Local ordinal embedding. In International Conference on Machine Learning, 2014

work page 2014
[26]

Ukkonen, B

A. Ukkonen, B. Derakhshan, and H. Heikinheimo. Crowdsourced nonparametric density estimation using relative distances. In HCOMP, 2015

work page 2015
[27]

van der Maaten and K

L. van der Maaten and K. Q. Weinberger. Stochastic triplet embedding. In IEEE International Workshop on Machine Learning for Signal Processing , 2012

work page 2012
[28]

W. H. Wolberg and O. L. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 1990

work page 1990

[1] [1]

Agarwal, J

S. Agarwal, J. Wills, L. Cayton, G. Lanckriet, D. Kriegman, and S. Belongie. Generalized non-metric multidimensional scaling. In Artificial Intelligence and Statistics, 2007

work page 2007

[2] [2]

Amid and A

E. Amid and A. Ukkonen. Multiview triplet embedding: Learning attributes in multiple maps. In International Conference on Machine Learning, 2015

work page 2015

[3] [3]

Revealing the Basis: Ordinal Embedding Through Geometry

J. Anderton , V. Pavlu , and J. Aslam . Revealing the Basis: Ordinal Embedding Through Geometry . arXiv e-prints, art. arXiv:1805.07589, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Arias-Castro

E. Arias-Castro. Some theory for ordinal embedding. Bernoulli, 2017

work page 2017

[5] [5]

Bartels and P

S. Bartels and P. Hennig. Probabilistic approximate least-squares. In Artificial Intelligence and Statistics, 2016

work page 2016

[6] [6]

T. H.A. Bijmolt and M. Wedel. The effects of alternative methods of collecting similarity data for multidimensional scaling. International Journal of Research in Marketing, 1995

work page 1995

[7] [7]

Borg and P.J.F

I. Borg and P.J.F. Groenen. Modern Multidimensional Scaling: Theory and Applications . Springer, 2005

work page 2005

[8] [8]

Dattorro

J. Dattorro. Convex Optimization & Euclidean Distance Geometry. 2005

work page 2005

[9] [9]

Dheeru and E

D. Dheeru and E. Karra Taniskidou. UCI machine learning repository, 2017

work page 2017

[10] [10]

G. A. Gescheider. Psychophysics: The Fundamentals. Lawrence Erlbaum Associates, 1997

work page 1997

[11] [11]

Heikinheimo and A

H. Heikinheimo and A. Ukkonen. The crowd-median algorithm. In HCOMP, 2013

work page 2013

[12] [12]

L. Jain, K. G. Jamieson, and R. D. Nowak. Finite sample prediction and recovery bounds for ordinal embedding. In Advances in Neural Information Processing Systems. 2016

work page 2016

[13] [13]

K. G. Jamieson and R. D. Nowak. Low-dimensional embedding using adaptively selected ordinal data. In 49th Annual Allerton Conference on Communication, Control, and Computing, 2011

work page 2011

[14] [14]

K. G. Jamieson, L. Jain, C. Fernandez, N. J. Glattard, and R. D. Nowak. Next: A system for real-world development, evaluation, and application of active learning. In Advances in Neural Information Processing Systems. 2015

work page 2015

[15] [15]

Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences

M. Kanagawa , P. Hennig , D. Sejdinovic , and B.K. Sriperumbudur . Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences . arXiv e-prints, art. arXiv:1807.02582, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[16] [16]

Karaletsos , S

T. Karaletsos , S. Belongie , and G. R \"a tsch . Bayesian representation learning with oracle constraints. In International Conference on Learning Representations, 2016

work page 2016

[17] [17]

Kleindessner and U

M. Kleindessner and U. von Luxburg. Uniqueness of ordinal embedding. In Conference on Learning Theory, 2014

work page 2014

[18] [18]

J. B. Kruskal. Nonmetric multidimensional scaling: A numerical method. Psychometrika, 1964

work page 1964

[19] [19]

LeCun, L

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998

work page 1998

[20] [20]

Murray, R

I. Murray, R. P. Adams, and D. J. C. MacKay. Elliptical slice sampling. In Artificial Intelligence and Statistics, 2010

work page 2010

[21] [21]

C. E. Rasmussen and C. K. I. Williams. G aussian P rocesses for M achine L earning . MIT Press, 2006

work page 2006

[22] [22]

Schultz and T

M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems. 2004

work page 2004

[23] [23]

R. N. Shepard. The analysis of proximities: Multidimensional scaling with an unknown distance function. Psychometrika, 1962

work page 1962

[24] [24]

Tamuz, C

O. Tamuz, C. Liu, S. Belongie, O. Shamir, and A. T. Kalai. Adaptively learning the crowd kernel. In International Conference on Machine Learning, 2011

work page 2011

[25] [25]

Terada and U

Y. Terada and U. von Luxburg . Local ordinal embedding. In International Conference on Machine Learning, 2014

work page 2014

[26] [26]

Ukkonen, B

A. Ukkonen, B. Derakhshan, and H. Heikinheimo. Crowdsourced nonparametric density estimation using relative distances. In HCOMP, 2015

work page 2015

[27] [27]

van der Maaten and K

L. van der Maaten and K. Q. Weinberger. Stochastic triplet embedding. In IEEE International Workshop on Machine Learning for Signal Processing , 2012

work page 2012

[28] [28]

W. H. Wolberg and O. L. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 1990

work page 1990