Local-Order Auxiliary Losses Can Improve Autoencoder Reconstruction

Ganesh Gopalakrishnan; Harvey Dam; Martin Burtscher; Tripti Agarwal

arxiv: 2504.04202 · v4 · submitted 2025-04-05 · 💻 cs.LG

Local-Order Auxiliary Losses Can Improve Autoencoder Reconstruction

Harvey Dam , Martin Burtscher , Tripti Agarwal , Ganesh Gopalakrishnan This is my paper

Pith reviewed 2026-05-22 20:30 UTC · model grok-4.3

classification 💻 cs.LG

keywords autoencodersauxiliary lossesfinite differencesreconstruction errorlocal ordermean squared errorstructural supervisiontensor reconstruction

0 comments

The pith

Moderate mixtures of mean-squared error and finite-difference sign error reduce validation reconstruction error by 2.3 to 7 times over pure MSE.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether adding a local-order auxiliary loss to the standard mean-squared error objective can improve pointwise accuracy in autoencoder reconstructions instead of trading off against it. It introduces finite-difference sign error, which penalizes mismatches in the signs of differences between neighboring points. Experiments across four tensor tasks show that blending this loss with MSE at moderate ratios produces lower validation MSE than MSE alone. The benefit is largest when the data exhibits coherent spatial structure that makes local order informative. Pure use of the auxiliary loss performs worse than the mixtures.

Core claim

Finite-difference sign error is a differentiable auxiliary objective that penalizes disagreements between the signs of neighboring finite differences in the target and reconstruction. When combined with mean-squared error at suitable mixing coefficients, this objective produces autoencoder models whose validation mean-squared error is 2.3 to 7 times lower than models trained on mean-squared error alone. Comparisons with other auxiliary objectives place finite-difference sign error among the strongest structural losses tested, though the gains appear mainly for coherent spatial fields where local order carries signal information.

What carries the argument

Finite-difference sign error (FDSE), an auxiliary loss that compares the signs of finite differences between adjacent elements in the target and reconstruction tensors.

If this is right

Moderate FDSE-MSE mixtures outperform pure MSE on validation error for the tested spatial tensor tasks.
FDSE ranks among the strongest structural auxiliary objectives in direct comparisons.
Gains are largest when the underlying data consists of coherent spatial fields.
Pure FDSE training yields worse results than the mixtures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local-order signal could be tested on other reconstruction architectures or on data without obvious spatial coherence.
Varying the order of the finite differences or applying the loss at multiple scales might further change the observed error reductions.
The additional gradient information may be steering optimization toward latent representations that better match the data distribution.

Load-bearing premise

The observed reductions in validation mean-squared error are caused by the local-order signal from the auxiliary loss rather than by task choice, the particular smooth sign surrogate, or optimization dynamics that favor the tested coefficients.

What would settle it

Re-running the coefficient sweeps on the same tasks but with a random auxiliary loss of matching form and magnitude, then finding no systematic validation MSE improvement, would indicate the local-order signal is not responsible.

Figures

Figures reproduced from arXiv: 2504.04202 by Ganesh Gopalakrishnan, Harvey Dam, Martin Burtscher, Tripti Agarwal.

**Figure 1.** Figure 1: tanh(sx) using different values of s and other sign-like functions. One way to capture the differences in critical points between two arrays of numbers (“tensors”) is to take their finite differences in all directions. If tensor X ∈ R n were one-dimensional, this would amount to computing X2:n − X1:n−1, and doing the same to the other tensor, say Y . The signs of these finite differences can be expressed … view at source ↗

**Figure 2.** Figure 2: Time and peak memory usage from 1-dimensional and 2-dimensional array inputs using [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Example originals and reconstructions of the generated sinusoidal wave dataset. The [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Original normalized stock prices vs reconstructions after training a simple autoencoder [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Topological dissimilarity metrics after training an MLP on NVDA stock prices. Squares [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Topological metrics on a convolutional VAE trained on COCO 2017. Note the split axis on [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Topological measures on the 3D autoencoder trained on shallow water simulation states. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Time and peak memory usage from 3-dimensional array inputs using one AMD Ryzen [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Visual fidelity metrics on a convolutional VAE trained on COCO 2017. Higher is better. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: An example image and its reconstructions from the validation dataset Eastman Kodak [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Original shallow water simulation snapshots and reconstructions. Column labels show [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

read the original abstract

Mean-squared error is the default objective for training autoencoders, yet compressed reconstructions often depend not only on pointwise accuracy but also on preserving local spatial order. We study whether structural auxiliary losses can improve, rather than trade off against, MSE in finite-capacity autoencoders. We introduce finite-difference sign error (FDSE), a local-order auxiliary objective that penalizes disagreements between the signs of neighboring finite differences in the target and reconstruction. FDSE is simple, architecture-agnostic, and differentiable through smooth sign surrogates. Across four tensor reconstruction tasks, we find that moderate mixtures of MSE and FDSE can substantially reduce validation MSE relative to pure MSE training. In coefficient sweeps, FDSE mixtures reduce validation MSE by 2.3$\times$--7.0$\times$ over pure MSE on these tasks, while comparisons with other auxiliary objectives show FDSE to be among the strongest structural objectives tested. The effect is not universal: pure FDSE performs poorly, and gains are largest for coherent spatial fields where local order carries information about the underlying signal. These results suggest that, in compressed-latent reconstruction, appropriately weighted local-structure supervision can guide optimization toward solutions with better pointwise accuracy, rather than merely improving perceptual or structural metrics at MSE's expense.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FDSE mixtures cut validation MSE by 2-7x on some tasks but the paper does not isolate the local-order mechanism from the sign surrogate or tuning effects.

read the letter

The core observation is that on four tensor reconstruction tasks, moderate mixing of MSE with this finite-difference sign error term produces substantially lower validation MSE than pure MSE, with the largest gains on coherent spatial fields. The paper introduces FDSE as a simple, architecture-agnostic auxiliary that penalizes sign disagreements between neighboring finite differences and shows it outperforms several other structural auxiliaries in their sweeps. That specific formulation and the empirical demonstration that the mixture can improve rather than trade off against pointwise error is the concrete addition here. They also note the effect is not universal and that pure FDSE performs poorly, which keeps the claim proportionate. The work is straightforward to understand and the comparisons to other auxiliaries are useful for context. The main limitations are the lack of error bars, statistical tests, or baseline implementation details, plus the fact that the mixing coefficient is chosen on the validation set. More critically, there are no controls that would separate the intended local-order signal from the gradient properties of the smooth sign surrogate or from optimization dynamics that happen to favor the tested weights. Without those, the attribution to local structure remains plausible but unconfirmed. This is the kind of incremental technique paper that could be worth trying in a lab working on autoencoders for spatial data. A reader already experimenting with auxiliary losses on structured tensors would get the most out of it. The empirical claim is specific enough and the method simple enough that it deserves a serious referee who can ask for the missing controls and stats.

Referee Report

3 major / 2 minor

Summary. The paper claims that a finite-difference sign error (FDSE) auxiliary loss, when moderately mixed with MSE, can substantially improve validation MSE (by factors of 2.3×–7.0×) over pure MSE training in finite-capacity autoencoders for four tensor reconstruction tasks. FDSE penalizes sign disagreements in neighboring finite differences between target and reconstruction, using smooth sign surrogates for differentiability. The effect is architecture-agnostic, strongest on coherent spatial fields, not universal (pure FDSE performs poorly), and FDSE outperforms other tested structural auxiliaries.

Significance. If the central empirical result holds and the mechanism is isolated, the finding would be significant for autoencoder training on spatial data: it indicates that appropriately weighted local-structure supervision can improve pointwise accuracy rather than trading off against it. The work provides concrete coefficient-sweep evidence across multiple tasks and comparisons to other auxiliaries, which is a strength for an empirical study.

major comments (3)

[Experiments / Abstract] Experiments (coefficient sweeps and task results): the reported 2.3×–7.0× validation MSE reductions lack error bars, multiple random seeds, or statistical tests, and the abstract notes that gains are not universal. This weakens confidence that the improvements are reliably attributable to FDSE rather than task-specific optimization dynamics.
[Method / Experiments] Method and Experiments: no control auxiliaries (e.g., sign surrogate applied to shuffled/non-local differences, or a matched-magnitude non-structural regularizer) are described to isolate whether MSE gains arise specifically from the local-order sign-disagreement term versus the smooth sign surrogate's gradient properties or general regularization effects. This is load-bearing for the claim that local-order supervision guides optimization toward better pointwise solutions.
[Experiments] Experiments: the mixing coefficient is selected via validation sweeps and performance is also measured on held-out validation sets; while not circular by construction, the absence of a separate test set or cross-validation protocol for the final reported numbers limits the strength of the generalization claim.

minor comments (2)

[Method] Notation: clarify whether the smooth sign surrogate is fixed across all experiments or tuned, and provide its explicit functional form and derivative in the main text or appendix.
[Experiments] Table/figure presentation: ensure all coefficient-sweep plots include the pure-MSE baseline as a horizontal reference line for direct visual comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help us improve the clarity and rigor of our empirical claims. We address each major comment below, proposing specific revisions to the manuscript.

read point-by-point responses

Referee: [Experiments / Abstract] Experiments (coefficient sweeps and task results): the reported 2.3×–7.0× validation MSE reductions lack error bars, multiple random seeds, or statistical tests, and the abstract notes that gains are not universal. This weakens confidence that the improvements are reliably attributable to FDSE rather than task-specific optimization dynamics.

Authors: We agree with this assessment and will strengthen the experimental reporting. In the revised version, we will rerun the coefficient sweeps and main experiments using at least 5 random seeds per configuration, reporting mean validation MSE along with standard error bars. We will also include pairwise statistical comparisons (e.g., t-tests) between FDSE mixtures and pure MSE where the differences are large. The abstract already qualifies that gains are not universal, which we will retain. revision: yes
Referee: [Method / Experiments] Method and Experiments: no control auxiliaries (e.g., sign surrogate applied to shuffled/non-local differences, or a matched-magnitude non-structural regularizer) are described to isolate whether MSE gains arise specifically from the local-order sign-disagreement term versus the smooth sign surrogate's gradient properties or general regularization effects. This is load-bearing for the claim that local-order supervision guides optimization toward better pointwise solutions.

Authors: This is a valid concern for isolating the mechanism. We will add two control experiments: (1) applying the smooth sign surrogate to shuffled (non-local) finite differences, and (2) a non-structural regularizer with matched magnitude but no local-order penalty. These controls will be presented alongside the main results to show that the local-order term is responsible for the observed MSE improvements. revision: yes
Referee: [Experiments] Experiments: the mixing coefficient is selected via validation sweeps and performance is also measured on held-out validation sets; while not circular by construction, the absence of a separate test set or cross-validation protocol for the final reported numbers limits the strength of the generalization claim.

Authors: We recognize that a dedicated test set would bolster generalization claims. For the revision, we will split the data into train/validation/test sets for each task, using the test set exclusively for final reported metrics after selecting coefficients on validation. Alternatively, we can report results averaged over multiple train/validation splits if a fixed test set is not feasible for all tasks. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical hyperparameter study

full rationale

The paper is a purely empirical study introducing FDSE as an auxiliary loss and reporting validation MSE improvements from coefficient sweeps mixing it with MSE. No derivation chain, equations, or self-citations are present that reduce any claimed result to its inputs by construction. The mixing coefficient is a standard hyperparameter selected on validation data, with reported MSE values measured on held-out validation sets; this does not force the outcome by definition, as the comparison baseline (pure MSE) is included in the same sweep and the gains are observed experimental results rather than tautological. The study is self-contained against external benchmarks via direct performance measurements.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work is empirical; the only explicit free parameter is the mixing weight between MSE and FDSE. The domain assumption that finite-difference signs encode useful local order is invoked to motivate the loss but is not derived.

free parameters (1)

mixing coefficient
Weight applied to FDSE term; chosen via sweeps to obtain the reported 2.3x-7x MSE reductions.

axioms (1)

domain assumption Sign of finite differences between neighbors captures local order that is informative about the underlying signal.
Used to justify why penalizing sign disagreements should improve reconstruction quality.

pith-pipeline@v0.9.0 · 5758 in / 1210 out tokens · 62584 ms · 2026-05-22T20:30:02.133160+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

Peter Lindstrom

doi: 10.1109/IPDPS.2016.11. Peter Lindstrom. Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics, 20(12):2674–2683, 2014. doi: 10.1109/TVCG.2014.2346458. Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Seung-Hoe Ku, C. S. Chang, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F. Samatova. Isabela for ...

work page doi:10.1109/ipdps.2016.11 2016
[2]

Michael Moor, Max Horn, Bastian Rieck, and Karsten Borgwardt

doi: 10.1109/TVCG.2023.3326920. Michael Moor, Max Horn, Bastian Rieck, and Karsten Borgwardt. Topological autoencoders, 2021. URL https://arxiv.org/abs/1906.00722. Ilya Trofimov, Daniil Cherniavskii, Eduard Tulchinskii, Nikita Balabin, Evgeny Burnaev, and Serguei Barannikov. Learning topology-preserving data representations, 2023. URL https://arxiv. org/a...

work page doi:10.1109/tvcg.2023.3326920 2023
[3]

Flatten the tensor into a list of scalar values and their corresponding grid positions

work page
[4]

Sort the values in ascending order: f1 ≤ f2 ≤ · · · ≤ fmwhere m is the total number of grid points

work page
[5]

Initialize each grid point as its own component

Use a union-find data structure to keep track of connected components. Initialize each grid point as its own component

work page
[6]

The definition of neighbors for each grid point is its orthogonally and diagonally adjacent elements

For each point xi in order of increasing f(xi): (a) If xi is a local minimum (i.e., all its neighbors in the grid have higher values), a new connected component is born. The definition of neighbors for each grid point is its orthogonally and diagonally adjacent elements. For a d-dimensional grid, each point has 3d − 1 neighbors. • In 1D, each element has ...

work page
[7]

Flatten and sort the values: [1, 1, 2, 2, 3, 3, 4, 5, 6], keeping track of their locations

work page
[8]

(a) At t = 1: Two components are born (local minima at positions (1, 1) and (2, 2))

Process the points. (a) At t = 1: Two components are born (local minima at positions (1, 1) and (2, 2)). (b) At t = 2: The components at (1, 3) and (3, 1) merge with existing components. (c) At t = 3: The components at (1, 2) and (3, 3) merge with existing components. (d) At t = 4: The component at (2, 1) merges with the component at (1, 1). (e) At t = 5:...

work page
[9]

Let D′ 1 and D′ 2 be the augmented diagrams

Add projections of all points in D1 and D2 onto the diagonal b = d to ensure the two diagrams have the same number of points. Let D′ 1 and D′ 2 be the augmented diagrams

work page
[10]

Compute the pairwise Euclidean distances between all points in D′ 1 and D′

work page
[11]

This results in a cost matrix C, where Cij = ∥xi − yj∥ for xi ∈ D′ 1 and yj ∈ D′ 2

work page
[12]

Find the optimal matching γ that minimizes the total costP (x,y)∈γ ∥x − y∥

work page
[13]

Sum the costs of the optimal matching and take the p-th root. Time and memory complexity If n is the number of pairs, the time complexity of computing the Wasserstein distance is dominated by solving the assignment problem: The cost matrix construction can be parallelized to O(n) with infinite threads. However, the Hungarian algorithm is inherently sequen...

work page arXiv 1999
[14]

Guidelines: • The answer NA means that the abstract and introduction do not include the claims made in the paper

Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: The abstract summarizes the paper. Guidelines: • The answer NA means that the abstract and introduction do not include the claims made in the paper. • The abstract and/or introduction should clearly...

work page
[15]

Limitations

Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: Limitations are discussed in the conclusion. Guidelines: • The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. • The authors are...

work page
[16]

Guidelines: • The answer NA means that the paper does not include theoretical results

Theory assumptions and proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [Yes] 19 Justification: Theoretical claims are supported by experiments. Guidelines: • The answer NA means that the paper does not include theoretical results. • All the theorems, formulas, and...

work page
[17]

Guidelines: • The answer NA means that the paper does not include experiments

Experimental result reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main ex- perimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: The main features are desc...

work page
[18]

Guidelines: • The answer NA means that paper does not include experiments requiring code

Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instruc- tions to faithfully reproduce the main experimental results, as described in supplemental material? 20 Answer: [Yes] Justification: Code is in supplementary material and will later be in a public repository. Guidelines: • The answer NA ...

work page
[19]

Code is provided

Experimental setting/details Question: Does the paper specify all the training and test details (e.g., data splits, hyper- parameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [No] Justification: All details will not fit. Code is provided. Guidelines: • The answer NA means that the paper does not include ...

work page
[20]

Guidelines: • The answer NA means that the paper does not include experiments

Experiment statistical significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: Error bars are shown where appropriate. Guidelines: • The answer NA means that the paper does not include experiments. • The autho...

work page
[21]

Guidelines: • The answer NA means that the paper does not include experiments

Experiments compute resources Question: For each experiment, does the paper provide sufficient information on the com- puter resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: Experiments dependent on hardware include hardware descriptions. Guidelines: • The answer NA means that...

work page
[22]

Guidelines: • The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics

Code of ethics Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines? Answer: [Yes] Justification: The work and anticipated effects conform to the Code of Ethics. Guidelines: • The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics...

work page
[23]

Guidelines: • The answer NA means that there is no societal impact of the work performed

Broader impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: We do not anticipate any notable direct societal impact. Guidelines: • The answer NA means that there is no societal impact of the work performed. • If the authors answer NA or No, they ...

work page
[24]

Guidelines: • The answer NA means that the paper poses no such risks

Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: We see no such threats. Guidelines: • The answer NA means that the paper poses no such risks. •...

work page
[25]

Guidelines: • The answer NA means that the paper does not use existing assets

Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: Original authors of assets are credited. Guidelines: • The answer NA means that the paper does n...

work page
[26]

Guidelines: • The answer NA means that the paper does not release new assets

New assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? 23 Answer:[Yes] Justification: New assets are described in the paper and included in supplementary material. Guidelines: • The answer NA means that the paper does not release new assets. • Researchers should communicate the d...

work page
[27]

Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

Crowdsourcing and research with human subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: The paper does not use crowdsourcing or experiment wi...

work page
[28]

Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page
[29]

Answer: [NA]

Declaration of LLM usage Question: Does the paper describe the usage of LLMs if it is an important, original, or non-standard component of the core methods in this research? Note that if the LLM is used only for writing, editing, or formatting purposes and does not impact the core methodology, scientific rigorousness, or originality of the research, decla...

work page 2025

[1] [1]

Peter Lindstrom

doi: 10.1109/IPDPS.2016.11. Peter Lindstrom. Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics, 20(12):2674–2683, 2014. doi: 10.1109/TVCG.2014.2346458. Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Seung-Hoe Ku, C. S. Chang, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F. Samatova. Isabela for ...

work page doi:10.1109/ipdps.2016.11 2016

[2] [2]

Michael Moor, Max Horn, Bastian Rieck, and Karsten Borgwardt

doi: 10.1109/TVCG.2023.3326920. Michael Moor, Max Horn, Bastian Rieck, and Karsten Borgwardt. Topological autoencoders, 2021. URL https://arxiv.org/abs/1906.00722. Ilya Trofimov, Daniil Cherniavskii, Eduard Tulchinskii, Nikita Balabin, Evgeny Burnaev, and Serguei Barannikov. Learning topology-preserving data representations, 2023. URL https://arxiv. org/a...

work page doi:10.1109/tvcg.2023.3326920 2023

[3] [3]

Flatten the tensor into a list of scalar values and their corresponding grid positions

work page

[4] [4]

Sort the values in ascending order: f1 ≤ f2 ≤ · · · ≤ fmwhere m is the total number of grid points

work page

[5] [5]

Initialize each grid point as its own component

Use a union-find data structure to keep track of connected components. Initialize each grid point as its own component

work page

[6] [6]

The definition of neighbors for each grid point is its orthogonally and diagonally adjacent elements

For each point xi in order of increasing f(xi): (a) If xi is a local minimum (i.e., all its neighbors in the grid have higher values), a new connected component is born. The definition of neighbors for each grid point is its orthogonally and diagonally adjacent elements. For a d-dimensional grid, each point has 3d − 1 neighbors. • In 1D, each element has ...

work page

[7] [7]

Flatten and sort the values: [1, 1, 2, 2, 3, 3, 4, 5, 6], keeping track of their locations

work page

[8] [8]

(a) At t = 1: Two components are born (local minima at positions (1, 1) and (2, 2))

Process the points. (a) At t = 1: Two components are born (local minima at positions (1, 1) and (2, 2)). (b) At t = 2: The components at (1, 3) and (3, 1) merge with existing components. (c) At t = 3: The components at (1, 2) and (3, 3) merge with existing components. (d) At t = 4: The component at (2, 1) merges with the component at (1, 1). (e) At t = 5:...

work page

[9] [9]

Let D′ 1 and D′ 2 be the augmented diagrams

Add projections of all points in D1 and D2 onto the diagonal b = d to ensure the two diagrams have the same number of points. Let D′ 1 and D′ 2 be the augmented diagrams

work page

[10] [10]

Compute the pairwise Euclidean distances between all points in D′ 1 and D′

work page

[11] [11]

This results in a cost matrix C, where Cij = ∥xi − yj∥ for xi ∈ D′ 1 and yj ∈ D′ 2

work page

[12] [12]

Find the optimal matching γ that minimizes the total costP (x,y)∈γ ∥x − y∥

work page

[13] [13]

Sum the costs of the optimal matching and take the p-th root. Time and memory complexity If n is the number of pairs, the time complexity of computing the Wasserstein distance is dominated by solving the assignment problem: The cost matrix construction can be parallelized to O(n) with infinite threads. However, the Hungarian algorithm is inherently sequen...

work page arXiv 1999

[14] [14]

Guidelines: • The answer NA means that the abstract and introduction do not include the claims made in the paper

Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: The abstract summarizes the paper. Guidelines: • The answer NA means that the abstract and introduction do not include the claims made in the paper. • The abstract and/or introduction should clearly...

work page

[15] [15]

Limitations

Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: Limitations are discussed in the conclusion. Guidelines: • The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. • The authors are...

work page

[16] [16]

Guidelines: • The answer NA means that the paper does not include theoretical results

Theory assumptions and proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [Yes] 19 Justification: Theoretical claims are supported by experiments. Guidelines: • The answer NA means that the paper does not include theoretical results. • All the theorems, formulas, and...

work page

[17] [17]

Guidelines: • The answer NA means that the paper does not include experiments

Experimental result reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main ex- perimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: The main features are desc...

work page

[18] [18]

Guidelines: • The answer NA means that paper does not include experiments requiring code

Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instruc- tions to faithfully reproduce the main experimental results, as described in supplemental material? 20 Answer: [Yes] Justification: Code is in supplementary material and will later be in a public repository. Guidelines: • The answer NA ...

work page

[19] [19]

Code is provided

Experimental setting/details Question: Does the paper specify all the training and test details (e.g., data splits, hyper- parameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [No] Justification: All details will not fit. Code is provided. Guidelines: • The answer NA means that the paper does not include ...

work page

[20] [20]

Guidelines: • The answer NA means that the paper does not include experiments

Experiment statistical significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: Error bars are shown where appropriate. Guidelines: • The answer NA means that the paper does not include experiments. • The autho...

work page

[21] [21]

Guidelines: • The answer NA means that the paper does not include experiments

Experiments compute resources Question: For each experiment, does the paper provide sufficient information on the com- puter resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: Experiments dependent on hardware include hardware descriptions. Guidelines: • The answer NA means that...

work page

[22] [22]

Guidelines: • The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics

Code of ethics Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines? Answer: [Yes] Justification: The work and anticipated effects conform to the Code of Ethics. Guidelines: • The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics...

work page

[23] [23]

Guidelines: • The answer NA means that there is no societal impact of the work performed

Broader impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: We do not anticipate any notable direct societal impact. Guidelines: • The answer NA means that there is no societal impact of the work performed. • If the authors answer NA or No, they ...

work page

[24] [24]

Guidelines: • The answer NA means that the paper poses no such risks

Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: We see no such threats. Guidelines: • The answer NA means that the paper poses no such risks. •...

work page

[25] [25]

Guidelines: • The answer NA means that the paper does not use existing assets

Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: Original authors of assets are credited. Guidelines: • The answer NA means that the paper does n...

work page

[26] [26]

Guidelines: • The answer NA means that the paper does not release new assets

New assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? 23 Answer:[Yes] Justification: New assets are described in the paper and included in supplementary material. Guidelines: • The answer NA means that the paper does not release new assets. • Researchers should communicate the d...

work page

[27] [27]

Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

Crowdsourcing and research with human subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: The paper does not use crowdsourcing or experiment wi...

work page

[28] [28]

Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[29] [29]

Answer: [NA]

Declaration of LLM usage Question: Does the paper describe the usage of LLMs if it is an important, original, or non-standard component of the core methods in this research? Note that if the LLM is used only for writing, editing, or formatting purposes and does not impact the core methodology, scientific rigorousness, or originality of the research, decla...

work page 2025