pith. sign in

arxiv: 1907.10178 · v1 · pith:SDIQUJL5new · submitted 2019-07-23 · 💻 cs.LG · cs.CV· stat.ML

Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction

Pith reviewed 2026-05-24 17:12 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords MoN lossvariety losstrajectory predictionprobabilistic forecastinggenerative modelsdensity estimationautonomous drivingminimum over N
0
0 comments X

The pith

The MoN loss approximates the square root of the true probability density rather than the density itself.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the variety loss, or Minimum over N loss, commonly used to train generative models for predicting trajectories of traffic agents. It supplies a mathematical proof that minimizing the distance from the ground truth to the nearest of N model outputs does not recover the target probability density. In the large-N limit the trained distribution instead converges to a density proportional to the square root of the ground-truth density. Experiments on both synthetic and real-world trajectory data confirm the mismatch, and the authors introduce corrective terms that restore higher log-likelihood for observed ground-truth samples.

Core claim

The MoN loss does not lead to the ground truth probability density function, but approximately to its square root instead. The derivation treats the selection of the single closest sample among N draws from the generative distribution and shows that, in the continuous limit, this selection produces a functional equivalent to the square root of the target density.

What carries the argument

The MoN loss, which takes the minimum distance over N independently drawn predictions and whose large-sample behavior is shown to equal the square root of the target density.

If this is right

  • Models trained with the MoN objective produce probability densities that are dilated relative to the true distribution.
  • The log-likelihood of ground-truth samples under the learned model is lower than it would be under a correctly calibrated density.
  • Compensating for the square-root effect raises the log-likelihood of observed trajectories.
  • The correction can be applied post-training or incorporated into the loss to restore proper density estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same bias may appear in any generative setting that uses a minimum-over-N objective, including image or motion synthesis.
  • Uncertainty estimates produced by such models will systematically understate tail probabilities.
  • Exact finite-N corrections could be derived by replacing the limiting square-root functional with the precise order-statistic expectation.
  • Evaluating calibration on held-out trajectory data after correction would test whether the adjustment improves downstream planning safety.

Load-bearing premise

The proof assumes that picking the single closest sample from many draws produces exactly the square root of the target density in the large-N or continuous limit.

What would settle it

Fit a model to samples from a known analytic density using the MoN loss, then compare the empirical histogram of model outputs against both the original density and its square root; the square-root version should match the histogram more closely.

Figures

Figures reproduced from arXiv: 1907.10178 by Luca Anthony Thiede, Pratik Prabhanjan Brahma.

Figure 1
Figure 1. Figure 1: Trajectory prediction is a multimodal problem. In [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of the MoN loss in one dimension. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Histogram of samples from the Normal [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: MoN values for different k by using samples from f1 as the groundtruth distribution and f k 1 as the test distri￾bution with N = 256 algorithm 1 with the found ¯k. Note that this compensation is very lightweight, once ¯kopt is found, for tasks where the KDE reconstruction has to be done anyway. 6. Experiments 6.1. MoN minimum of Mixture of Gaussians We verify our result on two toy experiments: For the firs… view at source ↗
Figure 5
Figure 5. Figure 5: The input distribution and corresponding targets [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The variation of the k that minimizes the MoN loss is plotted with respect to N for PDFs with dimension￾ality 1 and 10. Note that the 10 dimensional one converges much faster. This implies that a widespread PDF is pre￾ferred by MoN in higher dimensions even for small N. (a) Overview of the NGSIM dataset [3] (b) Close up of the highway and tracking of the vehicle [2] [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) An overview of the highway section the [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: The average log likelihood in dependence of the [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 9
Figure 9. Figure 9: Illustration of the Zara dataset with ground truth [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Marginalized probabilities of Social–GAN on the Zara dataset for [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Marginalized probabilities of Social–GAN on the Zara dataset for [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Marginalized probabilities of our own model on the NGSIM dataset for [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Marginalized probabilities of our own model on the NGSIM dataset for [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Marginalized probabilities of our own model on the NGSIM dataset for [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗
read the original abstract

Trajectory or behavior prediction of traffic agents is an important component of autonomous driving and robot planning in general. It can be framed as a probabilistic future sequence generation problem and recent literature has studied the applicability of generative models in this context. The variety or Minimum over N (MoN) loss, which tries to minimize the error between the ground truth and the closest of N output predictions, has been used in these recent learning models to improve the diversity of predictions. In this work, we present a proof to show that the MoN loss does not lead to the ground truth probability density function, but approximately to its square root instead. We validate this finding with extensive experiments on both simulated toy as well as real world datasets. We also propose multiple solutions to compensate for the dilation to show improvement of log likelihood of the ground truth samples in the corrected probability density function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes the Minimum-over-N (MoN) variety loss commonly used to train generative models for probabilistic trajectory prediction. It claims to prove that, in the large-N or continuous limit, minimizing the MoN objective yields a model density that approximates the square root of the target ground-truth density rather than the density itself. The paper validates the claim on toy and real-world datasets and proposes corrective modifications to the loss that are reported to improve the log-likelihood of ground-truth samples under the learned density.

Significance. If the limiting analysis is correct, the result would clarify why MoN-trained models often produce over-dispersed or mode-covering distributions and would supply a concrete mechanism for debiasing such losses. The combination of a theoretical claim with both simulated and real-data experiments is a strength; explicit machine-checked derivations or reproducible code would further increase the value of the contribution.

major comments (2)
  1. [Proof section] Proof section (location of the claimed derivation): the central assertion that the expected distance to the single closest sample among N draws converges to a functional minimized precisely when the model density equals the square root of the target density is stated without the explicit limiting expression. No order-statistic or CDF derivation of the minimum-distance functional is supplied, so it is impossible to verify that all other terms become constant or vanish in the large-N/continuum limit.
  2. [Experiments section] Experiments section and associated tables/figures: quantitative results for the reported log-likelihood improvements on toy and real datasets are referenced but not presented with sufficient numerical detail (e.g., exact values, standard deviations, or direct comparison against the theoretical square-root prediction). This leaves the empirical support for the magnitude of the claimed bias unassessable.
minor comments (2)
  1. Notation for the MoN objective and the target density should be introduced once with a single consistent symbol set rather than redefined across sections.
  2. The abstract states that the MoN loss leads 'approximately' to the square root; the precise sense of the approximation (e.g., pointwise, in KL divergence, or in total variation) should be stated explicitly in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the major comments point-by-point below and will revise the manuscript to incorporate the requested clarifications and details.

read point-by-point responses
  1. Referee: [Proof section] Proof section (location of the claimed derivation): the central assertion that the expected distance to the single closest sample among N draws converges to a functional minimized precisely when the model density equals the square root of the target density is stated without the explicit limiting expression. No order-statistic or CDF derivation of the minimum-distance functional is supplied, so it is impossible to verify that all other terms become constant or vanish in the large-N/continuum limit.

    Authors: We agree that the current presentation of the limiting argument would be strengthened by including the intermediate derivation steps. In the revision we will add the explicit order-statistic and CDF derivation of the minimum-distance functional together with the large-N/continuum limiting expression, showing that all extraneous terms become constant and that the functional is minimized precisely when the model density is proportional to the square root of the target density. revision: yes

  2. Referee: [Experiments section] Experiments section and associated tables/figures: quantitative results for the reported log-likelihood improvements on toy and real datasets are referenced but not presented with sufficient numerical detail (e.g., exact values, standard deviations, or direct comparison against the theoretical square-root prediction). This leaves the empirical support for the magnitude of the claimed bias unassessable.

    Authors: We acknowledge that the experimental section would benefit from greater numerical transparency. In the revised manuscript we will report exact log-likelihood values, standard deviations across repeated runs, and direct numerical comparisons against the theoretical square-root prediction for both the toy and real-world datasets, thereby making the magnitude of the bias and the effect of the corrective modifications fully assessable. revision: yes

Circularity Check

0 steps flagged

Derivation of MoN loss effect is self-contained analysis from loss definition

full rationale

The paper begins from the standard definition of the Minimum over N (MoN) loss and models the effect of selecting the closest sample among N draws from the generative distribution. It then analyzes the large-N or continuous limit to relate the minimized objective to the square root of the target density. No parameters are fitted to subsets of data and then renamed as predictions, no self-citations are load-bearing for the central claim, and no equations reduce to their own inputs by construction. The derivation is an independent functional analysis of the loss and is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on a derivation from the definition of the MoN loss together with standard properties of probability densities under optimization; no additional free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The MoN loss selects the minimum distance among N samples drawn from the model's predictive distribution and this selection governs the stationary point of the learned density.
    This is the operational definition of the variety loss used throughout the generative-modeling literature for sequences.

pith-pipeline@v0.9.0 · 5677 in / 1138 out tokens · 27534 ms · 2026-05-24T17:12:11.421131+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 16 internal anchors

  1. [1]

    https://graphics.cs.ucy.ac.cy/research/downloads/crowd- data

    ”crowds-by-example” data set (zara1 dataset). https://graphics.cs.ucy.ac.cy/research/downloads/crowd- data. Accessed: 2018-11-02

  2. [2]

    http://www2.ece.ohio- state.edu/ coifman/documents/I80-NGSIM/

    I-80 ngsim validation. http://www2.ece.ohio- state.edu/ coifman/documents/I80-NGSIM/. Accessed: 2018-11-02

  3. [3]

    https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.html

    Next generation simulation dataset. https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.html. Accessed: 2018-11-02

  4. [4]

    Alahi, K

    A. Alahi, K. Goel, V . Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese. Social lstm: Human trajectory prediction in crowded spaces. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

  5. [5]

    Bhattacharyya, B

    A. Bhattacharyya, B. Schiele, and M. Fritz. Accurate and diverse sampling of sequences based on a best of many sample objective. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 8485– 8493, 2018

  6. [6]

    C. M. Bishop. Mixture density networks. Technical report, 1994

  7. [7]

    S.-T. Chiu. Bandwidth selection for kernel density estima- tion. Ann. Statist., 19(4):1883–1905, 12 1991

  8. [8]

    Choi and S

    W. Choi and S. Savarese. A unified framework for multi-target tracking and collective activity recognition. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y . Sato, and C. Schmid, editors, Computer Vision – ECCV 2012 , pages 215–230, Berlin, Heidelberg, 2012. Springer Berlin Heidel- berg

  9. [9]

    H. Cui, V . Radosavljevic, F.-C. Chou, T.-H. Lin, T. Nguyen, T.-K. Huang, J. Schneider, and N. Djuric. Multimodal trajec- tory predictions for autonomous driving using deep convolu- tional networks. arXiv preprint arXiv:1809.10732, 2018

  10. [10]

    Convolutional Social Pooling for Vehicle Trajectory Prediction

    N. Deo and M. M. Trivedi. Convolutional social pooling for vehicle trajectory prediction. CoRR, abs/1805.06771, 2018

  11. [11]

    P. K. et. al. Human Trajectory Prediction using Adversarial Loss

  12. [12]

    H. Fan, H. Su, and L. Guibas. A Point Set Generation Net- work for 3D Object Reconstruction from a Single Image. arXiv e-prints, page arXiv:1612.00603, Dec 2016

  13. [13]

    Felsen, P

    P. Felsen, P. Lucey, and S. Ganguly. Where will they go? predicting fine-grained adversarial multi-agent motion using conditional variational autoencoders. In The European Con- ference on Computer Vision (ECCV), September 2018

  14. [14]

    Ferguson

    D. Ferguson. Efficiently using cost maps for planning com- plex maneuvers. 2008

  15. [15]

    Inferring 3D Shapes from Image Collections using Adversarial Networks

    M. Gadelha, A. Rai, S. Maji, and R. Wang. Inferring 3D Shapes from Image Collections using Adversarial Networks. arXiv e-prints, page arXiv:1906.04910, Jun 2019

  16. [16]

    Goodfellow, J

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Gen- erative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, edi- tors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014

  17. [17]

    J. Guan, Y . Yuan, K. M. Kitani, and N. Rhinehart. Gen- erative Hybrid Representations for Activity Forecasting with No-Regret Learning. arXiv e-prints, page arXiv:1904.06250, Apr 2019

  18. [18]

    Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

    A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. arXiv e-prints, page arXiv:1803.10892, Mar. 2018

  19. [19]

    Helbing and P

    D. Helbing and P. Moln ´ar. Social force model for pedestrian dynamics. 51:4282–4286, May 1995

  20. [20]

    Hladky and V

    S. Hladky and V . Bulitko. An evaluation of models for predicting opponent positions in first-person shooter video games. In 2008 IEEE Symposium On Computational Intelli- gence and Games, pages 39–46, Dec 2008

  21. [21]

    (https://math.stackexchange.com/users/297308/bgm)

    B. (https://math.stackexchange.com/users/297308/bgm). Expected minimum absolute difference to a given point correctly computed? Mathematics Stack Exchange. URL:https://math.stackexchange.com/q/3000933 (version: 2018-11-16)

  22. [22]

    Kiefer and J

    J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. , 23(3):462–466, 09 1952

  23. [23]

    B. Kim, C. M. Kang, S. Lee, H. Chae, J. Kim, C. C. Chung, and J. W. Choi. Probabilistic vehicle trajectory predic- tion over occupancy grid map via recurrent neural network. CoRR, abs/1704.07049, 2017

  24. [24]

    D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. arXiv e-prints, page arXiv:1312.6114, Dec. 2013

  25. [25]

    Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

    C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunning- ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-Realistic Single Image Super-Resolution Us- ing a Generative Adversarial Network. arXiv e-prints, page arXiv:1609.04802, Sep 2016

  26. [26]

    N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. S. Torr, and M. Chand raker. DESIRE: Distant Future Prediction in Dy- namic Scenes with Interacting Agents. arXiv e-prints, page arXiv:1704.04394, Apr 2017

  27. [27]

    Lerner, Y

    A. Lerner, Y . Chrysanthou, and D. Lischinski. Crowds by example. Computer Graphics Forum, 26(3):655–664, 2007

  28. [28]

    Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks

    C. Li and M. Wand. Precomputed Real-Time Texture Syn- thesis with Markovian Generative Adversarial Networks. arXiv e-prints, page arXiv:1604.04382, Apr 2016

  29. [29]

    Peeking into the Future: Predicting Future Person Activities and Locations in Videos

    J. Liang, L. Jiang, J. C. Niebles, A. Hauptmann, and L. Fei- Fei. Peeking into the Future: Predicting Future Person Activities and Locations in Videos. arXiv e-prints , page arXiv:1902.03748, Feb 2019

  30. [30]

    C. Liu, J. Yang, D. Ceylan, E. Yumer, and Y . Furukawa. Planenet: Piece-wise planar reconstruction from a single RGB image. CoRR, abs/1804.06278, 2018

  31. [31]

    W. Luo, B. Yang, and R. Urtasun. Fast and furious: Real time end-to-end 3d detection, tracking and motion forecast- ing with a single convolutional net. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2018

  32. [32]

    Mehran, A

    R. Mehran, A. Oyama, and M. Shah. Abnormal crowd be- havior detection using social force model. In 2009 IEEE Conference on Computer Vision and Pattern Recognition , pages 935–942, June 2009. 9

  33. [33]

    O Kelly, A

    M. O Kelly, A. Sinha, H. Namkoong, R. Tedrake, and J. C. Duchi. Scalable end-to-end autonomous vehicle test- ing via rare-event simulation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Gar- nett, editors, Advances in Neural Information Processing Systems 31 , pages 9827–9838. Curran Associates, Inc., 2018

  34. [34]

    S. H. Park, B. Kim, C. Mook Kang, C. Choo Chung, and J. W. Choi. Sequence-to-Sequence Prediction of Vehicle Tra- jectory via LSTM Encoder-Decoder Architecture. arXiv e- prints, page arXiv:1802.06338, Feb 2018

  35. [35]

    Pellegrini, A

    S. Pellegrini, A. Ess, K. Schindler, and L. van Gool. You’ll never walk alone: Modeling social behavior for multi-target tracking. In 2009 IEEE 12th International Conference on Computer Vision, pages 261–268, Sep. 2009

  36. [36]

    S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative Adversarial Text to Image Synthesis. arXiv e-prints, page arXiv:1605.05396, May 2016

  37. [37]

    Rhinehart, K

    N. Rhinehart, K. Kitani, and P. Vernaza. R2p2: A reparam- eterized pushforward policy for diverse, precise generative path forecasting. In European Conference on Computer Vi- sion. Springer, 2018

  38. [38]

    Rhinehart, R

    N. Rhinehart, R. McAllister, K. Kitani, and S. Levine. PRE- COG: PREdiction Conditioned On Goals in Visual Multi- Agent Settings. arXiv e-prints, page arXiv:1905.01296, May 2019

  39. [39]

    Ristic, B

    B. Ristic, B. La Scala, M. Morelande, and N. Gordon. Sta- tistical analysis of motion patterns in ais data: Anomaly de- tection and motion prediction. In 2008 11th International Conference on Information Fusion, pages 1–7, June 2008

  40. [40]

    SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

    A. Sadeghian, V . Kosaraju, A. Sadeghian, N. Hirose, S. H. Rezatofighi, and S. Savarese. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Con- straints. arXiv e-prints, page arXiv:1806.01482, June 2018

  41. [41]

    Sch ¨oller, V

    C. Sch ¨oller, V . Aravantinos, F. Lay, and A. Knoll. The Sim- pler the Better: Constant Velocity for Pedestrian Motion Pre- diction. arXiv e-prints, page arXiv:1903.07933, Mar 2019

  42. [42]

    D. W. Scott. Multivariate density estimation and visualiza- tion. 2012

  43. [43]

    M. K. C. Tay and C. Laugier. Modelling Smooth Paths Using Gaussian Processes, pages 381–390. Springer Berlin Hei- delberg, Berlin, Heidelberg, 2008

  44. [44]

    Treuille, S

    A. Treuille, S. Cooper, and Z. Popovi ´c. Continuum crowds. ACM Trans. Graph., 25(3):1160–1168, July 2006

  45. [45]

    An Uncertain Future: Forecasting from Static Images using Variational Autoencoders

    J. Walker, C. Doersch, A. Gupta, and M. Hebert. An Uncer- tain Future: Forecasting from Static Images using Variational Autoencoders. arXiv e-prints, page arXiv:1606.07873, Jun 2016

  46. [46]

    Wang, M.-Y

    T.-C. Wang, M.-Y . Liu, J.-Y . Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, 2018. 10

  47. [47]

    For the sake of simplicity we only consider the one dimensional case

    Supplementary Material Proof of Theorem 1 Proof. For the sake of simplicity we only consider the one dimensional case. First we bin the support of PT in M equally sized bins b1,b 2,...,b M of width 2ϵ. Then we can write the MoN Loss as LN(PT,P ) ≈ M∑ i=1 PT (bi) ∫ bi EMoNP,bi(x∗) dx∗ (18) with EMoNP,bi(x∗) = ∫ bi min (|x∗ −x1|, |x∗ −x2|,..., |x∗ −xN |) P ...