pith. the verified trust layer for science. sign in

arxiv: 2601.09448 · v2 · submitted 2026-01-14 · 💻 cs.SD · cs.AI

One Prompt, Many Sounds: Modeling Listener Variability in LLM-Based Equalization

Pith reviewed 2026-05-16 14:23 UTC · model grok-4.3

classification 💻 cs.SD cs.AI
keywords LLMaudio equalizationnatural languagelistener variabilityin-context learningparameter-efficient fine-tuningdistributional alignment
0
0 comments X p. Extension

The pith

Large language models can map natural language prompts to audio equalization settings that match population preferences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLMs can convert everyday text descriptions of listening contexts into equalization parameters. It collects data from a controlled listening experiment and applies in-context learning plus parameter-efficient fine-tuning to produce settings that align with what groups of listeners prefer. Evaluation uses distributional metrics to show statistically significant gains over random sampling and fixed preset baselines. The approach treats equalization as a conversational task rather than a manual adjustment process. This opens the possibility of audio systems that adapt automatically to mood, location, or social setting without expert intervention.

Core claim

The central claim is that LLMs function as artificial equalizers by mapping natural language text prompts to equalization settings that reliably align with population-preferred values. This is achieved by training on data from a controlled listening experiment using in-context learning and parameter-efficient fine-tuning techniques, which produce statistically significant improvements in distributional alignment compared with random sampling and static preset baselines.

What carries the argument

An LLM that uses in-context learning and parameter-efficient fine-tuning on listening-experiment data to translate natural language prompts into equalization parameters.

If this is right

  • Audio systems could accept spoken or written descriptions to adjust sound automatically.
  • Consumer devices could replace manual EQ sliders with conversational control.
  • Static factory presets could be replaced by prompt-driven adaptation that reflects group taste distributions.
  • Evaluation methods based on distributional metrics could be applied to other audio control tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prompting approach might extend to related audio controls such as dynamic range compression or spatial effects.
  • Individual users could supply a few personal examples to shift the model from population alignment toward personal preference.
  • Real-time integration with voice assistants would allow EQ changes mid-listening session based on simple descriptions.

Load-bearing premise

The data from the controlled listening experiment sufficiently captures listener variability and generalizes to new natural language prompts and contexts.

What would settle it

Collect listener preferences for a set of prompts never shown during training or fine-tuning, then check whether the model's predicted equalization settings show the same statistically significant distributional alignment improvement over baselines that was reported on the original data.

Figures

Figures reproduced from arXiv: 2601.09448 by Ioannis Stylianou, Jon Francombe, Pablo Martinez-Nuevo, Sven Ewan Shepstone, Zheng-Hua Tan.

Figure 1
Figure 1. Figure 1: Illustration of the interface used for the data collection experiment. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: The filters that comprise the equalization controller. A horizontal [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Outline of RAG and RAG-QA recommendation approaches. The sentence similarity is computed as the dot product between the embedding of the [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Parameter Efficient Fine-Tuning with regression head. The figure on the left illustrates the details of Prefix Tuning and LoRA methods, while the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of listener agreement in the Beosonic space. (a) For [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example of standard KDE vs reflective KDE. The blue/white points are drawn from a uniform distribution in the [-6,6]x[-6,6] square. With the [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of standard KDE vs reflective KDE. The blue/white points are drawn from a bimodal standard Gaussian distribution with modes [-3,3] and [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual validation of the Kantorovich distance as a performance metric. The plots contrast the Ground Truth user responses (Orange) against two synthetic model predictions: a “Population-Aligned” distribution (Blue) and a “Misaligned” distribution (Red). (a) Simple Prompt: The Red samples are spatially shifted from the consensus. The Kantorovich distance accurately reflects this error with a high value comp… view at source ↗
Figure 12
Figure 12. Figure 12: Comparison between proposed ICL and PEFT methods using [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Comparison between proposed ICL and PEFT methods using [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗
read the original abstract

Conventional audio equalization is a static process that requires manual and cumbersome adjustments to adapt to changing listening contexts (e.g., mood, location, or social setting). In this paper, we introduce a Large Language Model (LLM)-based alternative that maps natural language text prompts to equalization settings. This enables a conversational approach to sound system control. By utilizing data collected from a controlled listening experiment, our models exploit in-context learning and parameter-efficient fine-tuning techniques to reliably align with population-preferred equalization settings. Our evaluation methods, which leverage distributional metrics that capture users' varied preferences, show statistically significant improvements in distributional alignment over random sampling and static preset baselines. These results indicate that LLMs could function as "artificial equalizers," contributing to the development of more accessible, context-aware, and expert-level audio tuning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes an LLM-based system that maps natural-language prompts describing listening contexts (mood, location, etc.) to audio equalization settings. It collects listener preference data via a controlled experiment, then applies in-context learning and parameter-efficient fine-tuning to produce models that align with population-level preferred EQ distributions; distributional metrics are used to demonstrate statistically significant gains over random sampling and static-preset baselines.

Significance. If the reported alignment improvements prove robust, the work would offer a practical route to conversational, context-aware audio control that captures inter-listener variability without manual parameter adjustment. The distributional evaluation approach is a constructive way to handle preference diversity, but the absence of core experimental metadata prevents any assessment of whether the gains reflect genuine generalization rather than in-sample fitting.

major comments (2)
  1. [Abstract] Abstract: the assertion of 'statistically significant improvements in distributional alignment' supplies no participant count, exact statistical test, data-collection protocol, or metric definitions, rendering the central empirical claim unverifiable from the provided text.
  2. [Evaluation] Evaluation section: no held-out prompt set, prompt-diversity statistics, or cross-context generalization results are reported, leaving the claim that in-context learning and PEFT 'reliably align' with population preferences on arbitrary unseen prompts unsupported.
minor comments (1)
  1. [Methods] Notation for the distributional metrics (e.g., how the preferred EQ distribution is constructed from listener responses) should be defined explicitly before the results are presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important issues of verifiability and generalization. We address each major comment below and will revise the manuscript to strengthen these aspects.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion of 'statistically significant improvements in distributional alignment' supplies no participant count, exact statistical test, data-collection protocol, or metric definitions, rendering the central empirical claim unverifiable from the provided text.

    Authors: We agree that the abstract is currently too concise and omits key experimental metadata. In the revised manuscript we will expand the abstract to report the participant count from the listening experiment, the exact statistical test and significance level used, a brief description of the data-collection protocol, and explicit definitions of the distributional metrics. This change will make the central claim directly verifiable from the abstract. revision: yes

  2. Referee: [Evaluation] Evaluation section: no held-out prompt set, prompt-diversity statistics, or cross-context generalization results are reported, leaving the claim that in-context learning and PEFT 'reliably align' with population preferences on arbitrary unseen prompts unsupported.

    Authors: We acknowledge that the evaluation section does not currently include an explicit held-out prompt split, prompt-diversity statistics, or dedicated cross-context generalization results. While the in-context learning setup is designed to support new prompts, we will revise the evaluation section to add these elements: a held-out prompt set, quantitative prompt-diversity measures, and distributional alignment results on the held-out prompts. This will provide direct evidence for generalization beyond the prompts seen during data collection or fine-tuning. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation grounded in new listener data

full rationale

The paper collects fresh data from a controlled listening experiment and applies in-context learning plus PEFT to map prompts to equalization settings that match the observed population preferences. No step reduces by definition to its own inputs, renames a fitted parameter as a prediction, or relies on a load-bearing self-citation chain; the distributional alignment metrics are computed against the collected responses rather than being tautological. The central claim therefore remains independent of the patterns that would trigger a circularity flag.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim depends on the representativeness of the listening experiment data and the transferability of LLM in-context learning to this audio task; no free parameters or invented entities are described.

axioms (1)
  • domain assumption LLMs can reliably perform in-context learning to map text prompts to continuous parameter settings
    Invoked to justify using prompts and fine-tuning without additional derivation.

pith-pipeline@v0.9.0 · 5450 in / 1111 out tokens · 78949 ms · 2026-05-16T14:23:01.899041+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 8 internal anchors

  1. [1]

    The netflix recommender system: Algorithms, business value, and innovation,

    C. A. Gomez-Uribe and N. Hunt, “The netflix recommender system: Algorithms, business value, and innovation,”ACM Transactions on Management Information Systems (TMIS), vol. 6, no. 4, pp. 1–19, 2015

  2. [2]

    Music personalization at spotify,

    K. Jacobson, V . Murali, E. Newett, B. Whitman, and R. Yon, “Music personalization at spotify,” inProceedings of the 10th ACM Conference on Recommender Systems, ser. RecSys ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 373. [Online]. Available: https://doi.org/10.1145/2959100.2959120

  3. [3]

    Comparison of digital loudspeaker-equalization tech- niques,

    P. Dziechci ´nski, “Comparison of digital loudspeaker-equalization tech- niques,”Archives of Acoustics, vol. 30, no. 2, 2005

  4. [4]

    All about audio equalization: Solutions and frontiers,

    V . V ¨alim¨aki and J. D. Reiss, “All about audio equalization: Solutions and frontiers,”Applied Sciences, vol. 6, no. 5, p. 129, 2016

  5. [5]

    Deep optimization of parametric iir filters for audio equalization,

    G. Pepe, L. Gabrielli, S. Squartini, C. Tripodi, and N. Strozzi, “Deep optimization of parametric iir filters for audio equalization,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1136–1149, 2022

  6. [6]

    Designing audio equalization filters by deep neural networks,

    G. Pepe, L. Gabrielli, S. Squartini, and L. Cattani, “Designing audio equalization filters by deep neural networks,”Applied Sciences, vol. 10, no. 7, p. 2483, 2020

  7. [7]

    End-to-end equalization with convolutional neural networks,

    M. Martinez Ramirez, J. Reisset al., “End-to-end equalization with convolutional neural networks,” 2018

  8. [8]

    Building a personalized audio equal- izer interface with transfer learning and active learning,

    B. Pardo, D. Little, and D. Gergle, “Building a personalized audio equal- izer interface with transfer learning and active learning,” inProceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies, 2012, pp. 13– 18

  9. [9]

    Language and culture,

    C. Kramsch, “Language and culture,”AILA review, vol. 27, no. 1, pp. 30–55, 2014

  10. [10]

    Statistical analysis of an automated in- situ frequency response optimisation algorithm for active loudspeakers,

    A. Goldberg and A. M ¨akivirta, “Statistical analysis of an automated in- situ frequency response optimisation algorithm for active loudspeakers,” inproceedings of the 23rd Conf. Audio Eng. Soc.,(2003 May). Citeseer, 2003

  11. [11]

    On the variation and invertibility of room impulse response functions,

    J. Mourjopoulos, “On the variation and invertibility of room impulse response functions,”Journal of sound and vibration, vol. 102, no. 2, pp. 217–228, 1985

  12. [12]

    Audealize: Crowdsourced audio produc- tion tools,

    P. Seetharaman and B. Pardo, “Audealize: Crowdsourced audio produc- tion tools,”Journal of the Audio Engineering Society, vol. 64, no. 9, pp. 683–695, 2016

  13. [13]

    Audi- olm: a language modeling approach to audio generation,

    Z. Borsos, R. Marinier, D. Vincent, E. Kharitonov, O. Pietquin, M. Shar- ifi, D. Roblek, O. Teboul, D. Grangier, M. Tagliasacchiet al., “Audi- olm: a language modeling approach to audio generation,”IEEE/ACM transactions on audio, speech, and language processing, vol. 31, pp. 2523–2533, 2023

  14. [14]

    MusicLM: Generating Music From Text

    A. Agostinelli, T. I. Denk, Z. Borsos, J. Engel, M. Verzetti, A. Caillon, Q. Huang, A. Jansen, A. Roberts, M. Tagliasacchiet al., “Musiclm: Generating music from text,”arXiv preprint arXiv:2301.11325, 2023

  15. [15]

    Audit: Audio editing by following instructions with latent diffusion models,

    Y . Wang, Z. Ju, X. Tan, L. He, Z. Wu, J. Bian, and s. zhao, “Audit: Audio editing by following instructions with latent diffusion models,” inAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 71 340–71 357. [Online]. Available: https://p...

  16. [16]

    Prompt-guided precise audio editing with diffusion models,

    M. Xu, C. Li, D. Su, W. Liang, D. Yuet al., “Prompt-guided precise audio editing with diffusion models,”arXiv preprint arXiv:2406.04350, 2024

  17. [17]

    Can large language models predict audio effects parame- ters from natural language?

    S. Doh, J. Koo, M. A. Mart ´ınez-Ram´ırez, W.-H. Liao, J. Nam, and Y . Mitsufuji, “Can large language models predict audio effects parame- ters from natural language?”arXiv preprint arXiv:2505.20770, 2025

  18. [18]

    Text2fx: Harnessing clap embeddings for text-guided audio effects,

    A. Chu, P. O’Reilly, J. Barnett, and B. Pardo, “Text2fx: Harnessing clap embeddings for text-guided audio effects,” inICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

  19. [19]

    An automatic volume control for preserving intelligibility,

    F. Felber, “An automatic volume control for preserving intelligibility,” in34th IEEE Sarnoff Symposium. IEEE, 2011, pp. 1–5

  20. [20]

    Context-aware recom- mender systems in the music domain: A systematic literature review,

    A. Lozano Murciego, D. M. Jim ´enez-Bravo, A. Valera Roman, J. F. De Paz Santana, and M. N. Moreno-Garc ´ıa, “Context-aware recom- mender systems in the music domain: A systematic literature review,” Electronics, vol. 10, no. 13, p. 1555, 2021

  21. [21]

    Audio-based age and gender identification to enhance the recommendation of tv content,

    S. E. Shepstone, Z.-H. Tan, and S. H. Jensen, “Audio-based age and gender identification to enhance the recommendation of tv content,” IEEE Transactions on Consumer Electronics, vol. 59, no. 3, pp. 721– 729, 2013

  22. [22]

    Beosound / beovision theatre: Technical sound guide,

    B. . O. A/S, “Beosound / beovision theatre: Technical sound guide,” 2023

  23. [23]

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    M. Abdinet al., “Phi-3 technical report: A highly capable language model locally on your phone,” 2024. [Online]. Available: https://arxiv.org/abs/2404.14219

  24. [24]

    Language mod- els are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

  25. [25]

    Automatic text clustering for audio attribute elicitation experiment responses,

    J. Francombe, T. Brookes, and R. Mason, “Automatic text clustering for audio attribute elicitation experiment responses,” inAES 143rd Convention. Audio Engineering Society, 2017

  26. [26]

    Retrieval- augmented generation for knowledge-intensive nlp tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

  27. [27]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” 2019. [Online]. Available: https: //arxiv.org/abs/1908.10084

  28. [28]

    Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models,

    J. Ni, G. H. Abrego, N. Constant, J. Ma, K. Hall, D. Cer, and Y . Yang, “Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models,” inFindings of the association for computational linguistics: ACL 2022, 2022, pp. 1864–1874

  29. [29]

    Clap learning audio concepts from natural language supervision,

    B. Elizalde, S. Deshmukh, M. Al Ismail, and H. Wang, “Clap learning audio concepts from natural language supervision,” inICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

  30. [30]

    Genqa: Generating millions of instructions from a handful of prompts,

    J. Chen, R. Qadri, Y . Wen, N. Jain, J. Kirchenbauer, T. Zhou, and T. Goldstein, “Genqa: Generating millions of instructions from a handful of prompts,” 2024. [Online]. Available: https://arxiv.org/abs/2406.10323

  31. [31]

    On the effectiveness of parameter-efficient fine-tuning,

    Z. Fu, H. Yang, A. M.-C. So, W. Lam, L. Bing, and N. Collier, “On the effectiveness of parameter-efficient fine-tuning,” 2022. [Online]. Available: https://arxiv.org/abs/2211.15583

  32. [32]

    Parameter-efficient fine-tuning of large- scale pre-trained language models,

    N. Ding, Y . Qin, G. Yang, F. Wei, Z. Yang, Y . Su, S. Hu, Y . Chen, C.-M. Chan, W. Chenet al., “Parameter-efficient fine-tuning of large- scale pre-trained language models,”Nature Machine Intelligence, vol. 5, no. 3, pp. 220–235, 2023

  33. [33]

    Lora: Low-rank adaptation of large language models

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

  34. [34]

    Parameter-efficient transfer SUBMITTED TO IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 12 learning for nlp,

    N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer SUBMITTED TO IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 12 learning for nlp,” 2019. [Online]. Available: https://arxiv.org/abs/1902. 00751

  35. [35]

    Adapterhub: A framework for adapting transformers,

    J. Pfeiffer, A. R ¨uckl´e, C. Poth, A. Kamath, I. Vuli ´c, S. Ruder, K. Cho, and I. Gurevych, “Adapterhub: A framework for adapting transformers,”

  36. [36]

    Available: https://arxiv.org/abs/2007.07779

    [Online]. Available: https://arxiv.org/abs/2007.07779

  37. [37]

    Language models are unsupervised multitask learners,

    A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskeveret al., “Language models are unsupervised multitask learners,”OpenAI blog, vol. 1, no. 8, p. 9, 2019

  38. [38]

    The Power of Scale for Parameter-Efficient Prompt Tuning

    B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” 2021. [Online]. Available: https://arxiv.org/abs/2104.08691

  39. [39]

    DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

    J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” 2013. [Online]. Available: https: //arxiv.org/abs/1310.1531

  40. [40]

    Cross-attention is all you need: Adapting pretrained transformers for machine translation,

    M. Gheini, X. Ren, and J. May, “Cross-attention is all you need: Adapting pretrained transformers for machine translation,” 2021. [Online]. Available: https://arxiv.org/abs/2104.08771

  41. [41]

    Scaling down to scale up: A guide to parameter- efficient fine-tuning.arXiv preprint arXiv:2303.15647, 2023

    V . Lialin, V . Deshpande, X. Yao, and A. Rumshisky, “Scaling down to scale up: A guide to parameter-efficient fine-tuning,” 2024. [Online]. Available: https://arxiv.org/abs/2303.15647

  42. [42]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation

    X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” 2021. [Online]. Available: https://arxiv.org/abs/2101.00190

  43. [43]

    Interpolating between Optimal Transport and MMD using Sinkhorn Divergences

    J. Feydy, T. S ´ejourn´e, F.-X. Vialard, S. ichi Amari, A. Trouv ´e, and G. Peyr ´e, “Interpolating between optimal transport and mmd using sinkhorn divergences,” 2018. [Online]. Available: https://arxiv.org/abs/ 1810.08278

  44. [44]

    Mathematical methods of organizing and planning production,

    L. V . Kantorovich, “Mathematical methods of organizing and planning production,”Management science, vol. 6, no. 4, pp. 366–422, 1960

  45. [45]

    Long history of the monge-kantorovich transportation problem,

    A. Vershik, “Long history of the monge-kantorovich transportation problem,”The Mathematical Intelligencer, vol. 35, 12 2013

  46. [46]

    Wasserstein GAN

    M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” 2017. [Online]. Available: https://arxiv.org/abs/1701.07875

  47. [47]

    A multidimensional version of the kolmogorov–smirnov test,

    G. Fasano and A. Franceschini, “A multidimensional version of the kolmogorov–smirnov test,”Monthly Notices of the Royal Astronomical Society, vol. 225, no. 1, pp. 155–170, 1987

  48. [48]

    Direct preference optimization: Your language model is secretly a reward model,

    R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in neural information processing systems, vol. 36, pp. 53 728–53 741, 2023

  49. [49]

    Generative modeling using the sliced wasserstein distance,

    I. Deshpande, Z. Zhang, and A. G. Schwing, “Generative modeling using the sliced wasserstein distance,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3483–3491

  50. [50]

    Simple boundary correction for kernel density estimation,

    M. C. Jones, “Simple boundary correction for kernel density estimation,” Statistics and computing, vol. 3, pp. 135–146, 1993

  51. [51]

    B. W. Silverman,Density estimation for statistics and data analysis. Routledge, 2018

  52. [52]

    Incorporating support constraints into nonparametric estimators of densities,

    E. F. Schuster, “Incorporating support constraints into nonparametric estimators of densities,”Communications in Statistics-Theory and meth- ods, vol. 14, no. 5, pp. 1123–1136, 1985

  53. [53]

    Scott’s rule,

    D. W. Scott, “Scott’s rule,”Wiley Interdisciplinary Reviews: Computa- tional Statistics, vol. 2, no. 4, pp. 497–502, 2010

  54. [54]

    Variations of box plots,

    R. McGill, J. W. Tukey, and W. A. Larsen, “Variations of box plots,” The american statistician, vol. 32, no. 1, pp. 12–16, 1978

  55. [55]

    Use of ranks in one-criterion variance analysis,

    W. H. Kruskal and W. A. Wallis, “Use of ranks in one-criterion variance analysis,”Journal of the American statistical Association, vol. 47, no. 260, pp. 583–621, 1952

  56. [56]

    Multiple comparisons using rank sums,

    O. J. Dunn, “Multiple comparisons using rank sums,”Technometrics, vol. 6, no. 3, pp. 241–252, 1964