Calibration offset estimation in mobile hearing tests via categorical loudness scaling
Pith reviewed 2026-05-18 21:56 UTC · model grok-4.3
The pith
Categorical loudness scaling estimates calibration offsets in mobile hearing tests with correlations up to 0.81.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that CLS-based models can compensate for missing calibration by predicting device offsets from loudness scaling parameters. The Bayesian regression model reaches correlations of up to 0.81 between estimated and true offsets, while both models reduce calibration uncertainty by factors between 0.41 and 0.79 relative to threshold-based methods. This holds because CLS supplies measures that remain robust to arbitrary level shifts, allowing individual-level correction from the OHHR dataset.
What carries the argument
Bayesian regression and nearest neighbor models trained on level-independent CLS parameters such as dynamic range drawn from the OHHR dataset.
Load-bearing premise
The assumption that simulated Gaussian offsets and CLS parameters from the OHHR dataset will generalize to real uncontrolled mobile environments with arbitrary device offsets.
What would settle it
Collect CLS responses and actual measured calibration offsets from users on their own smartphones in everyday settings, then test whether the model predictions match the measured offsets within the reported uncertainty range.
read the original abstract
Objective: To enable reliable smartphone-based hearing assessments by developing methods to estimate device calibration offsets using categorical loudness scaling (CLS). Design: Calibration offsets were simulated from a Gaussian distribution. Two prediction models - a Bayesian regression model and a nearest neighbor model - were trained on CLS-derived parameters and data from the Oldenburg Hearing Health Repository (OHHR). CLS was chosen because it provides level-independent measures (e.g., dynamic range) that remain robust despite calibration errors. Study Sample: The dataset comprised CLS results from N = 847 participants with a mean age of 70.0 years (SD = 8.7), including 556 male and 291 female listeners with diverse hearing profiles. Results: The Bayesian regression model achieved correlations of up to 0.81 between estimated and true calibration offsets, enabling accurate individual-level correction. Compared to threshold-based approaches, calibration uncertainty was reduced by factors between 0.41 and 0.79, demonstrating greater robustness in uncontrolled environments. Conclusions: CLS-based models can effectively compensate for missing calibration in mobile hearing assessments. This approach provides a practical alternative to threshold-based methods, supporting the use of smartphone-based tests outside laboratory settings and expanding access to reliable hearing healthcare in everyday and resource-limited contexts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops and evaluates two models (Bayesian regression and nearest-neighbor) to estimate unknown calibration offsets in smartphone-based hearing tests. Offsets are simulated from a Gaussian distribution and added to categorical loudness scaling (CLS) data from the Oldenburg Hearing Health Repository (N=847 participants). Level-independent CLS parameters such as dynamic range are used as predictors; the models achieve correlations up to 0.81 with the simulated ground-truth offsets and reduce uncertainty by factors of 0.41–0.79 relative to threshold-based methods.
Significance. If the reported performance generalizes beyond the simulated setting, the work would provide a practical route to calibration-free mobile hearing assessments, potentially increasing accessibility in non-laboratory environments. The use of CLS parameters that are designed to be robust to level shifts is a conceptually attractive choice, and the quantitative comparison against threshold methods is a clear strength.
major comments (2)
- [Results] Results and Methods sections: All performance figures (r ≤ 0.81, uncertainty reductions 0.41–0.79) are obtained exclusively on data with artificially added Gaussian offsets. No experiments using measured calibration errors from actual smartphones or uncontrolled listening conditions are presented, leaving the central claim of applicability to real mobile environments unsupported by direct evidence.
- [Methods] Methods: The manuscript provides no information on cross-validation strategy, train/test splits, or regularization choices for the Bayesian regression and nearest-neighbor models. Without these details it is impossible to assess whether the reported correlations reflect genuine predictive power or overfitting to the particular simulation.
minor comments (2)
- [Abstract] Abstract: The phrase 'reducing uncertainty by factors between 0.41 and 0.79' is ambiguous; clarify whether these are multiplicative factors on standard deviation or on variance.
- The manuscript would benefit from an explicit statement of the assumed distribution parameters for the simulated offsets and from a sensitivity analysis showing how results change when the Gaussian assumption is relaxed.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Results] Results and Methods sections: All performance figures (r ≤ 0.81, uncertainty reductions 0.41–0.79) are obtained exclusively on data with artificially added Gaussian offsets. No experiments using measured calibration errors from actual smartphones or uncontrolled listening conditions are presented, leaving the central claim of applicability to real mobile environments unsupported by direct evidence.
Authors: We agree that the reported results are derived from simulated Gaussian calibration offsets added to the CLS data. This controlled simulation was selected to establish known ground truth and to isolate the contribution of level-independent CLS features such as dynamic range. Because these features are constructed to be invariant to uniform level shifts, the simulation provides a direct test of the method’s core mechanism. We acknowledge, however, that direct evidence from measured smartphone offsets or fully uncontrolled conditions is not presented. The revised manuscript will add a dedicated limitations paragraph that explicitly discusses the simulation assumptions, their relation to real-device errors, and the need for subsequent field validation studies. revision: partial
-
Referee: [Methods] Methods: The manuscript provides no information on cross-validation strategy, train/test splits, or regularization choices for the Bayesian regression and nearest-neighbor models. Without these details it is impossible to assess whether the reported correlations reflect genuine predictive power or overfitting to the particular simulation.
Authors: We appreciate the referee highlighting this gap. The revised Methods section will specify that participant-level data were randomly partitioned into an 80 % training and 20 % test set, with 5-fold cross-validation performed within the training set to select hyperparameters. For the Bayesian regression model, weakly informative normal priors were placed on the coefficients and a half-Cauchy prior on the residual scale; no further explicit regularization was applied. For the nearest-neighbor model, Euclidean distance was used with k = 5, where k was chosen by inner cross-validation. These details will be added so that readers can evaluate the risk of overfitting. revision: yes
Circularity Check
No circularity: simulation-based prediction of external offsets from independent CLS parameters
full rationale
The paper simulates Gaussian calibration offsets, adds them to the existing OHHR CLS dataset (N=847), extracts level-independent parameters such as dynamic range, and trains separate Bayesian regression and nearest-neighbor models whose target is the simulated offset value. Reported correlations (up to 0.81) and uncertainty-reduction factors are computed between these model predictions and the independently generated simulated offsets; no equation equates the output to a fitted input by construction, no self-citation supplies a uniqueness theorem, and the central evaluation remains a standard supervised simulation study whose inputs and targets are distinct.
Axiom & Free-Parameter Ledger
free parameters (1)
- Gaussian parameters for simulated offsets
axioms (1)
- domain assumption CLS provides level-independent measures (e.g., dynamic range) that remain robust despite calibration errors.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Calibration offsets were simulated from a Gaussian distribution. Two prediction models—a Bayesian regression model and a nearest neighbor model—were trained on CLS-derived parameters... level-independent measures (e.g., dynamic range)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The dataset comprised CLS results from N = 847 participants... ACALOS procedure
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Standard audiogram classification from loudness scaling data using unsupervised, supervised, and explainable machine learning techniques
Machine learning models can predict standard Bisgaard audiogram types from calibration-independent ACALOS loudness data with reasonable accuracy despite substantial class overlap.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.