Density-valued VAR Models with Latent Factors

Michel F. C. Haddad; Yasumasa Matsuda

arxiv: 2604.25087 · v1 · submitted 2026-04-28 · 📊 stat.ME

Density-valued VAR Models with Latent Factors

Yasumasa Matsuda , Michel F. C. Haddad This is my paper

Pith reviewed 2026-05-07 15:55 UTC · model grok-4.3

classification 📊 stat.ME

keywords density-valued VARlatent factorsdirected networksSARS-CoV-2Ct valuesregional time seriesB-spline mixturesidiosyncratic dynamics

0 comments

The pith

A latent-factor density VAR decomposes common trends from directed regional dynamics in viral load distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a model for time series of density functions that transforms B-spline approximated densities into Euclidean space and fits them with a vector autoregression including latent factors. This setup separates nationwide common movements in SARS-CoV-2 Ct value distributions from region-specific directed predictive relations, which are then extracted using statistical tests on the idiosyncratic residuals. A sympathetic reader would care because it provides a way to trace how viral patterns propagate between regions without conflating them with broader waves that affect all areas simultaneously. Simulations demonstrate that the factors effectively filter out false connections caused by shared movements, while the Brazilian data application shows emerging directed links from northern to southeastern areas once early months are set aside.

Core claim

By representing regional density functions of Ct values through B-spline mixtures and applying a generalized logit transform, the transformed series are modeled as a VAR process with latent factors; this allows the idiosyncratic component to reveal directed edges via one-sided tests under false discovery rate control, yielding a network of predictive relations from the northern region toward southeastern metropolitan areas in the adjusted sample.

What carries the argument

The latent-factor-augmented density-valued VAR, which decomposes the dynamics of transformed density weights into common latent factors capturing strong shared movements and an idiosyncratic VAR component that isolates directed predictive relations.

Load-bearing premise

That dropping the first six months of data and using a weak prior on the densities isolates genuine directed predictive relations instead of introducing selection effects or prior-driven artifacts.

What would settle it

Observing the same north-to-southeast directed network when the full sample is used without exclusion, or when the density prior is strengthened, would falsify the interpretation that the network reflects true idiosyncratic dynamics.

Figures

Figures reproduced from arXiv: 2604.25087 by Michel F. C. Haddad, Yasumasa Matsuda.

**Figure 1.** Figure 1: Simulation results as functions of the estimated number of factors view at source ↗

**Figure 2.** Figure 2: Weekly summaries of the raw Ct observations. The top panel reports the weekly mean view at source ↗

**Figure 3.** Figure 3: Map of the 20 regions in Brazil. Red dots indicate region centroids, and the numeric labels view at source ↗

**Figure 4.** Figure 4: Number of detected directed edges as a function of the estimated number of factors view at source ↗

**Figure 5.** Figure 5: Directed network for the post-2020/9/28 subsample with view at source ↗

read the original abstract

We propose a density-valued vector autoregressive model with latent factors for multivariate time series of density functions. Motivated by weekly regional distributions of SARS-CoV-2 cycle threshold (Ct) values in Brazil, we study their distributional dynamics across regions. The Ct value is the number of amplification cycles required for the viral signal to cross a detection threshold (lower Ct values correspond to higher viral load). We estimate each regional density by a B-spline mixture, mapping the mixture weights to a Euclidean space by a generalized logit transform equipped with an isometric inner product, and model the transformed series by a cross-regional VAR with latent factors. This decomposition allows for the separation between strong common movements and directed idiosyncratic dynamics. Directed edges are identified from the idiosyncratic VAR component using one-sided tests with Benjamini--Yekutieli false discovery rate control. Simulations show that increasing the number of estimated factors does not mechanically eliminate genuine idiosyncratic dependence; rather, it mainly removes spuriously detected edges driven by common factor movements. In the real-world data application, the full sample yields only a weak directed network, whereas a substantial network emerges once the first six months are excluded and the density prior is kept weak. The estimated links suggest directed predictive relations from the northern region toward southeastern metropolitan areas.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The density-valued VAR with latent factors cleanly separates common and idiosyncratic dynamics in simulations, but the Brazil directed network only shows up after dropping the first six months and using a weak prior.

read the letter

The paper's main advance is the full pipeline: B-spline density estimates mapped isometrically to Euclidean space via generalized logit, then modeled with a latent-factor VAR that isolates directed idiosyncratic edges after removing common movements. Simulations are the clearest strength here. They show that adding factors mostly removes spurious edges driven by shared factors rather than erasing real idiosyncratic dependence, and the FDR-controlled edge detection on the idiosyncratic part follows standard practice. That combination is not just another functional VAR; the factor step is used specifically to clean the network inference.

Referee Report

2 major / 3 minor

Summary. The paper proposes a density-valued VAR model with latent factors for multivariate time series of density functions. Densities are estimated via B-spline mixtures, mapped to Euclidean space through a generalized logit transform with isometric inner product, and modeled via a cross-regional VAR that decomposes common factor movements from idiosyncratic dynamics. Directed edges are recovered from the idiosyncratic component via one-sided tests with Benjamini-Yekutieli FDR control. Simulations demonstrate that additional factors primarily eliminate spurious common-factor edges without removing genuine idiosyncratic dependence. In the SARS-CoV-2 Ct-value application to Brazilian regions, the full sample yields only a weak network, while a substantial directed network (northern to southeastern regions) appears after excluding the first six months and using a weak density prior.

Significance. If the central decomposition and edge-identification procedure are robust, the framework offers a principled way to separate global trends from region-specific predictive relations in distributional time series, with clear relevance to epidemiological monitoring and other density-valued data. The simulation evidence that factor augmentation removes spurious rather than genuine edges is a concrete methodological strength that supports the separation claim.

major comments (2)

[real-world data application] Real-world data application (abstract and §5): the reported directed network from northern to southeastern regions is obtained only after excluding the first six months of data and retaining a weak density prior; the manuscript provides no pre-specified justification for these choices, and the simulations in §4 do not examine sensitivity to sample truncation or prior strength. Because the full-sample result is described as weak, these two post-estimation decisions are load-bearing for the empirical claim.
[§4] §4 (simulations): while the reported experiments correctly show that increasing the number of latent factors removes spurious common-factor edges without eliminating genuine idiosyncratic dependence, they do not include robustness checks for the exact sample-truncation and prior-strength choices that drive the empirical network. This leaves open whether the separation result generalizes to the data-dependent decisions used in the application.

minor comments (3)

[model description] The description of the generalized logit transform and isometric inner product would benefit from an explicit statement of the dimension of the transformed space and how the B-spline knot placement is chosen.
[estimation] It is unclear how the number of latent factors is selected in the real-data analysis; a data-driven criterion or cross-validation procedure should be stated.
[abstract and application] The abstract and application section should report the total number of regions, the exact time span, and the number of weekly observations to allow readers to assess the scale of the network.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and constructive suggestions. We address the major comments point by point below, proposing revisions to enhance the robustness and transparency of our analysis.

read point-by-point responses

Referee: [real-world data application] Real-world data application (abstract and §5): the reported directed network from northern to southeastern regions is obtained only after excluding the first six months of data and retaining a weak density prior; the manuscript provides no pre-specified justification for these choices, and the simulations in §4 do not examine sensitivity to sample truncation or prior strength. Because the full-sample result is described as weak, these two post-estimation decisions are load-bearing for the empirical claim.

Authors: We agree that the empirical network is sensitive to these choices and that pre-specification would have been preferable. The exclusion of the initial six months was motivated by the documented ramp-up in SARS-CoV-2 testing capacity in Brazil during early 2020, which likely introduced noise and bias into the Ct-value distributions. The weak density prior was selected to minimize shrinkage and allow the data to inform the density shapes more directly. In the revised manuscript, we will explicitly state these motivations in §5 with supporting references to the epidemiological literature on early-pandemic data quality. Additionally, we will conduct and report sensitivity analyses by varying the truncation point (e.g., excluding 3, 6, or 9 months) and prior strength, presenting the resulting networks and discussing stability. We will also update the simulation section to include scenarios with truncated samples and varying priors to assess the robustness of the factor decomposition. revision: yes
Referee: [§4] §4 (simulations): while the reported experiments correctly show that increasing the number of latent factors removes spurious common-factor edges without eliminating genuine idiosyncratic dependence, they do not include robustness checks for the exact sample-truncation and prior-strength choices that drive the empirical network. This leaves open whether the separation result generalizes to the data-dependent decisions used in the application.

Authors: We acknowledge this limitation in the current simulation design. While the existing simulations demonstrate the core property of the factor-augmented model, they do not directly replicate the data-dependent decisions from the application. In the revision, we will extend §4 with additional simulation experiments that incorporate sample truncation and different prior strengths in the density estimation step. These will evaluate whether the separation of common and idiosyncratic components remains reliable under conditions mimicking the real-data choices, thereby addressing the generalizability concern. revision: yes

Circularity Check

0 steps flagged

No circularity detected in model derivation or empirical claims

full rationale

The paper defines a density-valued VAR with latent factors via explicit steps: B-spline mixture estimation of regional densities, generalized logit transform to Euclidean space, cross-regional VAR decomposition into common factors plus idiosyncratic component, and one-sided tests on the idiosyncratic VAR for directed edges. These steps are constructed forward from the data and model assumptions without any quoted equation reducing the reported directed network (or its north-to-southeast pattern) back to a fitted parameter or prior choice by definition. Simulations are used only to verify that factor count removes spurious common-factor edges rather than genuine idiosyncratic ones, which is an independent check. The reported sensitivity to excluding the first six months and prior strength is presented as an empirical observation, not a load-bearing derivation step or self-citation chain. No self-citations, uniqueness theorems, or ansatzes are invoked to force the central result.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of the density-to-Euclidean mapping and the latent factor separation; since only the abstract is available, the ledger reflects components explicitly mentioned.

free parameters (3)

number of latent factors
Selected to remove common movements; simulations test effects of increasing the number.
B-spline mixture parameters
Fitted to estimate each regional density function.
density prior strength
Kept weak in the application to obtain the reported network.

axioms (2)

domain assumption The generalized logit transform equipped with an isometric inner product maps density mixture weights to a Euclidean space suitable for standard VAR modeling.
Invoked to enable cross-regional VAR on the transformed series.
domain assumption The time series of transformed densities can be decomposed into common latent factor movements plus idiosyncratic directed dynamics.
Central to separating shared trends from region-specific predictive relations.

pith-pipeline@v0.9.0 · 5519 in / 1677 out tokens · 72900 ms · 2026-05-07T15:55:41.015988+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

G., Yerly, S., Bellon, M., Wagner, N., Rohr, M., Huttner, B., Blanchard- Rohner, G., Loevy, N., Kaiser, L., and Gervaix, A

Baggio, S., L’Huillier, A. G., Yerly, S., Bellon, M., Wagner, N., Rohr, M., Huttner, B., Blanchard- Rohner, G., Loevy, N., Kaiser, L., and Gervaix, A. (2021). Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) viral load in the upper respiratory tract of children and adults with early acute coronavirus disease 2019 (covid-19).Clinical Infectious...

work page 2021
[2]

Bai, J. (2009). Panel data models with interactive fixed effects.Econometrica, 77(4):1229–1279. Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221. Banho, C. A., Sacchetto, L., Campos, G. R. F., et al. (2022). Impact of sars-cov-2 gamma lin- eage introduction and covid-19 vaccination on t...

work page 2009
[3]

Stock, J. H. and Watson, M. W. (2002). Macroeconomic forecasting using diffusion indexes.Journal of Business & Economic Statistics, 20(2):147–162. van den Boogaart, K. G., Egozcue, J. J., and Pawlowsky-Glahn, V. (2010). Bayes linear spaces. SORT–Statistics and Operations Research Transactions, 34(2):201–222. van der Vaart, A. W. (1998).Asymptotic Statisti...

work page 2002

[1] [1]

G., Yerly, S., Bellon, M., Wagner, N., Rohr, M., Huttner, B., Blanchard- Rohner, G., Loevy, N., Kaiser, L., and Gervaix, A

Baggio, S., L’Huillier, A. G., Yerly, S., Bellon, M., Wagner, N., Rohr, M., Huttner, B., Blanchard- Rohner, G., Loevy, N., Kaiser, L., and Gervaix, A. (2021). Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) viral load in the upper respiratory tract of children and adults with early acute coronavirus disease 2019 (covid-19).Clinical Infectious...

work page 2021

[2] [2]

Bai, J. (2009). Panel data models with interactive fixed effects.Econometrica, 77(4):1229–1279. Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221. Banho, C. A., Sacchetto, L., Campos, G. R. F., et al. (2022). Impact of sars-cov-2 gamma lin- eage introduction and covid-19 vaccination on t...

work page 2009

[3] [3]

Stock, J. H. and Watson, M. W. (2002). Macroeconomic forecasting using diffusion indexes.Journal of Business & Economic Statistics, 20(2):147–162. van den Boogaart, K. G., Egozcue, J. J., and Pawlowsky-Glahn, V. (2010). Bayes linear spaces. SORT–Statistics and Operations Research Transactions, 34(2):201–222. van der Vaart, A. W. (1998).Asymptotic Statisti...

work page 2002