Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational Causal Inference

Alex Deng; Jacob Zhu; Linsha Chen; Yufei Wu; Zhiying Gu

arxiv: 2606.30992 · v1 · pith:PC5XYWXOnew · submitted 2026-06-30 · 📊 stat.ME · cs.LG· stat.AP

Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational Causal Inference

Yufei Wu , Zhiying Gu , Alex Deng , Jacob Zhu , Linsha Chen This is my paper

Pith reviewed 2026-07-01 01:40 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.AP

keywords multicollinearityhierarchical clusteringcausal inferencemarketing mix modelsobservational dataBayesian modelsgeographic data

0 comments

The pith

Hierarchical clustering of geographic units on marketing spend correlations, after normalization and demeaning, allows separate identification of channel effects in a Bayesian marketing mix model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve multicollinearity in observational causal inference by using hierarchical clustering to aggregate data from geographic units. It normalizes and demeans the geo-level data to focus on correlations in marketing expenditures, then clusters units with similar patterns. This aggregated data is then used in a Bayesian Marketing Mix Model, enabling the isolation of individual channel impacts that would otherwise be confounded. Sympathetic readers care because traditional methods like shrinkage or principal components lose the original causal relationships, while this preserves them for causal questions. The method is shown to work through evidence that collinearity is reduced and identification improves.

Core claim

By hierarchically clustering geographic units based on marketing spend correlation after normalization and demeaning, and fitting a Bayesian Marketing Mix Model at the cluster level, the approach mitigates collinearity and facilitates the separate identification of the impact of different marketing channels.

What carries the argument

Hierarchical clustering on pairwise distances of normalized and demeaned marketing spend data between geographic units.

If this is right

The clustering effectively mitigates collinearity in the aggregated data.
It enables separate identification of impacts from different marketing channels.
The two-step process of normalization, demeaning, then clustering on correlation is key to the method.
This solution is generally applicable to causal problems featuring multicollinearity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying this clustering in other fields with correlated predictors, such as economic policy analysis, could yield similar benefits.
Comparing results from clustered models to those from regularized regressions on the same data would test robustness.
Exploring different linkage methods in the hierarchical clustering could optimize the balance between collinearity reduction and data granularity.

Load-bearing premise

That clustering geographic units solely based on observed spend correlations will preserve the underlying causal relationships without introducing aggregation bias that alters individual channel effect identification.

What would settle it

Observing that the variance inflation factors remain high or that channel coefficient estimates change substantially when using different clustering cutoffs would indicate the method does not reliably mitigate the problem.

Figures

Figures reproduced from arXiv: 2606.30992 by Alex Deng, Jacob Zhu, Linsha Chen, Yufei Wu, Zhiying Gu.

**Figure 4.** Figure 4: Heat Maps of Channel Residual Impressions Across DMAs Over Time [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Clustering Reduces Cross-Channel Correlation [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Variation in Residual Channel Impressions Across Clusters Over [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Posterior Distributions of Parameters our example, marketing data properties motivated us to cluster geographic units based on correlation in marketing activities. In other settings, one can decide which dimensions and criteria to use for clustering based on relevant data properties. The dimension to cluster data does not always have to be geographic. Futher more, in some settings, natural clusters might e… view at source ↗

read the original abstract

Multicollinearity is a long lasting challenge in observational causal inference, especially in regressions -- highly correlated independent variables make it hard to isolate their individual impacts on outcomes of interest. While common solutions such as shrinkage estimators and principal component regressions are helpful in prediction problems, a crucial limitation hinders their applicability to causal inference problems -- they cannot provide the original causal relationships. To fill the gap, we present an innovative and intuitive solution, by employing hierarchical clustering to aggregate data in a way that effectively alleviates collinearity. This method is generally applicable to causal problems featuring multicollinearity. We use a marketing application to demonstrate how and why it works. Expenditures on different advertising channels often exhibit correlations, making it exceedingly difficult to separately measure their impact. Many previous studies proposed to leverage granular cross-sectional data for better identification but, to our knowledge, none explicitly addressed multicollinearity, which undermines causal identification even with granular data. We propose to hierarchically cluster geographic units based on marketing spend correlation to reduce collinearity, and to implement a Bayesian Marketing Mix Model with cluster-level data. Such clustering happens in two steps -- we first normalize and demean geo-level data to establish a common scale and to eliminate the common trends; we then calculate pairwise distance to summarize marketing spend correlation between geos and cluster the ones with moderate to strong correlation. Both descriptive evidence and regression analysis affirm that such hierarchical clustering effectively mitigates collinearity and facilitates the separate identification of the impact of different marketing channels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Clustering geos on spend correlations after normalization and demeaning is a practical preprocessing idea for reducing multicollinearity in geo MMM, but the abstract gives no metrics or bias checks so the claim is hard to judge.

read the letter

The paper's main proposal is a two-step hierarchical clustering of geographic units: normalize and demean the spend series to remove scale and common trends, then cluster on pairwise correlations to group units with moderate to strong spend similarity, and finally fit the Bayesian MMM at the cluster level. This is positioned as a way to ease multicollinearity while keeping the original causal relationships intact.

The approach is new in the sense that the authors explicitly target multicollinearity in this granular geo setting, where earlier work focused on granularity but did not describe this clustering step. The demeaning step is a reasonable way to isolate relative spend patterns, and the overall recipe is simple enough that practitioners could try it.

What the paper does well is lay out a clear, intuitive data-processing sequence that directly addresses the correlation problem without switching to shrinkage or dimension reduction that might obscure channel-specific effects.

The soft spot is that the abstract asserts both descriptive evidence and regression results confirm the mitigation, yet supplies none of the numbers, baselines, or validation details. The stress-test concern about ecological bias is on point: if response elasticities vary with the spend patterns used for clustering, aggregation can produce composite effects rather than the disaggregated ones. Without simulations that recover known heterogeneous parameters or conditions under which bias is zero, the central claim rests on an unshown demonstration.

This is for analysts working on marketing mix models with correlated channels and geo data who need a concrete preprocessing step. A reader already familiar with MMM might pick up the clustering trick and test it themselves.

I would send it to peer review if the full paper adds quantitative comparisons and bias checks; the idea is straightforward enough to be worth referee time once the empirical support is filled in.

Referee Report

2 major / 1 minor

Summary. The paper claims that hierarchical clustering of geographic units, performed after normalizing and demeaning geo-level marketing spend data and using pairwise correlations as distance, reduces multicollinearity sufficiently to allow separate identification of individual advertising channel effects when a Bayesian Marketing Mix Model is estimated at the resulting cluster level. It asserts that both descriptive evidence and regression analysis confirm the mitigation and improved identification, positioning the method as a general solution for multicollinearity in observational causal inference that preserves interpretability of the original variables.

Significance. If the central claim holds after proper validation, the approach would supply a practical, interpretable preprocessing step for causal regression problems featuring correlated regressors, particularly in marketing-mix and similar applied settings where shrinkage or principal-component methods are undesirable because they obscure the original causal parameters.

major comments (2)

[Abstract] Abstract: the assertion that 'both descriptive evidence and regression analysis affirm' mitigation of collinearity supplies no quantitative metrics, error bars, baseline comparisons, or cluster-validation statistics, leaving the central empirical claim without measurable support.
[Abstract] Abstract: the method implicitly assumes that clustering on observed spend correlations (after demeaning) and subsequent aggregation will recover channel-specific causal effects without ecological bias; no simulation recovering known heterogeneous parameters or formal condition guaranteeing zero aggregation bias is provided, which is load-bearing for the identification claim.

minor comments (1)

[Abstract] Abstract: the phrase 'moderate to strong correlation' is used to define the clustering threshold but is never quantified, leaving the procedure incompletely specified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects of our empirical claims and identification strategy. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'both descriptive evidence and regression analysis affirm' mitigation of collinearity supplies no quantitative metrics, error bars, baseline comparisons, or cluster-validation statistics, leaving the central empirical claim without measurable support.

Authors: The abstract serves as a concise summary; the full manuscript presents the supporting evidence through pre- and post-clustering correlation matrices, variance inflation factor comparisons, and regression diagnostics. To address the concern directly in the abstract, we will revise it to report specific quantitative metrics such as the reduction in average pairwise correlations and VIF values relative to the unclustered baseline, along with cluster validation statistics. revision: yes
Referee: [Abstract] Abstract: the method implicitly assumes that clustering on observed spend correlations (after demeaning) and subsequent aggregation will recover channel-specific causal effects without ecological bias; no simulation recovering known heterogeneous parameters or formal condition guaranteeing zero aggregation bias is provided, which is load-bearing for the identification claim.

Authors: Clustering is performed on demeaned and normalized spend data to group geos with similar relative patterns, thereby reducing multicollinearity while retaining the original channel variables at the cluster level. The manuscript motivates this via the marketing application and discusses the underlying assumptions. We agree that ecological bias under heterogeneity is a relevant concern not addressed via simulation in the current version. In revision we will expand the discussion of identification assumptions and potential aggregation bias, though a dedicated simulation study is not feasible within the scope of this revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity; method is a preprocessing recipe verified empirically

full rationale

The paper proposes hierarchical clustering of geos on normalized/demeaned spend correlations as a preprocessing step before fitting a Bayesian MMM at the cluster level. It reports that this reduces collinearity via descriptive evidence and regression analysis, but presents no algebraic derivation, fitted parameter renamed as prediction, or self-citation chain that reduces the central claim to its inputs by construction. The clustering criterion and aggregation are external to the identification claim and are checked against data rather than assumed tautologically. This is a standard methodological proposal with independent empirical support.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method introduces an implicit free parameter in the choice of correlation threshold for clustering and relies on the domain assumption that spend correlation is the primary driver of multicollinearity that can be removed by aggregation without side effects.

free parameters (1)

correlation threshold for 'moderate to strong'
The paper states clusters are formed for 'moderate to strong correlation' without specifying the numeric cutoff or sensitivity analysis.

axioms (1)

domain assumption Normalization and demeaning remove common trends without distorting relative channel effects
Invoked in the first clustering step described in the abstract.

pith-pipeline@v0.9.1-grok · 5810 in / 1227 out tokens · 41947 ms · 2026-07-01T01:40:11.213528+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Ron Berman. 2018. Beyond the last touch: Attribution in online advertising. Marketing Science37, 5 (2018), 771–792

2018
[2]

Thomas Blake, Chris Nosko, and Steven Tadelis. 2015. Consumer heterogeneity and paid search effectiveness: A large-scale field experiment.Econometrica83, 1 (2015), 155–174

2015
[3]

David Chan and Mike Perry. 2017. Challenges and opportunities in media mix modeling. (2017)

2017
[4]

Hao Chen, Minguang Zhang, Lanshan Han, and Alvin Lim. 2021. Hierarchical marketing mix models with sign constraints.Journal of Applied Statistics48, 13-15 (2021), 2944–2960

2021
[5]

Jamal I Daoud. 2017. Multicollinearity and regression analysis. InJournal of Physics: Conference Series, Vol. 949. IOP Publishing, 012009

2017
[6]

Ruihuan Du, Yu Zhong, Harikesh Nair, Bo Cui, and Ruyang Shou. 2019. Causally driven incremental multi touch attribution using a recurrent neural network. arXiv preprint arXiv:1902.00215(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[7]

2000.Partially linear models

Wolfgang Härdle, Hua Liang, and Jiti Gao. 2000.Partially linear models. Springer Science & Business Media

2000
[8]

Arthur E Hoerl and Robert W Kennard. 1970. Ridge regression: applications to nonorthogonal problems.Technometrics12, 1 (1970), 69–82

1970
[9]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin. 2015.Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press. https://doi.org/10.1017/CBO9781139025751

work page doi:10.1017/cbo9781139025751 2015
[10]

Yuxue Jin, Yueqing Wang, Yunting Sun, David Chan, and Jim Koehler. 2017. Bayesian methods for media mix modeling with carryover and shape effects. (2017)

2017
[11]

Fionn Murtagh and Pedro Contreras. 2012. Algorithms for hierarchical cluster- ing: an overview.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery2, 1 (2012), 86–97

2012
[12]

2014.Position effects in search advertis- ing: A regression discontinuity approach

Sridhar Narayanan and Kirthi Kalyanam. 2014.Position effects in search advertis- ing: A regression discontinuity approach. Technical Report. Working paper

2014
[13]

Edwin Ng, Zhishi Wang, and Athena Dai. 2021. Bayesian Time Varying Co- efficient Model with Applications to Marketing Mix Modeling.arXiv preprint arXiv:2106.03322(2021)

work page arXiv 2021
[14]

Chandan K Reddy and Bhanukiran Vinzamuri. 2018. A survey of partitional and hierarchical clustering algorithms. InData clustering. Chapman and Hall/CRC, 87–110

2018
[15]

Michael Thomas. 2020. Spillovers from mass advertising: An identification strategy.Marketing Science39, 4 (2020), 807–826

2020
[16]

Jon Vaver and Stephanie Shin-Hui Zhang. 2017. Introduction to the Aggregate Marketing System Simulator. (2017)

2017
[17]

Yueqing Wang, Yuxue Jin, Yunting Sun, David Chan, and Jim Koehler. 2017. A hierarchical Bayesian approach to improve media mix models using category data. (2017)

2017
[18]

Michael J Wolfe Sr and John C Crotts. 2011. Marketing mix modeling for the tourism industry: A best practices approach.International Journal of Tourism Sciences11, 1 (2011), 1–15

2011

[1] [1]

Ron Berman. 2018. Beyond the last touch: Attribution in online advertising. Marketing Science37, 5 (2018), 771–792

2018

[2] [2]

Thomas Blake, Chris Nosko, and Steven Tadelis. 2015. Consumer heterogeneity and paid search effectiveness: A large-scale field experiment.Econometrica83, 1 (2015), 155–174

2015

[3] [3]

David Chan and Mike Perry. 2017. Challenges and opportunities in media mix modeling. (2017)

2017

[4] [4]

Hao Chen, Minguang Zhang, Lanshan Han, and Alvin Lim. 2021. Hierarchical marketing mix models with sign constraints.Journal of Applied Statistics48, 13-15 (2021), 2944–2960

2021

[5] [5]

Jamal I Daoud. 2017. Multicollinearity and regression analysis. InJournal of Physics: Conference Series, Vol. 949. IOP Publishing, 012009

2017

[6] [6]

Ruihuan Du, Yu Zhong, Harikesh Nair, Bo Cui, and Ruyang Shou. 2019. Causally driven incremental multi touch attribution using a recurrent neural network. arXiv preprint arXiv:1902.00215(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[7] [7]

2000.Partially linear models

Wolfgang Härdle, Hua Liang, and Jiti Gao. 2000.Partially linear models. Springer Science & Business Media

2000

[8] [8]

Arthur E Hoerl and Robert W Kennard. 1970. Ridge regression: applications to nonorthogonal problems.Technometrics12, 1 (1970), 69–82

1970

[9] [9]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin. 2015.Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press. https://doi.org/10.1017/CBO9781139025751

work page doi:10.1017/cbo9781139025751 2015

[10] [10]

Yuxue Jin, Yueqing Wang, Yunting Sun, David Chan, and Jim Koehler. 2017. Bayesian methods for media mix modeling with carryover and shape effects. (2017)

2017

[11] [11]

Fionn Murtagh and Pedro Contreras. 2012. Algorithms for hierarchical cluster- ing: an overview.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery2, 1 (2012), 86–97

2012

[12] [12]

2014.Position effects in search advertis- ing: A regression discontinuity approach

Sridhar Narayanan and Kirthi Kalyanam. 2014.Position effects in search advertis- ing: A regression discontinuity approach. Technical Report. Working paper

2014

[13] [13]

Edwin Ng, Zhishi Wang, and Athena Dai. 2021. Bayesian Time Varying Co- efficient Model with Applications to Marketing Mix Modeling.arXiv preprint arXiv:2106.03322(2021)

work page arXiv 2021

[14] [14]

Chandan K Reddy and Bhanukiran Vinzamuri. 2018. A survey of partitional and hierarchical clustering algorithms. InData clustering. Chapman and Hall/CRC, 87–110

2018

[15] [15]

Michael Thomas. 2020. Spillovers from mass advertising: An identification strategy.Marketing Science39, 4 (2020), 807–826

2020

[16] [16]

Jon Vaver and Stephanie Shin-Hui Zhang. 2017. Introduction to the Aggregate Marketing System Simulator. (2017)

2017

[17] [17]

Yueqing Wang, Yuxue Jin, Yunting Sun, David Chan, and Jim Koehler. 2017. A hierarchical Bayesian approach to improve media mix models using category data. (2017)

2017

[18] [18]

Michael J Wolfe Sr and John C Crotts. 2011. Marketing mix modeling for the tourism industry: A best practices approach.International Journal of Tourism Sciences11, 1 (2011), 1–15

2011