Robust Personalized Recommendation under Hidden Confounding in MNAR
Pith reviewed 2026-05-21 05:31 UTC · model grok-4.3
The pith
Estimating user-item level sensitivity bounds relaxes the uniform assumption in deconfounding recommender systems with hidden confounders.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a framework called Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID) can recover accurate user-item interaction probabilities by learning individualized sensitivity bounds on the effect of unobserved confounders, thereby relaxing the homogeneity assumption required by global sensitivity analysis; a benchmark-guided variant (BPUID) further stabilizes training by anchoring to pre-trained models, and both versions outperform global methods on real-world data without any randomized controlled trial observations.
What carries the argument
Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID), a framework that estimates a distinct sensitivity bound for each user-item pair on the influence of hidden confounders on interaction propensities through adversarial optimization.
If this is right
- Recommender models can achieve higher predictive accuracy under hidden confounding by using interaction-specific rather than uniform sensitivity bounds.
- The homogeneity assumption of global sensitivity analysis is no longer required for practical deconfounding in missing-not-at-random settings.
- Adversarial optimization combined with optional benchmark guidance balances robustness against hidden confounders with maintained recommendation quality.
- Performance improvements hold across multiple real-world datasets without any need for randomized controlled trial data.
Where Pith is reading between the lines
- The same idea of learning interaction-specific bounds could be tested in other domains where confounding strength varies, such as personalized treatment effect estimation.
- One could examine whether the estimated bounds remain stable when the underlying recommendation model is changed from matrix factorization to modern neural architectures.
- Direct validation against small-scale randomized trials on the same users and items would test whether the data-driven bounds recover the effects observed in the randomized setting.
Load-bearing premise
User-item level sensitivity bounds can be reliably estimated from observational data alone via the proposed adversarial optimization strategy without introducing new biases or requiring external validation.
What would settle it
A controlled simulation in which the true magnitude of hidden confounding varies across user-item pairs according to a known generative process; if the method's estimated bounds fail to contain the true confounding effects or produce worse predictions than global bounds, the central claim is falsified.
read the original abstract
Recommender systems often rely on observational user--item interaction data, which is prone to selection bias due to users' selective interactions with items. Inverse propensity weighting and doubly robust estimators effectively mitigate selection bias under observed confounding, but are unreliable in the presence of hidden confounders. Existing approaches relying on randomized controlled trials (RCTs) or global sensitivity bounds are constrained in practice: RCTs demand costly experimental data, while global sensitivity bounds presume a uniformly bounded effect of unmeasured confounders on propensities through sensitivity analysis, thereby neglecting heterogeneity across user--item interactions. To overcome this limitation, we propose a novel framework, which estimates user--item level sensitivity bounds, thereby substantially relaxing the homogeneity assumption inherent in global sensitivity bounds named Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID). To ensure both robustness and predictive accuracy, we further develop an adversarial optimization strategy and propose a benchmark-guided variant (BPUID) that incorporates pre-trained models as stabilizing references. Extensive experiments on three real-world datasets demonstrate that our approach significantly outperforms global methods under hidden confounding, without requiring RCT data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID) framework to address hidden confounding in MNAR recommender systems. It estimates user-item level sensitivity bounds via adversarial optimization, relaxing the homogeneity assumption of global sensitivity bounds, and introduces a benchmark-guided variant (BPUID) that incorporates pre-trained models. The authors report that experiments on three real-world datasets show significant outperformance over global methods without requiring RCT data.
Significance. If the personalized bounds can be shown to be identifiable and non-circular, the framework would meaningfully advance robust recommendation by enabling heterogeneous sensitivity analysis without RCTs or uniform bounds, potentially improving practical deployment in observational settings with hidden confounders.
major comments (2)
- [§3] §3 (Adversarial Optimization for Personalized Bounds): The claim that user-item sensitivity bounds are recoverable from observational MNAR data alone via the min-max game is load-bearing but unsupported. Sensitivity parameters remain fundamentally unidentifiable under hidden confounding; the adversarial objective can be satisfied by arbitrary feasible intervals without anchoring to the true (unknown) confounding strength, directly weakening the assertion that personalized bounds reliably relax global homogeneity.
- [§5] §5 (Experiments): The reported outperformance on three datasets lacks any detail on bound estimation procedure, concrete form of the adversarial strategy, or statistical significance testing. Without these, it is impossible to verify whether the empirical gains substantiate the robustness claims or merely reflect optimization artifacts.
minor comments (2)
- [§3] The manuscript would benefit from an explicit statement of the precise optimization objective (e.g., the loss and constraint forms) in the main text rather than deferring all details to the appendix.
- [§2] Notation for the sensitivity bounds (upper/lower per user-item pair) should be introduced consistently before the first use in the method description.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our work. We provide point-by-point responses to the major comments and outline the revisions we plan to make to improve the clarity and rigor of the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Adversarial Optimization for Personalized Bounds): The claim that user-item sensitivity bounds are recoverable from observational MNAR data alone via the min-max game is load-bearing but unsupported. Sensitivity parameters remain fundamentally unidentifiable under hidden confounding; the adversarial objective can be satisfied by arbitrary feasible intervals without anchoring to the true (unknown) confounding strength, directly weakening the assertion that personalized bounds reliably relax global homogeneity.
Authors: We concur that sensitivity parameters cannot be uniquely identified from observational MNAR data due to the presence of hidden confounding. Our framework does not purport to recover the ground-truth confounding strengths but rather employs an adversarial min-max optimization to compute personalized sensitivity bounds that are consistent with the observed data while allowing for heterogeneity across user-item pairs. This approach provides a practical relaxation of the global sensitivity bound assumption by deriving data-dependent intervals that ensure robustness. We will revise the manuscript in §3 to explicitly discuss the identifiability challenges and clarify that the bounds serve as conservative, feasible ranges for sensitivity analysis rather than precise estimates of the true effects. Additionally, we will provide more formal justification for the adversarial game's role in bounding the confounding impact. revision: yes
-
Referee: [§5] §5 (Experiments): The reported outperformance on three datasets lacks any detail on bound estimation procedure, concrete form of the adversarial strategy, or statistical significance testing. Without these, it is impossible to verify whether the empirical gains substantiate the robustness claims or merely reflect optimization artifacts.
Authors: We appreciate this observation and agree that additional details are necessary for reproducibility and verification. In the revised version, we will augment §5 with a comprehensive description of the bound estimation procedure, including the specific implementation of the adversarial optimization strategy (e.g., the loss functions and training dynamics). We will also report the results of statistical significance tests to confirm that the performance improvements are statistically meaningful and not due to random optimization variations. These additions will strengthen the empirical validation of our claims. revision: yes
Circularity Check
No significant circularity in PUID derivation chain
full rationale
The paper proposes estimating user-item sensitivity bounds from observational MNAR data via an adversarial optimization strategy within the PUID framework, then applies them for deconfounding. No load-bearing step reduces by construction to a self-definition, a fitted parameter renamed as a prediction, or a self-citation chain. The BPUID variant references pre-trained models as stabilizers, but this is an external reference rather than an internal tautology. The central claim rests on the proposed optimization and empirical outperformance on three datasets, which supplies independent content outside the inputs. No equations or sections exhibit the specific reductions required for circularity flags.
Axiom & Free-Parameter Ledger
free parameters (1)
- personalized sensitivity bounds
axioms (1)
- domain assumption The effect of hidden confounders on propensities varies across different user-item pairs
invented entities (1)
-
PUID
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Bias and debiasing in recommender systems: A survey and future directions,
J. Chen, H. Dong, X. Wang, F. Feng, M. Wang, and X. He, “Bias and debiasing in recommender systems: A survey and future directions,”ACM Transactions on Information Systems, vol. 41, no. 3, pp. 1–39, 2023
work page 2023
-
[2]
Collaborative filtering and the missing at random assumption,
B. M. Marlin, R. S. Zemel, S. T. Roweis, and M. Slaney, “Collaborative filtering and the missing at random assumption,” inUAI, 2007
work page 2007
-
[3]
B. Pradel, N. Usunier, and P. Gallinari, “Ranking with non- random missing ratings: Influence of popularity and positivity on evaluation metrics,” inRecSys, 2012
work page 2012
-
[4]
R. J. A. Little and D. B. Rubin,Statistical Analysis with Missing Data, 3rd ed. Wiley, 2019
work page 2019
-
[5]
Model- agnostic counterfactual reasoning for eliminating popularity bias in recommender systems,
T. Wei, F. Feng, J. Chen, Z. Wu, J. Yi, and X. He, “Model- agnostic counterfactual reasoning for eliminating popularity bias in recommender systems,” inKDD, 2021
work page 2021
-
[6]
Modeling dynamic missingness of implicit feedback for recommendation,
M. Wang, M. Gong, X. Zheng, and K. Zhang, “Modeling dynamic missingness of implicit feedback for recommendation,” inNeurIPS, 2018
work page 2018
-
[7]
Training and testing low-degree polynomial data mappings via linear svm,
Y .-W. Chang, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin, “Training and testing low-degree polynomial data mappings via linear svm,”Journal of Machine Learning Research, vol. 11, pp. 1471– 1490, 2010
work page 2010
-
[8]
Probabilistic matrix factorization with non-random missing data,
J. M. Hern ´andez-Lobato, N. Houlsby, and Z. Ghahramani, “Probabilistic matrix factorization with non-random missing data,” inICML, 2014
work page 2014
-
[9]
Training and testing of recommender systems on data missing not at random,
H. Steck, “Training and testing of recommender systems on data missing not at random,” inKDD, 2010
work page 2010
-
[10]
G. W. Imbens and D. B. Rubin,Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press, 2015
work page 2015
-
[11]
Unbiased recommen- dation model based on improved propensity score estimation,
J. Luo, D. Liu, W. Pan, and Z. Ming, “Unbiased recommen- dation model based on improved propensity score estimation,” Journal of Computer Applications, vol. 42, no. 8, pp. 3508– 3515, 2021
work page 2021
-
[12]
Doubly robust estimator for ranking metrics with post- click conversions,
Y . Saito, “Doubly robust estimator for ranking metrics with post- click conversions,” inRecSys, 2020
work page 2020
-
[13]
Recommendations as treatments: Debiasing learn- ing and evaluation,
T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims, “Recommendations as treatments: Debiasing learn- ing and evaluation,” inICML, 2016
work page 2016
-
[14]
Counterfactuals and causal inference: Methods and principles for social research,
S. L. Morgan and C. Winship, “Counterfactuals and causal inference: Methods and principles for social research,”Social F orces, vol. 88, no. 1, pp. 466–467, 2009
work page 2009
-
[15]
Doubly robust joint learning for recommendation on data missing not at random,
X. Wang, R. Zhang, Y . Sun, and J. Qi, “Doubly robust joint learning for recommendation on data missing not at random,” inICML, 2019
work page 2019
-
[16]
Addressing unmeasured confounder for recommendation with sensitivity analysis,
S. Ding, P. Wu, F. Feng, Y . Wang, X. He, Y . Liao, and Y . Zhang, “Addressing unmeasured confounder for recommendation with sensitivity analysis,” inKDD, 2022
work page 2022
-
[17]
Removing hidden confounding in recom- mendation: A unified multi-task learning approach,
H. Li, K. Wu, C. Zheng, Y . Xiao, H. Wang, Z. Geng, F. Feng, X. He, and P. Wu, “Removing hidden confounding in recom- mendation: A unified multi-task learning approach,”NeurIPS, 2023
work page 2023
-
[18]
Balancing unobserved confounding with a few unbiased ratings in debiased recom- mendations,
H. Li, Y . Xiao, C. Zheng, and P. Wu, “Balancing unobserved confounding with a few unbiased ratings in debiased recom- mendations,” inWWW, 2023
work page 2023
-
[19]
Addressing correlated latent exogenous variables in debiased recommender systems,
S. Zhang, Y . Zhang, J. Chen, and H. Sui, “Addressing correlated latent exogenous variables in debiased recommender systems,” inKDD, 2025
work page 2025
-
[20]
S. Zhang and T. Xia, “CBPL: A unified calibration and balanc- ing propensity learning framework in causal recommendation for debiasing,” inIJCAI Workshop, 2025
work page 2025
-
[21]
C. Zheng, H. Pan, Y . Zhang, and H. Li, “Adaptive structure learning with partial parameter sharing for post-click conversion rate prediction,” inSIGIR, 2025
work page 2025
-
[22]
Unified min- imax optimization framework for propensity score estimation in debiased recommendation,
C. Zheng, H. Yang, J. Chen, S. Zhang, and T. Xia, “Unified min- imax optimization framework for propensity score estimation in debiased recommendation,” inAAAI, 2026
work page 2026
-
[23]
Addressing hidden confounding with heterogeneous observational datasets for rec- ommendation,
Y . Xiao, H. Li, Y . Tang, and W. Zhang, “Addressing hidden confounding with heterogeneous observational datasets for rec- ommendation,” inNeurIPS, 2024
work page 2024
-
[24]
Unveiling extraneous sampling bias with data missing-not-at-random,
C. Zheng, H. Yang, H. Li, and M. Yang, “Unveiling extraneous sampling bias with data missing-not-at-random,” inNeurIPS, 2025
work page 2025
-
[25]
Confounder balancing in adversarial domain adaptation for pre- trained large models fine-tuning,
S. Jiang, Q. Chen, Y . Xiang, Y . Pan, X. Wu, and Y . Lin, “Confounder balancing in adversarial domain adaptation for pre- trained large models fine-tuning,”Neural Networks, vol. 173, p. 106173, 2024
work page 2024
-
[26]
Learning causal effects on hypergraphs,
J. Ma, M. Wan, L. Yang, J. Li, B. Hecht, and J. Teevan, “Learning causal effects on hypergraphs,” inKDD, 2022
work page 2022
-
[27]
Person- alized behavior-aware transformer for multi-behavior sequential recommendation,
J. Su, C. Chen, Z. Lin, X. Li, W. Liu, and X. Zheng, “Person- alized behavior-aware transformer for multi-behavior sequential recommendation,” inACM MM, 2023
work page 2023
-
[28]
Ddghm: Dual dynamic graph with hybrid metric training for cross-domain sequential recommendation,
X. Zheng, J. Su, W. Liu, and C. Chen, “Ddghm: Dual dynamic graph with hybrid metric training for cross-domain sequential recommendation,” inACM MM, 2022
work page 2022
-
[29]
How can recommender systems benefit from large language models: A survey,
J. Lin, X. Dai, Y . Xi, W. Liu, B. Chen, H. Zhang, Y . Liu, C. Wu, X. Li, C. Zhuet al., “How can recommender systems benefit from large language models: A survey,”ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–47, 2025
work page 2025
-
[30]
Large language models make sample-efficient recommender systems,
J. Lin, X. Dai, R. Shan, B. Chen, R. Tang, Y . Yu, and W. Zhang, “Large language models make sample-efficient recommender systems,”Frontiers of Computer Science, vol. 19, no. 4, p. 194328, 2025
work page 2025
-
[31]
Combating selection biases in recommender systems with a few unbiased ratings,
X. Wang, R. Zhang, Y . Sun, and J. Qi, “Combating selection biases in recommender systems with a few unbiased ratings,” inWSDM, 2021
work page 2021
-
[32]
Learning causal networks with latent variables from multivari- ate information in genomic data,
L. Verny, N. Sella, S. Affeldt, P. Singh, and H. Isambert, “Learning causal networks with latent variables from multivari- ate information in genomic data,”PLoS Computational Biology, vol. 13, no. 11, p. e1005662, 2017
work page 2017
-
[33]
Matrix factorization tech- niques for recommender systems,
Y . Koren, R. Bell, and C. V olinsky, “Matrix factorization tech- niques for recommender systems,”Computer, vol. 42, no. 8, pp. 30–37, 2009
work page 2009
-
[34]
W. Zhang, W. Bao, X.-Y . Liu, K. Yang, Q. Lin, H. Wen, and R. Ramezani, “Large-scale causal approaches to debiasing post- click conversion rate estimation with multi-task learning,” in WWW, 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.