Explainability, risk modeling, and segmentation based customer churn analytics for personalized retention in e-commerce
Pith reviewed 2026-05-18 07:20 UTC · model grok-4.3
The pith
A three-part framework uses explainable AI, survival modeling, and RFM segments to attribute churn drivers, time interventions, and prioritize e-commerce customers for retention.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that linking explainable AI to quantify how individual features contribute to churn probability, survival analysis to model the duration until churn occurs, and RFM profiling to classify customers by transactional history together produce attributions of churn drivers, estimates of intervention timing, and prioritized segments that support personalized retention strategies in e-commerce.
What carries the argument
The integrated three-component framework that joins feature contribution scores from explainable AI, time-to-churn hazard functions from survival analysis, and behavioral clusters from RFM analysis to guide retention decisions.
If this is right
- Firms can trace specific drivers such as low purchase frequency or declining recency to particular customer actions and adjust product or pricing offers accordingly.
- Survival estimates supply concrete time windows during which a retention message has the highest chance of preventing departure.
- RFM segments let marketers concentrate limited resources on high-value groups that show elevated risk rather than contacting everyone uniformly.
- The combined output shifts retention planning from generic campaigns toward evidence-based, segment-level actions that can be tested for lift.
Where Pith is reading between the lines
- The same structure could be tested on subscription or service datasets where repeat purchase patterns differ from one-time retail buys.
- Interactions between RFM groups and specific feature attributions might surface segment-specific churn reasons that single-method studies miss.
- Real-time streaming of transaction data could turn the framework into a trigger system that alerts teams when a customer's risk window opens.
Load-bearing premise
Typical e-commerce transaction logs already hold enough detail for these three standard techniques to deliver stable, actionable explanations of churn causes and timing without needing major new data collection or domain tweaks.
What would settle it
Run the framework on a held-out e-commerce dataset, apply the resulting segment-specific interventions, and measure whether churn rates drop more than under a simple baseline model or random targeting.
Figures
read the original abstract
In online retail, customer acquisition typically incurs higher costs than customer retention, motivating firms to invest in churn analytics. However, many contemporary churn models operate as opaque black boxes, limiting insight into the determinants of attrition, the timing of retention opportunities, and the identification of high-risk customer segments. Accordingly, the emphasis should shift from prediction alone to the design of personalized retention strategies grounded in interpretable evidence. This study advances a three-component framework that integrates explainable AI to quantify feature contributions, survival analysis to model time-to-event churn risk, and RFM profiling to segment customers by transactional behaviour. In combination, these methods enable the attribution of churn drivers, estimation of intervention windows, and prioritization of segments for targeted actions, thereby supporting strategies that reduce attrition and strengthen customer loyalty.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a three-component framework for customer churn analytics in e-commerce that integrates explainable AI to quantify feature contributions, survival analysis to model time-to-event churn risk, and RFM profiling to segment customers by transactional behaviour. In combination, these methods are claimed to enable attribution of churn drivers, estimation of intervention windows, and prioritization of segments for targeted retention actions.
Significance. If the framework were empirically validated on real transaction data with demonstrated consistency between components and actionable outputs, it could advance interpretable churn management beyond black-box predictors. The absence of any fitted models, results, or validation currently prevents assessment of whether the integration yields reliable attributions or intervention estimates.
major comments (3)
- [Proposed Framework] Proposed Framework section: The manuscript sketches the joint use of XAI, survival analysis, and RFM on typical transaction data but supplies no empirical application, dataset details, fitted models, SHAP/survival/RFM output tables, or baseline comparisons, leaving the load-bearing assumption that standard e-commerce logs will yield stable, non-conflicting insights unsupported.
- [Integration subsection] Integration subsection: No demonstration is provided of how censoring in survival times interacts with RFM recency scores or how XAI attributions are aggregated across survival strata, so the claim that the components combine into reliable attributions and intervention windows remains untested.
- [Validation and Results] Validation and Results: The central claim that the framework supports strategies to reduce attrition rests on an unvalidated assumption of component compatibility; without cross-validation or consistency checks, the practical utility for personalized retention cannot be evaluated.
minor comments (2)
- The abstract and framework description would benefit from a schematic diagram showing data flow between the three components.
- Consider adding references to prior work on hybrid XAI-survival models in churn prediction to better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We agree that the current manuscript presents the framework conceptually without sufficient empirical demonstration, and we will revise it substantially by adding a full empirical application on real transaction data, including all requested outputs, integration details, and validation metrics. This will directly address the concerns about unsupported assumptions and untested compatibility.
read point-by-point responses
-
Referee: [Proposed Framework] Proposed Framework section: The manuscript sketches the joint use of XAI, survival analysis, and RFM on typical transaction data but supplies no empirical application, dataset details, fitted models, SHAP/survival/RFM output tables, or baseline comparisons, leaving the load-bearing assumption that standard e-commerce logs will yield stable, non-conflicting insights unsupported.
Authors: We accept this assessment. The original submission emphasized the framework architecture over implementation. In revision we will add a complete 'Empirical Application' section that applies the framework to a public e-commerce transaction dataset (e.g., the UCI Online Retail II dataset or equivalent). This will include: dataset description and preprocessing, fitted models (XGBoost with SHAP for XAI, Cox PH or parametric survival for time-to-churn, and standard RFM scoring), full output tables (SHAP summary plots and values, Kaplan-Meier or Cox survival curves by segment, RFM segment profiles), and baseline comparisons (e.g., against logistic regression and random survival forests). These additions will provide concrete evidence that standard logs produce stable, non-conflicting insights. revision: yes
-
Referee: [Integration subsection] Integration subsection: No demonstration is provided of how censoring in survival times interacts with RFM recency scores or how XAI attributions are aggregated across survival strata, so the claim that the components combine into reliable attributions and intervention windows remains untested.
Authors: We agree that explicit integration mechanics must be shown. The revised Integration subsection will contain a worked example using the empirical data. It will demonstrate: (i) handling of right-censored observations when computing RFM recency (e.g., last observed transaction date for censored customers versus actual churn date), (ii) stratification of customers by survival risk quantiles, and (iii) aggregation of SHAP attributions across strata via weighted averaging or stratum-specific summaries. We will also derive and tabulate intervention windows from the survival functions for each RFM segment, thereby testing the claim that the components produce reliable combined outputs. revision: yes
-
Referee: [Validation and Results] Validation and Results: The central claim that the framework supports strategies to reduce attrition rests on an unvalidated assumption of component compatibility; without cross-validation or consistency checks, the practical utility for personalized retention cannot be evaluated.
Authors: We acknowledge that the practical utility claim requires empirical support. The revised manuscript will add a 'Validation and Results' section that reports: k-fold cross-validation performance for the survival and XAI components (concordance index, AUC, calibration plots), consistency checks between components (e.g., alignment of SHAP-ranked features with survival model coefficients and with RFM segment churn rates), and simulated retention-lift analysis showing expected reduction in attrition when targeting high-risk segments with personalized interventions. These checks will allow direct evaluation of component compatibility and real-world utility. revision: yes
Circularity Check
No circularity: proposal integrates standard existing methods without derivations or self-referential fitting
full rationale
The manuscript advances a descriptive three-component framework combining explainable AI for feature contributions, survival analysis for time-to-event modeling, and RFM for segmentation. No equations, parameter fittings, or derivation chains appear in the abstract or described content. Each component is an off-the-shelf technique applied to typical transaction data, with no self-definition, fitted-input-as-prediction, or load-bearing self-citation that reduces the central claim to its own inputs by construction. The framework sketch is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard churn-modeling techniques (XAI, survival analysis, RFM) can be combined to yield superior actionable insights on e-commerce data.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
three-component framework that integrates explainable AI to quantify feature contributions, survival analysis to model time-to-event churn risk, and RFM profiling
-
IndisputableMonolith/Foundation/ArrowOfTime.leanarrow_from_z unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Kaplan–Meier estimator ... survival function S(t)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
User modeling for churn prediction in e- commerce,
P. Berger and M. Kompan, “User modeling for churn prediction in e- commerce,”IEEE Intelligent Systems, vol. 34, no. 2, pp. 44–52, 2019
work page 2019
-
[2]
Comparing to techniques used in customer churn analysis,
O. C ¸ elik and U. O. Osmanoglu, “Comparing to techniques used in customer churn analysis,”Journal of Multidisciplinary Developments, vol. 4, no. 1, pp. 30–38, 2019
work page 2019
-
[3]
F. Le and J. Zhai, “Research on cross-border e-commerce customer churn prediction based on enhanced xgboost algorithm with temporal- spatial features,”Journal of Computational Methods in Sciences and Engineering, p. 14727978251337888, 2025
work page 2025
-
[4]
Customer churn in retail e- commerce business: Spatial and machine learning approach,
K. Matuszela ´nski and K. Kopczewska, “Customer churn in retail e- commerce business: Spatial and machine learning approach,”Journal of Theoretical and Applied Electronic Commerce Research, vol. 17, no. 1, pp. 165–198, 2022
work page 2022
-
[5]
T. Wang, “Risk assessment of customer churn in e-commerce platforms by integrating rf algorithm and extreme gradient boosting algorithm,” Service Oriented Computing and Applications, pp. 1–17, 2025
work page 2025
-
[6]
Deep learning for customer churn prediction in e- commerce decision support,
M. Pondel, M. Wuczy ´nski, W. Gryncewicz, Ł. Łysik, M. Hernes, A. Rot, and A. Kozina, “Deep learning for customer churn prediction in e- commerce decision support,” inBusiness Information Systems, 2021, pp. 3–12
work page 2021
-
[7]
Predicting customer churn in e-commerce sub- scription services using rnn with attention mechanisms,
C. Anudeep, R. Venugopal, M. Aarif, A. T. Valavan, V . A. Vuyyuru, and S. Muthuperumal, “Predicting customer churn in e-commerce sub- scription services using rnn with attention mechanisms,” in2024 15th International Conference on Computing Communication and Network- ing Technologies (ICCCNT). IEEE, 2024, pp. 1–6
work page 2024
-
[8]
Opening the black box: the promise and limitations of explainable machine learning in cardiology,
J. Petch, S. Di, and W. Nelson, “Opening the black box: the promise and limitations of explainable machine learning in cardiology,”Canadian Journal of Cardiology, vol. 38, no. 2, pp. 204–213, 2022
work page 2022
-
[9]
S. Hu, P. Chen, and X. Chen, “Do personalized economic incentives work in promoting shared mobility? examining customer churn using a time-varying cox model,”Transportation Research Part C: Emerging Technologies, vol. 128, p. 103224, 2021
work page 2021
-
[10]
Rediscovering market segmentation
D. Yankelovich and D. Meer, “Rediscovering market segmentation.” Harvard business review, vol. 84, no. 2, pp. 122–31, 2006
work page 2006
-
[11]
I. Jahan and T. F. Sanam, “A comprehensive framework for cus- tomer retention in e-commerce using machine learning based on churn prediction, customer segmentation, and recommendation,”Electronic Commerce Research, pp. 1–44, 2024
work page 2024
-
[12]
Applying survival analysis to telecom churn data,
M. Masarifoglu and A. H. Buyuklu, “Applying survival analysis to telecom churn data,”American Journal of Theoretical and Applied Statistics, vol. 8, no. 6, pp. 261–275, 2019
work page 2019
-
[13]
M. Imani and H. R. Arabnia, “Hyperparameter optimization and com- bined data sampling techniques in machine learning for customer churn prediction: a comparative analysis,”Technologies, vol. 11, no. 6, p. 167, 2023
work page 2023
-
[14]
Explainable machine learning models applied to predicting customer churn for e-commerce,
I. Boukrouh and A. Azmani, “Explainable machine learning models applied to predicting customer churn for e-commerce,”Int J Artif Intell ISSN, vol. 2252, no. 8938, p. 8938
-
[15]
P. P. Singh, F. I. Anik, R. Senapati, A. Sinha, N. Sakib, and E. Hossain, “Investigating customer churn in banking: A machine learning approach and visualization app for data science and management,”Data Science and Management, vol. 7, no. 1, pp. 7–16, 2024
work page 2024
-
[16]
Churn prediction in mobile social games: Towards a complete assessment using survival ensembles,
´A. Peri ´a˜nez, A. Saas, A. Guitart, and C. Magne, “Churn prediction in mobile social games: Towards a complete assessment using survival ensembles,” in2016 IEEE international conference on data science and advanced analytics (DSAA). IEEE, 2016, pp. 564–573
work page 2016
-
[17]
Exploiting limited players’ behavioral data to predict churn in gamification,
E. Loria and A. Marconi, “Exploiting limited players’ behavioral data to predict churn in gamification,”Electronic Commerce Research and Applications, vol. 47, p. 101057, 2021
work page 2021
-
[18]
Predicting customer churn from valuable b2b customers in the logistics industry: a case study,
K. Chen, Y .-H. Hu, and Y .-C. Hsieh, “Predicting customer churn from valuable b2b customers in the logistics industry: a case study,” Information Systems and e-Business Management, vol. 13, no. 3, pp. 475–494, 2015
work page 2015
-
[19]
Customer segmentation by using rfm model and clustering methods: a case study in retail industry,
O. Do ˘gan, E. Ayc ¸in, and Z. Bulut, “Customer segmentation by using rfm model and clustering methods: a case study in retail industry,” International Journal of Contemporary Economics and Administrative Sciences, vol. 8, 2018
work page 2018
-
[20]
Rfm ranking–an effective approach to customer segmentation,
A. J. Christy, A. Umamakeswari, L. Priyatharsini, and A. Neyaa, “Rfm ranking–an effective approach to customer segmentation,”Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 10, pp. 1251–1257, 2021
work page 2021
-
[21]
L. Zhang and Q. Wei, “Personalized and contextualized data analysis for e-commerce customer retention improvement with bi-lstm churn prediction,”IEEE Transactions on Consumer Electronics, 2024
work page 2024
-
[22]
Intelligent prediction of customer churn with a fused attentional deep learning model,
Y . Liu, M. Shengdong, G. Jijian, and N. Nedjah, “Intelligent prediction of customer churn with a fused attentional deep learning model,” Mathematics, vol. 10, no. 24, p. 4733, 2022
work page 2022
-
[23]
Transformer-based model for predicting customers’ next purchase day in e-commerce,
A. Grigoras , and F. Leon, “Transformer-based model for predicting customers’ next purchase day in e-commerce,”Computation, vol. 11, no. 11, p. 210, 2023
work page 2023
-
[24]
D. Asif, M. S. Arif, and A. Mukheimer, “A data-driven approach with explainable artificial intelligence for customer churn prediction in the telecommunications industry,”Results in Engineering, vol. 26, p. 104629, 2025
work page 2025
-
[25]
Explainable ai for cheating detection and churn prediction in online games,
J. Tao, Y . Xiong, S. Zhao, R. Wu, X. Shen, T. Lyu, C. Fan, Z. Hu, S. Zhao, and G. Pan, “Explainable ai for cheating detection and churn prediction in online games,”IEEE Transactions on Games, vol. 15, no. 2, pp. 242–251, 2022
work page 2022
-
[26]
Ecommerce Customer Churn Analysis and Prediction,
A. Verma, “Ecommerce Customer Churn Analysis and Prediction,” https://www.kaggle.com/datasets/ankitverma2010/ecommerce-customer- churn-analysis-and-prediction, 2021, [Accessed 23-08-2025]
work page 2021
-
[27]
A. Manzoor, M. A. Qureshi, E. Kidney, and L. Longo, “A review on machine learning methods for customer churn prediction and recommen- dations for business practitioners,”IEEE access, vol. 12, pp. 70 434– 70 463, 2024
work page 2024
-
[28]
Customer churn prediction on e-commerce data using stacking classifier,
S. Awasthi, “Customer churn prediction on e-commerce data using stacking classifier,”Authorea Preprints, 2022
work page 2022
-
[29]
Analysis of random forest algorithm on customer churn pre- diction to handle imbalanced data,
H. Ma, “Analysis of random forest algorithm on customer churn pre- diction to handle imbalanced data,”International Research Journal of Advanced Engineering and Science, vol. 6, no. 3, pp. 102–106, 2021
work page 2021
-
[30]
Estimating missing data: an iterative regression approach,
B. Holt and R. A. Benfer Jr, “Estimating missing data: an iterative regression approach,”Journal of Human Evolution, vol. 39, no. 3, pp. 289–296, 2000
work page 2000
-
[31]
G. J. McLachlan, “Mahalanobis distance,”Resonance, vol. 4, no. 6, pp. 20–26, 1999
work page 1999
-
[32]
Z. Li, “Extracting spatial effects from machine learning model using lo- cal interpretation method: An example of shap and xgboost,”Computers, Environment and Urban Systems, vol. 96, p. 101845, 2022
work page 2022
-
[33]
Insights into geospatial heterogeneity of landslide susceptibility based on the shap-xgboost model,
J. Zhang, X. Ma, J. Zhang, D. Sun, X. Zhou, C. Mi, and H. Wen, “Insights into geospatial heterogeneity of landslide susceptibility based on the shap-xgboost model,”Journal of environmental management, vol. 332, p. 117357, 2023
work page 2023
-
[34]
What makes an online review more helpful: an interpretation framework using xgboost and shap values,
Y . Meng, N. Yang, Z. Qian, and G. Zhang, “What makes an online review more helpful: an interpretation framework using xgboost and shap values,”Journal of Theoretical and Applied Electronic Commerce Research, vol. 16, no. 3, pp. 466–490, 2020
work page 2020
-
[35]
A. B. Parsa, A. Movahedi, H. Taghipour, S. Derrible, and A. K. Mohammadian, “Toward safer highways, application of xgboost and shap for real-time accident detection and feature analysis,”Accident Analysis & Prevention, vol. 136, p. 105405, 2020
work page 2020
-
[36]
Nonparametric estimation from incomplete observations,
E. L. Kaplan and P. Meier, “Nonparametric estimation from incomplete observations,”Journal of the American statistical association, vol. 53, no. 282, pp. 457–481, 1958
work page 1958
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.