Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment

Jiaqi Xu; Jing Chen; Qiwei Zhong; Xiang Ao; Yang Liu; Yiran Qiao

arxiv: 2606.02946 · v1 · pith:DQ4M4H3Jnew · submitted 2026-06-01 · 💻 cs.LG · cs.CR

Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment

Yiran Qiao , Jing Chen , Jiaqi Xu , Yang Liu , Qiwei Zhong , Xiang Ao This is my paper

Pith reviewed 2026-06-28 15:10 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords live streamingrisk assessmenttactical OOD shiftcounterfactual decouplinglatent causal modelingadversarial narrativeintent stability

0 comments

The pith

LPCD anchors live streaming risk predictions on stable malicious intent by enforcing latent counterfactual consistency despite changing narrative tactics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles tactical out-of-distribution shifts in live streaming risk assessment, where actors keep fixed malicious objectives but redesign narrative packaging to evade detectors. Existing OOD methods struggle because intent and tactics evolve together and raw counterfactuals are ill-defined. LPCD addresses this from a latent causal view by modeling intent and narrative variation separately at the latent level. It enforces latent counterfactual consistency so that risk scores stay tied to the causally stable intent rather than surface tactics. A parameter-free calibration step at inference further reduces tactic-induced shifts, with experiments on industrial datasets and production traffic showing gains over baselines.

Core claim

LPCD enables counterfactual reasoning under adversarial tactical re-packaging by modeling intent and narrative variation at the latent level, and enforces latent counterfactual consistency to anchor risk prediction on causally stable malicious intent.

What carries the argument

Latent-Predictive Counterfactual Decoupling (LPCD), a plug-in framework that separates intent from narrative at the latent level and enforces latent counterfactual consistency to stabilize predictions.

If this is right

Risk scores stay anchored on intent when only narrative packaging changes.
Latent-level modeling bypasses the need for well-defined raw-level counterfactual examples.
Lightweight inference-time calibration mitigates distribution shifts without model retraining.
The approach supports continuous moderation of evolving adversarial risks in production live streaming systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent separation might reduce retraining frequency in other intent-stable but presentation-variable domains such as comment moderation or transaction fraud.
If the latent consistency property holds across platforms, LPCD could serve as a reusable module for any detector facing tactical evolution.
Controlled ablation on synthetic intent-tactic pairs would directly measure how much the consistency constraint contributes versus the calibration step.

Load-bearing premise

Malicious intent remains stable and separable from narrative tactics at the latent level.

What would settle it

A test set in which the identical malicious intent is delivered through entirely new narrative packaging and LPCD accuracy drops to match or fall below standard OOD baselines.

Figures

Figures reproduced from arXiv: 2606.02946 by Jiaqi Xu, Jing Chen, Qiwei Zhong, Xiang Ao, Yang Liu, Yiran Qiao.

**Figure 2.** Figure 2: Overview of LPCD. In training flow: (a) Latent Representation Disentanglement factorizes session representations into [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: t-SNE visualization of decoupled representations. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Hyperparameter sensitivity analysis on the May [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Live streaming has emerged as a primary medium for social interaction and digital commerce, yet it is increasingly plagued by sophisticated risks. A fundamental challenge in this domain is \emph{tactical out-of-distribution (OOD) shift}: while malicious actors maintain stable underlying objectives, they continuously redesign narrative packaging to evade detection. Such adversarial shifts expose critical limitations of existing OOD generalization paradigms, whose assumptions are difficult to satisfy in the presence of tightly coupled intent-tactic evolution and ill-defined raw-level counterfactuals. In this paper, we tackle this issue from a \emph{latent causal} perspective and propose \underline{L}atent-\underline{P}redictive \underline{C}ounterfactual \underline{D}ecoupling~(LPCD), a plug-in framework for robust live streaming risk assessment. LPCD enables counterfactual reasoning under adversarial tactical re-packaging by modeling intent and narrative variation at the latent level, and enforces \emph{latent counterfactual consistency} to anchor risk prediction on causally stable malicious intent. At inference time, LPCD applies a lightweight, parameter-free calibration to further mitigate tactic-induced distribution shifts. Extensive experiments on large-scale industrial datasets and online production traffic demonstrate that LPCD consistently outperforms state-of-the-art baselines, validating its effectiveness in moderating evolving adversarial risks in real-world live streaming. The project page is available at https://qiaoyran.github.io/LiveStreamingRiskAssessment/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LPCD applies latent counterfactual consistency to tactical OOD in live streaming moderation but provides no derivation showing how the consistency isolates stable intent from coupled tactics without raw counterfactuals.

read the letter

The paper's main contribution is a plug-in framework called LPCD that treats tactical OOD shifts in live streaming risk detection as a latent causal problem. It models intent and narrative variation separately in latent space, adds a consistency constraint to keep predictions tied to stable malicious intent, and uses a parameter-free calibration step at inference. The authors report that this beats baselines on large industrial datasets and live production traffic.

What the work does reasonably well is frame a practical, recurring issue in content moderation: attackers keep the same goal but keep changing the surface form. The emphasis on a lightweight inference-time fix is a sensible engineering choice for deployment.

The soft spot is the missing mechanism. The abstract claims the latent consistency step anchors predictions on causally stable intent, yet it gives no derivation or loss formulation showing how this separation is achieved when only observational data with entangled intent-tactic pairs is available. Without raw-level counterfactuals, any learned consistency could just reflect correlations the model already sees. The stress-test note on this point stands up from the given text.

This paper is aimed at applied researchers and engineers building moderation systems for live platforms. A reader working on causal OOD methods might skim it for the application but would need the full equations and ablations to judge novelty. The experiments on real traffic give it some weight, but the lack of technical detail keeps the central claim provisional.

I would send it to peer review. The problem is concrete and the production results are worth checking, even if the causal grounding needs substantial clarification.

Referee Report

1 major / 1 minor

Summary. The paper proposes Latent-Predictive Counterfactual Decoupling (LPCD), a plug-in framework for live streaming risk assessment under tactical OOD shifts. It adopts a latent causal perspective to model intent and narrative variation separately at the latent level, enforces latent counterfactual consistency to anchor predictions on causally stable malicious intent, and applies a lightweight parameter-free calibration at inference to mitigate tactic-induced shifts. Experiments on large-scale industrial datasets and online production traffic show consistent outperformance over state-of-the-art baselines.

Significance. If the central claims hold, the work provides a novel latent-level approach to handling adversarial tactical re-packaging in risk detection, with the parameter-free calibration as a practical strength that avoids additional fitting. This could extend to other domains involving evolving adversarial behaviors where raw counterfactuals are ill-defined.

major comments (1)

[Abstract] Abstract: the claim that LPCD 'enforces latent counterfactual consistency to anchor risk prediction on causally stable malicious intent' is load-bearing but rests on the unvalidated assumption that intent remains separable from tactics at the latent level. No derivation is provided showing how the consistency loss isolates stable intent from observational data with coupled intent-tactic pairs, raising the possibility that learned consistency reflects spurious correlations rather than causal stability.

minor comments (1)

[Abstract] Abstract: the description of 'extensive experiments' lacks any mention of specific datasets, evaluation metrics, or baseline methods, making it difficult to assess the empirical support for the outperformance claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We provide a point-by-point response below and will incorporate clarifications in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that LPCD 'enforces latent counterfactual consistency to anchor risk prediction on causally stable malicious intent' is load-bearing but rests on the unvalidated assumption that intent remains separable from tactics at the latent level. No derivation is provided showing how the consistency loss isolates stable intent from observational data with coupled intent-tactic pairs, raising the possibility that learned consistency reflects spurious correlations rather than causal stability.

Authors: We appreciate the referee's emphasis on the need for stronger justification of the separability assumption. LPCD uses a disentangled latent encoder that explicitly factors the representation into an intent latent z_i (stable malicious objective) and a tactic latent z_t (narrative packaging). The consistency loss is L_cons = E[||p(y | z_i, z_t) - p(y | z_i, z_t')||] where z_t' is a sampled counterfactual tactic variation; minimizing this forces the predictor to ignore z_t variations. While we do not claim full causal identifiability from observational data alone (which would require stronger assumptions such as independent causal mechanisms), the architecture and loss are derived from the latent causal perspective stated in Section 3, and the parameter-free calibration at inference further decouples tactic shifts. Large-scale experiments on industrial datasets with documented tactical OOD shifts show consistent gains precisely in those regimes, which would be unlikely if the consistency merely captured spurious correlations. We will add a short formal sketch of the consistency objective and its intended effect in the revised Section 3 and update the abstract wording for precision. revision: yes

Circularity Check

0 steps flagged

No circularity exhibited; derivation self-contained on provided text

full rationale

The abstract describes LPCD as modeling intent and narrative variation at the latent level then enforcing latent counterfactual consistency, with a parameter-free calibration at inference. No equations, derivations, fitted-parameter renamings, or self-citations appear in the supplied text. Without any load-bearing step that reduces a claimed prediction or consistency enforcement to an input by construction, the central claim cannot be shown to collapse into its own assumptions. This is the expected honest non-finding when the manuscript supplies no explicit reduction to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract only; no information available on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5798 in / 979 out tokens · 21301 ms · 2026-06-28T15:10:52.469614+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Jean-Christophe Gagnon-Audet, Yoshua Bengio, Ioannis Mitliagkas, and Irina Rish. 2021. Invariance principle meets information bottleneck for out-of-distribution generalization.Advances in Neural Information Processing Systems34 (2021), 3438–3450

2021
[2]

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization.arXiv preprint arXiv:1907.02893(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[3]

Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. 2016. Domain separation networks.Advances in neural information processing systems29 (2016)

2016
[4]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. Simclr: A simple framework for contrastive learning of visual representations. InInternational Conference on Learning Representations, Vol. 2. PMLR New York, NY, USA

2020
[5]

Xiwen Chen, Peijie Qiu, Wenhui Zhu, Huayu Li, Hao Wang, Aristeidis Sotiras, Yalin Wang, and Abolfazl Razi. 2024. TimeMIL: advancing multivariate time series classification via a time-aware multiple instance learning. InProceedings of the 41st International Conference on Machine Learning. 7190–7206

2024
[6]

Dawei Cheng, Yao Zou, Sheng Xiang, and Changjun Jiang. 2025. Graph neural networks for financial fraud detection: a review.Frontiers of Computer Science19, 9 (2025), 1–15

2025
[7]

Elliot Creager, Jörn-Henrik Jacobsen, and Richard Zemel. 2021. Environment inference for invariant learning. InInternational Conference on Machine Learning. PMLR, 2189–2200. KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Yiran Qiao et al

2021
[8]

Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S Yu. 2020. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. InProceedings of the 29th ACM international conference on information & knowledge management. 315–324

2020
[9]

Joseph Early, Gavin KC Cheung, Kurt Cutajar, Hanting Xie, Jas Kandola, and Niall Twomey. 2024. Inherently Interpretable Time Series Classification via Multiple Instance Learning. InICLR

2024
[10]

Amir Feder, Katherine A Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E Roberts, et al. 2022. Causal inference in natural language processing: Estima- tion, prediction, interpretation and beyond.Transactions of the Association for Computational Linguistics10 (2022), 1138–1158

2022
[11]

Jia Guo, Guannan Liu, Yuan Zuo, and Junjie Wu. 2018. Learning sequential behavior representations for fraud detection. In2018 IEEE international conference on data mining (ICDM). IEEE, 127–136

2018
[12]

Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations

2017
[13]

Mengda Huang, Yang Liu, Xiang Ao, Kuan Li, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. 2022. Auc-oriented graph neural network for fraud detection. InProceedings of the ACM web conference 2022. 1311–1321

2022
[14]

Jaeseok Jang and Hyuk-Yoon Kwon. 2025. TAIL-MIL: Time-aware and instance- learnable multiple instance learning for multivariate time series anomaly de- tection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 17582–17589

2025
[15]

Hyunjik Kim and Andriy Mnih. 2018. Disentangling by factorising. InInterna- tional conference on machine learning. PMLR, 2649–2658

2018
[16]

Taero Kim, Subeen Park, Sungjun Lim, Yonghan Jung, Krikamol Muandet, and Kyungwoo Song. 2025. Sufficient invariant learning for distribution shift. In Proceedings of the Computer Vision and Pattern Recognition Conference. 4958–4967

2025
[17]

Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer.arXiv preprint arXiv:2001.04451(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[18]

David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. 2021. Out-of- distribution generalization via risk extrapolation (rex). InInternational conference on machine learning. PMLR, 5815–5826

2021
[19]

Alyssa Lees, Vinh Q Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, Donald Metzler, and Lucy Vasserman. 2022. A new generation of perspective api: Efficient multilingual character-level transformers. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 3197–3207

2022
[20]

Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. 2018. Adap- tive batch normalization for practical domain adaptation.Pattern Recognition80 (2018), 109–117

2018
[21]

Zhao Li, Haishuai Wang, Peng Zhang, Pengrui Hui, Jiaming Huang, Jian Liao, Ji Zhang, and Jiajun Bu. 2021. Live-streaming fraud detection: A heterogeneous graph neural network approach. InProceedings of the 27th ACM SIGKDD Confer- ence on Knowledge Discovery & Data Mining. 3670–3678

2021
[22]

Chang Liu, Xinwei Sun, Jindong Wang, Haoyue Tang, Tao Li, Tao Qin, Wei Chen, and Tie-Yan Liu. 2021. Learning causal semantic representation for out-of- distribution prediction.Advances in Neural Information Processing Systems34 (2021), 6155–6170

2021
[23]

Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, and B Aditya Prakash. 2024. Time-series forecasting for out-of-distribution generalization using invariant learning. InProceedings of the 41st International Conference on Machine Learning. 31312–31325

2024
[24]

Jiashuo Liu, Zheyan Shen, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, and Peng Cui. 2021. Towards out-of-distribution generalization: A survey.arXiv preprint arXiv:2108.13624(2021)

work page arXiv 2021
[25]

Yuting Liu, Qiang Zhou, Hanzhe Li, Fuzhen Zhuang, and Jingjing Gu. 2025. Long- term urban flow prediction against data distribution shift: A causal perspective. IEEE Transactions on Knowledge and Data Engineering(2025)

2025
[26]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations. https://openreview.net/ forum?id=Bkg6RiCqY7

2019
[27]

Xingyu Lu, Tianke Zhang, Chang Meng, Xiaobei Wang, Jinpeng Wang, Yi-Fan Zhang, Shisong Tang, Changyi Liu, Haojie Ding, Kaiyu Jiang, et al. 2025. Vlm as policy: Common-law content moderation framework for short video platform. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 4682–4693

2025
[28]

Divyat Mahajan, Shruti Tople, and Amit Sharma. 2021. Domain generalization using causal matching. InInternational conference on machine learning. PMLR, 7313–7324

2021
[29]

Khalid Oublal, Said Ladjal, David Benhaiem, Emmanuel LE BORGNE, and François Roueff. 2024. Disentangling time series representations via contrastive independence-of-support on l-variational inference. InThe Twelfth International Conference on Learning Representations

2024
[30]

2009.Causality

Judea Pearl. 2009.Causality. Cambridge university press

2009
[31]

Yiran Qiao, Jing Chen, Xiang Ao, Qiwei Zhong, Yang Liu, and Qing He. 2026. Live or Lie: Action-Aware Capsule Multiple Instance Learning for Risk Assessment in Live Streaming Platforms. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1182–1193

2026
[32]

Yiran Qiao, Yateng Tang, Xiang Ao, Qi Yuan, Ziming Liu, Chen Shen, and Xuehao Zheng. 2024. Financial Risk Assessment via Long-term Payment Behavior Sequence Folding . In2024 IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Los Alamitos, CA, USA, 410–419. doi:10.1109/ICDM59182. 2024.00048

work page doi:10.1109/icdm59182 2024
[33]

Yiran Qiao, Ningtao Wang, Yuncong Gao, Yang Yang, Xing Fu, Weiqiang Wang, and Xiang Ao. 2025. Online Fraud Detection via Test-Time Retrieval-Based Representation Enrichment. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 12470–12478

2025
[34]

Hashimoto, and Percy Liang

Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, and Percy Liang. 2020. Distributionally Robust Neural Networks. InInternational Conference on Learning Representations. https://openreview.net/forum?id=ryxGuJrFvS

2020
[35]

Axel Sauer and Andreas Geiger. 2021. Counterfactual Generative Networks. In International Conference on Learning Representations

2021
[36]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. InProceedings of the IEEE conference on computer vision and pattern recognition. 815–823

2015
[37]

Fengzhao Shi, Yanan Cao, Yanmin Shang, Yuchen Zhou, Chuan Zhou, and Jia Wu
[38]

InProceedings of the ACM web conference 2022

H2-fdetector: A gnn-based fraud detector with homophilic and heterophilic connections. InProceedings of the ACM web conference 2022. 1486–1494

2022
[39]

Baochen Sun and Kate Saenko. 2016. Deep coral: Correlation alignment for deep domain adaptation. InEuropean conference on computer vision. Springer, 443–450

2016
[40]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

2017
[41]

Victor Veitch, Alexander D’Amour, Steve Yadlowsky, and Jacob Eisenstein. 2021. Counterfactual invariance to spurious correlations in text classification.Advances in neural information processing systems34 (2021), 16196–16208

2021
[42]

Zixuan Wang, Yu Sun, Hongwei Wang, Baoyu Jing, Xiang Shen, Xin Luna Dong, Zhuolin Hao, Hongyu Xiong, and Yang Song. 2025. Reasoning-Enhanced Domain- Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track. 1104–1112

2025
[43]

Ziming Wang, Qianru Wu, Baolin Zheng, Junjie Wang, Kaiyu Huang, and Yanjie Shi. 2023. Sequence as genes: an user behavior modeling framework for fraud transaction detection in e-commerce. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5194–5203

2023
[44]

Xin Wu, Fei Teng, Xingwang Li, Ji Zhang, Qiang Duan, and Tianrui Li. 2026. Out-of-distribution generalization in time series: A survey.Information Fusion (2026), 104336

2026
[45]

Fei Xiao, Shaofeng Cai, Gang Chen, HV Jagadish, Beng Chin Ooi, and Meihui Zhang. 2024. VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 6025–6036

2024
[46]

Shen Yan, Huan Song, Nanxiang Li, Lincan Zou, and Liu Ren. 2020. Improve unsu- pervised domain adaptation with mixup training.arXiv preprint arXiv:2001.00677 (2020)

work page arXiv 2020
[47]

Savvas Zannettou, Mai ElSherief, Elizabeth Belding, Shirin Nilizadeh, and Gi- anluca Stringhini. 2020. Measuring and characterizing hate speech on news websites. InProceedings of the 12th ACM conference on web science. 125–134

2020
[48]

Cheng Zhang, Kun Zhang, and Yingzhen Li. 2020. A causal view on robustness of neural networks.Advances in Neural Information Processing Systems33 (2020), 289–301

2020
[49]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long se- quence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 11106–11115

2021
[50]

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. 2022. Domain generalization: A survey.IEEE transactions on pattern analysis and machine intelligence45, 4 (2022), 4396–4415. A Baseline Details First, we adopt two categories of backbone models as candidates to validate the effectiveness of LPCD. (i)Sequence Modelsexplicitly model the actio...

work page arXiv 2022

[1] [1]

Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Jean-Christophe Gagnon-Audet, Yoshua Bengio, Ioannis Mitliagkas, and Irina Rish. 2021. Invariance principle meets information bottleneck for out-of-distribution generalization.Advances in Neural Information Processing Systems34 (2021), 3438–3450

2021

[2] [2]

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization.arXiv preprint arXiv:1907.02893(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[3] [3]

Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. 2016. Domain separation networks.Advances in neural information processing systems29 (2016)

2016

[4] [4]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. Simclr: A simple framework for contrastive learning of visual representations. InInternational Conference on Learning Representations, Vol. 2. PMLR New York, NY, USA

2020

[5] [5]

Xiwen Chen, Peijie Qiu, Wenhui Zhu, Huayu Li, Hao Wang, Aristeidis Sotiras, Yalin Wang, and Abolfazl Razi. 2024. TimeMIL: advancing multivariate time series classification via a time-aware multiple instance learning. InProceedings of the 41st International Conference on Machine Learning. 7190–7206

2024

[6] [6]

Dawei Cheng, Yao Zou, Sheng Xiang, and Changjun Jiang. 2025. Graph neural networks for financial fraud detection: a review.Frontiers of Computer Science19, 9 (2025), 1–15

2025

[7] [7]

Elliot Creager, Jörn-Henrik Jacobsen, and Richard Zemel. 2021. Environment inference for invariant learning. InInternational Conference on Machine Learning. PMLR, 2189–2200. KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Yiran Qiao et al

2021

[8] [8]

Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S Yu. 2020. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. InProceedings of the 29th ACM international conference on information & knowledge management. 315–324

2020

[9] [9]

Joseph Early, Gavin KC Cheung, Kurt Cutajar, Hanting Xie, Jas Kandola, and Niall Twomey. 2024. Inherently Interpretable Time Series Classification via Multiple Instance Learning. InICLR

2024

[10] [10]

Amir Feder, Katherine A Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E Roberts, et al. 2022. Causal inference in natural language processing: Estima- tion, prediction, interpretation and beyond.Transactions of the Association for Computational Linguistics10 (2022), 1138–1158

2022

[11] [11]

Jia Guo, Guannan Liu, Yuan Zuo, and Junjie Wu. 2018. Learning sequential behavior representations for fraud detection. In2018 IEEE international conference on data mining (ICDM). IEEE, 127–136

2018

[12] [12]

Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations

2017

[13] [13]

Mengda Huang, Yang Liu, Xiang Ao, Kuan Li, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. 2022. Auc-oriented graph neural network for fraud detection. InProceedings of the ACM web conference 2022. 1311–1321

2022

[14] [14]

Jaeseok Jang and Hyuk-Yoon Kwon. 2025. TAIL-MIL: Time-aware and instance- learnable multiple instance learning for multivariate time series anomaly de- tection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 17582–17589

2025

[15] [15]

Hyunjik Kim and Andriy Mnih. 2018. Disentangling by factorising. InInterna- tional conference on machine learning. PMLR, 2649–2658

2018

[16] [16]

Taero Kim, Subeen Park, Sungjun Lim, Yonghan Jung, Krikamol Muandet, and Kyungwoo Song. 2025. Sufficient invariant learning for distribution shift. In Proceedings of the Computer Vision and Pattern Recognition Conference. 4958–4967

2025

[17] [17]

Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer.arXiv preprint arXiv:2001.04451(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[18] [18]

David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. 2021. Out-of- distribution generalization via risk extrapolation (rex). InInternational conference on machine learning. PMLR, 5815–5826

2021

[19] [19]

Alyssa Lees, Vinh Q Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, Donald Metzler, and Lucy Vasserman. 2022. A new generation of perspective api: Efficient multilingual character-level transformers. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 3197–3207

2022

[20] [20]

Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. 2018. Adap- tive batch normalization for practical domain adaptation.Pattern Recognition80 (2018), 109–117

2018

[21] [21]

Zhao Li, Haishuai Wang, Peng Zhang, Pengrui Hui, Jiaming Huang, Jian Liao, Ji Zhang, and Jiajun Bu. 2021. Live-streaming fraud detection: A heterogeneous graph neural network approach. InProceedings of the 27th ACM SIGKDD Confer- ence on Knowledge Discovery & Data Mining. 3670–3678

2021

[22] [22]

Chang Liu, Xinwei Sun, Jindong Wang, Haoyue Tang, Tao Li, Tao Qin, Wei Chen, and Tie-Yan Liu. 2021. Learning causal semantic representation for out-of- distribution prediction.Advances in Neural Information Processing Systems34 (2021), 6155–6170

2021

[23] [23]

Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, and B Aditya Prakash. 2024. Time-series forecasting for out-of-distribution generalization using invariant learning. InProceedings of the 41st International Conference on Machine Learning. 31312–31325

2024

[24] [24]

Jiashuo Liu, Zheyan Shen, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, and Peng Cui. 2021. Towards out-of-distribution generalization: A survey.arXiv preprint arXiv:2108.13624(2021)

work page arXiv 2021

[25] [25]

Yuting Liu, Qiang Zhou, Hanzhe Li, Fuzhen Zhuang, and Jingjing Gu. 2025. Long- term urban flow prediction against data distribution shift: A causal perspective. IEEE Transactions on Knowledge and Data Engineering(2025)

2025

[26] [26]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations. https://openreview.net/ forum?id=Bkg6RiCqY7

2019

[27] [27]

Xingyu Lu, Tianke Zhang, Chang Meng, Xiaobei Wang, Jinpeng Wang, Yi-Fan Zhang, Shisong Tang, Changyi Liu, Haojie Ding, Kaiyu Jiang, et al. 2025. Vlm as policy: Common-law content moderation framework for short video platform. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 4682–4693

2025

[28] [28]

Divyat Mahajan, Shruti Tople, and Amit Sharma. 2021. Domain generalization using causal matching. InInternational conference on machine learning. PMLR, 7313–7324

2021

[29] [29]

Khalid Oublal, Said Ladjal, David Benhaiem, Emmanuel LE BORGNE, and François Roueff. 2024. Disentangling time series representations via contrastive independence-of-support on l-variational inference. InThe Twelfth International Conference on Learning Representations

2024

[30] [30]

2009.Causality

Judea Pearl. 2009.Causality. Cambridge university press

2009

[31] [31]

Yiran Qiao, Jing Chen, Xiang Ao, Qiwei Zhong, Yang Liu, and Qing He. 2026. Live or Lie: Action-Aware Capsule Multiple Instance Learning for Risk Assessment in Live Streaming Platforms. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1182–1193

2026

[32] [32]

Yiran Qiao, Yateng Tang, Xiang Ao, Qi Yuan, Ziming Liu, Chen Shen, and Xuehao Zheng. 2024. Financial Risk Assessment via Long-term Payment Behavior Sequence Folding . In2024 IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Los Alamitos, CA, USA, 410–419. doi:10.1109/ICDM59182. 2024.00048

work page doi:10.1109/icdm59182 2024

[33] [33]

Yiran Qiao, Ningtao Wang, Yuncong Gao, Yang Yang, Xing Fu, Weiqiang Wang, and Xiang Ao. 2025. Online Fraud Detection via Test-Time Retrieval-Based Representation Enrichment. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 12470–12478

2025

[34] [34]

Hashimoto, and Percy Liang

Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, and Percy Liang. 2020. Distributionally Robust Neural Networks. InInternational Conference on Learning Representations. https://openreview.net/forum?id=ryxGuJrFvS

2020

[35] [35]

Axel Sauer and Andreas Geiger. 2021. Counterfactual Generative Networks. In International Conference on Learning Representations

2021

[36] [36]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. InProceedings of the IEEE conference on computer vision and pattern recognition. 815–823

2015

[37] [37]

Fengzhao Shi, Yanan Cao, Yanmin Shang, Yuchen Zhou, Chuan Zhou, and Jia Wu

[38] [38]

InProceedings of the ACM web conference 2022

H2-fdetector: A gnn-based fraud detector with homophilic and heterophilic connections. InProceedings of the ACM web conference 2022. 1486–1494

2022

[39] [39]

Baochen Sun and Kate Saenko. 2016. Deep coral: Correlation alignment for deep domain adaptation. InEuropean conference on computer vision. Springer, 443–450

2016

[40] [40]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

2017

[41] [41]

Victor Veitch, Alexander D’Amour, Steve Yadlowsky, and Jacob Eisenstein. 2021. Counterfactual invariance to spurious correlations in text classification.Advances in neural information processing systems34 (2021), 16196–16208

2021

[42] [42]

Zixuan Wang, Yu Sun, Hongwei Wang, Baoyu Jing, Xiang Shen, Xin Luna Dong, Zhuolin Hao, Hongyu Xiong, and Yang Song. 2025. Reasoning-Enhanced Domain- Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track. 1104–1112

2025

[43] [43]

Ziming Wang, Qianru Wu, Baolin Zheng, Junjie Wang, Kaiyu Huang, and Yanjie Shi. 2023. Sequence as genes: an user behavior modeling framework for fraud transaction detection in e-commerce. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5194–5203

2023

[44] [44]

Xin Wu, Fei Teng, Xingwang Li, Ji Zhang, Qiang Duan, and Tianrui Li. 2026. Out-of-distribution generalization in time series: A survey.Information Fusion (2026), 104336

2026

[45] [45]

Fei Xiao, Shaofeng Cai, Gang Chen, HV Jagadish, Beng Chin Ooi, and Meihui Zhang. 2024. VecAug: Unveiling Camouflaged Frauds with Cohort Augmentation for Enhanced Detection. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 6025–6036

2024

[46] [46]

Shen Yan, Huan Song, Nanxiang Li, Lincan Zou, and Liu Ren. 2020. Improve unsu- pervised domain adaptation with mixup training.arXiv preprint arXiv:2001.00677 (2020)

work page arXiv 2020

[47] [47]

Savvas Zannettou, Mai ElSherief, Elizabeth Belding, Shirin Nilizadeh, and Gi- anluca Stringhini. 2020. Measuring and characterizing hate speech on news websites. InProceedings of the 12th ACM conference on web science. 125–134

2020

[48] [48]

Cheng Zhang, Kun Zhang, and Yingzhen Li. 2020. A causal view on robustness of neural networks.Advances in Neural Information Processing Systems33 (2020), 289–301

2020

[49] [49]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long se- quence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 11106–11115

2021

[50] [50]

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. 2022. Domain generalization: A survey.IEEE transactions on pattern analysis and machine intelligence45, 4 (2022), 4396–4415. A Baseline Details First, we adopt two categories of backbone models as candidates to validate the effectiveness of LPCD. (i)Sequence Modelsexplicitly model the actio...

work page arXiv 2022