Static Attribution of Android Residential Proxy Malware Using Graph Kernels
Pith reviewed 2026-05-07 08:29 UTC · model grok-4.3
The pith
Graph kernels on control-flow graphs attribute Android residential proxy apps to their networks with 0.985 macro F1.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that Weisfeiler-Lehman graph kernels applied to control-flow and function-call graphs, when fused with binary capability vectors, permit classifiers to attribute Android residential proxy applications to one of four commercial proxy networks at a macro F1 score of 0.985. This performance is measured on an expanded dataset of 3,365 apps with 5-fold cross-validation grouped by DEX files to prevent leakage from shared code. Classifier decisions are mapped to automatically generated Yara rules that reach up to 88.45 percent per-family accuracy after filtering non-discriminative signatures. The work additionally finds that a majority of the apps are still hosted on APKPure and a
What carries the argument
Weisfeiler-Lehman graph kernel features from control-flow graphs and function-call graphs fused with binary capability vectors, supplied as input to supervised classifiers for proxy-family attribution.
If this is right
- Unknown proxy apps can be attributed to their network using only static analysis without execution.
- Classifier decisions translate into human-readable Yara rules for explainable detection.
- More than half the analyzed apps remain publicly available through app stores.
- A small number of developers maintain ongoing commercial relationships with proxy providers.
- The method scales to large corpora while blocking leakage from shared libraries.
Where Pith is reading between the lines
- The same graph-kernel pipeline could be tested on other Android PUP categories that embed shared SDKs.
- App stores might run similar static checks at submission time to reduce proxyware distribution.
- Repeated scans of developer accounts could track how proxy networks recruit and retain developers over time.
- If future obfuscation erodes graph distinctiveness, hybrid static-dynamic features would be a direct extension.
Load-bearing premise
Control-flow graphs, function-call graphs, and behavioral signatures extracted statically remain sufficiently distinct across the four proxy networks despite code reuse, SDK embedding, and obfuscation.
What would settle it
A substantial drop in macro F1 on a test set of apps from a previously unseen proxy network or on versions modified by new obfuscation techniques would show that the static features are no longer discriminative.
Figures
read the original abstract
Android residential proxy applications represent a growing class of potentially-unwanted programs (PUPs) that covertly route third-party traffic through end-user devices, enabling ad fraud, credential abuse, and evasion of geolocation controls by sophisticated threat actors. Attributing an unknown APK to a specific proxy network remains challenging due to code reuse, SDK embedding, and obfuscation across proxy families. We present a static-analysis pipeline for automated proxyware family attribution, extracting graph-structured representations (control-flow and function-call graphs) and behavioral signatures from a labeled corpus of 3,365 Android proxy apps spanning four commercial proxy networks. We evaluate Weisfeiler-Lehman graph kernel features alone and fused with binary capability vectors across multiple classifiers. Using 5-fold DEX-grouped cross-validation to prevent data leakage, SGD achieves a macro F1 of 0.985 on the expanded dataset. To support explainability, we map classifier decisions to automatically generated Yara rules, achieving per-family accuracies up to 88.45\% after filtering non-discriminative signatures. Finally, we discuss these results in the context of the broader ecosystem. We find that from the expanded dataset, the majority of applications (51.4\%) still available through APKPure still contain embedded proxy SDK code. Further analysis of developer accounts reveals that 23 developers are responsible for other applications also containing such functionality, suggesting continuous and ongoing commercial relationships between proxy providers and developers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to develop a static analysis pipeline for attributing Android residential proxy malware to specific families using Weisfeiler-Lehman graph kernels on control-flow and function-call graphs, combined with binary capability vectors. Evaluated on 3,365 apps from four proxy networks with 5-fold DEX-grouped cross-validation, an SGD classifier achieves a macro F1 score of 0.985. The method also produces Yara rules for explainability with up to 88.45% per-family accuracy and includes an ecosystem analysis showing persistent SDK embedding in 51.4% of apps and involvement of 23 developers.
Significance. If the high attribution performance is attributable to family-specific structures rather than shared SDK subgraphs, this work is significant for the field of Android malware analysis. It demonstrates the utility of graph kernels for handling code reuse and obfuscation in PUP attribution, offers an explainable approach via Yara rule generation, and provides insights into the commercial ecosystem of proxy providers. The use of DEX-grouped CV is a positive step toward rigorous evaluation. Strengths include the large labeled corpus and the fusion of structural and behavioral features.
major comments (2)
- [§5.1] §5.1 (Cross-validation procedure): The 5-fold DEX-grouped cross-validation prevents leakage from identical DEX files but does not account for shared subgraphs arising from the proxy SDKs embedded in 51.4% of the applications (as stated in the ecosystem analysis). As a result, the Weisfeiler-Lehman kernel similarities may be driven by these common components rather than family-specific code, potentially inflating the reported macro F1 of 0.985. An ablation study removing SDK-related subgraphs or reporting performance stratified by SDK presence is needed to confirm that the result reflects genuine discriminative power.
- [§3.2] §3.2 (Feature extraction): Concrete details on the construction of control-flow graphs and function-call graphs (e.g., the static analysis framework employed, handling of native libraries or obfuscated methods, and resulting graph statistics such as average node/edge counts) are missing. This information is load-bearing for assessing the dimensionality of the WL kernel feature space and the reproducibility of the 0.985 F1 claim.
minor comments (3)
- [Abstract] Abstract: Include a brief statement on the graph construction process, the dimensionality of the WL kernel features, and the exact fusion method (concatenation, kernel sum, etc.) between graph kernels and binary capability vectors.
- [Results] Results section: Report per-family F1 scores (or a confusion matrix) in addition to the macro F1 to allow assessment of whether performance is balanced across the four proxy networks or dominated by easier classes.
- [§6] §6 (Ecosystem analysis): Provide more detail on how the 23 developer accounts were identified and linked to proxy functionality, including any heuristics or manual verification steps used.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review, as well as the positive assessment of the work's significance for Android malware analysis. We address each major comment point by point below. Where additional analysis or details are warranted, we will incorporate revisions to strengthen the manuscript's rigor and clarity.
read point-by-point responses
-
Referee: [§5.1] §5.1 (Cross-validation procedure): The 5-fold DEX-grouped cross-validation prevents leakage from identical DEX files but does not account for shared subgraphs arising from the proxy SDKs embedded in 51.4% of the applications (as stated in the ecosystem analysis). As a result, the Weisfeiler-Lehman kernel similarities may be driven by these common components rather than family-specific code, potentially inflating the reported macro F1 of 0.985. An ablation study removing SDK-related subgraphs or reporting performance stratified by SDK presence is needed to confirm that the result reflects genuine discriminative power.
Authors: We appreciate the referee's careful attention to potential sources of leakage beyond identical DEX files. The DEX-grouped cross-validation was specifically chosen to avoid train-test overlap from duplicate or near-identical APKs. However, we acknowledge that the 51.4% SDK embedding rate identified in our ecosystem analysis could introduce shared subgraphs that the Weisfeiler-Lehman kernel might exploit. To directly address this concern, we will perform an ablation study in the revised manuscript: we will identify SDK-related functions and subgraphs using the signatures from our ecosystem analysis, remove the corresponding nodes and edges from the control-flow and function-call graphs, recompute the kernel features, and re-evaluate classifier performance. We will also report macro F1 scores stratified by SDK presence versus absence. This will provide empirical evidence on whether attribution performance is driven primarily by family-specific structures. We believe these additions will substantiate the discriminative power of the approach. revision: yes
-
Referee: [§3.2] §3.2 (Feature extraction): Concrete details on the construction of control-flow graphs and function-call graphs (e.g., the static analysis framework employed, handling of native libraries or obfuscated methods, and resulting graph statistics such as average node/edge counts) are missing. This information is load-bearing for assessing the dimensionality of the WL kernel feature space and the reproducibility of the 0.985 F1 claim.
Authors: We thank the referee for identifying this omission, which is important for reproducibility. In the revised manuscript we will expand §3.2 with the requested concrete details. The control-flow graphs and function-call graphs are constructed via static analysis of the Dalvik bytecode in each APK's DEX file(s). Our pipeline parses method bodies to build per-method control-flow graphs and resolves call targets (where statically possible) to construct the function-call graph. Obfuscated methods are handled by extracting the available bytecode structure and control-flow edges; we note that techniques such as reflection or dynamic class loading may result in incomplete graphs, which we treat as a limitation of static analysis. Native libraries are excluded from the graph representations, as our focus remains on the Dalvik layer. We will also include summary statistics for the graphs in the dataset, specifically average node and edge counts for both control-flow graphs and function-call graphs. These additions will allow readers to evaluate the resulting feature-space dimensionality and support independent reproduction of the reported results. revision: yes
Circularity Check
No circularity in empirical ML evaluation pipeline
full rationale
The paper's central claim is an empirical macro F1 of 0.985 obtained by training an SGD classifier on Weisfeiler-Lehman graph kernel features (computed from extracted control-flow and function-call graphs) fused with binary capability vectors, evaluated via 5-fold DEX-grouped cross-validation on a labeled corpus of 3,365 APKs. This performance metric is computed on held-out folds and does not reduce, by the paper's own description or equations, to any fitted parameter, self-referential definition, or input quantity. Graph feature extraction is a deterministic preprocessing step independent of the classifier output; the DEX-grouped CV is an explicit anti-leakage design choice rather than a tautology. No load-bearing self-citations, uniqueness theorems, or ansatzes smuggled via prior author work appear in the derivation. The result remains falsifiable by external replication on the same or similar datasets.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The labeled corpus of 3,365 apps correctly assigns each sample to one of the four commercial proxy networks
- domain assumption Control-flow graphs and function-call graphs extracted from DEX files capture family-specific structural signatures even after SDK embedding and obfuscation
Reference graph
Works this paper leans on
-
[1]
Yousra Aafer, Wenliang Du, and Heng Yin. 2013. Droidapiminer: Mining api-level features for robust malware detection in android. InInternational Conference on Security and Privacy in Communication Systems. Springer, Cham, 86–103
work page 2013
-
[2]
Bissyandé, Jacques Klein, and Yves Le Traon
Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. AndroZoo: Collecting Millions of Android Apps for the Research Community. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas)(MSR ’16). ACM, New York, NY, USA, 468–471. doi:10.1145/ 2901739.2903508
-
[3]
Jinrong Bai, Qibin Shi, and Shiguang Mu. 2019. A malware and variant detection method using function call graph isomorphism.Security and Communication Networks2019, 1 (2019), 1043794
work page 2019
-
[4]
Iliès Benhabbour and Marc Dacier. 2025. ENDEMIC: End-to-End Network Dis- ruptions – Examining Middleboxes, Issues, and Countermeasures – A Survey. Comput. Surveys57, 7 (2025), 181:1–181:42
work page 2025
-
[5]
Tristan Bilot, Nour El Madhoun, Khaldoun Al Agha, and Anis Zouaoui. 2024. A survey on malware detection with graph representation learning.Comput. Surveys56, 11 (2024), 1–36
work page 2024
-
[6]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794. 12 Static Attribution of Android Residential Proxy Malware Using Graph Kernels
work page 2016
-
[7]
Yi-Hsien Chen, Si-Chen Lin, Szu-Chun Huang, Chin-Laung Lei, and Chun-Ying Huang. 2023. Guided Malware Sample Analysis Based on Graph Neural Networks. IEEE Transactions on Information Forensics and Security18 (2023), 4128–4143
work page 2023
-
[8]
Elisa Chiapponi, Marc Dacier, and Olivier Thonnard. 2023. Inside Residential IP Proxies: Lessons Learned from Large Measurement Campaigns. In2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 501–512
work page 2023
-
[9]
Elisa Chiapponi, Marc Dacier, and Olivier Thonnard. 2023. Poster: The Impact of the Client Environment on Residential IP Proxies Detection. InProceedings of the 2023 ACM on Internet Measurement Conference. ACM, New York, NY, USA, 712–713
work page 2023
-
[10]
Elisa Chiapponi, Marc Dacier, Olivier Thonnard, Mohamed Fangar, Mattias Mattsson, and Vincent Rigal. 2022. An Industrial Perspective on Web Scraping Characteristics and Open Issues. In2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks – Supplemental Volume (DSN-S). IEEE, 5–8
work page 2022
-
[11]
Elisa Chiapponi, Marc Dacier, Olivier Thonnard, Mohamed Fangar, and Vincent Rigal. 2022. Badpass: Bots taking advantage of proxy as a service. InInternational Conference on Information Security Practice and Experience. Springer, Cham, 327– 344
work page 2022
-
[12]
Jinchun Choi, Mohammed Abuhamad, Ahmed Abusnaina, Afsah Anwar, Sultan Alshamrani, Jeman Park, Daehun Nyang, and David Mohaisen. 2020. Under- standing the proxy ecosystem: A comparative analysis of residential and open proxies on the internet.IEEE Access8 (2020), 111368–111380
work page 2020
-
[13]
Jonathan Crussell, Clint Gibler, and Hao Chen. 2014. Andarwin: Scalable de- tection of android application clones based on semantics.IEEE Transactions on Mobile Computing14, 10 (2014), 2007–2019
work page 2014
-
[14]
Thomas Dalton, Mauritius Schmidtler, and Alireza Hadj Khodabakhshi. 2020. Classifying Malware Using Function Representations in a Static Call Graph. In Computational Data and Social Networks: 9th International Conference, CSoNet 2020 (Lecture Notes in Computer Science, Vol. 12575). Springer, Cham, 243–254
work page 2020
-
[15]
Tianchong Gao, Wei Peng, Devkishen Sisodia, Tanay Kumar Saha, Feng Li, and Mohammad Al Hasan. 2018. Android malware detection via graphlet sampling. IEEE Transactions on Mobile Computing18, 12 (2018), 2754–2767
work page 2018
-
[16]
Hugo Gonzalez, Natalia Stakhanova, and Ali A Ghorbani. 2016. Measuring code reuse in android apps. In2016 14th Annual Conference on Privacy, Security and Trust (PST). IEEE, Auckland, New Zealand, 187–195
work page 2016
-
[17]
Google LLC. 2024. BinExport: Export Disassembly Data for BinDiff. GitHub
work page 2024
-
[18]
Akihiro Hanzawa and Hiroaki Kikuchi. 2020. Analysis on malicious residen- tial hosts activities exploited by residential IP proxy services. InInternational Conference on Information Security Applications. Springer, Cham, 349–361
work page 2020
- [19]
-
[20]
Hiroaki Kikuchi, Ryuichi Moriya, Takumi Kitahara, and Hikari Fukuda. 2025. Honey-Proxy: Revealing Malicious Activities via Residential Proxies. InAdvanced Information Networking and Applications (AINA 2025) (Lecture Notes on Data Engineering and Communications Technologies, Vol. 249). Springer, 1–12
work page 2025
-
[21]
Xingwei Li, Zheng Shan, Fudong Liu, Yihang Chen, and Yifan Hou. 2019. A Consistently-Executing Graph-Based Approach for Malware Packer Identifica- tion.IEEE Access7 (2019), 51620–51629
work page 2019
-
[22]
Jie Ling and Fangye Chen. 2019. An Android Malware Detection Approach Based on Weisfeiler-Lehman Kernel. InProceedings of the 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019) (Advances in Computer Science Research, Vol. 88). Atlantis Press, Paris, 538–543
work page 2019
-
[23]
Tong Lu, Xiaoyuan Liu, Jingwei Chen, Naitian Hu, and Bo Liu. 2020. Afcgdroid: Deep learning based android malware detection using attributed function call graphs.Journal of Physics: Conference Series1693 (2020), 012080
work page 2020
-
[24]
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. InAdvances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., Red Hook, NY, USA
work page 2017
-
[25]
Mandiant. 2024. Bring Your Own Rules: Extending capa for Android Malware Detection. Google Cloud Threat Intelligence Blog
work page 2024
-
[26]
Mandiant. 2026. Here’s How We Disrupted a Massive, Malicious Proxy Network. Google Cloud Threat Intelligence Blog
work page 2026
-
[27]
Xianghang Mi, Xuan Feng, Xiaojing Liao, Baojun Liu, XiaoFeng Wang, Feng Qian, Zhou Li, Sumayah Alrwais, Limin Sun, and Ying Liu. 2019. Resident evil: Understanding residential IP proxy as a dark service. In2019 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, 1185–1201
work page 2019
-
[28]
Xianghang Mi, Siyuan Tang, Zhengyi Li, Xiaojing Liao, Feng Qian, and XiaoFeng Wang. 2021. Your phone is my proxy: Detecting and understanding mobile proxy networks. InProceeding of ISOC Network and Distributed System Security Symposium (NDSS). Internet Society, Virtual
work page 2021
-
[29]
2024.SVR Cyber Actors Adapt Tactics for Initial Cloud Access
NCSC, NSA, CISA, CNMF, FBI, ACSC, CCCS, and NCSC-NZ. 2024.SVR Cyber Actors Adapt Tactics for Initial Cloud Access. Cybersecurity Advisory AA24-057A. UK National Cyber Security Centre
work page 2024
-
[30]
NSA. 2019. Ghidra: A Software Reverse Engineering Framework. GitHub
work page 2019
-
[31]
Ya Pan, Xiuting Ge, Chunrong Fang, and Yong Fan. 2020. A systematic literature review of android malware detection using static analysis.IEEE Access8 (2020), 116363–116379
work page 2020
-
[32]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 1135–1144
work page 2016
-
[33]
Florian Roth. 2024. yarGen: YARA Rule Generator for Building Rules from Strings Found in Malware Samples. GitHub
work page 2024
-
[34]
Justin Sahs and Latifur Khan. 2012. A machine learning approach to Android mal- ware detection. In2012 European Intelligence and Security Informatics Conference. IEEE, Odense, Denmark, 141–147
work page 2012
-
[35]
Nino Shervashidze, Pascal Schweitzer, Erik Jan Van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. 2011. Weisfeiler-Lehman graph kernels.Journal of Machine Learning Research12 (2011), 2539–2561
work page 2011
-
[36]
Sibo Shi, Shengwei Tian, Bo Wang, Tiejun Zhou, and Guanxin Chen. 2023. SFCGDroid: Android malware detection based on sensitive function call graph. International Journal of Information Security22, 5 (2023), 1115–1124
work page 2023
-
[37]
2025.A Taxonomy and Feature Set for Server-Side Identi- fication of Proxies
Charles Glen Smutz. 2025.A Taxonomy and Feature Set for Server-Side Identi- fication of Proxies. Technical Report. Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
work page 2025
-
[38]
Tiezhu Sun, Nadia Daoudi, Kevin Allix, and Tegawendé F Bissyandé. 2021. An- droid malware detection: looking beyond dalvik bytecode. In2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). IEEE, Melbourne, Australia, 34–39
work page 2021
-
[39]
Will Thomas. 2025. Scattered Spider Attacks: Infrastructure and TTP Analysis. Team Cymru Blog
work page 2025
-
[40]
S Viswanath N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. 2010. Graph kernels.Journal of Machine Learning Research11 (2010), 1201–1242
work page 2010
-
[41]
Wayne Wang, Aaron Ortwein, Enrique Sobrados, Robert Stanley, Piyush Kumar Sharma, Afsah Anwar, and Roya Ensafi. 2026. MVPNalyzer: An Investigative Framework for the Security and Privacy Audit of Mobile VPNs. InProceedings of the Network and Distributed System Security Symposium (NDSS)
work page 2026
-
[42]
Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. 2009. Feature hashing for large scale multitask learning. InProceedings of the 26th Annual International Conference on Machine Learning. ACM, New York, NY, USA, 1113–1120
work page 2009
-
[43]
Mingshuo Yang, Yunnan Yu, Xianghang Mi, Shujun Tang, Shanqing Guo, Yilin Li, Xiaofeng Zheng, and Haixin Duan. 2022. An Extensive Study of Residential Proxies in China. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM
work page 2022
-
[44]
Yang Yang, Xuehui Du, Zhi Yang, and Xing Liu. 2021. Android malware detection based on structural features of the function call graph.Electronics10, 2 (2021), 186
work page 2021
-
[45]
Zicheng Zhang, Wenrui Diao, Chengyu Hu, Shanqing Guo, Chaoshun Zuo, and Li Li. 2020. An Empirical Study of Potentially Malicious Third-Party Libraries in Android Apps. InProceedings of the 13th ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec). ACM, 89–99. 9 Acknowledgements Sandia National Laboratories is a multi-mission labo...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.