Recognition: unknown
TL-RL-FusionNet: An Adaptive and Efficient Reinforcement Learning-Driven Transfer Learning Framework for Detecting Evolving Ransomware Threats
Pith reviewed 2026-05-10 00:47 UTC · model grok-4.3
The pith
An RL agent adaptively reweights training samples to focus on complex polymorphic ransomware for higher detection accuracy and efficiency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TL-RL-FusionNet integrates frozen EfficientNetB0 and InceptionV3 backbones with a residual MLP classifier whose training is supervised by a Q-learning agent that adaptively reweights samples according to observable ransomware behavior variations, achieving 99.1% accuracy, 98.6% precision, 99.6% recall and 99.74% AUC while cutting training time by 55% and RAM usage by 59%.
What carries the argument
The Q-learning RL agent that reweights samples during training in response to variations in observable ransomware behaviors such as file system activity, registry changes and API calls.
If this is right
- The model outperforms non-RL baselines by up to 2.5% in accuracy and 3.1% in recall on the tested dataset.
- Training time drops 55% and RAM usage drops 59% relative to the compared baselines.
- Behavioral features from sandbox JSON reports are turned into RGB images and processed by the frozen dual TL backbones before final classification by the residual MLP.
- The adaptive reweighting improves resilience against evolving threats while preserving high classification metrics.
Where Pith is reading between the lines
- The same RL-guided reweighting could be applied to other categories of polymorphic malware that alter their execution patterns over time.
- Pairing the framework with streaming sandbox data might allow incremental updates without full retraining cycles.
- Testing on imbalanced real-world traffic would clarify how well the current balanced-dataset gains translate to production environments.
Load-bearing premise
The assumption that the RL agent's adaptive reweighting based on observable behaviors in a small balanced sandbox dataset will generalize to real-world polymorphic ransomware without overfitting or requiring extensive new data collection.
What would settle it
A performance evaluation on a larger dataset that includes fresh, previously unseen ransomware variants collected from real infections would show whether the reported gains persist outside the original 1,000-sample sandbox distribution.
read the original abstract
Modern ransomware exhibits polymorphic and evasive behaviors by frequently modifying execution patterns to evade detection. This dynamic nature disrupts feature spaces and limits the effectiveness of static or predefined models. To address this challenge, we propose TL-RL-FusionNet, a reinforcement learning (RL)-guided hybrid framework that integrates frozen dual transfer learning (TL) backbones as feature extractors with a lightweight residual multilayer perceptron (MLP) classifier. The RL agent supervises training by adaptively reweighting samples in response to variations in observable ransomware behavior. Through reward and penalty signals, the agent prioritizes complex cases such as stealthy or polymorphic ransomware employing obfuscation, while down-weighting trivial samples including benign applications with simple file I/O operations or easily classified ransomware. This adaptive mechanism enables the model to dynamically refine its strategy, improving resilience against evolving threats while maintaining strong classification performance. The framework utilizes dynamic behavioral features such as file system activity, registry changes, network traffic, API calls, and anti-analysis checks, extracted from sandbox-generated JSON reports. These features are transformed into RGB images and processed using frozen EfficientNetB0 and InceptionV3 models to capture rich feature representations efficiently. Final classification is performed by a lightweight residual MLP guided by an RL (Q-learning) agent. Experiments on a balanced dataset of 1,000 samples (500 ransomware, 500 benign) show that TL-RL-FusionNet achieves 99.1% accuracy, 98.6% precision, 99.6% recall, and 99.74% AUC, outperforming non-RL baselines by up to 2.5% in accuracy and 3.1% in recall. Efficiency analysis shows 55% lower training time and 59% reduced RAM usage, demonstrating suitability for real-world deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes TL-RL-FusionNet, a hybrid ransomware detection framework that freezes EfficientNetB0 and InceptionV3 backbones to extract features from RGB images derived from sandbox JSON behavioral reports (file system, registry, network, API calls), feeds them to a lightweight residual MLP classifier, and uses a Q-learning RL agent to adaptively reweight training samples by assigning rewards/penalties based on observable difficulty (prioritizing polymorphic or stealthy cases). On a balanced 1000-sample sandbox dataset (500 ransomware, 500 benign), it reports 99.1% accuracy, 98.6% precision, 99.6% recall, and 99.74% AUC, claiming up to 2.5% accuracy and 3.1% recall gains over non-RL baselines plus 55% lower training time and 59% reduced RAM usage.
Significance. If the performance and generalization claims are substantiated with independent validation, the adaptive RL reweighting mechanism could provide a practical way to handle polymorphic ransomware without retraining entire models, while the frozen backbones and efficiency gains support deployment in resource-constrained settings. The approach combines established TL with RL sample prioritization in a manner that may reduce sensitivity to distribution shifts in behavioral features.
major comments (3)
- [Abstract / Experiments] Abstract and Experiments: The headline metrics (99.1% accuracy, 99.74% AUC) and claimed 2.5% improvement rest on a single 1000-sample balanced sandbox dataset with no reported train/test split, k-fold cross-validation, temporal hold-out, or external ransomware-family test set. Because the Q-learning agent reweights samples during training using the same observable behaviors, any leakage or narrow distribution in these 1000 reports directly undermines the generalization claim to evolving real-world threats.
- [Abstract] Abstract: No details are provided on the definition or tuning of the RL reward/penalty signals (e.g., exact thresholds for 'complex' vs. 'trivial' samples based on file I/O, anti-analysis checks, or obfuscation indicators), the Q-learning hyperparameters, or variance across random seeds. This makes it impossible to determine whether the reported gains are attributable to the adaptive mechanism or to dataset-specific artifacts.
- [Abstract] Abstract: The efficiency claims (55% lower training time, 59% reduced RAM) are stated without specifying the hardware baseline, batch sizes, or comparison models' configurations, and without reporting whether the RL overhead was included in the timing measurements. These numbers are load-bearing for the 'suitable for real-world deployment' conclusion.
minor comments (1)
- [Abstract] The transformation of JSON behavioral reports into RGB images is mentioned but lacks a precise description of channel mapping or normalization, which would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for major revision. We address each major comment point-by-point below.
read point-by-point responses
-
Referee: Abstract and Experiments: The headline metrics (99.1% accuracy, 99.74% AUC) and claimed 2.5% improvement rest on a single 1000-sample balanced sandbox dataset with no reported train/test split, k-fold cross-validation, temporal hold-out, or external ransomware-family test set. Because the Q-learning agent reweights samples during training using the same observable behaviors, any leakage or narrow distribution in these 1000 reports directly undermines the generalization claim to evolving real-world threats.
Authors: We recognize that the manuscript does not sufficiently detail the experimental setup regarding data partitioning. We will add a clear description of the train/test split employed and ensure that the Q-learning reweighting is applied only within the training portion to prevent leakage. Additionally, we will incorporate k-fold cross-validation results and expand the discussion on the challenges of generalizing from this sandbox dataset to broader real-world scenarios involving evolving ransomware threats. We note that while an external family-specific test set is not available in our current study, the adaptive RL mechanism is designed to improve robustness to variations within the observed behavioral distributions. revision: yes
-
Referee: Abstract: No details are provided on the definition or tuning of the RL reward/penalty signals (e.g., exact thresholds for 'complex' vs. 'trivial' samples based on file I/O, anti-analysis checks, or obfuscation indicators), the Q-learning hyperparameters, or variance across random seeds. This makes it impossible to determine whether the reported gains are attributable to the adaptive mechanism or to dataset-specific artifacts.
Authors: We agree that these implementation details are essential for evaluating the RL component. In the revised manuscript, we will provide a comprehensive description of the reward and penalty signals, including how sample complexity is determined from behavioral indicators, the specific Q-learning hyperparameters used, and performance metrics with variance across multiple random seeds. This addition will help clarify the source of the observed improvements. revision: yes
-
Referee: Abstract: The efficiency claims (55% lower training time, 59% reduced RAM) are stated without specifying the hardware baseline, batch sizes, or comparison models' configurations, and without reporting whether the RL overhead was included in the timing measurements. These numbers are load-bearing for the 'suitable for real-world deployment' conclusion.
Authors: We will revise the efficiency claims section to include the hardware specifications, batch sizes, and configurations of the comparison models. We will also explicitly state that the RL overhead is accounted for in the timing and memory measurements and provide absolute values to support the percentage reductions. revision: yes
- Lack of an external ransomware-family test set for further validation of generalization claims.
Circularity Check
No significant circularity in claimed results or framework
full rationale
The paper presents an empirical ML framework combining transfer learning backbones with an RL reweighting agent for ransomware classification, reporting experimental metrics on a fixed 1000-sample sandbox dataset. No derivation chain, mathematical prediction, or first-principles result is claimed that reduces by construction to its own inputs; the RL reweighting is a training procedure whose outputs are evaluated as standard performance numbers rather than tautologically forced. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked as load-bearing justifications in the abstract or description. The reported accuracy improvements are presented as empirical observations, not as self-referential predictions, leaving the central claim self-contained against external benchmarks even if the small dataset raises separate generalization concerns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[2]
E. Ruellan, M. Paquet -Clouston, and S. Garcia, “Conti Inc.: understanding the internal discussions of a large ransomware- as-a-service operator with machine learning,” Crime Sci., vol. 13, no. 1, pp. 1–13, 2024, doi: 10.1186/s40163-024-00212-y
-
[3]
AI - based Ransomware D etection: A Comprehensive Review,
J. Ferdous, R. Islam, A. Mahboubi, and M. Z. Islam, “AI - based Ransomware D etection: A Comprehensive Review,” IEEE Access , vol. 12, no. September, pp. 136666 –136695, 2024, doi: 10.1109/ACCESS.2024.3461965
-
[4]
Automated Dynamic Analysis of Ransomware: Benefits, Limitations and Use for Detection
E. C. L. Daniele Sgandurra, Luis Muñoz -González, Rabih Mohsen, “Automated Dynamic Analysis of Ransomware: Benefits, Limit ations and use for Detection,” arXiv Prepr. arXiv1609.03020, 2016, [Online]. Available: https://arxiv.org/pdf/1609.03020
-
[5]
SINN -RD: Spline interpolation -envisioned neural network -based ransomware detection scheme,
J. Singh, K. Sharma, M. Wazid, and A. K. Das, “SINN -RD: Spline interpolation -envisioned neural network -based ransomware detection scheme,” Comput. Electr. Eng. , vol. 106, no. March, p. 108601, 2023, doi: 10.1016/j.compeleceng.2023.108601
-
[6]
U. Zahoora, A. Khan, M. Rajarajan, S. H. Khan, M. Asam, and T. Jamal, “Ransomware detection using deep learning based unsupervised feature extr action and a cost sensitive Pareto Ensemble classifier,” Sci. Rep., vol. 12, no. 1, pp. 1 – 15, 2022, doi: 10.1038/s41598-022-19443-7
-
[7]
D. W. Fernando, N. Komninos, and T. Chen, “A Study on the Evolution of Ransomware Detection Using Machine Learning and Deep Learning Techniques,” IoT, vol. 1, no. 2, pp. 551–604, 2020, doi: 10.3390/iot1020030
-
[8]
T. Lin and R. Lin, “Efficient Ransomware Detection in Resource-Constrained Environments Using Optimized Multi- Layer Perceptron Networks,” IEEE Access, vol. 13, no. April, pp. 84153 –84169, 2025, doi: 10.1109/ACCESS.2025.3569184
-
[9]
CNN -LSTM and transfer learning models for malware classification based on opcodes and API calls,
A. Bensaoud and J. Kalita, “CNN -LSTM and transfer learning models for malware classification based on opcodes and API calls,” Knowledge-Based Syst. , vol. 290, no. February, p. 111543, 2024, doi: 10.1016/j.knosys.2024.111543
-
[10]
Image-Based malware classification using ensemble of CNN architectures (IMCEC),
D. Vasan, M. Alazab, S. Wa ssan, B. Safaei, and Q. Zheng, “Image-Based malware classification using ensemble of CNN architectures (IMCEC),” Comput. Secur. , vol. 92, p. 101748, 2020, doi: 10.1016/j.cose.2020.101748
-
[11]
C. C. Moreira, D. C. Moreira, and C. de S. d. Sales, “Improving ransomware detection based on portable executable header using xception convolutional neural network,” Comput. Secur. , vol. 130, p. 103265, 2023, doi: 10.1016/j.cose.2023.103265
-
[12]
D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei, and Q. Zheng, “IMC FN: Image -based malware classification using fine-tuned convolutional neural network architecture,” Comput. Networks, vol. 171, no. January, p. 107138, 2020, doi: 10.1016/j.comnet.2020.107138
-
[13]
S. Kumar and K. Panda, “SDIF -CNN: Stacking deep image features using fine-tuned convolution neural network models for real -world malware detection and classification,” Appl. Soft Comput. , vol. 146, p. 110676, 2023, doi: 10.1016/j.asoc.2023.110676
-
[14]
S. S. H. Shah, N. Jamil, and A. ur R. Khan, “Transfer learning with dual -stage discrete wavelet transform for enhanced visual malware image compression and classification,” J. Supercomput., vol. 81, no. 8, 2025, doi: 10.1007/s11227-025- 07358-9
-
[15]
Malware detection using image representation of malware data and transfer learning,
F. Rustam, I. Ashraf, A. D. Jurcut, A. K. Bashir, and Y. Bin Zikria, “Malware detection using image representation of malware data and transfer learning,” J. Parallel Distrib. Comput., vol. 172, no. 2023, pp. 32 –50, 2023, doi: 10.1016/j.jpdc.2022.10.001
-
[16]
Malicious Software Classification Using VGG16 Deep Neural Network’s Bottleneck Features,
E. Rezende, G. Ruppert, T. Carvalho, A. Theophilo, F. Ramos, and P. de Geus, “Malicious Software Classification Using VGG16 Deep Neural Network’s Bottleneck Features,” Adv. Intell. Syst. Comput. , vol. 738, pp. 51 –59, 2018, doi: 10.1007/978-3-319-77028-4_9
-
[17]
R. Chaganti, V. Ravi, and T. D. Pham, “Image-based malware representation approach with EfficientNet convolutional neural networks for effective malware classification,” J. Inf. Secur. Appl. , vol. 69, no. August, 2022, doi: 10.1016/j.jisa.2022.103306
-
[18]
A novel technique for ransomware detection using image based dynamic features and transfer learning to address dataset limitations,
J. Ferdous, R. Islam, A. Mahboubi, and Z. Islam, “A novel technique for ransomware detection using image based dynamic features and transfer learning to address dataset limitations,” Sci. Rep., pp. 1–25, 2025
2025
-
[19]
Ransomware early detection using deep reinforcement learning on portable executable header,
X. Z. Deng, M. C. Cen, M. Jiang, a nd M. Lu, “Ransomware early detection using deep reinforcement learning on portable executable header,” Cluster Comput. , vol. 27, no. 2, pp. 1867–1881, 2024, doi: 10.1007/s10586-023-04043-5
-
[20]
J. Pre -proof et al. , “Leveraging Dynamic Embeddings and Reinforcement Learning with Bayesian Networks for Ransomware Resiliences,” Cyber Secur. Appl. , p. 100095, 2025, doi: 10.1016/j.csa.2025.100095
-
[21]
Leveraging application permissions and network traffic attributes for Android ransomware detection,
S. R. Jeremiah, H. Chen, S. Gritzalis, and J. H. Park, “Leveraging application permissions and network traffic attributes for Android ransomware detection,” J. Netw. Comput. Appl. , vol. 230, no. June, p. 103950, 2024, doi: 10.1016/j.jnca.2024.103950
-
[22]
DART: A Solut ion for decentralized federated learning model robustness analysis,
C. Feng, A. Huertas Celdrán, J. von der Assen, E. T. Martínez Beltrán, G. Bovet, and B. Stiller, “DART: A Solut ion for decentralized federated learning model robustness analysis,” Array, vol. 23, no. May, p. 100360, 2024, doi: 10.1016/j.array.2024.100360
-
[23]
Static Detection of Ransomware Using LSTM Network and PE Header,
F. Manavi and A. Hamzeh, “Static Detection of Ransomware Using LSTM Network and PE Header,” 26th Int. Comp ut. Conf. Comput. Soc. Iran, CSICC 2021 , 2021, doi: 10.1109/CSICC52343.2021.9420580
-
[24]
Ransomware Detection Based on PE Header Using Convolutional Neural Networks ∗∗,
F. Manavi and A. Hamzeh, “Ransomware Detection Based on PE Header Using Convolutional Neural Networks ∗∗,” ISeCure, vol. 14, no. 2, pp. 181 –192, 2022, doi: 10.22042/isecure.2021.262846.595
-
[25]
API Call Based Ransomware Dynamic Detection Approach Using TextCNN,
B. Qin, Y. Wang, and C. Ma, “API Call Based Ransomware Dynamic Detection Approach Using TextCNN,” Proc. - 2020 Int. Conf. Big Data, Artif. Intell. Internet Things Eng. ICBAIE 2020, pp. 162 –166, 2020, doi: 10.1109/ICBAIE49996.2020.00041
-
[26]
XRan: Explainable deep learning-based ransomware detection using dynamic analysis,
S. Gulmez, A. Gorgulu Kakisim, and I. Sogukpinar, “XRan: Explainable deep learning-based ransomware detection using dynamic analysis,” Comput. Secur., vol. 139, no. January, p. 103703, 2024, doi: 10.1016/j.cose.2024.103703
-
[27]
E. M. B. Karb ab, M. Debbabi, and A. Derhab, “SwiftR: Cross-platform ransomware fingerprinting using hierarchical neural networks on hybrid features,” Expert Syst. Appl., vol. 225, no. April, p. 120017, 2023, doi: 10.1016/j.eswa.2023.120017
-
[28]
I. Almomani, A. Alkhaye r, and W. El -Shafai, “E2E -RDS: Efficient End-to-End Ransomware Detection System Based on Static -Based ML and Vision -Based DL Approaches,” Sensors, vol. 23, no. 9, pp. 1 –27, 2023, doi: 10.3390/s23094467
-
[29]
Malicious software classification using transfer learning of ResNet -50 deep neural network,
E. Rezende, G. Ruppert, T. Carvalho, F. Ramos, a nd P. De Geus, “Malicious software classification using transfer learning of ResNet -50 deep neural network,” Proc. - 16th IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2017, vol. 2017- Decem, pp. 1011–1014, 2017, doi: 10.1109/ICMLA.2017.00- 19
-
[30]
A Method for Windows Malware Detection Based on Deep Learning,
X. Huang, L. Ma, W. Yang, and Y. Zhong, “A Method for Windows Malware Detection Based on Deep Learning,” J. Signal Process. Syst. , vol. 93, no. 2 –3, pp. 265 –273, 2021, doi: 10.1007/s11265-020-01588-1
-
[31]
CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation,
C. Feng et al. , “CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation,” IEEE Trans. Dependable Secur. Comput. , vol. 22, no. 4, pp. 4398 –4411, 2025, doi: 10.1109/TDSC.2025.3547005
-
[32]
MalwareBazaar Database
abuse.ch, “MalwareBazaar Database.” https://bazaar.abuse.ch/browse/ (accessed Jun. 02, 2024)
2024
-
[33]
VirusShare.com - Because Sharing is Caring,
VirusShare, “VirusShare.com - Because Sharing is Caring,”
-
[34]
07, 2024)
https://virusshare.com/ (accessed Jun. 07, 2024)
2024
-
[35]
Popular freeware categories,
SnapFiles, “Popular freeware categories,” 1997. https://www.snapfiles.com/freeware/ (accessed Jul. 06, 2024)
1997
-
[36]
Portable App Directory,
Portableapps, “Portable App Directory,” 2024. https://portableapps.com/apps (accessed Jul. 10, 2024)
2024
-
[37]
DikeDataset,
Iosifache, “DikeDataset,” 2023. https://github.com/iosifache/DikeDataset/tree/main/files/ben ign (accessed Jul. 16, 2024)
2023
-
[38]
Narrowed Sights, Bigger Payoffs: Ransomware in 2019,
J. A. and M. Ang, “Narrowed Sights, Bigger Payoffs: Ransomware in 2019,” TREND, 2019. https://www.trendmicro.com/vinfo/au/security/news/cybercr ime-and-digital-threats/narrowed-sights-bigger-payoffs- ransomware-in-2019 (accessed Oct. 01, 2024)
2019
-
[39]
Unit 42 Ransomware Thre at Report,
Palo Alto Networks, “Unit 42 Ransomware Thre at Report,”
-
[40]
10, 2024)
https://start.paloaltonetworks.com/unit -42- ransomware-threat-report.html (accessed Sep. 10, 2024)
2024
-
[41]
Zscaler: 2022 ThreatLabz State of Ransomware Report,
Zscaler, “Zscaler: 2022 ThreatLabz State of Ransomware Report,” 2022. https://www.zscaler.com/resources/industry - reports/2022-threatlabz-ransomware-report.pdf (accessed Oct. 12, 2024)
2022
-
[42]
Maturing criminal marketplaces present new challenges to defenders,
Sophos, “Maturing criminal marketplaces present new challenges to defenders,” Sophos 2023 Threat Report , 2023. https://www.scribd.com/document/628559505/Sophos- Threat-Report-2023 (accessed Sep. 21, 2024)
-
[43]
The Ransomware Threat Landscape,
by B. Symantec, “The Ransomware Threat Landscape,” Ransomware Threat Landsc. , 2023, doi: 10.2307/j.ctv1f8xc7v
-
[44]
Phobos Emerges as a Formidable Threat in Q1 2024, LockBit Stays in the Top Spot,
TREND, “Phobos Emerges as a Formidable Threat in Q1 2024, LockBit Stays in the Top Spot,” 2024. https://www.trendmicro.com/vinfo/au/security/news/ransom ware-by-the-numbers/phobos-emerges-as-a-formidable- threat-in-q1-2024-lockbit-stays-in-the-top-spot (accessed Oct. 12, 2024)
2024
-
[45]
Cuckoo Sandbox- Automated Malware Analysis
Cuckoo, “Cuckoo Sandbox- Automated Malware Analysis.” https://cuckoosandbox.org/download
-
[46]
G. Cao, Y. Ma, and M. Geng, “A reinforcement learning based fuzzing technique for binary programs vulnerabilities detection,” Array, vol. 27, no. July, p. 100458, 2025, doi: 10.1016/j.array.2025.100458
-
[47]
Towards superior android ransomware detection: An ensemble machine learning perspective,
M. A. Hossain, T. Hasan, F. Ahmed, S. H. Cheragee, M. H. Kanchan, and M. A. Haque, “Towards superior android ransomware detection: An ensemble machine learning perspective,” Cyber Secur. Appl., vol. 3, no. August 2024, p. 100076, 2025, doi: 10.1016/j.csa.2024.100076
-
[48]
M. A. Hossain and M. S. Islam, “Ensurin g network security with a robust intrusion detection system using ensemble - based machine learning,” Array, vol. 19, no. May, p. 100306, 2023, doi: 10.1016/j.array.2023.100306
-
[49]
Enhancing malware detecti on with feature selection and scaling techniques using machine learning models,
R. Hasan, B. Biswas, M. A. Saleh, and M. Prabha, “Enhancing malware detecti on with feature selection and scaling techniques using machine learning models,” pp. 1–18, 2025
2025
-
[50]
M. S. Abdel -Wahab, A. M. Neil, and A. Atia, “A Comparative Study of Machine Learning and Deep Learning in Network Anomaly -Based Intrusion Detection System s,” Proc. ICCES 2020 - 2020 15th Int. Conf. Comput. Eng. Syst., vol. 26, no. April, 2020, doi: 10.1109/ICCES51560.2020.9334553
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.