Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis
Pith reviewed 2026-05-17 21:45 UTC · model grok-4.3
The pith
RETROFIT lets security models adapt to new threats while retaining prior knowledge by merging old and new models without replaying data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RETROFIT regulates knowledge retention and adaptation with controlled forgetting at each update by consolidating previously trained and newly fine-tuned models as teachers of legacy and emergent knowledge through retrospective-free parameter merging, with forgetting control achieved by constraining parameter changes to low-rank and sparse subspaces for approximate orthogonality and employing a confidence-guided arbitration mechanism to dynamically aggregate knowledge from both teachers.
What carries the argument
Retrospective-free parameter merging constrained to low-rank and sparse subspaces for approximate orthogonality, combined with a confidence-guided arbitration mechanism to aggregate legacy and new knowledge.
If this is right
- In malware detection under temporal drift, retention score rises from 20.2% to 38.6% over continual learning baselines.
- Performance on new data exceeds the oracle upper bound.
- In binary summarization across decompilation levels, BLEU score more than doubles that of prior transfer learning.
- Cross-representation generalization surpasses all baselines when analyzing stripped binaries.
Where Pith is reading between the lines
- The same merging approach could apply to other data-private continual learning settings such as updating fraud detectors without retaining past transactions.
- If new threats demand updates outside the low-rank subspace, adaptability might degrade in domains with very rapid change.
- Additional validation on vulnerability detection or different binary formats would test whether controlled forgetting holds beyond the two evaluated tasks.
- Pairing the arbitration step with ensemble techniques might further strengthen knowledge consolidation.
Load-bearing premise
That constraining parameter changes to low-rank and sparse subspaces for approximate orthogonality combined with confidence-guided arbitration will reliably control forgetting and enable effective knowledge consolidation without access to historical data or explicit replay.
What would settle it
A temporal malware detection experiment where RETROFIT shows retention below 30% or fails to exceed the oracle upper bound on new-data accuracy while preserving adaptability.
Figures
read the original abstract
Binary security has increasingly relied on deep learning to reason about malware behavior and program semantics. However, the performance often degrades as threat landscapes evolve and code representations shift. While continual learning (CL) offers a natural solution through sequential updates, most existing approaches rely on data replay or unconstrained updates, limiting their applicability and effectiveness in data-sensitive security environments. We propose RETROFIT, which regulates knowledge retention and adaptation with controlled forgetting at each update, without requiring historical data. Our key idea is to consolidate previously trained and newly fine-tuned models, serving as teachers of legacy and emergent knowledge, through retrospective-free parameter merging. Forgetting control is achieved by 1) constraining parameter changes to low-rank and sparse subspaces for approximate orthogonality, and 2) employing a confidence-guided arbitration mechanism to dynamically aggregate knowledge from both teachers. Our evaluation on two representative applications demonstrates that RETROFIT consistently mitigates forgetting while maintaining adaptability. In malware detection under temporal drift, it substantially improves the retention score, from 20.2% to 38.6% over CL baselines, and exceeds the oracle upper bound on new data. In binary summarization across decompilation levels, where analyzing stripped binaries is especially challenging, RETROFIT achieves over 2x the BLEU score of transfer learning used in prior work and surpasses all baselines in cross-representation generalization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes RETROFIT, a continual learning framework for binary security tasks that performs retrospective-free merging of a legacy model and a newly fine-tuned model. Forgetting is controlled by constraining updates to low-rank and sparse subspaces (intended to produce approximate orthogonality) combined with a confidence-guided arbitration mechanism that aggregates knowledge from both teachers without access to historical data. Experiments on malware detection under temporal drift report retention-score gains from 20.2% to 38.6% and exceed the oracle on new data; on binary summarization across decompilation levels the method claims more than 2x BLEU improvement over prior transfer-learning baselines and better cross-representation generalization.
Significance. If the central mechanism is shown to be sound, the work would offer a practical route to continual adaptation in data-sensitive security settings where replay is prohibited. The reported numerical gains are concrete and the two-task evaluation (temporal malware drift and multi-level decompilation) is relevant to the domain.
major comments (3)
- [§3.2] §3.2 (Parameter Merging and Subspace Constraints): The claim that restricting updates to low-rank and sparse subspaces produces 'approximate orthogonality' between legacy and new parameters is not supported by the given construction. Low-rank or sparse deltas can still have non-zero inner products with prior parameter directions; no explicit projection (e.g., Gram-Schmidt or orthogonal complement projection) is described. This step is load-bearing for the forgetting-control argument and must be supplied or the claim revised.
- [§4.1] §4.1 and Table 1 (Malware Detection Results): The retention-score improvement from 20.2% to 38.6% and the claim of exceeding the oracle upper bound on new data are presented without error bars, number of runs, or statistical tests. Because the central claim is that RETROFIT 'consistently mitigates forgetting,' these omissions make it impossible to judge whether the gains are reliable or merely point estimates.
- [§4.2] §4.2 (Binary Summarization): The assertion that RETROFIT surpasses all baselines in cross-representation generalization relies on BLEU scores that are more than double those of transfer learning. The paper must clarify whether the same low-rank/sparse constraint and arbitration are applied uniformly across decompilation levels and whether any representation-specific hyper-parameters were tuned; otherwise the generalization claim rests on an under-specified experimental protocol.
minor comments (2)
- [Abstract] The abstract states that RETROFIT 'exceeds the oracle upper bound on new data'; this counter-intuitive result should be explained in the main text with a precise definition of the oracle.
- [§3] Notation for the two teachers (legacy vs. emergent) and the arbitration weights should be introduced once and used consistently; current usage mixes 'teacher' and 'model' terminology.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on methodological clarity and experimental reporting. We address each major comment below and have made revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Parameter Merging and Subspace Constraints): The claim that restricting updates to low-rank and sparse subspaces produces 'approximate orthogonality' between legacy and new parameters is not supported by the given construction. Low-rank or sparse deltas can still have non-zero inner products with prior parameter directions; no explicit projection (e.g., Gram-Schmidt or orthogonal complement projection) is described. This step is load-bearing for the forgetting-control argument and must be supplied or the claim revised.
Authors: We agree that the original phrasing overstated the guarantee provided by the construction. The low-rank and sparse constraints limit update capacity and thereby reduce the potential for parameter interference, but they do not enforce orthogonality without an explicit projection step. In the revised manuscript we have removed the term 'approximate orthogonality' from §3.2, replaced it with a description of dimensionality reduction and empirical interference control, and added a short discussion of the observed forgetting mitigation. revision: yes
-
Referee: [§4.1] §4.1 and Table 1 (Malware Detection Results): The retention-score improvement from 20.2% to 38.6% and the claim of exceeding the oracle upper bound on new data are presented without error bars, number of runs, or statistical tests. Because the central claim is that RETROFIT 'consistently mitigates forgetting,' these omissions make it impossible to judge whether the gains are reliable or merely point estimates.
Authors: We accept that statistical reporting was insufficient. The revised version reports results averaged over five independent runs with standard-deviation error bars in Table 1 and includes a statistical significance analysis (paired t-tests, p < 0.05) confirming the retention gains. We have also clarified that the oracle comparison on new data reflects the arbitration mechanism's focus on current-task performance rather than a violation of the upper bound. revision: yes
-
Referee: [§4.2] §4.2 (Binary Summarization): The assertion that RETROFIT surpasses all baselines in cross-representation generalization relies on BLEU scores that are more than double those of transfer learning. The paper must clarify whether the same low-rank/sparse constraint and arbitration are applied uniformly across decompilation levels and whether any representation-specific hyper-parameters were tuned; otherwise the generalization claim rests on an under-specified experimental protocol.
Authors: The low-rank/sparse constraints and confidence-guided arbitration are applied uniformly across all decompilation levels. Hyper-parameters (rank, sparsity, arbitration thresholds) were selected once via validation-set search for the overall task and held fixed; no per-representation tuning was performed. The revised §4.2 and a new appendix table now document the exact hyper-parameter values and confirm the uniform protocol. revision: yes
Circularity Check
No significant circularity; derivation relies on proposed mechanisms with independent empirical validation.
full rationale
The paper introduces RETROFIT for continual learning in binary security tasks via retrospective-free parameter merging, low-rank/sparse subspace constraints for approximate orthogonality, and confidence-guided arbitration. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citation chains appear in the abstract or claims. The reported gains (e.g., retention score improvement from 20.2% to 38.6%, >2x BLEU) are presented as outcomes of the method applied to malware detection and binary summarization, remaining self-contained against external benchmarks without reducing to input definitions or prior author results by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
constraining parameter changes to low-rank and sparse subspaces for approximate orthogonality... confidence-guided arbitration mechanism
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RETROFIT... bounded forgetting... retrospective-free parameter merging
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Lamd: Context- driven android malware detection and classification with llms,
X. Qian, X. Zheng, Y . He, S. Yang, and L. Cavallaro, “Lamd: Context- driven android malware detection and classification with llms,” in 2025 IEEE Security and Privacy Workshops (SPW). IEEE, 2025, pp. 126–136
work page 2025
-
[2]
Msdroid: Identifying malicious snippets for android malware detection,
Y . He, Y . Liu, L. Wu, Z. Yang, K. Ren, and Z. Qin, “Msdroid: Identifying malicious snippets for android malware detection,”IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 3, pp. 2025–2039, 2023
work page 2025
-
[3]
On distribution shift in learning-based bug detectors,
J. He, L. Beurer-Kellner, and M. Vechev, “On distribution shift in learning-based bug detectors,” inInternational conference on machine learning. PMLR, 2022, pp. 8559–8580
work page 2022
-
[4]
Exploring{ChatGPT’s}capabilities on vulner- ability management,
P. Liu, J. Liu, L. Fu, K. Lu, Y . Xia, X. Zhang, W. Chen, H. Weng, S. Ji, and W. Wang, “Exploring{ChatGPT’s}capabilities on vulner- ability management,” in33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 811–828
work page 2024
-
[5]
Revisit- ing non-separable binary classification and its applications in anomaly detection,
M. Lau, I. SECK, A. P. Meliopoulos, W. Lee, and E. Ndiaye, “Revisit- ing non-separable binary classification and its applications in anomaly detection,”Transactions on Machine Learning Research, 2024. [Online]. Available: https://openreview.net/forum?id=zOJ846BXhl
work page 2024
-
[6]
Source code foundation models are transferable binary analysis knowledge bases,
Z. Su, X. Xu, Z. Huang, K. Zhang, and X. Zhang, “Source code foundation models are transferable binary analysis knowledge bases,” Advances in Neural Information Processing Systems, vol. 37, pp. 112 624–112 655, 2024
work page 2024
-
[7]
Bodmas: An open dataset for learning based temporal analysis of pe malware,
L. Yang, A. Ciptadi, I. Laziuk, A. Ahmadzadeh, and G. Wang, “Bodmas: An open dataset for learning based temporal analysis of pe malware,” in2021 IEEE Security and Privacy Workshops (SPW). IEEE, 2021, pp. 78–84
work page 2021
-
[8]
Evaluating the effec- tiveness of decompilers,
Y . Cao, R. Zhang, R. Liang, and K. Chen, “Evaluating the effec- tiveness of decompilers,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 491–502
work page 2024
-
[9]
2025 threat intelligence index,
IBM, “2025 threat intelligence index,” IBM Insti- tute for Business Value, Tech. Rep., Oct 2025. [Online]. Available: https://www.ibm.com/thought-leadership/ institute-business-value/en-us/report/2025-threat-intelligence-index
work page 2025
-
[10]
A comprehensive survey of continual learning: Theory, method and application,
L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: Theory, method and application,”IEEE transac- tions on pattern analysis and machine intelligence, vol. 46, no. 8, pp. 5362–5383, 2024
work page 2024
-
[11]
Continuous learning for android malware detection,
Y . Chen, Z. Ding, and D. Wagner, “Continuous learning for android malware detection,” in32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 1127–1144
work page 2023
-
[12]
Madar: Efficient continual learning for malware analysis with distribution-aware re- play,
M. S. Rahman, S. Coull, Q. Yu, and M. Wright, “Madar: Efficient continual learning for malware analysis with distribution-aware re- play,” inProceedings of the Conference on Applied Machine Learning in Information Security (CAMLIS), 2025
work page 2025
-
[13]
Transfer learning via learning to transfer,
W. Ying, Y . Zhang, J. Huang, and Q. Yang, “Transfer learning via learning to transfer,” inInternational conference on machine learning. PMLR, 2018, pp. 5085–5094
work page 2018
-
[14]
Enabling efficient privacy-assured outlier detection over encrypted incremental data sets,
S. Lai, X. Yuan, A. Sakzad, M. Salehi, J. K. Liu, and D. Liu, “Enabling efficient privacy-assured outlier detection over encrypted incremental data sets,”IEEE Internet of Things Journal, vol. 7, no. 4, pp. 2651–2662, 2019
work page 2019
-
[15]
On the conflict between robustness and learning in collaborative machine learning,
M. Raynal and C. Troncoso, “On the conflict between robustness and learning in collaborative machine learning,” in2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2025, pp. 2171–2189
work page 2025
-
[16]
Remind your neural network to prevent catastrophic forgetting,
T. L. Hayes, K. Kafle, R. Shrestha, M. Acharya, and C. Kanan, “Remind your neural network to prevent catastrophic forgetting,” in European conference on computer vision. Springer, 2020, pp. 466– 483
work page 2020
-
[17]
Extending source code pre-trained language models to summarise decompiled binaries,
A. Al-Kaswan, T. Ahmed, M. Izadi, A. A. Sawant, P. Devanbu, and A. van Deursen, “Extending source code pre-trained language models to summarise decompiled binaries,” in2023 IEEE International Con- ference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2023, pp. 260–271
work page 2023
-
[18]
Transcend: Detecting concept drift in malware classification models,
R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdi- nov, and L. Cavallaro, “Transcend: Detecting concept drift in malware classification models,” in26th USENIX security symposium (USENIX security 17), 2017, pp. 625–642
work page 2017
-
[19]
Catastrophic interference in con- nectionist networks: The sequential learning problem,
M. McCloskey and N. J. Cohen, “Catastrophic interference in con- nectionist networks: The sequential learning problem,” inPsychology of learning and motivation. Elsevier, 1989, vol. 24, pp. 109–165
work page 1989
-
[20]
Learning multiple visual domains with residual adapters,
S.-A. Rebuffi, H. Bilen, and A. Vedaldi, “Learning multiple visual domains with residual adapters,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[21]
Z. Li and D. Hoiem, “Learning without forgetting,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935–2947, 2017
work page 2017
-
[22]
Podnet: Pooled outputs distillation for small-tasks incremental learning,
A. Douillard, M. Cord, C. Ollion, T. Robert, and E. Valle, “Podnet: Pooled outputs distillation for small-tasks incremental learning,” in European Conference on Computer Vision. Springer, 2020, pp. 86– 102
work page 2020
-
[23]
Co2l: Contrastive continual learning,
H. Cha, J. Lee, and J. Shin, “Co2l: Contrastive continual learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 9516–9525
work page 2021
-
[24]
Is multi-task learning an upper bound for continual learning?
Z. Wu, H. Tran, H. Pirsiavash, and S. Kolouri, “Is multi-task learning an upper bound for continual learning?” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5
work page 2023
-
[25]
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
E. Yang, L. Shen, G. Guo, X. Wang, X. Cao, J. Zhang, and D. Tao, “Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities,”arXiv preprint arXiv:2408.07666, 2024
work page internal anchor Pith review arXiv 2024
-
[26]
A continual learning survey: Defy- ing forgetting in classification tasks,
M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defy- ing forgetting in classification tasks,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3366–3385, 2021
work page 2021
-
[27]
Persistent backdoor attacks in continual learning,
Z. Guo, A. Kumar, and R. Tourani, “Persistent backdoor attacks in continual learning,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 6379–6397
work page 2025
-
[28]
Expe- rience replay for continual learning,
D. Rolnick, A. Ahuja, J. Schwarz, T. Lillicrap, and G. Wayne, “Expe- rience replay for continual learning,”Advances in neural information processing systems, vol. 32, 2019
work page 2019
-
[29]
icarl: Incremental classifier and representation learning,
S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010
work page 2017
-
[30]
Overcoming catastrophic forgetting in neural networks,
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,”Pro- ceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017
work page 2017
-
[31]
Continual learning through synaptic intelligence,
F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” inInternational conference on machine learn- ing. PMLR, 2017, pp. 3987–3995
work page 2017
-
[32]
Memory aware synapses: Learning what (not) to forget,
R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuyte- laars, “Memory aware synapses: Learning what (not) to forget,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 139–154
work page 2018
-
[33]
Packnet: Adding multiple tasks to a single network by iterative pruning,
A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7765–7773
work page 2018
-
[34]
Overcoming catas- trophic forgetting with hard attention to the task,
J. Serra, D. Suris, M. Miron, and A. Karatzoglou, “Overcoming catas- trophic forgetting with hard attention to the task,” inInternational conference on machine learning. PMLR, 2018, pp. 4548–4557
work page 2018
-
[35]
Lifelong learning with dynamically expandable networks,
J. Yoon, E. Yang, J. Lee, and S. J. Hwang, “Lifelong learning with dynamically expandable networks,” in6th International Conference on Learning Representations, ICLR 2018, 2018
work page 2018
-
[36]
Convnext v2: Co-designing and scaling convnets with masked au- toencoders,
S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “Convnext v2: Co-designing and scaling convnets with masked au- toencoders,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 16 133–16 142
work page 2023
-
[37]
Efficientnetv2: Smaller models and faster train- ing,
M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster train- ing,” inInternational conference on machine learning. PMLR, 2021, pp. 10 096–10 106
work page 2021
-
[38]
Tesseract: Eliminating experimental bias in malware classifi- cation across space and time,
F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Caval- laro, “Tesseract: Eliminating experimental bias in malware classifi- cation across space and time,” in28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 729–746
work page 2019
-
[39]
S. H. Ding, B. C. Fung, and P. Charland, “Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization,” in2019 ieee symposium on security and privacy (sp). IEEE, 2019, pp. 472–489
work page 2019
-
[40]
Transcend- ing transcend: Revisiting malware classification in the presence of concept drift,
F. Barbero, F. Pendlebury, F. Pierazzi, and L. Cavallaro, “Transcend- ing transcend: Revisiting malware classification in the presence of concept drift,” in2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, pp. 805–823
work page 2022
-
[41]
P. Ren, C. Zuo, X. Liu, W. Diao, Q. Zhao, and S. Guo, “Demistify: Identifying on-device machine learning models stealing and reuse vulnerabilities in mobile apps,” in2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). IEEE Computer So- ciety, 2023, pp. 468–480
work page 2023
-
[42]
S. Shao, Y . Li, H. Yao, Y . He, Z. Qin, and K. Ren, “Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution,” inNetwork and Distributed System Security Symposium (NDSS), 2025
work page 2025
-
[43]
M. Wortsman, G. Ilharco, S. Y . Gadre, R. Roelofs, R. Gontijo-Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y . Carmon, S. Kornblith et al., “Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time,” inInterna- tional conference on machine learning. PMLR, 2022, pp. 23 965– 23 998
work page 2022
-
[44]
Lora: Low-rank adaptation of large language mod- els
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language mod- els.”ICLR, vol. 1, no. 2, p. 3, 2022
work page 2022
-
[45]
Evaluating model calibration in classification,
J. Vaicenavicius, D. Widmann, C. Andersson, F. Lindsten, J. Roll, and T. Sch¨on, “Evaluating model calibration in classification,” inThe 22nd international conference on artificial intelligence and statistics. PMLR, 2019, pp. 3459–3467
work page 2019
-
[46]
Calibration of large language models on code summarization,
Y . Virk, P. Devanbu, and T. Ahmed, “Calibration of large language models on code summarization,”Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 2944–2964, 2025
work page 2025
-
[47]
Comparing kullback-leibler divergence and mean squared error loss in knowl- edge distillation,
T. Kim, J. Oh, N. Y . Kim, S. Cho, and S.-Y . Yun, “Comparing kullback-leibler divergence and mean squared error loss in knowl- edge distillation,” in30th International Joint Conference on Artificial Intelligence (IJCAI-21). IJCAI, 2021, pp. 2628–2635
work page 2021
-
[48]
Drebin: Effective and explainable detection of android malware in your pocket
D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, “Drebin: Effective and explainable detection of android malware in your pocket.” inNdss, vol. 14, no. 1, 2014, pp. 23–26
work page 2014
-
[49]
Adversarial examples for malware detection,
K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversarial examples for malware detection,” inEuropean sympo- sium on research in computer security. Springer, 2017, pp. 62–79
work page 2017
-
[50]
Y . Wang, W. Wang, S. Joty, and S. C. Hoi, “Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 8696–8708
work page 2021
-
[51]
Editing Models with Task Arithmetic
G. Ilharco, M. T. Ribeiro, M. Wortsman, S. Gururangan, L. Schmidt, H. Hajishirzi, and A. Farhadi, “Editing models with task arithmetic,” arXiv preprint arXiv:2212.04089, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[52]
Ties- merging: Resolving interference when merging models,
P. Yadav, D. Tam, L. Choshen, C. A. Raffel, and M. Bansal, “Ties- merging: Resolving interference when merging models,”Advances in Neural Information Processing Systems, vol. 36, pp. 7093–7115, 2023
work page 2023
-
[53]
Adamerging: Adaptive model merging for multi-task learning
E. Yang, Z. Wang, L. Shen, S. Liu, G. Guo, X. Wang, and D. Tao, “Adamerging: Adaptive model merging for multi-task learning,”arXiv preprint arXiv:2310.02575, 2023
-
[54]
Bleu: a method for automatic evaluation of machine translation,
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” inProceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318
work page 2002
-
[55]
Meteor: An automatic metric for mt evaluation with improved correlation with human judgments,
S. Banerjee and A. Lavie, “Meteor: An automatic metric for mt evaluation with improved correlation with human judgments,” in Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp. 65–72
work page 2005
-
[56]
Dexbert: Effective, task-agnostic and fine-grained representation learning of android bytecode,
T. Sun, K. Allix, K. Kim, X. Zhou, D. Kim, D. Lo, T. F. Bissyand ´e, and J. Klein, “Dexbert: Effective, task-agnostic and fine-grained representation learning of android bytecode,”IEEE Transactions on Software Engineering, vol. 49, no. 10, pp. 4691–4706, 2023
work page 2023
-
[57]
Code Llama: Open Foundation Models for Code
B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, R. Sauvestre, T. Remezet al., “Code llama: Open foundation models for code,”arXiv preprint arXiv:2308.12950, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[58]
A simple frame- work for contrastive learning of visual representations,
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple frame- work for contrastive learning of visual representations,” inInterna- tional conference on machine learning. PmLR, 2020, pp. 1597–1607
work page 2020
-
[59]
In- variant risk minimization games,
K. Ahuja, K. Shanmugam, K. Varshney, and A. Dhurandhar, “In- variant risk minimization games,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 145–155
work page 2020
-
[60]
Learning temporal invariance in android malware detectors,
X. Zheng, S. Yang, E. C. Ngai, S. Jana, and L. Cavallaro, “Learning temporal invariance in android malware detectors,”arXiv preprint arXiv:2502.05098, 2025
-
[61]
Enhancing state-of-the-art classifiers with api se- mantics to detect evolved android malware,
X. Zhang, Y . Zhang, M. Zhong, D. Ding, Y . Cao, Y . Zhang, M. Zhang, and M. Yang, “Enhancing state-of-the-art classifiers with api se- mantics to detect evolved android malware,” inProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 757–770
work page 2020
-
[62]
Exploiting code symmetries for learning program seman- tics,
K. Pei, W. Li, Q. Jin, S. Liu, S. Geng, L. Cavallaro, J. Yang, and S. Jana, “Exploiting code symmetries for learning program seman- tics,” inProceedings of the 41st International Conference on Machine Learning, 2024, pp. 40 092–40 113
work page 2024
-
[63]
A survey on multi-task learning,
Y . Zhang and Q. Yang, “A survey on multi-task learning,”IEEE transactions on knowledge and data engineering, vol. 34, no. 12, pp. 5586–5609, 2021
work page 2021
-
[64]
Cross-stitch net- works for multi-task learning,
I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch net- works for multi-task learning,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3994–4003
work page 2016
-
[65]
π-tuning: Transferring multimodal foundation models with optimal multi-task interpolation,
C. Wu, T. Wang, Y . Ge, Z. Lu, R. Zhou, Y . Shan, and P. Luo, “π-tuning: Transferring multimodal foundation models with optimal multi-task interpolation,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 37 713–37 727
work page 2023
-
[66]
Learning under concept drift: A review,
J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under concept drift: A review,”IEEE transactions on knowledge and data engineering, vol. 31, no. 12, pp. 2346–2363, 2018
work page 2018
-
[67]
Y . He, J. Lei, Z. Qin, K. Ren, and C. Chen, “Combating concept drift with explanatory detection and adaptation for android malware classification,”arXiv preprint arXiv:2405.04095, 2024
-
[68]
{CADE}: Detecting and explaining concept drift sam- ples for security applications,
L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. Wang, “{CADE}: Detecting and explaining concept drift sam- ples for security applications,” in30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2327–2344
work page 2021
-
[69]
Beyond classification: Evaluating llms for fine-grained automatic malware behavior auditing,
X. Zheng, X. Qian, Y . He, S. Yang, and L. Cavallaro, “Beyond classification: Evaluating llms for fine-grained automatic malware behavior auditing,”arXiv preprint arXiv:2509.14335, 2025
-
[70]
Large language models for code analysis: Do LLMs really do their job?
C. Fang, N. Miao, S. Srivastav, J. Liu, R. Zhang, R. Fang, Asmita, R. Tsang, N. Nazari, H. Wang, and H. Homayoun, “Large language models for code analysis: Do LLMs really do their job?” in33rd USENIX Security Symposium (USENIX Security 24). Philadelphia, PA: USENIX Association, Aug. 2024, pp. 829–846
work page 2024
-
[71]
Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models,
X. Jin, J. Larson, W. Yang, and Z. Lin, “Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models,”arXiv preprint arXiv:2312.09601, 2023
-
[72]
On benchmarking code llms for android malware analysis,
Y . He, H. She, X. Qian, X. Zheng, Z. Chen, Z. Qin, and L. Caval- laro, “On benchmarking code llms for android malware analysis,” in Proceedings of the 34th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2025, pp. 153–160. Appendix A. Model-level Bound and Interference While the main text expresses the update rule using a single wei...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.