pith. sign in

arxiv: 2605.19261 · v1 · pith:XUGZCGNInew · submitted 2026-05-19 · 💻 cs.SE · cs.AI· cs.HC· cs.PL

When Web Apps Heal Themselves: A MAPE-K Based Approach to Fault Tolerance and Adaptive Recovery

Pith reviewed 2026-05-20 05:04 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.HCcs.PL
keywords self-healing systemsMAPE-Kfault toleranceweb applicationsadaptive recoveryfault injectionruntime failuresAutoFix
0
0 comments X

The pith

A MAPE-K framework with AutoFix detects web app faults at 90.7% F1-score and recovers 56.2% faster.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a self-healing framework for web applications built around the MAPE-K loop of monitor, analyze, plan, and execute actions drawn from a shared knowledge base. It adds an AutoFix module that applies recovery steps and uses feedback to refine them over time. Controlled tests across twenty injected faults such as crashes, leaks, and disconnections produced strong detection and recovery numbers while keeping overall performance steady. A reader would care because web apps run in unpredictable environments where automatic fixes could cut downtime and manual work.

Core claim

The authors claim that their modular self-healing framework, built on the MAPE-K model and incorporating an AutoFix-inspired adaptive mechanism, delivers effective fault tolerance for web applications. Evaluation through design and development research with fault injection in twenty scenarios yielded a mean fault detection F1-score of 90.7 percent, a 93.2 percent recovery success rate, and a 56.2 percent reduction in time-to-recovery down to an average of 3.92 seconds, alongside stable throughput and gains from feedback iterations.

What carries the argument

The MAPE-K loop of monitor-analyze-plan-execute over a shared knowledge base, paired with the AutoFix module that selects and refines recovery actions through iterative feedback.

If this is right

  • System throughput remains between 88 and 95 percent even while faults are active.
  • Average response time rises by only 3.1 percent under fault conditions.
  • Iterative feedback raises recovery efficiency by 18.6 percent across repeated cycles.
  • The framework supplies a concrete starting point for building more autonomous self-healing web applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Production deployments might surface fault types absent from the twenty controlled scenarios, requiring updates to the recovery library.
  • Embedding learning algorithms in the knowledge base could let the system invent new fixes instead of depending only on predefined ones.
  • The same monitor-analyze-plan-execute structure could be adapted to improve resilience in related systems such as microservice clusters or cloud services.

Load-bearing premise

The twenty runtime failure scenarios created through controlled fault injection accurately represent the range and frequency of faults encountered in real-world production web application environments.

What would settle it

Deploying the framework on a live production web application and measuring its actual fault detection F1-score and recovery times against the controlled-experiment results over several weeks of normal operation.

Figures

Figures reproduced from arXiv: 2605.19261 by Rov Japheth Oracion, Sales Aribe Jr.

Figure 2
Figure 2. Figure 2: Average recovery time across baseline methods and environments [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Learning curve of feedback-enhanced autofix over iterations 4. DISCUSSION 4.1. Addressing gaps in previous research This study investigated how a MAPE-K-based framework integrated with AutoFix intelligence can enhance the self-healing capability of web applications. While earlier research has explored self-healing mechanisms at the infrastructure level through container orchestration tools such as Kubernet… view at source ↗
read the original abstract

Ensuring the reliability and resilience of modern web applications remains a critical challenge due to increasing system complexity and dynamic runtime environments. This study proposes a modular self-healing framework based on the monitor-analyze-plan-execute over a shared knowledge base (MAPE-K) model, integrated with an AutoFix-inspired mechanism for adaptive fault recovery. Using a design and development research (DDR) approach, the system was implemented and evaluated through controlled fault injection experiments across twenty runtime failure scenarios, including service crashes, memory leaks, and database disconnections. Experimental results demonstrate that the proposed framework achieved a mean fault detection F1-score of 90.7% and a recovery success rate of 93.2%. The AutoFix module reduced the average time-to-recovery (TTR) by 56.2%, achieving an average recovery time of 3.92 seconds. System throughput was maintained between 88% and 95% during fault conditions, with only a 3.1% increase in response time. Additionally, iterative feedback mechanisms improved recovery efficiency by 18.6% over multiple cycles. These findings indicate that the proposed framework provides a practical and extensible approach to enhancing fault tolerance in web applications through feedback-driven adaptation. While the current implementation relies on predefined recovery strategies, the integration of learning-oriented feedback establishes a foundation for future development of more autonomous self-healing systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a modular self-healing framework for web applications based on the MAPE-K model integrated with an AutoFix-inspired mechanism for adaptive fault recovery. Implemented via a design and development research approach, the system is evaluated through controlled fault injection experiments across twenty runtime failure scenarios (including service crashes, memory leaks, and database disconnections). It reports a mean fault detection F1-score of 90.7%, a recovery success rate of 93.2%, a 56.2% reduction in average time-to-recovery (TTR) to 3.92 seconds, throughput maintained between 88% and 95% with a 3.1% response time increase, and an 18.6% improvement in recovery efficiency from iterative feedback mechanisms.

Significance. If the results hold under more rigorous validation, the work offers a practical, extensible approach to fault tolerance in dynamic web applications by combining the MAPE-K loop with feedback-driven adaptation. The concrete metrics on detection accuracy, recovery speed, and system performance under faults provide a foundation for future autonomous self-healing systems, though the current reliance on predefined strategies limits full autonomy claims.

major comments (2)
  1. Evaluation section: The twenty runtime failure scenarios created through controlled fault injection are presented without a quantitative mapping to observed fault frequencies from production logs, without comparison to public web-app failure datasets, and without sensitivity analysis showing how the F1-score of 90.7% or recovery success rate of 93.2% change when fault probabilities are altered. This makes the central performance claims difficult to extrapolate beyond the chosen test harness.
  2. Results and Abstract: The manuscript states specific performance numbers (mean F1-score of 90.7%, recovery success of 93.2%, TTR reduction of 56.2%) but supplies no information on baselines, statistical tests, error bars, or the precise calculation methods for F1-score and recovery success. This is load-bearing for assessing whether the data support the stated claims.
minor comments (2)
  1. Abstract: The claim that 'iterative feedback mechanisms improved recovery efficiency by 18.6% over multiple cycles' lacks detail on the number of cycles, the exact efficiency metric used, or how the improvement was measured.
  2. Notation and presentation: Ensure consistent use of terms such as 'AutoFix module' versus 'AutoFix-inspired mechanism' across sections to avoid ambiguity in describing the adaptive recovery component.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We are grateful to the referee for the valuable feedback on our manuscript. The comments highlight important aspects for improving the rigor of our evaluation and results reporting. We have prepared point-by-point responses and indicate where revisions will be incorporated.

read point-by-point responses
  1. Referee: Evaluation section: The twenty runtime failure scenarios created through controlled fault injection are presented without a quantitative mapping to observed fault frequencies from production logs, without comparison to public web-app failure datasets, and without sensitivity analysis showing how the F1-score of 90.7% or recovery success rate of 93.2% change when fault probabilities are altered. This makes the central performance claims difficult to extrapolate beyond the chosen test harness.

    Authors: We recognize the importance of grounding the evaluation in real-world data. The twenty scenarios were carefully chosen to cover a representative range of common web application faults, including crashes, leaks, and disconnections, based on established taxonomies in the software engineering literature. However, obtaining quantitative mappings from production logs would require access to proprietary data from specific deployments, which was beyond the scope of this controlled study. We will expand the Evaluation section to provide a detailed rationale for scenario selection, citing relevant prior work on web app failures. We commit to performing a sensitivity analysis by adjusting fault injection probabilities and reporting the impact on key metrics. A comparison to public datasets will be discussed as a future direction, noting that suitable standardized datasets for runtime web app faults are limited. revision: partial

  2. Referee: Results and Abstract: The manuscript states specific performance numbers (mean F1-score of 90.7%, recovery success of 93.2%, TTR reduction of 56.2%) but supplies no information on baselines, statistical tests, error bars, or the precise calculation methods for F1-score and recovery success. This is load-bearing for assessing whether the data support the stated claims.

    Authors: We agree that additional details are necessary to substantiate the reported figures. The mean F1-score of 90.7% was derived from aggregating detection performance across all scenarios, using standard definitions of precision and recall where a detection is considered correct if the fault type and location are accurately identified within a time window. The recovery success rate of 93.2% reflects the fraction of cases where the planned recovery actions fully restored the application state. To address this, we will revise the manuscript to include baseline comparisons (e.g., to threshold-based monitoring without MAPE-K), specify the exact formulas and data used for calculations, report standard deviations or confidence intervals from repeated experimental runs, and include appropriate statistical tests to validate the significance of the 56.2% TTR reduction. These changes will be made in the revised version. revision: yes

standing simulated objections not resolved
  • The absence of a quantitative mapping to production logs and direct comparisons to public datasets, as the study was based on controlled experiments without access to such real-world data sources.

Circularity Check

0 steps flagged

No circularity: results are direct experimental measurements

full rationale

The manuscript presents an implemented MAPE-K self-healing framework evaluated via controlled fault-injection experiments on twenty predefined scenarios. No equations, derivations, fitted parameters, or mathematical predictions appear in the provided text or abstract. Performance metrics (F1-score, recovery success, TTR reduction) are reported as observed outcomes of the test harness rather than quantities derived from or equivalent to the input assumptions by construction. No self-citation chains, ansatzes, or renamings of known results are invoked as load-bearing steps. The evaluation therefore remains self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on the standard MAPE-K model from prior literature without introducing new postulates.

pith-pipeline@v0.9.0 · 5786 in / 1145 out tokens · 53447 ms · 2026-05-20T05:04:20.703533+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Yazdanparast, Using rule engine in self-healing systems and MAPE model

    Z. Yazdanparast, Using rule engine in self-healing systems and MAPE model. arXiv preprint arXiv:2402.11581, 2024

  2. [2]

    A two phases self-healing framework for service-oriented systems,

    A. Alhosban, Z. Malik, K. Hashmi, B. Medjahed, and H. Al -Ababneh, “A two phases self-healing framework for service-oriented systems,” ACM Transactions on the Web, vol. 15, no. 2, pp. 1–25, Apr. 2021, doi: 10.1145/3450443

  3. [3]

    Proactive self‐healing techniques for cloud computing: a systematic review,

    S. R. Rouholamini, M. Mirabi, R. Farazkish, and A. Sahafi, “Proactive self‐healing techniques for cloud computing: a systematic review,” Concurrency and Computation: Practice and Experience, vol. 36, no. 24, Aug. 2024, doi: 10.1002/cpe.8246

  4. [4]

    Analyzing docker vulnerabilities through static an d dynamic methods and enhancing IoT security with AWS IoT core, CloudWatch, and GuardDuty,

    V. Ajith, T. Cyriac, C. Chavda, A. T. Kiyani, V. Chennareddy, and K. Ali, “Analyzing docker vulnerabilities through static an d dynamic methods and enhancing IoT security with AWS IoT core, CloudWatch, and GuardDuty,” IoT, vol. 5, no. 3, pp. 592–607, Sep. 2024, doi: 10.3390/iot5030026

  5. [5]

    Beyond containers: orchestrating microserv ices with minikube, kubernetes, docker, and compose for seamless deployment and scalability,

    F. Eyvazov, T. E. Ali, F. I. Ali, and A. D. Zoltan, “Beyond containers: orchestrating microserv ices with minikube, kubernetes, docker, and compose for seamless deployment and scalability,” in 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) , Mar. 2024, pp. 1 –6, doi: 10.1109...

  6. [6]

    Ma -Ease: an android-based technology for corn production and management,

    S. G. Aribe Jr, J. M. H. Turtosa, J. M. B. Yamba, and A. B. Jamisola, “Ma -Ease: an android-based technology for corn production and management,” Pertanika Journal of Science and Technology, vol. 27, no. 1, 2019

  7. [7]

    NotiPower: a mobile -based power advisory for bukidnon second electric cooperative, inc. consumers,

    S. G. Aribe Jr , J. M. Q. Vedra, J. M. Ladion, and A. S. Tablazon, “NotiPower: a mobile -based power advisory for bukidnon second electric cooperative, inc. consumers,” International Journal of Multidisciplinary Research and Publications , vol. 2, no. 1, pp. 35–42

  8. [8]

    Developing digital research portal for bukidnon state university’s scholarly work,

    K. J. R. Caseres, R. P. Cruz, L. A. T. Gonzales, P. G. Mary L. Tapayan, and S. Aribe Jr., “Developing digital research portal for bukidnon state university’s scholarly work,” SSRN Electronic Journal, 2025, doi: 10.2139/ssrn.5389800

  9. [9]

    An android -based ubiquitous notification application for Bukidnon State University,

    S. G. Aribe Jr, C. C. Yabes, M. V. G. Jamago, K. I. L. Rayos, H. Toledo Rebosura, and J. J. B. Gonzales, “An android -based ubiquitous notification application for Bukidnon State University,” Pertanika Journal of Science and Technology , vol. 27, no. 2, 2019

  10. [10]

    Moodle learning system as an effective tool for implementing the innovation policy of the university,

    A. Sibgatullina, R. Ivanova, and E. Yushchik, “Moodle learning system as an effective tool for implementing the innovation policy of the university,” International Journal of Web -Based Learning and Teaching Technologies , vol. 17, no. 1, pp. 1 –12, Mar. 2022, doi: 10.4018/ijwltt.298683

  11. [11]

    A survey on automatic bug fixing,

    H. Cao, Y. Meng, J. Shi, L. Li, T. Liao, and C. Zhao, “A survey on automatic bug fixing,” in 2020 6th International Symposium on System and Software Reliability (ISSSR), Oct. 2020, pp. 122–131, doi: 10.1109/isssr51244.2020.00029

  12. [12]

    A literature review on automated code repair,

    T. Mamatha, B. R. S. Reddy, and C. S. Bindu, “A literature review on automated code repair,” in Proceedings of the 2nd International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications , Springer Nature Singapore, 2022, pp. 249–260

  13. [13]

    Self -healing control: review, framework, and prospect,

    H. Liang and X. Yin, “Self -healing control: review, framework, and prospect,” IEEE Access , vol. 11, pp. 79495 –79512, 2023, doi: 10.1109/access.2023.3298554

  14. [14]

    Self -healing autonomous software code development,

    S. K. Jangam, “Self -healing autonomous software code development,” International Journal of Emerging Trends in Computer Science and Information Technology, vol. 3, no. 4, pp. 42–52, 2022

  15. [15]

    Optimization and prediction techniques for self -healing and self -learning applications in a trustworthy cloud continuum,

    J. Alonso et al. , “Optimization and prediction techniques for self -healing and self -learning applications in a trustworthy cloud continuum,” Information, vol. 12, no. 8, p. 308, Jul. 2021, doi: 10.3390/info12080308

  16. [16]

    Adaptive fault detection and emergency control of autonomous vehicles fo r fail-safe systems using a sliding mode a pproach,

    J. Lee, K. Oh, Y. Yoon, T. Song, T. Lee, and K. Yi, “Adaptive fault detection and emergency control of autonomous vehicles fo r fail-safe systems using a sliding mode a pproach,” IEEE Access , vol. 10, pp. 27863 –27880, 2022, doi: 10.1109/access.2022.3155738

  17. [17]

    A review of monitoring probes for cloud computing continuum,

    Y. Verginadis, “A review of monitoring probes for cloud computing continuum,” in Advanced Information Networking and Applications, Springer International Publishing, 2023, pp. 631–643

  18. [18]

    Design and development research (DDR) approach in designing design thinking chemistry module to empower students’ innovation competencies,

    N. M. Aris, “Design and development research (DDR) approach in designing design thinking chemistry module to empower students’ innovation competencies,” Journal of Advanced Research in Applied Sciences and Engineering Technology, vol. 44, no. 1, pp. 55–68, Apr. 2024, doi: 10.37934/araset.44.1.5568

  19. [19]

    Self -healing test automation framework using AI and ML,

    S. Saarathy, S. Bathrachalam, and B. Rajendran, “Self -healing test automation framework using AI and ML,” International Journal of Strategic Management, vol. 3, no. 3, pp. 45–77, Aug. 2024, doi: 10.47604/ijsm.2843

  20. [20]

    Performance analysis of anomaly detection methods for application system on kubernetes with auto - scaling and self -healing,

    Y. Matsuo and D. Ikegami, “Performance analysis of anomaly detection methods for application system on kubernetes with auto - scaling and self -healing,” in 2021 17th International Conference on Network and Service Management (CNSM) , Oct. 2021, pp. 464–472, doi: 10.23919/cnsm52442.2021.9615544

  21. [21]

    Test automation revisited: comparative analysis of tools and frameworks for scalable software testing,

    S. Dubey, “Test automation revisited: comparative analysis of tools and frameworks for scalable software testing,” International Journal for Research in Applied Science and Engineering Technology , vol. 13, no. 9, pp. 207 –216, Sep. 2025, doi: 10.22214/ijraset.2025.73663

  22. [22]

    Self -adaptive systems planning with model checking using MAPE -K,

    A. E. M. Da Silva, A. M. S. Andrade, and S. S. Andrade, “Self -adaptive systems planning with model checking using MAPE -K,” in Anais do XXI Workshop de Testes e Tolerância a Falhas (WTF 2020), Dec. 2020, pp. 69–82, doi: 10.5753/wtf.2020.12488

  23. [23]

    Differential optimization testing of gremlin -based graph database systems,

    Y. Zheng et al. , “Differential optimization testing of gremlin -based graph database systems,” in 2024 IEEE Conference on Software Testing, Verification and Validation (ICST), May 2024, pp. 25–36, doi: 10.1109/icst60714.2024.00012

  24. [24]

    The role of chaos engineering in devops for software robustness,

    N. A. Mhatre and M. S. Kulkarni, “The role of chaos engineering in devops for software robustness,” in Applied Intelligence and Computing, Soft Computing Research Society, 2024, pp. 9–17

  25. [25]

    Software engineering revolutionized by machine learning -powered self -healing systems,

    J. Patel and H. Shah, “Software engineering revolutionized by machine learning -powered self -healing systems,” International Research Journal Of Engineering & Applied Sciences, vol. 9, no. 1, pp. 43–49, 2021, doi: 10.55083/irjeas.2021.v09i01008

  26. [26]

    Applying machine learning in self -adaptive systems: a systematic literature review,

    O. Gheibi, D. Weyns, and F. Quin, “Applying machine learning in self -adaptive systems: a systematic literature review,” ACM Transactions on Autonomous and Adaptive Systems, vol. 15, no. 3, pp. 1–37, Sep. 2020, doi: 10.1145/3469440

  27. [27]

    Yazdanparast, A survey on self-healing software system

    Z. Yazdanparast, A survey on self-healing software system. arXiv preprint arXiv:2403.00455, 2024

  28. [28]

    Kubernetes and docker load balancing: state -of-the-art techniques and challenges,

    I. Vasireddy, G. Ramya, and P. Kandi, “Kubernetes and docker load balancing: state -of-the-art techniques and challenges,” International Journal of Innovative Research in Engineering and Management , vol. 10, no. 6, pp. 49 –54, Dec. 2023, doi: 10.55524/ijirem.2023.10.6.7.  ISSN: 2252-8776 Int J Inf & Commun Technol, Vol. 15, No. 2, June 2026: 729-740 740...