Active Inference-Based Adaptive Routing for Heterogeneous Edge AI Services
Pith reviewed 2026-05-10 05:56 UTC · model grok-4.3
The pith
Active inference guides routing for edge AI services by minimizing expected free energy from real-time metrics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AIF-Router performs Bayesian state inference and expected free energy minimization to guide routing decisions based on observability-driven real-time metrics, resulting in stable online learning behavior for adaptive AI service orchestration in unreliable edge environments.
What carries the argument
The expected free energy minimization process within the active inference framework, which selects routing actions by evaluating the trade-off between information gain and preferred outcomes based on inferred states.
Load-bearing premise
That real-time Bayesian state inference and expected free energy minimization can be performed reliably on unstable edge nodes using only observability-driven metrics without offline training or additional safeguards.
What would settle it
Running AIF-Router on physical edge hardware with induced instability, such as random node failures or metric noise, and measuring whether the learning curve remains stable and outperforms non-adaptive routing in terms of service quality metrics.
Figures
read the original abstract
Edge computing enables AI inference closer to data sources, reducing latency and bandwidth costs. However, orchestrating AI services across the cloud-edge continuum remains challenging due to dynamic workloads and infrastructure variability. We present AIF-Router, an Active Inference--based routing framework that autonomously learns to balance latency, throughput, and resource utilization across multi-tier AI services without offline training. AIF-Router performs Bayesian state inference and expected free energy minimization to guide routing decisions based on observability-driven real-time metrics. Despite device instability on edge nodes, AIF-Router exhibits stable online learning behavior and demonstrates the feasibility of applying Active Inference for adaptive AI service orchestration in unreliable edge environments. Our findings highlight both the promise and practical challenges of deploying self-adaptive decision-making frameworks for real-world edge AI systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AIF-Router, an Active Inference-based routing framework for orchestrating heterogeneous AI services across the cloud-edge continuum. It claims to enable autonomous adaptation by performing real-time Bayesian state inference and expected free energy minimization on observability-driven metrics, balancing latency, throughput, and resource utilization without offline training. The central result is that the framework exhibits stable online learning behavior and demonstrates the feasibility of active inference for adaptive AI service orchestration despite device instability on edge nodes.
Significance. If substantiated, the work would provide a principled, self-adaptive alternative to heuristic or supervised routing methods in dynamic edge environments, potentially influencing designs for autonomous orchestration in unreliable distributed AI systems. It directly addresses practical challenges of workload variability and infrastructure instability while highlighting deployment challenges for active inference frameworks.
major comments (3)
- Abstract: The claim that AIF-Router 'exhibits stable online learning behavior' and 'demonstrates the feasibility' is unsupported by any quantitative results, error bars, baseline comparisons, performance traces, or implementation details, making verification of the central claim impossible.
- The manuscript provides no complexity bounds, state-space dimensionality, variational inference approximation details, message-passing schedule, or per-decision latency measurements for the real-time Bayesian state inference and expected free energy minimization steps. These omissions leave the computational tractability and stability under device instability and variable workloads unverified.
- No description of experiments, workloads, failure models, or robustness metrics is supplied to test behavior under partial observability from node failures, which is required to substantiate the 'stable online learning' result given the stress-test concern about inference reliability on unstable nodes.
minor comments (1)
- The abstract would benefit from explicit definition of the observability-driven metrics and the precise form of the expected free energy used for policy selection.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive review. The comments highlight important gaps in empirical support, technical specifications, and experimental details that we agree must be addressed to substantiate the central claims. We will perform a major revision incorporating quantitative results, implementation specifics, and experimental descriptions. Point-by-point responses follow.
read point-by-point responses
-
Referee: Abstract: The claim that AIF-Router 'exhibits stable online learning behavior' and 'demonstrates the feasibility' is unsupported by any quantitative results, error bars, baseline comparisons, performance traces, or implementation details, making verification of the central claim impossible.
Authors: We agree that the abstract's claims require explicit empirical backing. In the revised manuscript, we will modify the abstract to reference specific quantitative outcomes from our evaluations, such as convergence rates of expected free energy with standard deviations, latency/throughput improvements over baselines (e.g., round-robin and load-balancing heuristics), and traces of online adaptation under varying workloads. Implementation details, including the observability metric collection pipeline, will be summarized with pointers to the full experimental section. revision: yes
-
Referee: The manuscript provides no complexity bounds, state-space dimensionality, variational inference approximation details, message-passing schedule, or per-decision latency measurements for the real-time Bayesian state inference and expected free energy minimization steps. These omissions leave the computational tractability and stability under device instability and variable workloads unverified.
Authors: We acknowledge these omissions limit assessment of real-time feasibility. The revision will add a dedicated subsection on the active inference implementation: state-space dimensionality (defined over latency, throughput, CPU/memory utilization, and node availability variables), variational approximations (mean-field variational inference with factorized posteriors), message-passing schedule (loopy belief propagation with fixed iteration limits), asymptotic complexity (linear in the number of edge nodes per decision), and empirical per-decision latencies measured on representative hardware. These additions will directly address tractability under instability. revision: yes
-
Referee: No description of experiments, workloads, failure models, or robustness metrics is supplied to test behavior under partial observability from node failures, which is required to substantiate the 'stable online learning' result given the stress-test concern about inference reliability on unstable nodes.
Authors: We concur that the lack of experimental methodology prevents verification of stability claims. We will expand the manuscript with a full experimental evaluation section detailing: workloads (heterogeneous AI services with Poisson arrivals and varying model sizes), failure models (random node crashes inducing partial observability with configurable failure rates), robustness metrics (free energy variance, routing decision stability, SLA violation rates, and online learning convergence under stress), and results from simulated and emulated edge environments. This will provide the required evidence for stable behavior despite device instability. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper applies established active inference (Bayesian inference + expected free energy minimization) drawn from prior literature to an edge routing task and reports empirical stability results. No equations, self-citations, or parameter-fitting steps are shown that reduce the claimed 'stable online learning' outcome to a tautology or to the same fitted data used for validation. The derivation chain remains independent of its inputs and relies on external active-inference foundations plus new experimental observations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Machine Learning47(2–3), 235–256 (2002)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning47(2–3), 235–256 (2002)
work page 2002
-
[2]
Burns, B., Grant, B., Oppenheimer, D., Brewer, E., Wilkes, J.: Borg, omega, and kubernetes: Lessons learned from three container-management systems over a decade. ACM Queue14(1), 70–93 (2016)
work page 2016
-
[3]
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems:Across-industry studyofmapreduceworkloads.ProceedingsoftheVLDB Endowment5(12), 1802–1813 (2012)
work page 2012
-
[4]
In: Proceedings of the ACM Symposium on Cloud Computing (SoCC) (2020)
Crankshaw,D.,Sela,G.I.,Mo,X.,Zumar,C.,Stoica,I.,Gonzalez,J.,Tumanov,A.: Inferline: Latency-aware provisioning and scaling for prediction serving pipelines. In: Proceedings of the ACM Symposium on Cloud Computing (SoCC) (2020)
work page 2020
-
[5]
Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., Stoica, I.: Clipper: A low-latency online prediction serving system. In: USENIX NSDI (2017)
work page 2017
-
[6]
IEEE Internet of Things Journal7(8), 7457–7469 (2020)
Deng, S., Zhao, H., Fang, W., Yin, J., Dustdar, S., Zomaya, A.Y.: Edge intelli- gence: The confluence of edge computing and artificial intelligence. IEEE Internet of Things Journal7(8), 7457–7469 (2020)
work page 2020
-
[7]
Frazier, P.I.: A tutorial on bayesian optimization (2018)
work page 2018
-
[8]
Neural Computation29(1), 1–49 (2017)
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active in- ference: A process theory. Neural Computation29(1), 1–49 (2017)
work page 2017
-
[9]
In: USENIX OSDI (2020) 12 Wang et al
Gujarati, A., Karimi, R., Alzayat, S., Hao, W., Kaufmann, A., Vigfusson, Y., Mace, J.: Serving dnns like clockwork: Performance predictability from the bottom up. In: USENIX OSDI (2020) 12 Wang et al
work page 2020
-
[10]
Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Com- puting Systems. Wiley-IEEE Press (2004)
work page 2004
-
[11]
Kang, Y., Hauswald, J., Gao, C., Rovinski, A., Mudge, T., Mars, J., Tang, L.: Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In: Proceedings of the ASPLOS. pp. 615–629 (2017)
work page 2017
-
[12]
In: Proceedings of the European Con- ference on Artificial Intelligence (ECAI)
Lanillos, P., Pages, J., Cheng, G.: Robot self/other distinction: Active inference meets neural networks learning in a mirror. In: Proceedings of the European Con- ference on Artificial Intelligence (ECAI). pp. 2410–2418 (2020)
work page 2020
-
[13]
In: Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets)
Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets). pp. 50–56 (2016)
work page 2016
-
[14]
In: Proceedings of the ACM SIGCOMM Conference
Mao, H., Schwarzkopf, M., Venkatakrishnan, S.B., Meng, Z., Alizadeh, M.: Learn- ing scheduling algorithms for data processing clusters. In: Proceedings of the ACM SIGCOMM Conference. pp. 270–288 (2019)
work page 2019
-
[15]
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication- efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) pp. 1273–1282 (2017)
work page 2017
-
[16]
Journal of Systems and Software137, 491–507 (2018)
Pahl, C., Jamshidi, P., Zimmermann, O.: Microservices: A systematic mapping study. Journal of Systems and Software137, 491–507 (2018)
work page 2018
-
[17]
Parr, T., Pezzulo, G., Friston, K.J.: Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press (2022)
work page 2022
-
[18]
Romero, F., Li, Q., Yadwadkar, N.J., Kozyrakis, C.: Infaas: Automated model-less inference serving. In: USENIX ATC. pp. 397–411 (2021)
work page 2021
-
[19]
Foundations and Trends in Machine Learning11(1), 1–96 (2018)
Russo, D.J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z.: A tutorial on thomp- son sampling. Foundations and Trends in Machine Learning11(1), 1–96 (2018)
work page 2018
-
[20]
Sedlak, B., Furutanpey, A., Wang, Z., Pujol, V.C., Dustdar, S.: Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods (2025)
work page 2025
-
[21]
Sedlak, B., Pujol, V.C., Morichetta, A., Donta, P.K., Dustdar, S.: Adaptive Stream Processing on Edge Devices through Active Inference (Sep 2024)
work page 2024
-
[22]
In: Proceedings of the Advances in Neural Information Processing Systems (NIPS)
Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of ma- chine learning algorithms. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS). pp. 2951–2959 (2012)
work page 2012
-
[23]
In: Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS)
Teerapittayanon, S., McDanel, B., Kung, H.T.: Distributed deep neural networks over the cloud, the edge and end devices. In: Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS). pp. 328–339 (2017)
work page 2017
-
[24]
In: Proceedings of the IEEE Interna- tional Conference on Autonomic Computing (ICAC)
Tesauro, G., Jong, N.K., Das, R., Bennani, M.N.: A hybrid reinforcement learning approach to autonomic resource allocation. In: Proceedings of the IEEE Interna- tional Conference on Autonomic Computing (ICAC). pp. 65–73 (2006)
work page 2006
-
[25]
Xiao, W., Bhardwaj, R., Ramjee, R., Sivathanu, M., Kwatra, N., Han, Z., Patel, P., Peng, X., Zhao, H., Zhang, Q., Yang, F., Zhou, L.: Gandiva: Introspective cluster scheduling for deep learning. In: USENIX OSDI. pp. 595–610 (2018)
work page 2018
-
[26]
In: IEEE/ACM International Symposium on Quality of Service (IWQoS)
Zhao, N., Liang, J., Dovrolis, C., Liu, M.: Self-adaptive microservice chains with deep reinforcement learning. In: IEEE/ACM International Symposium on Quality of Service (IWQoS). pp. 1–10 (2019)
work page 2019
-
[27]
In: Proceedings of the IEEE International Conference on Communications (ICC)
Zhao, T., Zhou, S., Guo, X., Niu, Z.: Tasks scheduling and resource allocation in heterogeneous cloud for delay-bounded mobile edge computing. In: Proceedings of the IEEE International Conference on Communications (ICC). pp. 1–7 (2017)
work page 2017
-
[28]
Proceedings of the IEEE 107(8), 1738–1762 (2019)
Zhou, Z., Chen, X., Li, E., Zeng, L., Luo, K., Zhang, J.: Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proceedings of the IEEE 107(8), 1738–1762 (2019)
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.