Meta-Offline and Distributional Multi-Agent RL for Risk-Aware Decision-Making

Eslam Eldeeb; Hirley Alves

arxiv: 2501.16098 · v2 · submitted 2025-01-27 · 💻 cs.MA

Meta-Offline and Distributional Multi-Agent RL for Risk-Aware Decision-Making

Eslam Eldeeb , Hirley Alves This is my paper

Pith reviewed 2026-05-23 05:05 UTC · model grok-4.3

classification 💻 cs.MA

keywords meta-offline MARLdistributional reinforcement learningrisk-aware decision-makingUAV networksconservative Q-learningquantile regression DQNmodel-agnostic meta-learning

0 comments

The pith

M-CQR integrates conservative Q-learning, quantile regression and meta-learning to reach faster convergence in risk-sensitive multi-agent UAV tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a meta-offline distributional multi-agent reinforcement learning algorithm called M-CQR that fuses conservative Q-learning for safe offline training, quantile regression DQN for risk-sensitive value estimates, and model-agnostic meta-learning for quick adaptation to new conditions. It applies this framework to UAV-assisted IoT networks that face changing topologies and uncertain channels. Two versions are presented, with the CTDE variant reported to converge up to 50 percent faster than standard multi-agent RL baselines while improving scalability and robustness for risk-aware choices.

Core claim

The paper claims that the meta-conservative quantile regression (M-CQR) algorithm, specifically its meta-CTDE-CQR variant, achieves up to 50 percent faster convergence and outperforms baseline MARL methods by combining conservative Q-learning for safe offline learning, quantile regression DQN for risk-sensitive values, and MAML for rapid adaptation in a UAV communication scenario.

What carries the argument

The M-CQR algorithm that merges conservative Q-learning, quantile regression DQN, and model-agnostic meta-learning into one meta-offline distributional multi-agent RL framework.

If this is right

The method improves scalability for larger numbers of agents in dynamic environments.
It increases robustness to uncertain communication channels.
It enables quicker adaptation when network topologies change.
It supports safer risk-sensitive decisions in mission-critical multi-agent applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same combination pattern could be tested in other uncertain multi-agent domains such as vehicle fleets or sensor networks.
Performance would likely depend on having a sufficiently diverse offline dataset collected under realistic risk conditions.
Hardware experiments with actual UAVs would be required to check whether simulation gains translate to physical settings.

Load-bearing premise

The three components of conservative Q-learning, quantile regression, and meta-learning can be combined without one canceling the benefits of the others in the UAV setting.

What would settle it

A direct comparison run in the described UAV IoT scenario where M-CTDE-CQR shows no faster convergence or no performance gain over baselines would disprove the central performance claim.

Figures

Figures reproduced from arXiv: 2501.16098 by Eslam Eldeeb, Hirley Alves.

**Figure 2.** Figure 2: Convergence performance of the proposed algorithm compared to the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The effect of model parameters: (a) dataset size effect, (b) training [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Mission critical applications, such as UAV-assisted IoT networks require risk-aware decision-making under dynamic topologies and uncertain channels. We propose meta-conservative quantile regression (M-CQR), a meta-offline distributional MARL algorithm that integrates conservative Q-learning (CQL) for safe offline learning, quantile regression DQN (QR-DQN) for risk-sensitive value estimation, and model-agnostic meta-learning (MAML) for rapid adaptation. Two variants are developed: meta-independent CQR (M-I-CQR) and meta-CTDE-CQR. In a UAV-based communication scenario, M-CTDE-CQR achieves up to 50% faster convergence and outperforms baseline MARL methods, offering improved scalability, robustness, and adaptability for risk-sensitive decision-making. Code is available at https://github.com/Eslam211/MA_Meta_ODRL

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper names a new combination of CQL, QR-DQN, and MAML for offline distributional MARL in UAV settings, but the 50% convergence claim rests on experiments whose protocol and ablations are not visible.

read the letter

The new piece is the M-CQR construction and its two variants, M-I-CQR and M-CTDE-CQR. It takes three existing pieces—conservative Q-learning for offline safety, quantile regression for distributional risk values, and MAML for fast adaptation—and packages them for multi-agent risk-aware decisions under changing UAV topologies. That specific named integration is not in the earlier literature the abstract cites, so the algorithmic step counts as a legitimate extension inside the current program rather than a first-principles advance. The GitHub link is also useful; anyone can pull the code and see the joint loss or the CTDE implementation directly. That is the main concrete value on offer. The application to UAV-assisted IoT networks with uncertain channels gives the work a narrow but real target, which is better than pure toy environments. The central empirical statement is that M-CTDE-CQR reaches up to 50% faster convergence and beats standard MARL baselines on scalability and robustness. Nothing in the abstract or the stress-test note shows the experimental protocol, baseline definitions, statistical tests, or component ablations that would let a reader judge whether that number holds. The integration premise itself is untested in the supplied text: CQL’s penalty could blunt the quantile estimates that MAML needs for quick updates, or the distributional outputs could make meta-gradients unstable across agents. Without those checks, the performance claim cannot be evaluated. The paper is therefore an honest but limited-scope extension. Readers already working on offline or distributional MARL for communication networks could extract the formulation and the code for their own experiments. It is not yet at the stage where I would cite it, because the key result is still unverified. A serious editor could send the full manuscript to referees if the experiments section supplies the missing protocol, ablations, and statistical detail; otherwise the central claim stays unevaluable. I would put it on a reading-group list only to walk through the joint loss and see whether the three components actually compose without interference.

Referee Report

2 major / 2 minor

Summary. The paper proposes meta-conservative quantile regression (M-CQR), a meta-offline distributional multi-agent RL algorithm that combines conservative Q-learning (CQL) for safe offline learning, quantile regression DQN (QR-DQN) for risk-sensitive value estimation, and model-agnostic meta-learning (MAML) for rapid adaptation. Two variants are introduced (M-I-CQR and M-CTDE-CQR) and evaluated in a UAV-assisted IoT communication scenario with dynamic topologies and uncertain channels, where M-CTDE-CQR is reported to achieve up to 50% faster convergence and outperform baseline MARL methods.

Significance. If the empirical claims hold under proper controls and the component integration is shown to be non-destructive, the work could advance risk-aware MARL by demonstrating a practical combination of offline conservatism, distributional risk modeling, and meta-adaptation for mission-critical dynamic environments. Code availability is noted as a reproducibility strength.

major comments (2)

[Abstract] Abstract: the central claim of up to 50% faster convergence and outperformance is stated without any description of the experimental protocol, baseline definitions, statistical measures, number of runs, or ablation results, rendering the empirical contribution unevaluable from the provided text.
[Proposed Method] Proposed Method (integration of CQL, QR-DQN, and MAML): no joint loss function, hyperparameter schedule, or analysis of potential interference (e.g., CQL conservatism suppressing MAML adaptation gradients or quantile outputs destabilizing meta-updates) is supplied, which is load-bearing for the claimed net performance gains in uncertain UAV channels.

minor comments (2)

[Abstract] The abstract and title use 'M-CQR' while the body refers to 'M-CTDE-CQR'; consistent naming would improve clarity.
[Abstract] The GitHub link is provided but no details on which variant or hyperparameters are released; this is a minor reproducibility note.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We address the major comments below and commit to making the necessary revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of up to 50% faster convergence and outperformance is stated without any description of the experimental protocol, baseline definitions, statistical measures, number of runs, or ablation results, rendering the empirical contribution unevaluable from the provided text.

Authors: We agree that the abstract would benefit from additional context on the experimental evaluation. In the revised manuscript, we will modify the abstract to include a brief mention of the UAV simulation setup, the baselines compared (including standard MARL methods), the number of independent runs (e.g., 5 seeds), and the use of mean and standard deviation for reporting performance. Full ablation studies and statistical details will be retained and expanded in Section 5. revision: yes
Referee: [Proposed Method] Proposed Method (integration of CQL, QR-DQN, and MAML): no joint loss function, hyperparameter schedule, or analysis of potential interference (e.g., CQL conservatism suppressing MAML adaptation gradients or quantile outputs destabilizing meta-updates) is supplied, which is load-bearing for the claimed net performance gains in uncertain UAV channels.

Authors: We recognize the importance of detailing the integration. The original manuscript presented the components separately but omitted the combined objective. We will introduce the joint loss function explicitly in the revised Section 4, along with the hyperparameter annealing schedule for the conservatism coefficient and quantile levels. Additionally, we will add a subsection analyzing potential gradient interference, supported by gradient norm plots from our experiments showing that the components do not destructively interfere in the UAV channel setting. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on simulations, not derivations reducing to inputs

full rationale

The paper proposes the M-CQR algorithm by integrating three existing components (CQL, QR-DQN, MAML) and reports empirical performance gains (e.g., 50% faster convergence) from UAV simulations. No equations, predictions, or first-principles results are presented that reduce by construction to fitted parameters, self-definitions, or self-citation chains. The central claims are simulation outcomes rather than theoretical derivations, making the derivation chain self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone.

pith-pipeline@v0.9.0 · 5670 in / 1025 out tokens · 39188 ms · 2026-05-23T05:05:31.999944+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

integrates conservative Q-learning (CQL) for safe offline learning, quantile regression DQN (QR-DQN) for risk-sensitive value estimation, and model-agnostic meta-learning (MAML)
IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

M-CTDE-CQR achieves up to 50% faster convergence

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 1 internal anchor

[1]

Machine learning for large-scale optimization in 6G wireless networks,

Y . Shi, L. Lian, Y . Shi, Z. Wang, Y . Zhou, L. Fu, L. Bai, J. Zhang, and W. Zhang, “Machine learning for large-scale optimization in 6G wireless networks,” IEEE Communications Surveys & Tutorials , vol. 25, no. 4, pp. 2088–2132, 2023

work page 2088
[2]

Applications of deep reinforcement learning in communications and networking: A survey,

N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y .-C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey,” IEEE Communications Surveys & Tutorials , vol. 21, no. 4, pp. 3133–3174, 2019

work page 2019
[3]

A Tutorial on UA Vs for Wireless Networks: Applications, Challenges, and Open Problems,

M. Mozaffari, W. Saad, M. Bennis, Y .-H. Nam, and M. Debbah, “A Tutorial on UA Vs for Wireless Networks: Applications, Challenges, and Open Problems,” IEEE Communications Surveys & Tutorials , vol. 21, no. 3, pp. 2334–2360, 2019

work page 2019
[4]

Age of information: A new concept, metric, and tool,

A. Kosta, N. Pappas, and V . Angelakis, “Age of information: A new concept, metric, and tool,” F oundations and Trends in Networking, Now Publishers, Inc. , 2017

work page 2017
[5]

Offline and distributional reinforcement learn- ing for wireless communications,

E. Eldeeb and H. Alves, “Offline and distributional reinforcement learn- ing for wireless communications,” IEEE Communications Magazine , pp. 1–7, 2025

work page 2025
[6]

Deep reinforcement learning for Internet of Things: A comprehensive survey,

W. Chen, X. Qiu, T. Cai, H.-N. Dai, Z. Zheng, and Y . Zhang, “Deep reinforcement learning for Internet of Things: A comprehensive survey,” IEEE Communications Surveys & Tutorials , vol. 23, no. 3, 2021

work page 2021
[7]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005
[8]

Meta- Reinforcement Learning Based Resource Allocation for Dynamic V2X Communications,

Y . Yuan, G. Zheng, K.-K. Wong, and K. B. Letaief, “Meta- Reinforcement Learning Based Resource Allocation for Dynamic V2X Communications,” IEEE Transactions on V ehicular Technology, vol. 70, no. 9, 2021

work page 2021
[9]

Conservative Q-Learning for Offline Reinforcement Learning,

A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative Q-Learning for Offline Reinforcement Learning,” in NeurIPS, vol. 33, 2020, pp. 1179–1191

work page 2020
[10]

Model-agnostic meta-learning for fast adaptation of deep networks,

S. L. Chelsea Finn, Pieter Abbeel, “Model-agnostic meta-learning for fast adaptation of deep networks,” 34th International Conference on Machine Learning , vol. 70, pp. 1126–1135, 2017

work page 2017
[11]

Conser- vative and risk-aware offline multi-agent reinforcement learning,

E. Eldeeb, H. Sifaou, O. Simeone, M. Shehab, and H. Alves, “Conser- vative and risk-aware offline multi-agent reinforcement learning,” IEEE Transactions on Cognitive Communications and Networking , 2024

work page 2024
[12]

Offline reinforcement learning for wireless network optimization with mixture datasets,

K. Yang, C. Shi, C. Shen, J. Yang, S.-P. Yeh, and J. J. Sydir, “Offline reinforcement learning for wireless network optimization with mixture datasets,” IEEE Transactions on Wireless Communications , vol. 23, no. 10, pp. 12 703–12 716, 2024

work page 2024
[13]

Trajectory design for unmanned aerial vehicles via meta-reinforcement learning,

Z. Lu, X. Wang, and M. C. Gursoy, “Trajectory design for unmanned aerial vehicles via meta-reinforcement learning,” in IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , 2023, pp. 1–6

work page 2023
[14]

Distributed multi- agent meta learning for trajectory design in wireless drone networks,

Y . Hu, M. Chen, W. Saad, H. V . Poor, and S. Cui, “Distributed multi- agent meta learning for trajectory design in wireless drone networks,” IEEE Journal on Selected Areas in Communications , vol. 39, no. 10, pp. 3177–3192, 2021

work page 2021
[15]

Age and power minimization via meta-deep reinforcement learning in UA V networks,

S. Sarathchandra, E. Eldeeb, M. Shehab, H. Alves, K. Mikhaylov, and M.-S. Alouini, “Age and power minimization via meta-deep reinforcement learning in UA V networks,” 2025. [Online]. Available: https://arxiv.org/abs/2501.14603

work page arXiv 2025
[16]

Deep reinforcement learning for fresh data collection in UA V-assisted IoT networks,

M. Yi, X. Wang, J. Liu, Y . Zhang, and B. Bai, “Deep reinforcement learning for fresh data collection in UA V-assisted IoT networks,” in IEEE INFOCOM Workshops 2020 , 2020, pp. 716–721

work page 2020
[17]

Deep reinforcement learning for minimizing age-of-information in UA V- assisted networks,

M. A. Abd-Elmagid, A. Ferdowsi, H. S. Dhillon, and W. Saad, “Deep reinforcement learning for minimizing age-of-information in UA V- assisted networks,” in 2019 IEEE GLOBECOM , 2019, pp. 1–6

work page 2019
[18]

Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification,

L. Pan, L. Huang, T. Ma, and H. Xu, “Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification,” in International Conference on Machine Learning . PMLR, 2022, pp. 17 221–17 237

work page 2022
[19]

Value-decomposition networks for cooperative multi-agent learning based on team reward,

P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V . Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems , 2018

work page 2018
[20]

MetaGraphLoc: A graph-based meta-learning scheme for indoor localization via sensor fusion,

Y . Etiabi, E. Eldeeb, M. Shehab, W. Njima, H. Alves, M.-S. Alouini, and E. M. Amhoud, “MetaGraphLoc: A graph-based meta-learning scheme for indoor localization via sensor fusion,” 2024. [Online]. Available: https://arxiv.org/abs/2411.17781

work page arXiv 2024

[1] [1]

Machine learning for large-scale optimization in 6G wireless networks,

Y . Shi, L. Lian, Y . Shi, Z. Wang, Y . Zhou, L. Fu, L. Bai, J. Zhang, and W. Zhang, “Machine learning for large-scale optimization in 6G wireless networks,” IEEE Communications Surveys & Tutorials , vol. 25, no. 4, pp. 2088–2132, 2023

work page 2088

[2] [2]

Applications of deep reinforcement learning in communications and networking: A survey,

N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y .-C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey,” IEEE Communications Surveys & Tutorials , vol. 21, no. 4, pp. 3133–3174, 2019

work page 2019

[3] [3]

A Tutorial on UA Vs for Wireless Networks: Applications, Challenges, and Open Problems,

M. Mozaffari, W. Saad, M. Bennis, Y .-H. Nam, and M. Debbah, “A Tutorial on UA Vs for Wireless Networks: Applications, Challenges, and Open Problems,” IEEE Communications Surveys & Tutorials , vol. 21, no. 3, pp. 2334–2360, 2019

work page 2019

[4] [4]

Age of information: A new concept, metric, and tool,

A. Kosta, N. Pappas, and V . Angelakis, “Age of information: A new concept, metric, and tool,” F oundations and Trends in Networking, Now Publishers, Inc. , 2017

work page 2017

[5] [5]

Offline and distributional reinforcement learn- ing for wireless communications,

E. Eldeeb and H. Alves, “Offline and distributional reinforcement learn- ing for wireless communications,” IEEE Communications Magazine , pp. 1–7, 2025

work page 2025

[6] [6]

Deep reinforcement learning for Internet of Things: A comprehensive survey,

W. Chen, X. Qiu, T. Cai, H.-N. Dai, Z. Zheng, and Y . Zhang, “Deep reinforcement learning for Internet of Things: A comprehensive survey,” IEEE Communications Surveys & Tutorials , vol. 23, no. 3, 2021

work page 2021

[7] [7]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005

[8] [8]

Meta- Reinforcement Learning Based Resource Allocation for Dynamic V2X Communications,

Y . Yuan, G. Zheng, K.-K. Wong, and K. B. Letaief, “Meta- Reinforcement Learning Based Resource Allocation for Dynamic V2X Communications,” IEEE Transactions on V ehicular Technology, vol. 70, no. 9, 2021

work page 2021

[9] [9]

Conservative Q-Learning for Offline Reinforcement Learning,

A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative Q-Learning for Offline Reinforcement Learning,” in NeurIPS, vol. 33, 2020, pp. 1179–1191

work page 2020

[10] [10]

Model-agnostic meta-learning for fast adaptation of deep networks,

S. L. Chelsea Finn, Pieter Abbeel, “Model-agnostic meta-learning for fast adaptation of deep networks,” 34th International Conference on Machine Learning , vol. 70, pp. 1126–1135, 2017

work page 2017

[11] [11]

Conser- vative and risk-aware offline multi-agent reinforcement learning,

E. Eldeeb, H. Sifaou, O. Simeone, M. Shehab, and H. Alves, “Conser- vative and risk-aware offline multi-agent reinforcement learning,” IEEE Transactions on Cognitive Communications and Networking , 2024

work page 2024

[12] [12]

Offline reinforcement learning for wireless network optimization with mixture datasets,

K. Yang, C. Shi, C. Shen, J. Yang, S.-P. Yeh, and J. J. Sydir, “Offline reinforcement learning for wireless network optimization with mixture datasets,” IEEE Transactions on Wireless Communications , vol. 23, no. 10, pp. 12 703–12 716, 2024

work page 2024

[13] [13]

Trajectory design for unmanned aerial vehicles via meta-reinforcement learning,

Z. Lu, X. Wang, and M. C. Gursoy, “Trajectory design for unmanned aerial vehicles via meta-reinforcement learning,” in IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , 2023, pp. 1–6

work page 2023

[14] [14]

Distributed multi- agent meta learning for trajectory design in wireless drone networks,

Y . Hu, M. Chen, W. Saad, H. V . Poor, and S. Cui, “Distributed multi- agent meta learning for trajectory design in wireless drone networks,” IEEE Journal on Selected Areas in Communications , vol. 39, no. 10, pp. 3177–3192, 2021

work page 2021

[15] [15]

Age and power minimization via meta-deep reinforcement learning in UA V networks,

S. Sarathchandra, E. Eldeeb, M. Shehab, H. Alves, K. Mikhaylov, and M.-S. Alouini, “Age and power minimization via meta-deep reinforcement learning in UA V networks,” 2025. [Online]. Available: https://arxiv.org/abs/2501.14603

work page arXiv 2025

[16] [16]

Deep reinforcement learning for fresh data collection in UA V-assisted IoT networks,

M. Yi, X. Wang, J. Liu, Y . Zhang, and B. Bai, “Deep reinforcement learning for fresh data collection in UA V-assisted IoT networks,” in IEEE INFOCOM Workshops 2020 , 2020, pp. 716–721

work page 2020

[17] [17]

Deep reinforcement learning for minimizing age-of-information in UA V- assisted networks,

M. A. Abd-Elmagid, A. Ferdowsi, H. S. Dhillon, and W. Saad, “Deep reinforcement learning for minimizing age-of-information in UA V- assisted networks,” in 2019 IEEE GLOBECOM , 2019, pp. 1–6

work page 2019

[18] [18]

Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification,

L. Pan, L. Huang, T. Ma, and H. Xu, “Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification,” in International Conference on Machine Learning . PMLR, 2022, pp. 17 221–17 237

work page 2022

[19] [19]

Value-decomposition networks for cooperative multi-agent learning based on team reward,

P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V . Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems , 2018

work page 2018

[20] [20]

MetaGraphLoc: A graph-based meta-learning scheme for indoor localization via sensor fusion,

Y . Etiabi, E. Eldeeb, M. Shehab, W. Njima, H. Alves, M.-S. Alouini, and E. M. Amhoud, “MetaGraphLoc: A graph-based meta-learning scheme for indoor localization via sensor fusion,” 2024. [Online]. Available: https://arxiv.org/abs/2411.17781

work page arXiv 2024