Multiple Landmark Detection using Multi-Agent Reinforcement Learning

Amir Alansary; Athanasios Vlontzos; Bernhard Kainz; Daniel Rueckert; Konstantinos Kamnitsas

arxiv: 1907.00318 · v2 · pith:SCPTAUXQnew · submitted 2019-06-30 · 💻 cs.CV

Multiple Landmark Detection using Multi-Agent Reinforcement Learning

Athanasios Vlontzos , Amir Alansary , Konstantinos Kamnitsas , Daniel Rueckert , Bernhard Kainz This is my paper

Pith reviewed 2026-05-25 13:04 UTC · model grok-4.3

classification 💻 cs.CV

keywords anatomical landmark detectionmulti-agent reinforcement learningdeep Q-networkmedical image analysiscollaborative detectionlandmark interdependence

0 comments

The pith

Multi-agent reinforcement learning detects multiple anatomical landmarks simultaneously by sharing knowledge among agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes training K agents together in one Deep Q-Network environment so they locate K different landmarks on the same medical image. The agents collaborate by exchanging accumulated knowledge during training rather than learning in isolation. This setup rests on the idea that landmark positions are interdependent within anatomy, allowing one discovery to inform others. The result is reported as half the detection error of training K agents separately, plus lower overall compute and training time.

Core claim

A single multi-agent DQN environment lets K agents act and learn simultaneously to find K anatomical landmarks; they share experience so that locating one landmark helps deduce the positions of the others, yielding 50 percent lower detection error and reduced training cost compared with the baseline of separate agents.

What carries the argument

Multi-agent Deep Q-Network with implicit inter-communication, where K agents operate on the same image and pool their accumulated knowledge for collective improvement.

If this is right

Detection error drops by 50 percent relative to training K agents independently.
Training requires fewer computational resources than the naive separate-agent baseline.
Training time is shorter than training K agents separately.
Accuracy exceeds that of prior state-of-the-art single-agent or non-collaborative architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same shared-knowledge pattern could be tested on other multi-object localization problems where spatial relations are stable.
Clinical annotation pipelines might reduce manual review time if the agents can flag inconsistent landmark sets automatically.
If the interdependence assumption holds only for certain anatomies, the method could be extended by learning which subsets of landmarks benefit from joint training.

Load-bearing premise

The positions of anatomical landmarks are interdependent and non-random within human anatomy.

What would settle it

A controlled test on a set of landmarks whose positions are known to be statistically independent, showing no accuracy gain for the multi-agent version over separately trained agents.

Figures

Figures reproduced from arXiv: 1907.00318 by Amir Alansary, Athanasios Vlontzos, Bernhard Kainz, Daniel Rueckert, Konstantinos Kamnitsas.

**Figure 2.** Figure 2: Proposed Collaborative DQN for the case of two agents; The [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

The detection of anatomical landmarks is a vital step for medical image analysis and applications for diagnosis, interpretation and guidance. Manual annotation of landmarks is a tedious process that requires domain-specific expertise and introduces inter-observer variability. This paper proposes a new detection approach for multiple landmarks based on multi-agent reinforcement learning. Our hypothesis is that the position of all anatomical landmarks is interdependent and non-random within the human anatomy, thus finding one landmark can help to deduce the location of others. Using a Deep Q-Network (DQN) architecture we construct an environment and agent with implicit inter-communication such that we can accommodate K agents acting and learning simultaneously, while they attempt to detect K different landmarks. During training the agents collaborate by sharing their accumulated knowledge for a collective gain. We compare our approach with state-of-the-art architectures and achieve significantly better accuracy by reducing the detection error by 50%, while requiring fewer computational resources and time to train compared to the naive approach of training K agents separately.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multi-agent DQN for joint landmark detection is a sensible framing but the abstract leaves the actual collaboration mechanism too vague to credit the 50% gain to interdependence.

read the letter

The paper sets up K DQN agents that detect different landmarks at the same time and share accumulated knowledge during training. The hypothesis that anatomical landmarks are interdependent is plausible, and treating detection as a joint rather than independent task is a straightforward way to test it. The comparison to training K separate agents is the right baseline and the reported drop in both error and training cost is the kind of practical result that matters for medical imaging pipelines. That part of the work is useful on its own terms. The soft spot is that the abstract never spells out what the implicit inter-communication actually consists of. There is no description of whether an agent's state includes other agents' positions or Q-values, how the replay buffer is shared, or what the joint reward looks like. Without those details it is impossible to tell whether any measured improvement comes from exploiting anatomical dependence or simply from parameter sharing and joint optimization. The 50% error reduction is also stated without dataset names, error bars, or statistical tests in the abstract, so the size of the effect cannot be judged from what is given here. This is the sort of paper that belongs in a reading group focused on RL applications in medical imaging. A reader already working on landmark detection or multi-task RL would get value from the baseline comparison and the resource numbers even if the collaboration mechanism turns out to be simple. It deserves a serious referee because the experimental claim is concrete and falsifiable once the architecture and data are supplied; the authors can be asked to clarify the communication details and add the missing statistical information.

Referee Report

2 major / 0 minor

Summary. The paper proposes a multi-agent reinforcement learning approach based on DQN for detecting multiple anatomical landmarks simultaneously in medical images. It hypothesizes that landmark positions are interdependent within human anatomy and implements an environment with implicit inter-communication so that K agents can act and learn together while sharing accumulated knowledge. The central empirical claim is a 50% reduction in detection error versus state-of-the-art architectures together with lower computational cost than training K independent agents.

Significance. If the performance and efficiency claims hold after proper verification, the work would demonstrate a practical benefit of multi-agent RL for exploiting anatomical dependencies, offering a more scalable alternative to separate per-landmark detectors in medical image analysis.

major comments (2)

[Abstract] Abstract: the claim of a 50% reduction in detection error versus SOTA is presented without any dataset details, baseline descriptions, value of K, statistical tests, or error bars, rendering the central quantitative result unverifiable.
[Abstract] Abstract: the multi-agent mechanism is described only as 'implicit inter-communication' and 'sharing accumulated knowledge,' with no definition of per-agent state, observation, action space, reward, or how other agents' positions or Q-values enter the learning process; this leaves the interdependence hypothesis untested and prevents causal attribution of any performance gain to the multi-agent design rather than parameter sharing alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that greater specificity is needed to make the central claims verifiable and to clarify the multi-agent formulation. We will revise the abstract accordingly while preserving the manuscript's technical content.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of a 50% reduction in detection error versus SOTA is presented without any dataset details, baseline descriptions, value of K, statistical tests, or error bars, rendering the central quantitative result unverifiable.

Authors: We agree that the abstract should be self-contained on this point. The revised abstract will specify the datasets, the SOTA baselines, the value of K used, and will note that results are reported with error bars and statistical testing as detailed in the experimental section. revision: yes
Referee: [Abstract] Abstract: the multi-agent mechanism is described only as 'implicit inter-communication' and 'sharing accumulated knowledge,' with no definition of per-agent state, observation, action space, reward, or how other agents' positions or Q-values enter the learning process; this leaves the interdependence hypothesis untested and prevents causal attribution of any performance gain to the multi-agent design rather than parameter sharing alone.

Authors: The manuscript defines the per-agent state as local image patches, the action space as discrete movements, the reward as a function of distance to the target landmark, and sharing via a common replay buffer and network weights. The experiments compare against separately trained agents to isolate the benefit of joint learning. The revised abstract will briefly state these elements and reference the comparison that supports attribution to the multi-agent interaction. revision: yes

Circularity Check

0 steps flagged

Empirical RL comparison with no derivation chain or self-referential predictions

full rationale

The paper reports an empirical comparison of multi-agent DQN against SOTA and separate-agents baselines for landmark detection. The abstract and structure contain no equations, fitted parameters presented as predictions, uniqueness theorems, or ansatzes. The hypothesis of landmark interdependence is tested via the experimental setup and measured outcomes (50% error reduction), not derived by construction from the inputs. No load-bearing self-citations or circular reductions are present; the result is a standard empirical claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on one explicit domain assumption about anatomical interdependence; no free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption The position of all anatomical landmarks is interdependent and non-random within the human anatomy
This premise is required for the claimed benefit of inter-agent collaboration.

pith-pipeline@v0.9.0 · 5709 in / 1087 out tokens · 49136 ms · 2026-05-25T13:04:13.955369+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

In: MICCAI 18

Alansary, A., Le Folgoc, L., Vaillant, G., Oktay, O., Li, Y., Bai, W., Passerat- Palmbach, J., Guerrero, R., Kamnitsas, K., Hou, B., McDonagh, S., Glocker, B., Kainz, B., Rueckert, D.: Automatic View Planning with Multi-scale Deep Rein- forcement Learning Agents. In: MICCAI 18. pp. 277–285 (2018)

work page 2018
[2]

Medical Image Analysis 53, 156–164 (2019)

Alansary, A., Oktay, O., Li, Y., Folgoc, L.L., Hou, B., Vaillant, G., Kamnitsas, K., Vlontzos, A., Glocker, B., Kainz, B., Rueckert, D.: Evaluating reinforcement learning agents for anatomical landmark detection. Medical Image Analysis 53, 156–164 (2019)

work page 2019
[3]

737–744 (1993)

Bromley, J., Guyon, I., LeCun, Y., S¨ ackinger, E., Shah, R.: Signature veriﬁcation using a ”siamese” time delay neural network pp. 737–744 (1993)

work page 1993
[4]

In: NIPS 29

Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: NIPS 29. pp. 2137–2145 (2016)

work page 2016
[5]

In: Proc

Foerster, J., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., Mordatch, I.: Learning with opponent-learning awareness. In: Proc. 17th Intl. Conf. on Au- tonomous Agents and MultiAgent Systems. pp. 122–130. AAMAS ’18 (2018)

work page 2018
[6]

Medical Image Analysis 23(1), 70 – 83 (2015)

Gauriau, R., Cuingnet, R., Lesage, D., Bloch, I.: Multi-organ localization with cascaded global-to-local regression and shape prior. Medical Image Analysis 23(1), 70 – 83 (2015)

work page 2015
[7]

IEEE PAMI 41(1), 176–189 (Jan 2019)

Ghesu, F., Georgescu, B., Zheng, Y., Grbic, S., Maier, A., Hornegger, J., Co- maniciu, D.: Multi-scale deep reinforcement learning for real-time 3d-landmark detection in ct scans. IEEE PAMI 41(1), 176–189 (Jan 2019)

work page 2019
[8]

In: MICCAI 2016

Ghesu, F.C., Georgescu, B., Mansi, T., Neumann, D., Hornegger, J., Comaniciu, D.: An artiﬁcial agent for anatomical landmark detection in medical images. In: MICCAI 2016. pp. 229–237. Springer, Cham (2016)

work page 2016
[9]

EAAI (2015)

Girard, J., Emami, R.: Concurrent markov decision processes for robot team learn- ing. EAAI (2015)

work page 2015
[10]

In: Autonomous Agents and Multiagent Systems

Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Autonomous Agents and Multiagent Systems. pp. 66–83. Springer (2017) Multiple Landmark Detection using MARL 9

work page 2017
[11]

NIPS (1995)

Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement learning algorithm for par- tially observable markov decision problems. NIPS (1995)

work page 1995
[12]

Journal of Magnetic Resonance Imaging 27(4), 685–691 (2008)

Jack Jr, C.R., et al.: The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging 27(4), 685–691 (2008)

work page 2008
[13]

In: Proceedings 21st International Conference, Granada, Spain, September 1620, 2018, Part I

Li, Y., Alansary, A., Cerrolaza, J., Khanal, B., Sinclair, M., Matthew, J., Gupta, C., Knight, C., Kainz, B., Rueckert, D.: Fast multiple landmark localisation using a patch-based iterative network. In: Proceedings 21st International Conference, Granada, Spain, September 1620, 2018, Part I. pp. 563–571 (Sep 2018)

work page 2018
[14]

Journal of cardiovascular magnetic resonance 16(1), 16 (2014)

de Marvao, A., Dawes, T.J., Shi, W., Minas, C., Keenan, N.G., Diamond, T., Durighel, G., Montana, G., Rueckert, D., Cook, S.A., et al.: Population-based stud- ies of myocardial hypertrophy: high resolution cardiovascular magnetic resonance atlases improve statistical power. Journal of cardiovascular magnetic resonance 16(1), 16 (2014)

work page 2014
[15]

Nature 518, 529 (Feb 2015)

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529 (Feb 2015)

work page 2015
[16]

IEEE Transactions on Medical Imaging 36(1), 332–342 (Jan 2017)

Oktay, O., Bai, W., Guerrero, R., Rajchl, M., de Marvao, A., ORegan, D.P., Cook, S.A., Heinrich, M.P., Glocker, B., Rueckert, D.: Stratiﬁed decision forests for ac- curate anatomical landmark localization in cardiac images. IEEE Transactions on Medical Imaging 36(1), 332–342 (Jan 2017)

work page 2017
[17]

In: 2012 IEEE 36th Annual Computer Software and Applications Conference

Rahmatullah, B., Papageorghiou, A.T., Noble, J.A.: Image analysis using machine learning: Anatomical landmarks detection in fetal ultrasound images. In: 2012 IEEE 36th Annual Computer Software and Applications Conference. pp. 354–355 (July 2012)

work page 2012
[18]

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J.N., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforce- ment learning. CoRR abs/1803.11485 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

In: MICCAI 2015

Zheng, Y., Liu, D., Georgescu, B., Nguyen, H., Comaniciu, D.: 3d deep learning for eﬃcient and robust landmark detection in volumetric data. In: MICCAI 2015. pp. 565–572. Springer (2015)

work page 2015

[1] [1]

In: MICCAI 18

Alansary, A., Le Folgoc, L., Vaillant, G., Oktay, O., Li, Y., Bai, W., Passerat- Palmbach, J., Guerrero, R., Kamnitsas, K., Hou, B., McDonagh, S., Glocker, B., Kainz, B., Rueckert, D.: Automatic View Planning with Multi-scale Deep Rein- forcement Learning Agents. In: MICCAI 18. pp. 277–285 (2018)

work page 2018

[2] [2]

Medical Image Analysis 53, 156–164 (2019)

Alansary, A., Oktay, O., Li, Y., Folgoc, L.L., Hou, B., Vaillant, G., Kamnitsas, K., Vlontzos, A., Glocker, B., Kainz, B., Rueckert, D.: Evaluating reinforcement learning agents for anatomical landmark detection. Medical Image Analysis 53, 156–164 (2019)

work page 2019

[3] [3]

737–744 (1993)

Bromley, J., Guyon, I., LeCun, Y., S¨ ackinger, E., Shah, R.: Signature veriﬁcation using a ”siamese” time delay neural network pp. 737–744 (1993)

work page 1993

[4] [4]

In: NIPS 29

Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: NIPS 29. pp. 2137–2145 (2016)

work page 2016

[5] [5]

In: Proc

Foerster, J., Chen, R.Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., Mordatch, I.: Learning with opponent-learning awareness. In: Proc. 17th Intl. Conf. on Au- tonomous Agents and MultiAgent Systems. pp. 122–130. AAMAS ’18 (2018)

work page 2018

[6] [6]

Medical Image Analysis 23(1), 70 – 83 (2015)

Gauriau, R., Cuingnet, R., Lesage, D., Bloch, I.: Multi-organ localization with cascaded global-to-local regression and shape prior. Medical Image Analysis 23(1), 70 – 83 (2015)

work page 2015

[7] [7]

IEEE PAMI 41(1), 176–189 (Jan 2019)

Ghesu, F., Georgescu, B., Zheng, Y., Grbic, S., Maier, A., Hornegger, J., Co- maniciu, D.: Multi-scale deep reinforcement learning for real-time 3d-landmark detection in ct scans. IEEE PAMI 41(1), 176–189 (Jan 2019)

work page 2019

[8] [8]

In: MICCAI 2016

Ghesu, F.C., Georgescu, B., Mansi, T., Neumann, D., Hornegger, J., Comaniciu, D.: An artiﬁcial agent for anatomical landmark detection in medical images. In: MICCAI 2016. pp. 229–237. Springer, Cham (2016)

work page 2016

[9] [9]

EAAI (2015)

Girard, J., Emami, R.: Concurrent markov decision processes for robot team learn- ing. EAAI (2015)

work page 2015

[10] [10]

In: Autonomous Agents and Multiagent Systems

Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Autonomous Agents and Multiagent Systems. pp. 66–83. Springer (2017) Multiple Landmark Detection using MARL 9

work page 2017

[11] [11]

NIPS (1995)

Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement learning algorithm for par- tially observable markov decision problems. NIPS (1995)

work page 1995

[12] [12]

Journal of Magnetic Resonance Imaging 27(4), 685–691 (2008)

Jack Jr, C.R., et al.: The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging 27(4), 685–691 (2008)

work page 2008

[13] [13]

In: Proceedings 21st International Conference, Granada, Spain, September 1620, 2018, Part I

Li, Y., Alansary, A., Cerrolaza, J., Khanal, B., Sinclair, M., Matthew, J., Gupta, C., Knight, C., Kainz, B., Rueckert, D.: Fast multiple landmark localisation using a patch-based iterative network. In: Proceedings 21st International Conference, Granada, Spain, September 1620, 2018, Part I. pp. 563–571 (Sep 2018)

work page 2018

[14] [14]

Journal of cardiovascular magnetic resonance 16(1), 16 (2014)

de Marvao, A., Dawes, T.J., Shi, W., Minas, C., Keenan, N.G., Diamond, T., Durighel, G., Montana, G., Rueckert, D., Cook, S.A., et al.: Population-based stud- ies of myocardial hypertrophy: high resolution cardiovascular magnetic resonance atlases improve statistical power. Journal of cardiovascular magnetic resonance 16(1), 16 (2014)

work page 2014

[15] [15]

Nature 518, 529 (Feb 2015)

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529 (Feb 2015)

work page 2015

[16] [16]

IEEE Transactions on Medical Imaging 36(1), 332–342 (Jan 2017)

Oktay, O., Bai, W., Guerrero, R., Rajchl, M., de Marvao, A., ORegan, D.P., Cook, S.A., Heinrich, M.P., Glocker, B., Rueckert, D.: Stratiﬁed decision forests for ac- curate anatomical landmark localization in cardiac images. IEEE Transactions on Medical Imaging 36(1), 332–342 (Jan 2017)

work page 2017

[17] [17]

In: 2012 IEEE 36th Annual Computer Software and Applications Conference

Rahmatullah, B., Papageorghiou, A.T., Noble, J.A.: Image analysis using machine learning: Anatomical landmarks detection in fetal ultrasound images. In: 2012 IEEE 36th Annual Computer Software and Applications Conference. pp. 354–355 (July 2012)

work page 2012

[18] [18]

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J.N., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforce- ment learning. CoRR abs/1803.11485 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

In: MICCAI 2015

Zheng, Y., Liu, D., Georgescu, B., Nguyen, H., Comaniciu, D.: 3d deep learning for eﬃcient and robust landmark detection in volumetric data. In: MICCAI 2015. pp. 565–572. Springer (2015)

work page 2015