Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction

Abhishek Dharmaratnakar; Anushree Sinha; Debanshu Das; Srivaths Ranganathan

arxiv: 2604.21154 · v1 · submitted 2026-04-22 · 💻 cs.AI

Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction

Abhishek Dharmaratnakar , Srivaths Ranganathan , Anushree Sinha , Debanshu Das This is my paper

Pith reviewed 2026-05-09 23:38 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-agent systemsgenerative AIphysiotherapypose estimationtele-rehabilitationpersonalized medicinevideo generationreal-time feedback

0 comments

The pith

A multi-agent AI system generates personalized physiotherapy videos from medical notes and delivers real-time pose corrections at home.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

At-home physiotherapy suffers from low compliance because patients miss the personalized supervision and immediate feedback that in-person sessions provide. This paper presents a framework of four AI agents that together parse doctor notes into movement rules, generate custom exercise videos using generative models, estimate poses from video in real time, and issue corrective guidance. By combining these capabilities, the system aims to replicate key elements of supervised rehab in the patient's own environment. If successful, it would allow safe, tailored exercises without relying on fixed video libraries or generic avatars.

Core claim

The paper establishes that a multi-agent architecture can integrate clinical constraint extraction, generative video synthesis, vision-based pose tracking, and diagnostic feedback to enable personalized, dynamic physiotherapy training and correction, thereby addressing the limitations of static digital health tools.

What carries the argument

A four-agent multi-agent system where the Clinical Extraction Agent converts unstructured notes into kinematic constraints, the Video Synthesis Agent creates patient-specific videos, the Vision Processing Agent handles real-time pose estimation, and the Diagnostic Feedback Agent provides corrections.

Load-bearing premise

Generative models will produce exercises that respect all patient-specific injury limits and that pose estimation will be accurate enough in everyday home conditions to prevent harmful advice.

What would settle it

A trial in which users following the AI-generated videos and corrections experience lower compliance rates or sustain injuries compared to standard care or no intervention.

Figures

Figures reproduced from arXiv: 2604.21154 by Abhishek Dharmaratnakar, Anushree Sinha, Debanshu Das, Srivaths Ranganathan.

**Figure 2.** Figure 2: Example of the generated patient interface: The Video Synthesis [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

At-home physiotherapy compliance remains critically low due to a lack of personalized supervision and dynamic feedback. Existing digital health solutions rely on static, pre-recorded video libraries or generic 3D avatars that fail to account for a patient's specific injury limitations or home environment. In this paper, we propose a novel Multi-Agent System (MAS) architecture that leverages Generative AI and computer vision to close the tele-rehabilitation loop. Our framework consists of four specialized micro-agents: a Clinical Extraction Agent that parses unstructured medical notes into kinematic constraints; a Video Synthesis Agent that utilizes foundational video generation models to create personalized, patient-specific exercise videos; a Vision Processing Agent for real-time pose estimation; and a Diagnostic Feedback Agent that issues corrective instructions. We present the system architecture, detail the prototype pipeline using Large Language Models and MediaPipe, and outline our clinical evaluation plan. This work demonstrates the feasibility of combining generative media with agentic autonomous decision-making to scale personalized patient care safely and effectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a multi-agent system (MAS) architecture for at-home physiotherapy that combines generative AI for personalized exercise video synthesis with real-time computer vision for pose correction. It details four specialized agents—a Clinical Extraction Agent (LLM-based parsing of medical notes into kinematic constraints), a Video Synthesis Agent (foundational video models for patient-specific videos), a Vision Processing Agent (MediaPipe for pose estimation), and a Diagnostic Feedback Agent (corrective instructions)—along with a prototype pipeline and a high-level clinical evaluation plan. The paper asserts that this framework demonstrates the feasibility of agentic generative media for scaling safe, personalized tele-rehabilitation.

Significance. If the proposed agents can be shown to reliably generate kinematically valid and injury-safe exercises while providing accurate home-environment feedback, the work could meaningfully advance digital health by addressing low physiotherapy compliance through adaptive, constraint-aware supervision. The integration of LLMs, generative video, and vision agents represents a coherent system-level idea with potential applicability beyond physiotherapy, but the manuscript contains no empirical results to establish this potential.

major comments (2)

[Abstract] Abstract: The assertion that 'This work demonstrates the feasibility of combining generative media with agentic autonomous decision-making to scale personalized patient care safely and effectively' is unsupported. The manuscript supplies only an architectural description, a prototype outline using LLMs and MediaPipe, and an evaluation plan, with no quantitative results, error rates, kinematic validation, or safety analysis of generated videos or autonomous corrections.
[System Architecture / Prototype Pipeline] Video Synthesis Agent and Vision Processing Agent descriptions: The central feasibility claim depends on untested assumptions that generative video models will produce kinematically correct, injury-safe exercises from parsed constraints and that MediaPipe-based pose estimation will be sufficiently accurate in uncontrolled home settings for safe autonomous feedback; no preliminary tests, failure-mode analysis, or constraint-enforcement mechanisms are provided to address these risks.

minor comments (2)

The interaction and communication protocol among the four agents (e.g., how the Diagnostic Feedback Agent receives real-time input from the Vision Processing Agent or resolves conflicts with the Clinical Extraction Agent) is described at a high level only, reducing clarity on the closed-loop operation.
The clinical evaluation plan is outlined without specifying primary outcome metrics (e.g., kinematic error thresholds, patient adherence rates, or adverse event monitoring), making it difficult to assess how feasibility would be rigorously tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments that correctly identify overstatements in the current manuscript. We agree the work is an architectural proposal with a prototype outline and evaluation plan rather than an empirical demonstration, and we will revise the text to reflect this accurately without claiming unsupported results.

read point-by-point responses

Referee: [Abstract] The assertion that 'This work demonstrates the feasibility of combining generative media with agentic autonomous decision-making to scale personalized patient care safely and effectively' is unsupported. The manuscript supplies only an architectural description, a prototype outline using LLMs and MediaPipe, and an evaluation plan, with no quantitative results, error rates, kinematic validation, or safety analysis of generated videos or autonomous corrections.

Authors: We agree that the abstract claim is unsupported by the manuscript content. The paper describes a proposed MAS architecture, details a prototype pipeline, and outlines a future clinical evaluation but presents no experimental data. In revision we will replace the final sentence of the abstract with: 'This work proposes a novel Multi-Agent System architecture that integrates generative media with agentic decision-making and outlines a clinical evaluation plan to assess its potential for scaling personalized tele-rehabilitation.' This change will be made throughout the introduction and conclusion as well. revision: yes
Referee: [System Architecture / Prototype Pipeline] Video Synthesis Agent and Vision Processing Agent descriptions: The central feasibility claim depends on untested assumptions that generative video models will produce kinematically correct, injury-safe exercises from parsed constraints and that MediaPipe-based pose estimation will be sufficiently accurate in uncontrolled home settings for safe autonomous feedback; no preliminary tests, failure-mode analysis, or constraint-enforcement mechanisms are provided to address these risks.

Authors: The referee accurately notes the absence of preliminary tests, failure-mode analysis, or explicit constraint-enforcement mechanisms. Because the manuscript is a framework proposal rather than an implementation study, we did not conduct such experiments. In the revised version we will add a dedicated 'Risks and Mitigations' subsection under the Video Synthesis Agent describing kinematic validation steps (e.g., post-generation pose consistency checks against the extracted constraints) and a parallel subsection under the Vision Processing Agent discussing MediaPipe limitations in variable home lighting and camera angles together with proposed mitigations such as multi-view fusion and confidence thresholding before issuing feedback. The clinical evaluation plan will be expanded to include explicit safety and kinematic accuracy metrics. We cannot add new empirical results at this stage. revision: partial

Circularity Check

0 steps flagged

No circularity: high-level system proposal without derivations or fitted predictions

full rationale

The paper is a conceptual architecture proposal for a multi-agent physiotherapy system. It contains no equations, quantitative predictions, parameter fitting, or derivation chains that could reduce to inputs by construction. The feasibility claim rests on descriptive outline of components (LLM parsing, MediaPipe, generative video models) and a planned evaluation, with no self-referential logic or load-bearing self-citations that substitute for independent evidence. This is the normal non-circular outcome for a system-design manuscript.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the ledger captures implied assumptions about AI component capabilities since no full methods or results are available. No free parameters are mentioned. The four agents are introduced as new software components without external validation.

axioms (2)

domain assumption Generative video models can accurately synthesize kinematically correct and safe patient-specific exercise videos from extracted constraints.
Invoked in the description of the Video Synthesis Agent.
domain assumption Real-time pose estimation is reliable enough in home environments to enable safe corrective feedback from the Diagnostic Feedback Agent.
Assumed for the Vision Processing Agent and overall system safety.

pith-pipeline@v0.9.0 · 5487 in / 1354 out tokens · 109634 ms · 2026-05-09T23:38:35.568801+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

Applications of Machine Learning in Real-Life Digital Health Interventions: Review of the Literature,

A. K. Triantafyllidis and A. Tsanas, “Applications of Machine Learning in Real-Life Digital Health Interventions: Review of the Literature,” Journal of Medical Internet Research, vol. 21, no. 4, p. e12286, Apr. 2019

work page 2019
[2]

Evaluating the Performance of Balance Physio- therapy Exercises Using a Sensory Platform: The Basis for a Persuasive Balance Rehabilitation Virtual Coaching System,

V. D. Tsakanikaset al., “Evaluating the Performance of Balance Physio- therapy Exercises Using a Sensory Platform: The Basis for a Persuasive Balance Rehabilitation Virtual Coaching System,”Frontiers in Digital Health, vol. 2, Nov. 2020

work page 2020
[3]

The Promise of Explainable AI in Digital Health for Precision Medicine: A Systematic Review,

B. Allen, “The Promise of Explainable AI in Digital Health for Precision Medicine: A Systematic Review,”Journal of Personalized Medicine, vol. 14, no. 277, 2024

work page 2024
[4]

Towards Intelligent Assessment in Person- alized Physiotherapy with Computer Vision,

V. García and O. C. Santos, “Towards Intelligent Assessment in Person- alized Physiotherapy with Computer Vision,”Sensors, vol. 25, no. 3436, 2025

work page 2025
[5]

Enhancing Virtual Physiotherapy Through Com- puter Vision and Pose Estimation,

F. M. da Silva Luz, “Enhancing Virtual Physiotherapy Through Com- puter Vision and Pose Estimation,” Master’s thesis, ISCTE - Instituto Universitário de Lisboa, 2024

work page 2024
[6]

The Potential and Limitations of Vision-Language Models for Human Motion Understanding: A Case Study in Data-Driven Stroke Rehabilitation,

V. Liet al., “The Potential and Limitations of Vision-Language Models for Human Motion Understanding: A Case Study in Data-Driven Stroke Rehabilitation,”arXiv preprint arXiv:2511.17727v1, 2025

work page arXiv 2025
[7]

Generative Adversarial Networks for Genera- tion and Classification of Physical Rehabilitation Movement Episodes,

L. Li and A. Vakanski, “Generative Adversarial Networks for Genera- tion and Classification of Physical Rehabilitation Movement Episodes,” International Journal of Machine Learning and Computing, vol. 8, no. 5, pp. 428–436, Oct. 2018

work page 2018
[8]

Generative AI for biomedical video synthesis: a review,

N. Algethami, T. Iqbal, and I. Ullah, “Generative AI for biomedical video synthesis: a review,”Artificial Intelligence Review, vol. 58, no. 392, Oct. 2025

work page 2025
[9]

A Review of Deepfake Technology in Physical Health Management and Application,

T. Fan and M. M. Moghimi, “A Review of Deepfake Technology in Physical Health Management and Application,”International Journal of Intelligent Systems, 2026

work page 2026

[1] [1]

Applications of Machine Learning in Real-Life Digital Health Interventions: Review of the Literature,

A. K. Triantafyllidis and A. Tsanas, “Applications of Machine Learning in Real-Life Digital Health Interventions: Review of the Literature,” Journal of Medical Internet Research, vol. 21, no. 4, p. e12286, Apr. 2019

work page 2019

[2] [2]

Evaluating the Performance of Balance Physio- therapy Exercises Using a Sensory Platform: The Basis for a Persuasive Balance Rehabilitation Virtual Coaching System,

V. D. Tsakanikaset al., “Evaluating the Performance of Balance Physio- therapy Exercises Using a Sensory Platform: The Basis for a Persuasive Balance Rehabilitation Virtual Coaching System,”Frontiers in Digital Health, vol. 2, Nov. 2020

work page 2020

[3] [3]

The Promise of Explainable AI in Digital Health for Precision Medicine: A Systematic Review,

B. Allen, “The Promise of Explainable AI in Digital Health for Precision Medicine: A Systematic Review,”Journal of Personalized Medicine, vol. 14, no. 277, 2024

work page 2024

[4] [4]

Towards Intelligent Assessment in Person- alized Physiotherapy with Computer Vision,

V. García and O. C. Santos, “Towards Intelligent Assessment in Person- alized Physiotherapy with Computer Vision,”Sensors, vol. 25, no. 3436, 2025

work page 2025

[5] [5]

Enhancing Virtual Physiotherapy Through Com- puter Vision and Pose Estimation,

F. M. da Silva Luz, “Enhancing Virtual Physiotherapy Through Com- puter Vision and Pose Estimation,” Master’s thesis, ISCTE - Instituto Universitário de Lisboa, 2024

work page 2024

[6] [6]

The Potential and Limitations of Vision-Language Models for Human Motion Understanding: A Case Study in Data-Driven Stroke Rehabilitation,

V. Liet al., “The Potential and Limitations of Vision-Language Models for Human Motion Understanding: A Case Study in Data-Driven Stroke Rehabilitation,”arXiv preprint arXiv:2511.17727v1, 2025

work page arXiv 2025

[7] [7]

Generative Adversarial Networks for Genera- tion and Classification of Physical Rehabilitation Movement Episodes,

L. Li and A. Vakanski, “Generative Adversarial Networks for Genera- tion and Classification of Physical Rehabilitation Movement Episodes,” International Journal of Machine Learning and Computing, vol. 8, no. 5, pp. 428–436, Oct. 2018

work page 2018

[8] [8]

Generative AI for biomedical video synthesis: a review,

N. Algethami, T. Iqbal, and I. Ullah, “Generative AI for biomedical video synthesis: a review,”Artificial Intelligence Review, vol. 58, no. 392, Oct. 2025

work page 2025

[9] [9]

A Review of Deepfake Technology in Physical Health Management and Application,

T. Fan and M. M. Moghimi, “A Review of Deepfake Technology in Physical Health Management and Application,”International Journal of Intelligent Systems, 2026

work page 2026