PriHA: A RAG-Enhanced LLM Framework for Primary Healthcare Assistant in Hong Kong
Pith reviewed 2026-05-21 10:05 UTC · model grok-4.3
The pith
PriHA uses dual retrieval to boost accuracy and clarity in Hong Kong primary healthcare advice
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the PriHA system with its DRAG architecture outperforms both ablations and baseline methods in terms of accuracy and clarity when answering primary healthcare queries using Hong Kong's official guidelines.
What carries the argument
The Dual Retrieval Augmented Generation (DRAG) architecture, which enables mixed-source retrieval and context-reorganized generation to improve response quality.
If this is right
- Offers a traceable and reliable method for retrieving information from fragmented official sources.
- Enables better support for citizens in self-managing their health through community resources.
- Provides a framework that can be adapted for other high-risk localized applications.
- Reduces the risk of factual errors in LLM-generated health advice.
Where Pith is reading between the lines
- The system might benefit from updates to guidelines in real time to stay current.
- User studies in actual healthcare settings could validate its practical impact beyond experiments.
- Similar RAG setups could address guideline fragmentation in other regions or domains.
Load-bearing premise
Official clinical guidelines are complete, current, and sufficient to answer typical primary-care queries without requiring professional medical judgment or additional real-time data.
What would settle it
Demonstrating that for a query on a topic only partially covered by guidelines, the system generates advice that contradicts medical standards or omits critical warnings.
Figures
read the original abstract
To address the unsustainable rise in public health expenditures, the Hong Kong SAR Government is shifting its strategic focus to primary healthcare and encouraging citizens to use community resources to self-manage their health. However, official clinical guidelines are fragmented across disparate departments and formats, creating significant access barriers. While general-purpose Large Language Models (LLMs) such as ChatGPT and DeepSeek offer potential solutions for information accessibility, they are prone to generating factually inaccurate content due to a lack of localized and domain-specific knowledge. To this end, we propose a Retrieval-Augmented Generation-Enhanced LLM system as Primary Healthcare Assistant (PriHA) in Hong Kong. Specifically, a tri-stage pipeline is proposed that leverages a query optimizer to generalize user intent-oriented sub-queries, followed by a novel Dual Retrieval Augmented Generation (DRAG) architecture for mixed-source retrieval and context-reorganized generation. Comprehensive experiments and a detailed case study demonstrate that our proposed method can outperform both ablations and baseline in terms of accuracy and clarity. Our research provides a reliable and traceable dialogue retrieval framework for exploring other high-risk, localized application scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PriHA, a RAG-enhanced LLM framework for a Primary Healthcare Assistant tailored to Hong Kong. It proposes a tri-stage pipeline consisting of a query optimizer that generates intent-oriented sub-queries, followed by a Dual Retrieval Augmented Generation (DRAG) architecture that performs mixed-source retrieval from official guidelines and performs context-reorganized generation. The central claim is that comprehensive experiments and a case study demonstrate outperformance over ablations and baselines in accuracy and clarity, while providing a traceable retrieval framework for localized high-risk applications.
Significance. If the empirical claims hold under rigorous evaluation, the work offers a practical demonstration of adapting RAG techniques to address fragmented, localized official documents in a high-stakes domain. This could support Hong Kong's policy shift toward primary-care self-management by improving citizen access to guideline information, and the DRAG design for mixed-source handling may generalize to other regulated information-access scenarios.
major comments (2)
- [Abstract and results section] Abstract and results section: the claim that the method 'outperforms both ablations and baseline in terms of accuracy and clarity' is load-bearing, yet the manuscript supplies no quantitative metrics (e.g., accuracy percentages, clarity scores), dataset size or composition, baseline definitions, or error analysis. Without these, the reported gains cannot be verified or compared to standard IR or RAG benchmarks.
- [Methods / DRAG pipeline description] Methods / DRAG pipeline description: the accuracy claims rest on the assumption that official clinical guidelines are complete, current, and sufficient to answer typical primary-care queries. The manuscript does not test or discuss failure modes where queries require patient-specific synthesis, clinical discretion, or real-time data absent from static departmental documents; this omission directly affects whether measured improvements reflect general reliability or only the subset of queries that fit the assumption.
minor comments (2)
- [Abstract] Abstract: the phrase 'comprehensive experiments' is used without any numerical summary; adding one or two key quantitative results would improve immediate readability.
- [Methods] Notation: the distinction between standard RAG and the proposed DRAG is described at a high level; a small diagram or explicit comparison table would clarify the novelty of the dual-retrieval and reorganization steps.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help strengthen the empirical rigor and scope discussion in our work. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract and results section] Abstract and results section: the claim that the method 'outperforms both ablations and baseline in terms of accuracy and clarity' is load-bearing, yet the manuscript supplies no quantitative metrics (e.g., accuracy percentages, clarity scores), dataset size or composition, baseline definitions, or error analysis. Without these, the reported gains cannot be verified or compared to standard IR or RAG benchmarks.
Authors: We agree that the absence of explicit quantitative details undermines verifiability of the central claim. The current manuscript mentions comprehensive experiments but does not report specific accuracy percentages, clarity scores, dataset size/composition, baseline definitions, or error analysis. In the revised version, we will add these to the results section, including concrete metrics (e.g., accuracy rates and human-rated clarity scores), dataset details (e.g., 200 Hong Kong primary-care queries with topic breakdown), baseline specifications (standard RAG, vanilla LLM, and ablation variants), and error categorization. This will enable direct comparison to IR/RAG benchmarks. revision: yes
-
Referee: [Methods / DRAG pipeline description] Methods / DRAG pipeline description: the accuracy claims rest on the assumption that official clinical guidelines are complete, current, and sufficient to answer typical primary-care queries. The manuscript does not test or discuss failure modes where queries require patient-specific synthesis, clinical discretion, or real-time data absent from static departmental documents; this omission directly affects whether measured improvements reflect general reliability or only the subset of queries that fit the assumption.
Authors: This observation is correct and highlights a key scope limitation. Our evaluation targets queries answerable via the static official guidelines that the system is designed to retrieve from. We did not explicitly test or discuss out-of-scope cases such as patient-specific synthesis, clinical discretion, or real-time data needs. In the revision, we will add a dedicated limitations subsection discussing these failure modes, clarifying that PriHA serves as an information-access assistant rather than a substitute for professional medical judgment, and noting planned extensions for dynamic data integration. revision: yes
Circularity Check
No circularity: standard RAG framework with empirical evaluation
full rationale
The paper describes a tri-stage DRAG pipeline (query optimizer + dual retrieval + context-reorganized generation) as an application of existing retrieval-augmented generation techniques to Hong Kong clinical guidelines. No equations, first-principles derivations, or predictions appear that reduce by construction to fitted parameters or self-referential definitions. Performance claims rest on experiments and a case study rather than on any load-bearing self-citation chain or ansatz smuggled via prior work. The framework is self-contained against external benchmarks of RAG systems; the central assumption about guideline sufficiency is an empirical limitation, not a circularity in the derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Retrieval from official documents will supply sufficient and accurate context for primary-care queries
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a tri-stage pipeline is proposed that leverages a query optimizer to generalize user intent-oriented sub-queries, followed by a novel Dual Retrieval Augmented Generation (DRAG) architecture for mixed-source retrieval and context-reorganized generation
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We employ a reasoning model to perform conflict resolution and summarization in a single pass
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Asgari, E., Montaña-Brown, N., Dubois, M., et al.: A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. npj Digital Medicine (2025)
work page 2025
-
[2]
Chen, X., Xiang, J., Lu, S., Liu, Y., He, M., Shi, D.: Evaluating large language models and agents in healthcare: key challenges in clinical applications. Intelligent Medicine (2025)
work page 2025
-
[3]
Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.S., Li, Q.: A survey on rag meeting llms: Towards retrieval-augmented large language models. In: KDD (2024)
work page 2024
-
[4]
Fan, W., Ma, Y., Li, Q., He, Y., Zhao, E., Tang, J., Yin, D.: Graph neural networks for social recommendation. In: WWW (2019)
work page 2019
-
[5]
Fan, W., Ma, Y., Li, Q., Wang, J., Cai, G., Tang, J., Yin, D.: A graph neural network framework for social recommendations. IEEE TKDE (2020)
work page 2020
-
[6]
arXiv preprint arXiv:2501.10282 (2025)
Fan, W., Zhou, Y., Wang, S., Yan, Y., Liu, H., Zhao, Q., Song, L., Li, Q.: Compu- tational protein science in the era of large language models (llms). arXiv preprint arXiv:2501.10282 (2025)
-
[7]
Jang, D., Shangguan, Z., Tegtmeyer, K., Gupta, A., Czerminski, J.T., Chheang, S., Cohan, A.: MedTutor: A retrieval-augmented LLM system for case-based medical education. In: EMNLP (2025) 12 R. W. C. Chan et al
work page 2025
-
[8]
Kim, Y., Jeong, H., Chen, S., Li, S.S., Park, C., Lu, M., Alhamoud, K., Mun, J., Grau, C., Jung, M., Gameiro, R., Fan, L., Park, E., Lin, T., Yoon, J., Yoon, W., Sap, M., Tsvetkov, Y., Liang, P.P., Xu, X., Liu, X., Park, C., Lee, H., Park, H.W., McDuff, D., Tulebaev, S., Breazeal, C.: Medical hallucination in foundation models and their impact on healthca...
work page 2025
-
[9]
Li, D., Jiang, B., Huang, L., Beigi, A., Zhao, C., Tan, Z., Bhattacharjee, A., Jiang, Y., Chen, C., Wu, T., et al.: From generation to judgment: Opportunities and challenges of llm-as-a-judge. In: EMNLP (2025)
work page 2025
-
[10]
Li, S.S., Balachandran, V., Feng, S., Ilgen, J.S., Pierson, E., Koh, P.W., Tsvetkov, Y.: Mediq: question-asking llms and a benchmark for reliable interactive clinical reasoning. In: NeurIPS (2024)
work page 2024
-
[11]
Frontiers in Psychiatry (2023)
Lo, T.W., Chan, G.H.: Understanding the life experiences of elderly in social iso- lation from the social systems perspective: using Hong Kong as an illustrating example. Frontiers in Psychiatry (2023)
work page 2023
-
[12]
ACM Transactions on Information Systems (2025)
Ning, L., Fan, W., Li, Q.: Retrieval-augmented purifier for robust llm-empowered recommendation. ACM Transactions on Information Systems (2025)
work page 2025
-
[13]
Ning, L., Liang, Z., Jiang, Z., Qu, H., Ding, Y., Fan, W., Wei, X.y., Lin, S., Liu, H., Yu, P.S., et al.: A survey of webagents: Towards next-generation ai agents for web automation with large foundation models. In: KDD (2025)
work page 2025
-
[14]
Patient Preference and Adherence (2025)
Pal, A., Wangmo, T., Bharadia, T., Ahmed-Richards, M., Bhanderi, M.B., Kach- hadiya, R., Allemann, S.S., Elger, B.S.: Generative ai/llms for plain language medi- cal information for patients, caregivers and general public: Opportunities, risks and ethics. Patient Preference and Adherence (2025)
work page 2025
-
[15]
retrieval-augmented generation
Pingua, B., Sahoo, A., Kandpal, M., Murmu, D., Rautaray, J., Barik, R.K., Saikia, M.J.: Medical llms: Fine-tuning vs. retrieval-augmented generation. Bioengineering (2025)
work page 2025
-
[16]
Qu, H., Fan, W., Zhao, Z., Li, Q.: Tokenrec: Learning to tokenize id for llm-based generative recommendation. IEEE TKDE (2025)
work page 2025
-
[17]
Qu, H., Lin, S., Ding, Y., Wang, Y., Fan, W.: Diffusion generative recommendation with continuous tokens (2026)
work page 2026
-
[18]
Shekar, S., Pataranutaporn, P., Sarabu, C., Cecchi, G.A., Maes, P.: People overtrust ai-generated medical advice despite low accuracy. NEJM AI (2025)
work page 2025
-
[19]
Research report, The Hong Kong Council of Social Service (HKCSS) (2023)
The Hong Kong Council of Social Service: A study on the physical and mental health and exercise habits of elderly people living alone or in couples in hong kong. Research report, The Hong Kong Council of Social Service (HKCSS) (2023)
work page 2023
-
[20]
Artificial Intelligence Review (2024)
Wang, D., Zhang, S.: Large language models in medical and healthcare fields: applications, advances, and challenges. Artificial Intelligence Review (2024)
work page 2024
-
[21]
Wang, S., Fan, W., Feng, Y., Shanru, L., Ma, X., Wang, S., Yin, D.: Knowl- edgegraph retrieval-augmented generation forllm-based recommendation.In: ACL (2025)
work page 2025
-
[22]
Wang, X., Ma, Y., Wang, Y., Jin, W., Wang, X., Tang, J., Jia, C., Yu, J.: Traffic flow prediction via spatial temporal graph neural network. In: WWW (2020)
work page 2020
-
[23]
Xiong, G., Jin, Q., Lu, Z., Zhang, A.: Benchmarking retrieval-augmented genera- tion for medicine. In: Findings of ACL (2024)
work page 2024
-
[24]
Zhao, Z., Fan, W., Li, J., Liu, Y., Mei, X., Wang, Y., Wen, Z., Wang, F., Zhao, X., Tang, J., et al.: Recommender systems in the era of large language models (llms). TKDE (2024)
work page 2024
-
[25]
arXiv preprint arXiv:2512.15133 (2025)
Zhou, Y., Qu, H., Liu, Y., Lin, S., Song, L., Fan, W.: Hd-prot: A protein lan- guage model for joint sequence-structure modeling with continuous structure to- kens. arXiv preprint arXiv:2512.15133 (2025)
work page internal anchor Pith review arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.