GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation

Krati Saxena; Tomohiro Shibata

arxiv: 2605.20188 · v1 · pith:FKUVADNQnew · submitted 2026-03-21 · 💻 cs.LG · cs.AI

GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation

Krati Saxena , Tomohiro Shibata This is my paper

Pith reviewed 2026-05-21 10:52 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords medication recommendationdifferential attentionpharmacological graphelectronic health recordsdrug-drug interactionsnoise filteringclinical recommendation systems

0 comments

The pith

A framework using dual-scale differential attention and pharmacological graph constraints improves medication recommendations from noisy patient records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GraphDiffMed to recommend medication combinations from electronic health records that are long, noisy, and clinically varied. It applies differential attention separately within individual visits and across sequences of visits to remove spurious signals, then folds in pharmacological knowledge such as drug interaction rules during model training. A reader would care because existing approaches tend to handle either the time dimension or the safety knowledge well but not both at once, leaving recommendations vulnerable to noise or unsafe combinations. Experiments on MIMIC-III data indicate that the combined design lifts recommendation quality, ranking, and safety balance over strong baselines.

Core claim

GraphDiffMed is a knowledge-constrained medication recommendation framework built on dual-scale Differential Attention v2. Differential attention is applied at both intra-visit and inter-visit levels to filter spurious signals within encounters and across longitudinal history, while pharmacological constraints are incorporated during learning. Experiments on MIMIC-III and ablation studies show that this design consistently improves recommendation quality and ranking over strong baselines while achieving a more favorable safety performance balance. The strongest-performing configuration uses only demographic auxiliary features under the reported experimental setting.

What carries the argument

Dual-scale Differential Attention v2 applied at intra-visit and inter-visit levels to suppress noise, combined with pharmacological graph priors that are enforced during training.

If this is right

Experiments on MIMIC-III show consistent gains in recommendation quality and ranking over strong baselines.
The design achieves a more favorable safety performance balance than prior methods.
The best reported results occur when the model uses only demographic auxiliary features.
The overall approach produces more reliable and clinically meaningful medication recommendations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The noise-filtering mechanism could be tested on EHR data from hospitals outside the MIMIC-III cohort to check whether gains persist under different recording practices.
Pharmacological graph priors might transfer to other clinical tasks that predict safe drug combinations from sequential records.
The dual attention scales could be examined on patient histories longer than those in the current experiments to see where the noise-suppression benefit plateaus.

Load-bearing premise

Dual-scale differential attention can reliably separate real clinical signals from noise in varied patient histories while pharmacological constraints are added during learning without distorting the model's recommendation distribution.

What would settle it

An independent test on MIMIC-III or another EHR dataset in which the full model fails to outperform baselines that omit either the dual-scale attention or the pharmacological constraints on accuracy, ranking, or safety metrics.

Figures

Figures reproduced from arXiv: 2605.20188 by Krati Saxena, Tomohiro Shibata.

**Figure 1.** Figure 1: Overview of GraphDiffMed We use the CIDGMed pipeline [6] for multimodal representation learning that jointly models diagnoses, procedures, and medications for visit-level recommendation (The second top box in [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

read the original abstract

Recommending safe and effective medication combinations from electronic health records (EHRs) is a core clinical AI problem, yet it remains difficult because patient trajectories are long, noisy, and clinically heterogeneous. Existing methods typically excel at either temporal modeling across visits or pharmacological knowledge integration (e.g., drug-drug interactions, DDIs), but rarely achieve both while robustly suppressing noise. We present GraphDiffMed, a knowledge-constrained medication recommendation framework built on dual-scale Differential Attention v2. Differential attention is applied at both intra-visit and inter-visit levels to filter spurious signals within encounters and across longitudinal history, while pharmacological constraints are incorporated during learning. Experiments on MIMIC-III and ablation studies show that this design consistently improves recommendation quality and ranking over strong baselines while achieving a more favorable safety performance balance. We further find that the strongest-performing configuration uses only demographic auxiliary features under our experimental setting. Overall, GraphDiffMed demonstrates that combining noise-aware attention with pharmacological constraints yields more reliable and clinically meaningful medication recommendation. We open-source our code at https://github.com/saxenakrati09/GraphDiffMed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes GraphDiffMed, a medication recommendation framework for EHR data that applies dual-scale Differential Attention v2 at intra-visit and inter-visit levels to suppress spurious signals in long, noisy patient trajectories while incorporating pharmacological graph priors (e.g., DDIs) as constraints during learning. Experiments on MIMIC-III report consistent gains in recommendation quality and ranking over baselines, a favorable safety balance, and the observation that the best configuration relies only on demographic auxiliary features; code is open-sourced.

Significance. If the noise-filtering behavior and constraint integration are validated, the approach could improve robustness of clinical AI systems for handling heterogeneous longitudinal records while respecting pharmacological knowledge, potentially aiding safer polypharmacy decisions. The open-source release supports reproducibility, though the absence of quantitative metrics, error bars, and isolating diagnostics in the reported results limits immediate impact assessment.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the central claim that dual-scale Differential Attention v2 'reliably filters spurious signals' is not supported by targeted evidence such as attention weight histograms on noisy vs. clean visits, synthetic noise injection tests, or comparisons of attended vs. ignored codes; ablation results show ranking gains but do not isolate noise suppression from general capacity increases.
[Abstract] Abstract: improvements on MIMIC-III and favorable safety balance are stated without any quantitative metrics, error bars, statistical significance tests, or detailed ablation tables, preventing assessment of effect sizes and reliability of the reported gains.

minor comments (1)

[Abstract] The finding that the strongest configuration uses only demographic auxiliary features is noteworthy but would benefit from further discussion of why richer visit-level features underperform under this architecture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of evidence and quantitative results.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim that dual-scale Differential Attention v2 'reliably filters spurious signals' is not supported by targeted evidence such as attention weight histograms on noisy vs. clean visits, synthetic noise injection tests, or comparisons of attended vs. ignored codes; ablation results show ranking gains but do not isolate noise suppression from general capacity increases.

Authors: We agree that the current ablation results demonstrate overall performance gains but do not fully isolate the noise-suppression mechanism from capacity effects. The dual-scale design is motivated by the need to suppress spurious signals in long trajectories, and the consistent improvements across metrics support this, yet we acknowledge the value of more direct diagnostics. In the revision we will add attention weight histograms and comparisons of attended versus ignored codes on representative visits to provide targeted evidence for the filtering behavior. revision: yes
Referee: [Abstract] Abstract: improvements on MIMIC-III and favorable safety balance are stated without any quantitative metrics, error bars, statistical significance tests, or detailed ablation tables, preventing assessment of effect sizes and reliability of the reported gains.

Authors: We accept that the abstract would benefit from explicit quantitative support. The Experiments section already contains the relevant metrics, error bars, and ablation tables with statistical comparisons, but these are not summarized in the abstract. We will revise the abstract to report key effect sizes (e.g., absolute and relative gains in Jaccard, F1, and PRAUC) and note the statistical significance of the improvements while maintaining the required length. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is empirically grounded

full rationale

The paper presents GraphDiffMed as a framework that applies dual-scale Differential Attention v2 at intra- and inter-visit levels to suppress noise in EHR trajectories while incorporating pharmacological graph priors during learning. Claims of improved recommendation quality, ranking, and safety balance are supported by experiments and ablations on MIMIC-III against baselines, with no equations or steps shown that reduce by construction to fitted parameters renamed as predictions, self-definitions, or load-bearing self-citations. The architecture is introduced as a design choice whose value is assessed through external benchmarks rather than internal tautology, making the central derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard ML assumptions about attention mechanisms and the availability of pharmacological knowledge graphs; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Differential attention mechanisms can effectively suppress spurious signals in longitudinal EHR data at both intra- and inter-visit scales.
Core design choice stated in the abstract as the basis for noise filtering.
domain assumption Pharmacological constraints such as DDIs can be incorporated during model learning without introducing new biases.
Assumed when stating that constraints are added during learning.

pith-pipeline@v0.9.0 · 5729 in / 1293 out tokens · 59112 ms · 2026-05-21T10:52:38.386295+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Graph-biased differential attention … S = QK^T / sqrt(dh) + λ_graph B_graph … C_diff = C1 − λ ⊙ C2 … B_inter_graph[i,j] = I_med(i,j) · mean ADDI[mq,mk]
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

annealed DDI weighting β(t) … L = L_BCE + β(t) L_DDI + α L_reg

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Advances in Neural Information Processing Systems36 (2024)

Bhoi, S., Lee, M.L., Hsu, W., Tan, N.C.: Refine: a fine-grained medication recommendation system using deep learning and personalized drug inter- action modeling. Advances in Neural Information Processing Systems36 (2024)

work page 2024
[2]

Advances in neural information processing systems 29(2016)

Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems 29(2016)

work page 2016
[3]

The Journal of Supercom- puting pp

Huo, J., Hong, Z., Chen, M., Duan, Y.: Mifnet: multimodal interactive fu- sion network for medication recommendation. The Journal of Supercom- puting pp. 1–33 (2024)

work page 2024
[4]

Scientific data3(1), 1–9 (2016)

Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.w.H., Feng, M., Ghas- semi, M., Moody, B., Szolovits, P., Anthony Celi, L., Mark, R.G.: Mimic-iii, a freely accessible critical care database. Scientific data3(1), 1–9 (2016)

work page 2016
[5]

Nucleic acids research52(D1), D1265– D1275 (2024)

Knox, C., Wilson, M., Klinger, C.M., Franklin, M., Oler, E., Wilson, A., Pon, A., Cox, J., Chin, N.E., Strawbridge, S.A., et al.: Drugbank 6.0: the drugbank knowledgebase for 2024. Nucleic acids research52(D1), D1265– D1275 (2024)

work page 2024
[6]

Knowledge-Based Systems309, 112685 (2025)

Liang, S., Li, X., Mu, S., Li, C., Lei, Y., Hou, Y., Ma, T.: Cidgmed: Causal inference-driven medication recommendation with enhanced dual- granularity learning. Knowledge-Based Systems309, 112685 (2025)

work page 2025
[7]

Applied Soft Computing161, 111723 (2024)

Liu, J., Wan, Z., Hu, X., Zhu, Q.: Safe drug recommendation through for- ward data imputation and recurrent residual neural network. Applied Soft Computing161, 111723 (2024)

work page 2024
[8]

arXiv preprint arXiv:2402.02803 (2024)

Liu,Q.,Wu,X.,Zhao,X.,Zhu,Y.,Zhang,Z.,Tian,F.,Zheng,Y.:Largelan- guage model distilling medication recommendation model. arXiv preprint arXiv:2402.02803 (2024)

work page arXiv 2024
[9]

IEEE Journal of Biomedical and Health Informatics (2023)

Liu, S., Wang, X., Du, J., Hou, Y., Zhao, X., Xu, H., Wang, H., Xiang, Y., Tang, B.: Shape: A sample-adaptive hierarchical prediction network for medication recommendation. IEEE Journal of Biomedical and Health Informatics (2023)

work page 2023
[10]

In: IFIP International Con- ference on Artificial Intelligence Applications and Innovations

Saxena, K., Shibata, T.: Dada-med: Data-augmented dual attention model for enhanced medication recommendations. In: IFIP International Con- ference on Artificial Intelligence Applications and Innovations. pp. 83–97. Springer (2025)

work page 2025
[11]

In: proceed- ingsoftheAAAIConferenceonArtificialIntelligence.vol.33,pp.1126–1133 (2019)

Shang, J., Xiao, C., Ma, T., Li, H., Sun, J.: Gamenet: Graph augmented memory networks for recommending medication combination. In: proceed- ingsoftheAAAIConferenceonArtificialIntelligence.vol.33,pp.1126–1133 (2019)

work page 2019
[12]

Science translational medicine4(125), 125ra31–125ra31 (2012) 14 K

Tatonetti, N.P., Ye, P.P., Daneshjou, R., Altman, R.B.: Data-driven predic- tion of drug effects and interactions. Science translational medicine4(125), 125ra31–125ra31 (2012) 14 K. Saxena and T. Shibata

work page 2012
[13]

Advances in Neural Information Processing Systems (2017)

Vaswani, A.: Attention is all you need. Advances in Neural Information Processing Systems (2017)

work page 2017
[14]

Bioinformatics39(1), btad003 (2023)

Wu, J., Dong, Y., Gao, Z., Gong, T., Li, C.: Dual attention and patient similarity network for drug recommendation. Bioinformatics39(1), btad003 (2023)

work page 2023
[15]

Information Processing & Management61(4), 103758 (2024)

Wu, J., Yu, X., He, K., Gao, Z., Gong, T.: Promise: A pre-trained knowledge-infused multimodal representation learning framework for med- ication recommendation. Information Processing & Management61(4), 103758 (2024)

work page 2024
[16]

Ye, T., Dong, L., Sun, Y., Wei, F.: Differential transformer v2 (1 2026), https://aka.ms/diff-transformer-v2

work page 2026
[17]

Differential transformer, 2024

Ye, T., Dong, L., Xia, Y., Sun, Y., Zhu, Y., Huang, G., Wei, F.: Differential transformer. arXiv preprint arXiv:2410.05258 (2024)

work page arXiv 2024
[18]

Bioengineering10(11), 1241 (2023)

Yue, W., Wang, M., Zhang, L., Zhang, L., Huang, J., Wan, J., Xiong, N., Vasilakos, A.V.: A-gstcn: An augmented graph structural–temporal convo- lution network for medication recommendation based on electronic health records. Bioengineering10(11), 1241 (2023)

work page 2023
[19]

In: proceedings of the 23rd ACM SIGKDD international conference on knowl- edge Discovery and data Mining

Zhang, Y., Chen, R., Tang, J., Stewart, W.F., Sun, J.: Leap: learning to prescribe effective and safe treatment combinations for multimorbidity. In: proceedings of the 23rd ACM SIGKDD international conference on knowl- edge Discovery and data Mining. pp. 1315–1324 (2017)

work page 2017

[1] [1]

Advances in Neural Information Processing Systems36 (2024)

Bhoi, S., Lee, M.L., Hsu, W., Tan, N.C.: Refine: a fine-grained medication recommendation system using deep learning and personalized drug inter- action modeling. Advances in Neural Information Processing Systems36 (2024)

work page 2024

[2] [2]

Advances in neural information processing systems 29(2016)

Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems 29(2016)

work page 2016

[3] [3]

The Journal of Supercom- puting pp

Huo, J., Hong, Z., Chen, M., Duan, Y.: Mifnet: multimodal interactive fu- sion network for medication recommendation. The Journal of Supercom- puting pp. 1–33 (2024)

work page 2024

[4] [4]

Scientific data3(1), 1–9 (2016)

Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.w.H., Feng, M., Ghas- semi, M., Moody, B., Szolovits, P., Anthony Celi, L., Mark, R.G.: Mimic-iii, a freely accessible critical care database. Scientific data3(1), 1–9 (2016)

work page 2016

[5] [5]

Nucleic acids research52(D1), D1265– D1275 (2024)

Knox, C., Wilson, M., Klinger, C.M., Franklin, M., Oler, E., Wilson, A., Pon, A., Cox, J., Chin, N.E., Strawbridge, S.A., et al.: Drugbank 6.0: the drugbank knowledgebase for 2024. Nucleic acids research52(D1), D1265– D1275 (2024)

work page 2024

[6] [6]

Knowledge-Based Systems309, 112685 (2025)

Liang, S., Li, X., Mu, S., Li, C., Lei, Y., Hou, Y., Ma, T.: Cidgmed: Causal inference-driven medication recommendation with enhanced dual- granularity learning. Knowledge-Based Systems309, 112685 (2025)

work page 2025

[7] [7]

Applied Soft Computing161, 111723 (2024)

Liu, J., Wan, Z., Hu, X., Zhu, Q.: Safe drug recommendation through for- ward data imputation and recurrent residual neural network. Applied Soft Computing161, 111723 (2024)

work page 2024

[8] [8]

arXiv preprint arXiv:2402.02803 (2024)

Liu,Q.,Wu,X.,Zhao,X.,Zhu,Y.,Zhang,Z.,Tian,F.,Zheng,Y.:Largelan- guage model distilling medication recommendation model. arXiv preprint arXiv:2402.02803 (2024)

work page arXiv 2024

[9] [9]

IEEE Journal of Biomedical and Health Informatics (2023)

Liu, S., Wang, X., Du, J., Hou, Y., Zhao, X., Xu, H., Wang, H., Xiang, Y., Tang, B.: Shape: A sample-adaptive hierarchical prediction network for medication recommendation. IEEE Journal of Biomedical and Health Informatics (2023)

work page 2023

[10] [10]

In: IFIP International Con- ference on Artificial Intelligence Applications and Innovations

Saxena, K., Shibata, T.: Dada-med: Data-augmented dual attention model for enhanced medication recommendations. In: IFIP International Con- ference on Artificial Intelligence Applications and Innovations. pp. 83–97. Springer (2025)

work page 2025

[11] [11]

In: proceed- ingsoftheAAAIConferenceonArtificialIntelligence.vol.33,pp.1126–1133 (2019)

Shang, J., Xiao, C., Ma, T., Li, H., Sun, J.: Gamenet: Graph augmented memory networks for recommending medication combination. In: proceed- ingsoftheAAAIConferenceonArtificialIntelligence.vol.33,pp.1126–1133 (2019)

work page 2019

[12] [12]

Science translational medicine4(125), 125ra31–125ra31 (2012) 14 K

Tatonetti, N.P., Ye, P.P., Daneshjou, R., Altman, R.B.: Data-driven predic- tion of drug effects and interactions. Science translational medicine4(125), 125ra31–125ra31 (2012) 14 K. Saxena and T. Shibata

work page 2012

[13] [13]

Advances in Neural Information Processing Systems (2017)

Vaswani, A.: Attention is all you need. Advances in Neural Information Processing Systems (2017)

work page 2017

[14] [14]

Bioinformatics39(1), btad003 (2023)

Wu, J., Dong, Y., Gao, Z., Gong, T., Li, C.: Dual attention and patient similarity network for drug recommendation. Bioinformatics39(1), btad003 (2023)

work page 2023

[15] [15]

Information Processing & Management61(4), 103758 (2024)

Wu, J., Yu, X., He, K., Gao, Z., Gong, T.: Promise: A pre-trained knowledge-infused multimodal representation learning framework for med- ication recommendation. Information Processing & Management61(4), 103758 (2024)

work page 2024

[16] [16]

Ye, T., Dong, L., Sun, Y., Wei, F.: Differential transformer v2 (1 2026), https://aka.ms/diff-transformer-v2

work page 2026

[17] [17]

Differential transformer, 2024

Ye, T., Dong, L., Xia, Y., Sun, Y., Zhu, Y., Huang, G., Wei, F.: Differential transformer. arXiv preprint arXiv:2410.05258 (2024)

work page arXiv 2024

[18] [18]

Bioengineering10(11), 1241 (2023)

Yue, W., Wang, M., Zhang, L., Zhang, L., Huang, J., Wan, J., Xiong, N., Vasilakos, A.V.: A-gstcn: An augmented graph structural–temporal convo- lution network for medication recommendation based on electronic health records. Bioengineering10(11), 1241 (2023)

work page 2023

[19] [19]

In: proceedings of the 23rd ACM SIGKDD international conference on knowl- edge Discovery and data Mining

Zhang, Y., Chen, R., Tang, J., Stewart, W.F., Sun, J.: Leap: learning to prescribe effective and safe treatment combinations for multimorbidity. In: proceedings of the 23rd ACM SIGKDD international conference on knowl- edge Discovery and data Mining. pp. 1315–1324 (2017)

work page 2017