TAVR-VLM: Risk-Conditioned Causal Grounding for Hallucination-Resistant Report Generation

Anh Nguyen; Changkai Ji; Imran Razzak; Jinfeng Wang; Jionglong Su; Sifan Song; Xiwei Liu; Zhixiang Lu

arxiv: 2606.26874 · v1 · pith:AL5RTBWTnew · submitted 2026-06-25 · 💻 cs.AI

TAVR-VLM: Risk-Conditioned Causal Grounding for Hallucination-Resistant Report Generation

Zhixiang Lu , Xiwei Liu , Sifan Song , Changkai Ji , Anh Nguyen , Jionglong Su , Imran Razzak , Jinfeng Wang This is my paper

Pith reviewed 2026-06-26 05:04 UTC · model grok-4.3

classification 💻 cs.AI

keywords TAVRhallucination reductioncausal groundingmultimodal report generationmedical AIrisk conditioningsurgical planning

0 comments

The pith

Risk-conditioned causal grounding enables hallucination-resistant TAVR report generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TAVR-VLM as a framework to adapt multimodal large language models for generating reports in transcatheter aortic valve replacement planning. It proposes Risk-Conditioned Causal Grounding Attention to establish a structural pathway linking risk assessment to anatomical regions and then to generated words. This mechanism purifies visual features through a causal risk bottleneck and restricts token generation to a risk-defined support mask during autoregressive output. A sympathetic reader would care because such grounding could make AI-generated medical reports more reliable and interpretable in high-stakes surgical contexts.

Core claim

TAVR-VLM instantiates a model-internal Risk to Region to Word structural grounding pathway using Risk-Conditioned Causal Grounding Attention. The mechanism compresses multimodal inputs into a causal risk bottleneck that purifies dense visual features into a global risk mask. During generation, a support-projected causal consistency objective constrains token-level grounding within the risk-defined support mask, leading to reduced diagnostic hallucinations on the M3TAVR cohort.

What carries the argument

Risk-Conditioned Causal Grounding Attention (R-CGA), which compresses multimodal inputs into a causal risk bottleneck and constrains token generation within the risk-defined support mask.

If this is right

Anatomically grounded reports are produced by linking risk to specific regions and words.
Token generation is restricted to anatomically supported content.
Multimodal reasoning for TAVR planning gains improved evidence basis.
Interpretability of the AI outputs increases for clinical use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This causal structure could generalize to other high-stakes medical report tasks like radiology.
The risk mask could serve as a visual explanation tool for clinicians reviewing the reports.
Applying the method to video or 3D imaging data might extend the grounding to dynamic procedures.
Comparing performance on datasets from different hospitals would test robustness to distribution shifts.

Load-bearing premise

That constraining generation within the risk-defined support mask will eliminate hallucinations without creating new dataset-specific errors or missing important details.

What would settle it

Checking whether generated reports correctly reference anatomical regions visible in the input images on a new set of TAVR cases that previously triggered hallucinations in other models.

Figures

Figures reproduced from arXiv: 2606.26874 by Anh Nguyen, Changkai Ji, Imran Razzak, Jinfeng Wang, Jionglong Su, Sifan Song, Xiwei Liu, Zhixiang Lu.

**Figure 1.** Figure 1: Overview of the TAVR-VLM architecture. R-CGA implements a modelinternal “Risk → Region → Word” structural grounding pathway. Stage 1 encodes 3D CT, echocardiography clips, and clinical parameters into a joint embedding, predicts a risk distribution Prisk, and constructs a causal risk bottleneck Zrisk = P ⊤ riskW. Stage 2 uses Zrisk as the query and CT visual tokens as key/value features to obtain a global… view at source ↗

**Figure 2.** Figure 2: Comprehensive Model Analysis. (a) Sensitivity of the causal consistency coefficient λcausal on spatial grounding accuracy and hallucination rate. (b) Qualitative comparison between R-CGA and a baseline VLM without risk-conditioned grounding. Superiority in Clinical Prediction and Generation. While current open-source models and cutting-edge closed-source VLMs exhibit formidable general visionlanguage capa… view at source ↗

read the original abstract

Transcatheter Aortic Valve Replacement (TAVR) planning requires meticulous multimodal reasoning. However, adapting Multimodal Large Language Models (MLLMs) to this high-stakes domain is severely impeded by diagnostic hallucinations, where generated text lacks anatomical grounding. To address this, TAVR-VLM is introduced: a novel framework featuring Risk-Conditioned Causal Grounding Attention (R-CGA) that instantiates a model-internal ``Risk $\rightarrow$ Region $\rightarrow$ Word'' structural grounding pathway. R-CGA compresses multimodal inputs into a causal risk bottleneck, purifying dense visual features into a global risk mask. During autoregressive generation, a support-projected causal consistency objective constrains token-level grounding within the risk-defined support mask. Evaluated on $\text{M}^3\text{TAVR}$, a comprehensive 1,482-patient cohort, TAVR-VLM establishes a new state-of-the-art. It achieves an AUROC of 0.896, boosts CIDEr to 0.936, and drastically reduces the hallucination rate to 8.1\%, thereby improving interpretability for evidence-based surgical AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TAVR-VLM adds a risk-bottleneck attention trick for grounding TAVR reports but the abstract supplies no ablations or baselines to show the mechanism actually drives the reported gains.

read the letter

The paper introduces TAVR-VLM and its R-CGA module, which routes multimodal features through a risk bottleneck to produce a global mask and then restricts token generation to that mask during decoding. This Risk → Region → Word pathway is the concrete new piece; it is not just another attention variant but a deliberate attempt to enforce anatomical support in autoregressive output for TAVR planning.

It does address a genuine deployment obstacle. Hallucinations in medical report generation are a known blocker, and the 1,482-patient M³TAVR cohort is large enough to be taken seriously for this narrow task. The headline numbers (AUROC 0.896, CIDEr 0.936, hallucination rate 8.1 %) are presented as SOTA, which at least gives readers a clear target to beat.

The soft spots are the missing internals. No ablation removes the causal consistency objective or the support mask, so we cannot tell whether those components are load-bearing or incidental. No baseline tables or statistical tests appear in the abstract, and there are no mask visualizations or error-case breakdowns. Without those, the performance claims remain unanchored. The assumption that the bottleneck will not create its own artifacts is stated but not tested in the visible text.

This work is aimed at groups already building or evaluating MLLMs for structured medical reporting. Someone looking for a specific causal constraint idea could extract the R-CGA description and try it elsewhere. It is worth sending for peer review because the clinical stakes are high and the proposed mechanism is precise enough for referees to examine the code and ablations once the full manuscript is in front of them.

Referee Report

2 major / 0 minor

Summary. The paper introduces TAVR-VLM, a multimodal LLM framework for TAVR planning report generation that uses a novel Risk-Conditioned Causal Grounding Attention (R-CGA) mechanism to instantiate a 'Risk → Region → Word' structural pathway. R-CGA compresses multimodal inputs into a causal risk bottleneck to produce a global risk mask and applies a support-projected causal consistency objective to constrain autoregressive token generation within the mask. On the M³TAVR cohort of 1,482 patients, the model is reported to achieve AUROC 0.896, CIDEr 0.936, and an 8.1% hallucination rate, establishing a new state-of-the-art for hallucination-resistant, anatomically grounded reports.

Significance. If substantiated with internal evidence, the work could meaningfully advance reliable multimodal reasoning in high-stakes surgical AI by addressing diagnostic hallucinations through an explicit causal grounding pathway. The risk-bottleneck formulation offers a potentially generalizable direction for improving interpretability in medical MLLMs.

major comments (2)

[Abstract] Abstract: The central SOTA claim rests on the reported metrics (AUROC 0.896, CIDEr 0.936, hallucination rate 8.1%) yet no baseline values, ablation results, or statistical tests are referenced, preventing assessment of whether R-CGA is responsible for the gains.
[R-CGA mechanism] R-CGA description: No ablations, risk-mask visualizations, or failure-case analyses are supplied to show that the causal risk bottleneck and support mask produce anatomically grounded outputs without new artifacts or dataset-specific biases, which is load-bearing for the hallucination-resistance claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects for strengthening the presentation of our results. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central SOTA claim rests on the reported metrics (AUROC 0.896, CIDEr 0.936, hallucination rate 8.1%) yet no baseline values, ablation results, or statistical tests are referenced, preventing assessment of whether R-CGA is responsible for the gains.

Authors: The abstract is intentionally concise to summarize the core contribution and primary outcomes. The full manuscript contains detailed baseline comparisons, ablation studies, and statistical significance tests in the Experiments section that attribute performance gains to R-CGA. To improve accessibility of this evidence, we will revise the abstract to include a brief reference to the key improvements over baselines and the supporting analyses. revision: yes
Referee: [R-CGA mechanism] R-CGA description: No ablations, risk-mask visualizations, or failure-case analyses are supplied to show that the causal risk bottleneck and support mask produce anatomically grounded outputs without new artifacts or dataset-specific biases, which is load-bearing for the hallucination-resistance claim.

Authors: We agree that direct empirical support for the causal risk bottleneck is important for the hallucination-resistance claim. The manuscript provides a detailed mechanistic description in Section 3. To strengthen validation, we will add component ablations, risk-mask visualizations, and failure-case analyses in the revised manuscript and supplementary material to demonstrate anatomical grounding and address potential artifacts or biases. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and context describe a new architectural mechanism (R-CGA) and report standard evaluation metrics (AUROC, CIDEr, hallucination rate) on an external cohort (M^3TAVR). No equations, fitting procedures, or derivation steps are shown that reduce predictions or results to inputs by construction. No self-citations, ansatzes, or uniqueness claims appear in the text. The central claim rests on empirical performance of the described pathway rather than any self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review provides no visibility into fitted parameters, background axioms, or additional entities beyond the named framework components.

axioms (1)

domain assumption Multimodal large language models can be adapted to specialized medical domains through targeted grounding mechanisms.
Implicit premise required for the adaptation claim in the abstract.

invented entities (1)

Risk-Conditioned Causal Grounding Attention (R-CGA) no independent evidence
purpose: Instantiates a model-internal Risk → Region → Word structural grounding pathway and compresses multimodal inputs into a causal risk bottleneck.
Newly introduced component in the described framework.

pith-pipeline@v0.9.1-grok · 5754 in / 1208 out tokens · 24586 ms · 2026-06-26T05:04:53.139202+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 14 canonical work pages

[1]

Proceed- ings of the AAAI Conference on Artificial Intelligence (2021)

Arik, S.Ö., Pfister, T.: Tabnet: Attentive interpretable tabular learning. Proceed- ings of the AAAI Conference on Artificial Intelligence (2021). https://doi.org/10. 1609/aaai.v35i8.16826

2021
[2]

Xgboost: A scalable tree boosting system,

Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016). https://doi.org/10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016
[3]

In: Proceedings of the 2020 Conference on Empir- ical Methods in Natural Language Processing (EMNLP)

Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 Conference on Empir- ical Methods in Natural Language Processing (EMNLP). pp. 1439–1449. Asso- ciation for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020. emnlp-main.112

work page doi:10.18653/v1/2020 2020
[4]

Token-wise curriculum learning for neural machine translation

Gheini, M., Ren, X., May, J.: Cross-attention is all you need: Adapting pretrained Transformers for machine translation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 1754–1765. Associa- tion for Computational Linguistics (Nov 2021). https://doi.org/10.18653/v1/2021. emnlp-main.132

work page doi:10.18653/v1/2021 2021
[5]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., Liu, T.: A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst.43(2) (Jan 2025). https://doi.org/10.1145/3703155

work page doi:10.1145/3703155 2025
[6]

Scientific Data (2019)

Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data (2019). https: //doi.org/10.1038/s41597-019-0322-0

work page doi:10.1038/s41597-019-0322-0 2019
[7]

New England Journal of Medicine (2010)

Leon, M.B., Smith, C.R., Mack, M., Miller, D.C., Moses, J.W., Svensson, L.G., et al.: Transcatheter aortic-valve implantation for aortic stenosis in patients who cannot undergo surgery. New England Journal of Medicine (2010). https://doi. org/10.1056/NEJMoa1008232

work page doi:10.1056/nejmoa1008232 2010
[8]

In: Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2023), https://openreview

Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., Naumann, T., Poon, H., Gao, J.: LLaVA-med: Training a large language-and-vision assistant for biomedicine in one day. In: Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2023), https://openreview. net/forum?id=GSuP99u2kR

2023
[9]

Lu et al

Lilly, S.M., Deshmukh, A.J., Epstein, A.E., Ricciardi, M.J., Shreenivas, S., Ve- lagapudi, P., et al.: 2020 ACC expert consensus decision pathway on man- agement of conduction disturbances in patients undergoing transcatheter aor- 10 Z. Lu et al. tic valve replacement. Journal of the American College of Cardiology (2020). https://doi.org/10.1016/j.jacc.20...

work page doi:10.1016/j.jacc.2020.08.050 2020
[10]

In: Workshop on Text Summarization Branches Out (WAS 2004) (2004)

Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Workshop on Text Summarization Branches Out (WAS 2004) (2004)

2004
[11]

https://doi.org/10.1609/aaai.v40i46

Lu, Z., Li, Y., Tang, F., Jiang, Z., Li, C., Zhou, M., Li, T., Su, J.: Deepgb-tb: A risk-balanced cross-attention gradient-boosted convolutional network for rapid, interpretabletuberculosisscreening.ProceedingsoftheAAAIConferenceonArtifi- cial Intelligence40(46), 38989–38997 (2026). https://doi.org/10.1609/aaai.v40i46. 41245

work page doi:10.1609/aaai.v40i46 2026
[12]

Gaussian Process Tilted Nonparametric Density Estimation Using

Lu, Z., Su, J.: Hierrisk: A hierarchical framework for suicide risk prediction on social media. In: 2025 IEEE International Conference on Big Data (BigData). pp. 8169–8174 (2025). https://doi.org/10.1109/BigData66926.2025.11402629

work page doi:10.1109/bigdata66926.2025.11402629 2025
[13]

Lu, Z., Su, J.: Dialectic-med: Mitigating diagnostic hallucinations via counterfac- tual adversarial multi-agent debate (2026), https://arxiv.org/abs/2604.11258

Pith/arXiv arXiv 2026
[14]

In: ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Lu, Z., Xu, S., Li, Y., Su, J., Tang, T.: Causal-sam-llm: Large language models as causal reasoners for robust medical segmentation. In: ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 6246–6250 (2026). https://doi.org/10.1109/ICASSP55912.2026.11460869

work page doi:10.1109/icassp55912.2026.11460869 2026
[15]

Lu,Z.,Xu,S.,Yan,K.,Cai,X.,Zhang,C.,Li,Y.,Stefanidis,A.,Nguyen,A.,Su,J.: Skinclip-vl: Consistency-aware vision-language learning for multimodal skin cancer diagnosis (2026), https://arxiv.org/abs/2603.21010

arXiv 2026
[16]

New England Journal of Medicine (2019)

Mack, M.J., Leon, M.B., Thourani, V.H., Makkar, R.R., Kodali, S.K., Russo, M., et al.: Transcatheter aortic-valve replacement with a balloon-expandable valve in low-risk patients. New England Journal of Medicine (2019). https://doi.org/10. 1056/NEJMoa1814052

2019
[17]

In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)

Sun, Y., Lee, Y.Z., Woodard, G.A., Zhu, H., Lian, C., Liu, M.: R2gen-mamba: A selective state space model for radiology report generation. In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). pp. 1–4 (2025). https: //doi.org/10.1109/ISBI60581.2025.10980814

work page doi:10.1109/isbi60581.2025.10980814 2025
[18]

Journal of the American College of Cardiology (2021)

VARC-3 Writing Committee, Généreux, P., Piazza, N., Alu, M.C., Nazif, T., Hahn, R.T., et al.: Valve academic research consortium 3: Updated endpoint definitions for aortic valve clinical research. Journal of the American College of Cardiology (2021). https://doi.org/10.1016/j.jacc.2021.02.038

work page doi:10.1016/j.jacc.2021.02.038 2021
[19]

Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: Consensus-based image description evaluation.In:2015IEEEConferenceonComputerVisionandPatternRecognition (CVPR). pp. 4566–4575 (2015). https://doi.org/10.1109/CVPR.2015.7299087

work page doi:10.1109/cvpr.2015.7299087 2015
[20]

Wang, W., et al.: Internvl3.5: Advancing open-source multimodal models in versa- tility, reasoning, and efficiency (2025), https://arxiv.org/abs/2508.18265

Pith/arXiv arXiv 2025
[21]

SeqTrack: Sequence to se- quence learning for visual object tracking

Wang, Z., Liu, L., Wang, L., Zhou, L.: Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In: 2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). pp. 11558–11567. IEEE Computer Society (2023). https://doi.org/10.1109/CVPR52729.2023.01112

work page doi:10.1109/cvpr52729.2023.01112 2023
[22]

Xue, C., Liu, Y., Zhou, M., Su, J., Lu, Z.: Semantic-topological graph reasoning for language-guided pulmonary screening (2026), https://arxiv.org/abs/2604.05620

Pith/arXiv arXiv 2026
[23]

Yang, A., et al.: Qwen3 technical report (2025), https://arxiv.org/abs/2505.09388

Pith/arXiv arXiv 2025

[1] [1]

Proceed- ings of the AAAI Conference on Artificial Intelligence (2021)

Arik, S.Ö., Pfister, T.: Tabnet: Attentive interpretable tabular learning. Proceed- ings of the AAAI Conference on Artificial Intelligence (2021). https://doi.org/10. 1609/aaai.v35i8.16826

2021

[2] [2]

Xgboost: A scalable tree boosting system,

Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016). https://doi.org/10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016

[3] [3]

In: Proceedings of the 2020 Conference on Empir- ical Methods in Natural Language Processing (EMNLP)

Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 Conference on Empir- ical Methods in Natural Language Processing (EMNLP). pp. 1439–1449. Asso- ciation for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020. emnlp-main.112

work page doi:10.18653/v1/2020 2020

[4] [4]

Token-wise curriculum learning for neural machine translation

Gheini, M., Ren, X., May, J.: Cross-attention is all you need: Adapting pretrained Transformers for machine translation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 1754–1765. Associa- tion for Computational Linguistics (Nov 2021). https://doi.org/10.18653/v1/2021. emnlp-main.132

work page doi:10.18653/v1/2021 2021

[5] [5]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., Liu, T.: A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst.43(2) (Jan 2025). https://doi.org/10.1145/3703155

work page doi:10.1145/3703155 2025

[6] [6]

Scientific Data (2019)

Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data (2019). https: //doi.org/10.1038/s41597-019-0322-0

work page doi:10.1038/s41597-019-0322-0 2019

[7] [7]

New England Journal of Medicine (2010)

Leon, M.B., Smith, C.R., Mack, M., Miller, D.C., Moses, J.W., Svensson, L.G., et al.: Transcatheter aortic-valve implantation for aortic stenosis in patients who cannot undergo surgery. New England Journal of Medicine (2010). https://doi. org/10.1056/NEJMoa1008232

work page doi:10.1056/nejmoa1008232 2010

[8] [8]

In: Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2023), https://openreview

Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., Naumann, T., Poon, H., Gao, J.: LLaVA-med: Training a large language-and-vision assistant for biomedicine in one day. In: Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2023), https://openreview. net/forum?id=GSuP99u2kR

2023

[9] [9]

Lu et al

Lilly, S.M., Deshmukh, A.J., Epstein, A.E., Ricciardi, M.J., Shreenivas, S., Ve- lagapudi, P., et al.: 2020 ACC expert consensus decision pathway on man- agement of conduction disturbances in patients undergoing transcatheter aor- 10 Z. Lu et al. tic valve replacement. Journal of the American College of Cardiology (2020). https://doi.org/10.1016/j.jacc.20...

work page doi:10.1016/j.jacc.2020.08.050 2020

[10] [10]

In: Workshop on Text Summarization Branches Out (WAS 2004) (2004)

Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Workshop on Text Summarization Branches Out (WAS 2004) (2004)

2004

[11] [11]

https://doi.org/10.1609/aaai.v40i46

Lu, Z., Li, Y., Tang, F., Jiang, Z., Li, C., Zhou, M., Li, T., Su, J.: Deepgb-tb: A risk-balanced cross-attention gradient-boosted convolutional network for rapid, interpretabletuberculosisscreening.ProceedingsoftheAAAIConferenceonArtifi- cial Intelligence40(46), 38989–38997 (2026). https://doi.org/10.1609/aaai.v40i46. 41245

work page doi:10.1609/aaai.v40i46 2026

[12] [12]

Gaussian Process Tilted Nonparametric Density Estimation Using

Lu, Z., Su, J.: Hierrisk: A hierarchical framework for suicide risk prediction on social media. In: 2025 IEEE International Conference on Big Data (BigData). pp. 8169–8174 (2025). https://doi.org/10.1109/BigData66926.2025.11402629

work page doi:10.1109/bigdata66926.2025.11402629 2025

[13] [13]

Lu, Z., Su, J.: Dialectic-med: Mitigating diagnostic hallucinations via counterfac- tual adversarial multi-agent debate (2026), https://arxiv.org/abs/2604.11258

Pith/arXiv arXiv 2026

[14] [14]

In: ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Lu, Z., Xu, S., Li, Y., Su, J., Tang, T.: Causal-sam-llm: Large language models as causal reasoners for robust medical segmentation. In: ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 6246–6250 (2026). https://doi.org/10.1109/ICASSP55912.2026.11460869

work page doi:10.1109/icassp55912.2026.11460869 2026

[15] [15]

Lu,Z.,Xu,S.,Yan,K.,Cai,X.,Zhang,C.,Li,Y.,Stefanidis,A.,Nguyen,A.,Su,J.: Skinclip-vl: Consistency-aware vision-language learning for multimodal skin cancer diagnosis (2026), https://arxiv.org/abs/2603.21010

arXiv 2026

[16] [16]

New England Journal of Medicine (2019)

Mack, M.J., Leon, M.B., Thourani, V.H., Makkar, R.R., Kodali, S.K., Russo, M., et al.: Transcatheter aortic-valve replacement with a balloon-expandable valve in low-risk patients. New England Journal of Medicine (2019). https://doi.org/10. 1056/NEJMoa1814052

2019

[17] [17]

In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)

Sun, Y., Lee, Y.Z., Woodard, G.A., Zhu, H., Lian, C., Liu, M.: R2gen-mamba: A selective state space model for radiology report generation. In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). pp. 1–4 (2025). https: //doi.org/10.1109/ISBI60581.2025.10980814

work page doi:10.1109/isbi60581.2025.10980814 2025

[18] [18]

Journal of the American College of Cardiology (2021)

VARC-3 Writing Committee, Généreux, P., Piazza, N., Alu, M.C., Nazif, T., Hahn, R.T., et al.: Valve academic research consortium 3: Updated endpoint definitions for aortic valve clinical research. Journal of the American College of Cardiology (2021). https://doi.org/10.1016/j.jacc.2021.02.038

work page doi:10.1016/j.jacc.2021.02.038 2021

[19] [19]

Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: Consensus-based image description evaluation.In:2015IEEEConferenceonComputerVisionandPatternRecognition (CVPR). pp. 4566–4575 (2015). https://doi.org/10.1109/CVPR.2015.7299087

work page doi:10.1109/cvpr.2015.7299087 2015

[20] [20]

Wang, W., et al.: Internvl3.5: Advancing open-source multimodal models in versa- tility, reasoning, and efficiency (2025), https://arxiv.org/abs/2508.18265

Pith/arXiv arXiv 2025

[21] [21]

SeqTrack: Sequence to se- quence learning for visual object tracking

Wang, Z., Liu, L., Wang, L., Zhou, L.: Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In: 2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). pp. 11558–11567. IEEE Computer Society (2023). https://doi.org/10.1109/CVPR52729.2023.01112

work page doi:10.1109/cvpr52729.2023.01112 2023

[22] [22]

Xue, C., Liu, Y., Zhou, M., Su, J., Lu, Z.: Semantic-topological graph reasoning for language-guided pulmonary screening (2026), https://arxiv.org/abs/2604.05620

Pith/arXiv arXiv 2026

[23] [23]

Yang, A., et al.: Qwen3 technical report (2025), https://arxiv.org/abs/2505.09388

Pith/arXiv arXiv 2025