Cross-Modal Clinical Knowledge Integration for Mammography Report Generation

Fuxiang Huang; Hao Chen; Jiayi Zhu; Qingcong Kong; Qiong Luo; Xi Wang; Yuan Guo; Yu Xie; Zhenhui Li; Zhixuan Chen

arxiv: 2605.31093 · v1 · pith:BDZ2H4BVnew · submitted 2026-05-29 · 💻 cs.CV

Cross-Modal Clinical Knowledge Integration for Mammography Report Generation

Jiayi Zhu , Fuxiang Huang , Yu Xie , Xi Wang , Zhixuan Chen , Yuan Guo , Qingcong Kong , Zhenhui Li

show 2 more authors

Qiong Luo Hao Chen

This is my paper

Pith reviewed 2026-06-28 23:08 UTC · model grok-4.3

classification 💻 cs.CV

keywords mammography report generationBI-RADSclinical knowledge integrationmulti-view mammogramstwo-stage trainingterminology-aware fine-tuningreport parsing tool

0 comments

The pith

MammoRG generates mammography reports by simulating BI-RADS clinical reasoning in a two-stage process that first classifies multi-view images and then fine-tunes on terminology.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that direct image-to-text methods miss the structured reasoning radiologists follow under BI-RADS guidelines when writing mammography reports. MammoRG addresses this with a two-stage framework: the first stage uses classification supervision to integrate prior knowledge from a patient's four-view mammograms, and the second stage applies terminology-aware fine-tuning to treat clinical terms as atomic units. This produces reports that score higher on clinical metrics, especially diagnosis-related BI-RADS F1, across internal and external datasets. The authors also introduce MammoRGTool to extract structured information from free-text reports for evaluation. If correct, the approach shows that explicit clinical workflow modeling improves consistency in automated reporting for breast cancer screening.

Core claim

MammoRG adopts a two-stage training framework. In the first stage, the model learns to integrate clinically relevant prior knowledge from a patient's four-view mammograms through classification-based supervision. In the second stage, a terminology-aware supervised fine-tuning strategy is introduced to model mammography-specific clinical terms as atomic semantic units, enabling the generation of high-quality reports with improved clinical consistency.

What carries the argument

The two-stage training framework that follows the BI-RADS guideline by combining classification-based multi-view knowledge integration with terminology-aware supervised fine-tuning.

If this is right

MammoRG produces higher BI-RADS F1 scores than prior methods, with gains of 2.73%, 2.04%, 1.90%, and 3.27% on the internal, external 1, external 2, and VinDr-Mammo datasets.
Generated reports exhibit improved clinical consistency through explicit modeling of mammography-specific terms.
MammoRGTool enables automated extraction of structured clinical information from free-text reports for quantitative evaluation.
The framework reduces reliance on direct visual-to-text mapping by incorporating prior clinical knowledge from multiple views.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The classification supervision step may reduce factual errors in reports by grounding generation in explicit diagnostic categories before text production.
Similar two-stage pipelines could be tested on other structured reporting tasks such as chest X-ray or pathology report generation.
Performance on external datasets suggests the method may transfer across different imaging equipment and patient populations, though this would need separate confirmation.

Load-bearing premise

The two-stage training framework with classification-based supervision followed by terminology-aware supervised fine-tuning actually captures and simulates the structured clinical reasoning process followed by radiologists under the BI-RADS guideline.

What would settle it

A controlled test in which the BI-RADS F1 gains disappear when the second-stage terminology modeling is removed or when the model is evaluated on reports that do not follow BI-RADS structure.

Figures

Figures reproduced from arXiv: 2605.31093 by Fuxiang Huang, Hao Chen, Jiayi Zhu, Qingcong Kong, Qiong Luo, Xi Wang, Yuan Guo, Yu Xie, Zhenhui Li, Zhixuan Chen.

**Figure 1.** Figure 1: Examples demonstrating limitations of general models in mammography reporting and the comparison between traditional uni-modal methods and our cross-modal method. However, despite the rapid advancement of report generation technology, existing general-purpose radiology report generation models cannot be directly adapted to mammography. Most existing methods are primarily designed for generic single-image v… view at source ↗

**Figure 2.** Figure 2: Overview of MammoRG. This figure illustrates the two-stage training process and the inference workflow, where 𝐿, 𝑀, and 𝑆 in stage 1 represent Located_at, Modified_by, and Suggestive_of, respectively. and fed into a text decoder (𝐷txt) to generate the final mammography report: 𝑌̂ = 𝐷txt(𝐯, 𝐤), (2) where 𝐤 denotes the integrated clinical knowledge features. Training Strategy. The training procedure consists… view at source ↗

**Figure 3.** Figure 3: Overview of MammoRGTool. This figure illustrates the two phases of MammoRGTool development and compares the performance of Qwen3-32B, MammoRGTool, and MammoRGTool with post-processing on 50 manually annotated samples. we initialize each new token embedding using the mean of existing embeddings with small random perturbations, providing a stable starting point that is consistent with the original embedding … view at source ↗

**Figure 4.** Figure 4: Qualitative examples of the generated report of the baseline and the proposed method. Blue font indicates consistent content with the ground-truth, while red font indicates incorrect content. whereas view-level features contribute more to fine-grained relational modeling. c) The combination of patient-level and view-level features demonstrates high efficiency with minimal token cost. Using only 5 tokens, t… view at source ↗

**Figure 5.** Figure 5: Error analysis of MammoRG predictions. (a) Confusion matrix of BI-RADS classification, showing the distribution of predicted categories against ground-truth labels. (b) Error rates for each BI-RADS category, defined as the proportion of incorrect predictions within each class. (c) Confusion matrix of breast composition classification. (d) Error rates for each composition category. (e-f) Finding-level analy… view at source ↗

read the original abstract

Breast cancer is a major global health concern, and mammography screening plays a central role in early detection. The large volume of screening examinations creates a substantial workload for radiologists, making accurate and consistent report generation a critical clinical challenge. Existing automated mammography report generation methods primarily focus on direct visual-to-text mapping, while overlooking the structured clinical reasoning process followed by radiologists in real-world practice. To address this limitation, we propose MammoRG, a mammography report generation framework that explicitly simulates the clinical reporting workflow by following the BI-RADS guideline and incorporating prior clinical knowledge to produce diagnostic reports. Specifically, MammoRG adopts a two-stage training framework. In the first stage, the model learns to integrate clinically relevant prior knowledge from a patient's four-view mammograms through classification-based supervision. In the second stage, a terminology-aware supervised fine-tuning strategy is introduced to model mammography-specific clinical terms as atomic semantic units, enabling the generation of high-quality reports with improved clinical consistency. To facilitate clinical efficacy evaluation of generated reports, we further develop MammoRGTool, a dedicated mammography report parsing tool that extracts structured clinical information from free-text reports. Extensive experiments demonstrate that MammoRG consistently outperforms existing methods across multiple clinical efficacy metrics, particularly in diagnosis-related BI-RADS F1, where it surpasses the second-best model by 2.73%, 2.04%, 1.90%, and 3.27% on the internal, external 1, external 2, and VinDr-Mammo datasets, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MammoRG adds a two-stage BI-RADS integration step that produces measurable F1 gains on multiple datasets, but those gains rest on an unvalidated custom parser whose errors could explain the margins.

read the letter

The one thing to take away is that this work moves mammography report generation past pure visual-to-text models by splitting training into a classification stage that pulls in four-view clinical priors and a second stage that treats BI-RADS terms as atomic units during fine-tuning. The reported 2-3% lifts in diagnosis-related F1 on internal, external, and VinDr-Mammo sets are the concrete result.

What the paper actually contributes is the explicit two-stage structure tied to the BI-RADS workflow and the MammoRGTool for turning free-text outputs into structured scores. The multi-dataset setup, including external validation, is better than the single-center norm in this literature. The terminology-aware fine-tuning is a straightforward way to reduce hallucinated or inconsistent clinical phrases.

The soft spot is the parser. The abstract claims it extracts structured information but supplies no precision or recall numbers against radiologist annotations. If tool mistakes correlate with the model’s outputs, the small reported edges could be parsing artifacts rather than genuine clinical improvement. The paper would be tighter if it included even a modest expert agreement check on the tool.

The central assumption—that the staged training actually mimics radiologist reasoning under BI-RADS—remains plausible but is supported mainly by the metric numbers rather than direct process evidence. No obvious circularity or self-referential definitions appear in the claims.

This is for groups already working on medical report generation who want to test guideline-aware training rather than generic VLMs. A reader focused on radiology AI or clinical NLP would find the framework and the dataset results useful. The work is coherent enough on its own terms to merit referee time, even if the evaluation tool needs extra scrutiny.

Referee Report

3 major / 1 minor

Summary. The paper proposes MammoRG, a two-stage mammography report generation framework that simulates BI-RADS-guided clinical reasoning: stage 1 uses classification-based supervision to integrate prior knowledge from four-view mammograms, and stage 2 applies terminology-aware supervised fine-tuning to treat clinical terms as atomic units. It introduces MammoRGTool to parse free-text reports for structured clinical metrics and claims consistent outperformance over baselines, with BI-RADS F1 gains of 2.73%, 2.04%, 1.90%, and 3.27% on the internal, external 1, external 2, and VinDr-Mammo datasets.

Significance. If the empirical gains hold under validated evaluation, the work could advance clinically aligned report generation by explicitly modeling structured diagnostic reasoning rather than direct visual-to-text mapping, with the dedicated parsing tool offering a step toward more meaningful efficacy assessment in medical imaging.

major comments (3)

[MammoRGTool description] MammoRGTool section: The tool is presented as extracting structured clinical information for BI-RADS F1 computation, yet no validation metrics (precision/recall/F1 against radiologist-annotated free-text reports on a held-out set) are reported. This is load-bearing for the central claim, as unvalidated parsing errors could artifactually inflate the reported 2-3% margins if they correlate with model outputs.
[Experiments] Experiments section: No statistical significance tests, confidence intervals, or p-values are provided for the BI-RADS F1 and other metric differences, and baseline implementation details (e.g., exact architectures, training hyperparameters) are insufficiently specified to rule out confounding factors in the cross-dataset comparisons.
[§3] §3 (two-stage framework): The claim that classification-based supervision followed by terminology-aware fine-tuning simulates radiologists' BI-RADS reasoning process lacks supporting ablations (e.g., vs. standard end-to-end fine-tuning) or qualitative analysis showing alignment with guideline-structured outputs, weakening attribution of gains to the proposed clinical integration.

minor comments (1)

[Abstract] Abstract: Dataset sizes, BI-RADS category distributions, and exact number of views per case could be stated explicitly to aid quick assessment of experimental scope.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and evidence.

read point-by-point responses

Referee: [MammoRGTool description] MammoRGTool section: The tool is presented as extracting structured clinical information for BI-RADS F1 computation, yet no validation metrics (precision/recall/F1 against radiologist-annotated free-text reports on a held-out set) are reported. This is load-bearing for the central claim, as unvalidated parsing errors could artifactually inflate the reported 2-3% margins if they correlate with model outputs.

Authors: We agree that validation metrics for MammoRGTool are necessary to substantiate the BI-RADS F1 results. The current manuscript does not report precision, recall, or F1 against radiologist annotations on a held-out set. In the revised version, we will add these metrics from a dedicated validation study. revision: yes
Referee: [Experiments] Experiments section: No statistical significance tests, confidence intervals, or p-values are provided for the BI-RADS F1 and other metric differences, and baseline implementation details (e.g., exact architectures, training hyperparameters) are insufficiently specified to rule out confounding factors in the cross-dataset comparisons.

Authors: We acknowledge that statistical tests and fuller baseline details are missing. We will incorporate p-values, confidence intervals for all reported differences, and expanded specifications of baseline models and hyperparameters in the revised experiments section. revision: yes
Referee: [§3] §3 (two-stage framework): The claim that classification-based supervision followed by terminology-aware fine-tuning simulates radiologists' BI-RADS reasoning process lacks supporting ablations (e.g., vs. standard end-to-end fine-tuning) or qualitative analysis showing alignment with guideline-structured outputs, weakening attribution of gains to the proposed clinical integration.

Authors: While the two-stage design follows BI-RADS clinical workflow, we agree that ablations and qualitative evidence would better support attribution of gains. The revised manuscript will include comparisons against end-to-end fine-tuning baselines and qualitative report examples aligned with guideline structure. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on held-out dataset comparisons

full rationale

The paper's derivation consists of a two-stage training procedure (classification supervision then terminology-aware fine-tuning) followed by empirical evaluation on internal/external/VinDr-Mammo datasets using BI-RADS F1 and other metrics extracted via the authors' MammoRGTool. No equation, prediction, or central result is shown to reduce by construction to a fitted parameter or self-referential definition; the reported gains are presented as direct comparisons against baselines on held-out data. Any self-citations (if present) are not load-bearing for the performance claims, and the tool is introduced solely for evaluation rather than as part of a self-defining loop. This is the standard non-circular empirical pattern.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of the two-stage training process and the assumption that BI-RADS provides a usable structured clinical workflow; model parameters are fitted to data during training.

free parameters (1)

neural network parameters
Weights of the vision-language model are fitted during the classification stage and the terminology-aware fine-tuning stage on mammography datasets.

axioms (1)

domain assumption BI-RADS guideline defines a structured clinical reasoning process that can be simulated by classification supervision and terminology-aware fine-tuning
Invoked in the description of the two-stage framework that explicitly follows the BI-RADS guideline.

invented entities (1)

MammoRGTool no independent evidence
purpose: Extracts structured clinical information such as BI-RADS categories from free-text generated reports for evaluation
New parsing tool developed to enable clinical efficacy metrics; no independent evidence provided outside this work.

pith-pipeline@v0.9.1-grok · 5830 in / 1362 out tokens · 35990 ms · 2026-06-28T23:08:09.076027+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 6 canonical work pages · 5 internal anchors

[1]

Sequential reading effects in dutch screening mammography, in: Medical Imaging 2020: Image Perception, Observer Performance, and Technology Assessment, SPIE. pp. 66–70. de Avila Armenta, E., Bosques-Palomo, B., Ález, G.A.F.G., Monsivais-Molina, M.A., Garza-Abdala, J.A., Hussain, S., Vela-Jarquin, D., Cardona- Huerta, S., Ño-Avalos, D.B.A., Ña, J.G.T.P.,

2020
[2]

Qwen3-VL Technical Report

Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 . Broeders, M., Moss, S., Nyström, L., Njor, S., Jonsson, H., Paap, E., Massat, N., Duffy, S., Lynge, E., Paci, E.,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Information Fusion 118, 102998

Mammovlm: A generative large vision–language model for mammography-related diagnostic assistance. Information Fusion 118, 102998. Chen,Z.,Song,Y.,Chang,T.H.,Wan,X.,2020. Generatingradiologyreportsviamemory-driventransformer,in:Proceedingsofthe2020Conference on Empirical Methods in Natural Language Processing. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.,

2020
[4]

4171–4186

Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1 (long and short papers), pp. 4171–4186. Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Che...

2019
[5]

arXiv preprint arXiv:2509.20271

A versatile foundation model for ai-enabled mammogram interpretation. arXiv preprint arXiv:2509.20271 . Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al.,

work page arXiv
[6]

Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 590–597. Jain,S.,Agrawal,A.,Saporta,A.,Truong,S.,Duong,D.N.D.N.,Bui,T.,Chambon,P.,Zhang,Y.,Lungren,M.,Ng,A.,Langlotz,C.,Rajpurkar,P.,Ra- jpurkar,P.,2021. Radgraph:Extractingclinicalentitiesandrela...

2021
[7]

Diseases of the Chest, Breast, Heart and Vessels 2019-2022: Diagnostic and Interventional Imaging ,

Diagnosis and staging of breast cancer: When and how to use mammography. Diseases of the Chest, Breast, Heart and Vessels 2019-2022: Diagnostic and Interventional Imaging ,

2019
[8]

7123–7138

Factual accuracy is not enough: Planning consistent description order for radiology report generation, in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 7123–7138. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.,

2022
[9]

OpenAI GPT-5 System Card

Openai gpt-5 system card. arXiv preprint arXiv:2601.03267 . :Preprint submitted to Elsevier Page 15 of 16 Smit, A., Jain, S., Rajpurkar, P., Pareek, A., Ng, A.Y., Lungren, M.,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

1500–1519

Combining automatic labelers and expert annotations for accurate radiology report labeling using bert, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1500–1519. Spak, D.A., Plaxco, J., Santiago, L., Dryden, M., Dogan, B.,

2020
[11]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency. arXiv preprint arXiv:2508.18265 . Xu, W., Chan, H.P., Li, L., Aljunied, M., Yuan, R., Wang, J., Xiao, C., Chen, G., Liu, C., Li, Z., et al.,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Lingshu: A generalist foundation model for unified multimodal medical understanding and reasoning. arXiv preprint arXiv:2506.07044 . Yalunin, A., Sokolova, E., Burenko, I., Ponomarchuk, A., Puchkova, O., Umerenkov, D.,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Generating mammography reports from multi-view mammograms with bert, in: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 153–162. Yan, S., Cheung, W.K., Chiu, K., Tong, T.M., Cheung, K.C., See, S.,

2021
[14]

Qwen3 Technical Report

Qwen3 technical report. arXiv preprint arXiv:2505.09388 . Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S.K., Xiao, L.,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Sequential reading effects in dutch screening mammography, in: Medical Imaging 2020: Image Perception, Observer Performance, and Technology Assessment, SPIE. pp. 66–70. de Avila Armenta, E., Bosques-Palomo, B., Ález, G.A.F.G., Monsivais-Molina, M.A., Garza-Abdala, J.A., Hussain, S., Vela-Jarquin, D., Cardona- Huerta, S., Ño-Avalos, D.B.A., Ña, J.G.T.P.,

2020

[2] [2]

Qwen3-VL Technical Report

Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 . Broeders, M., Moss, S., Nyström, L., Njor, S., Jonsson, H., Paap, E., Massat, N., Duffy, S., Lynge, E., Paci, E.,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Information Fusion 118, 102998

Mammovlm: A generative large vision–language model for mammography-related diagnostic assistance. Information Fusion 118, 102998. Chen,Z.,Song,Y.,Chang,T.H.,Wan,X.,2020. Generatingradiologyreportsviamemory-driventransformer,in:Proceedingsofthe2020Conference on Empirical Methods in Natural Language Processing. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.,

2020

[4] [4]

4171–4186

Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1 (long and short papers), pp. 4171–4186. Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Che...

2019

[5] [5]

arXiv preprint arXiv:2509.20271

A versatile foundation model for ai-enabled mammogram interpretation. arXiv preprint arXiv:2509.20271 . Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al.,

work page arXiv

[6] [6]

Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 590–597. Jain,S.,Agrawal,A.,Saporta,A.,Truong,S.,Duong,D.N.D.N.,Bui,T.,Chambon,P.,Zhang,Y.,Lungren,M.,Ng,A.,Langlotz,C.,Rajpurkar,P.,Ra- jpurkar,P.,2021. Radgraph:Extractingclinicalentitiesandrela...

2021

[7] [7]

Diseases of the Chest, Breast, Heart and Vessels 2019-2022: Diagnostic and Interventional Imaging ,

Diagnosis and staging of breast cancer: When and how to use mammography. Diseases of the Chest, Breast, Heart and Vessels 2019-2022: Diagnostic and Interventional Imaging ,

2019

[8] [8]

7123–7138

Factual accuracy is not enough: Planning consistent description order for radiology report generation, in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 7123–7138. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.,

2022

[9] [9]

OpenAI GPT-5 System Card

Openai gpt-5 system card. arXiv preprint arXiv:2601.03267 . :Preprint submitted to Elsevier Page 15 of 16 Smit, A., Jain, S., Rajpurkar, P., Pareek, A., Ng, A.Y., Lungren, M.,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

1500–1519

Combining automatic labelers and expert annotations for accurate radiology report labeling using bert, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1500–1519. Spak, D.A., Plaxco, J., Santiago, L., Dryden, M., Dogan, B.,

2020

[11] [11]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency. arXiv preprint arXiv:2508.18265 . Xu, W., Chan, H.P., Li, L., Aljunied, M., Yuan, R., Wang, J., Xiao, C., Chen, G., Liu, C., Li, Z., et al.,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Lingshu: A generalist foundation model for unified multimodal medical understanding and reasoning. arXiv preprint arXiv:2506.07044 . Yalunin, A., Sokolova, E., Burenko, I., Ponomarchuk, A., Puchkova, O., Umerenkov, D.,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Generating mammography reports from multi-view mammograms with bert, in: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 153–162. Yan, S., Cheung, W.K., Chiu, K., Tong, T.M., Cheung, K.C., See, S.,

2021

[14] [14]

Qwen3 Technical Report

Qwen3 technical report. arXiv preprint arXiv:2505.09388 . Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S.K., Xiao, L.,

work page internal anchor Pith review Pith/arXiv arXiv