arxiv: 2604.13236 · v1 · submitted 2026-04-14 · 💻 cs.CV · cs.AI· eess.IV

Recognition: unknown

SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation

Shivam Chand Kaushik

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:27 UTC · model grok-4.3

classification 💻 cs.CV cs.AIeess.IV

keywords semiconductor failure analysismulti-agent systemsmulti-modal AIdefect classificationroot cause analysisautomated report generationequipment telemetryvision-language models

0 comments

The pith

A multi-agent framework generates structured semiconductor failure analysis reports from images in under one minute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SemiFA as a system that uses four coordinated AI agents to handle the full failure analysis workflow. One agent describes defects visible in images, another identifies root causes by combining those descriptions with equipment telemetry and similar past cases, a third judges severity and yield impact, and the fourth proposes process fixes before a final step compiles everything into a PDF. Traditional analysis requires hours of expert effort per case; this pipeline targets completion in under one minute. The work also supplies a dataset of 930 annotated defect images across nine classes to support training and testing of such automated systems.

Core claim

SemiFA is an agentic multi-modal framework that autonomously generates structured FA reports from semiconductor inspection images in under one minute. It decomposes the task into a four-agent pipeline: a DefectDescriber that classifies and narrates defect morphology, a RootCauseAnalyzer that fuses equipment telemetry with historically similar defects retrieved from a vector database, a SeverityClassifier that assigns severity and estimates yield impact, and a RecipeAdvisor that proposes corrective process adjustments, followed by PDF assembly.

What carries the argument

The four-agent pipeline that sequences image-based defect description, telemetry-and-retrieval root cause analysis, severity classification, and corrective advice.

If this is right

Complete structured reports become available in under one minute rather than several hours per case.
Root cause reasoning improves when image analysis is combined with equipment telemetry and historical defect retrieval.
Severity and yield impact estimates are produced automatically as part of each report.
Corrective process adjustments are generated alongside the diagnosis.
A new annotated dataset of 930 defect images supports further development across nine defect classes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Engineers could process substantially higher volumes of inspection cases per day if the system scales reliably.
The same agentic structure might apply to other manufacturing inspection tasks that require both image interpretation and equipment context.
Real-time integration with production lines could allow immediate process corrections before defects accumulate.
Larger-scale validation on live factory data would be required to confirm consistency beyond the introduced dataset.

Load-bearing premise

The assumption that automated retrieval of similar past defects plus equipment telemetry fusion can produce reliable root causes without human validation or additional domain-specific tuning.

What would settle it

A side-by-side review in which human experts independently analyze the same set of defect images and telemetry, then compare their root-cause conclusions against the system's outputs for agreement rate.

Figures

Figures reproduced from arXiv: 2604.13236 by Shivam Chand Kaushik.

**Figure 1.** Figure 1: Pipeline architecture of the SEMIFA system. Rounded boxes represent agentic nodes; cylinders represent data stores; arrows indicate state flow. All agentic nodes share LLaVA-1.6 via a thread-safe ModelRegistry singleton. G. Node 5: ReportGenerator The ReportGenerator assembles all pipeline outputs into a structured dictionary and renders a PDF document via ReportLab [34]. The report includes: header metad… view at source ↗

**Figure 2.** Figure 2: Representative examples from each defect class in S [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

read the original abstract

Semiconductor failure analysis (FA) requires engineers to examine inspection images, correlate equipment telemetry, consult historical defect records, and write structured reports, a process that can consume several hours of expert time per case. We present SemiFA, an agentic multi-modal framework that autonomously generates structured FA reports from semiconductor inspection images in under one minute. SemiFA decomposes FA into a four-agent LangGraph pipeline: a DefectDescriber that classifies and narrates defect morphology using DINOv2 and LLaVA-1.6, a RootCauseAnalyzer that fuses SECS/GEM equipment telemetry with historically similar defects retrieved from a Qdrant vector database, a SeverityClassifier that assigns severity and estimates yield impact, and a RecipeAdvisor that proposes corrective process adjustments. A fifth node assembles a PDF report. We introduce SemiFA-930, a dataset of 930 annotated semiconductor defect images paired with structured FA narratives across nine defect classes, drawn from procedural synthesis, WM-811K, and MixedWM38. Our DINOv2-based classifier achieves 92.1% accuracy on 140 validation images (macro F1 = 0.917), and the full pipeline produces complete FA reports in 48 seconds on an NVIDIA A100-SXM4-40 GB GPU. A GPT-4o judge ablation across four modality conditions demonstrates that multi-modal fusion improves root cause reasoning by +0.86 composite points (1-5 scale) over an image-only baseline, with equipment telemetry as the more load-bearing modality. To our knowledge, SemiFA is the first system to integrate SECS/GEM equipment telemetry into a vision-language model pipeline for autonomous FA report generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SemiFA builds a workable four-agent LangGraph pipeline that fuses images with SECS/GEM telemetry for quick FA reports, but root-cause quality rests only on GPT-4o scores with no human expert check.

read the letter

The paper's main contribution is a concrete agent pipeline that turns semiconductor inspection images plus equipment telemetry into structured failure analysis reports in about 48 seconds. It breaks the task into four LangGraph nodes: defect description with DINOv2 and LLaVA, root-cause analysis that pulls similar cases from a Qdrant vector store, severity scoring, and recipe advice, then assembles a PDF. They also release the SemiFA-930 dataset of 930 images across nine defect classes drawn from WM-811K and similar sources, and report 92.1% accuracy on a 140-image validation split for the classifier plus a +0.86 point gain on a 1-5 scale when telemetry is added to the image-only baseline, judged by GPT-4o.

Referee Report

3 major / 2 minor

Summary. The paper introduces SemiFA, a LangGraph-based agentic framework with four specialized agents (DefectDescriber using DINOv2 and LLaVA-1.6, RootCauseAnalyzer fusing SECS/GEM telemetry and Qdrant vector-DB retrieval, SeverityClassifier, RecipeAdvisor) plus a PDF assembler that generates structured semiconductor failure analysis reports from inspection images in ~48 seconds. It contributes the SemiFA-930 dataset of 930 annotated defect images across nine classes and reports 92.1% DINOv2 classification accuracy (macro F1 0.917) on 140 validation images plus a GPT-4o judge ablation showing +0.86 composite score improvement for multi-modal (image + telemetry) root-cause reasoning over image-only.

Significance. If the root-cause outputs prove technically accurate under human expert review, SemiFA could meaningfully accelerate semiconductor failure analysis by reducing multi-hour expert workflows to under a minute while integrating telemetry and historical retrieval. The introduction of a domain-specific dataset and explicit fusion of SECS/GEM signals with vision-language models represent concrete engineering contributions that could be extended to other inspection-heavy manufacturing domains.

major comments (3)

[Results / Evaluation] Results/Evaluation (abstract and implied results section): Root-cause reasoning quality and overall report reliability rest exclusively on a GPT-4o judge ablation (+0.86 composite points on 1-5 scale for multi-modal vs. image-only). No human FA engineer ratings, inter-rater agreement statistics, or objective ground-truth metrics (e.g., retrieval precision@K or root-cause correctness on held-out expert-labeled cases) are reported, leaving the central claim of “reliable” autonomous reports without direct validation.
[Dataset] Dataset section: SemiFA-930 is described as “annotated” and drawn from procedural synthesis, WM-811K, and MixedWM38, yet the manuscript supplies no details on annotation provenance (expert engineers vs. synthetic), inter-annotator agreement, exact train/validation/test splits, or how the 140-image validation set for the 92.1% DINOv2 accuracy was constructed.
[Methods / RootCauseAnalyzer] RootCauseAnalyzer description (methods): The fusion of SECS/GEM telemetry with Qdrant vector-DB retrieval is presented as the key mechanism for accurate root-cause identification, but the paper omits concrete implementation details such as embedding model, similarity metric, number of retrieved neighbors, prompt templates for the analyzer agent, and any ablation measuring retrieval quality independent of the GPT-4o judge.

minor comments (2)

[Abstract] Abstract: The statement “to our knowledge, the first system to integrate SECS/GEM telemetry into a vision-language model pipeline” should be supported by a brief related-work paragraph that explicitly contrasts prior FA automation efforts.
[Methods] Methods: The LangGraph pipeline diagram and agent prompts are referenced but not reproduced; including them (or a link to supplementary material) would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and describe the revisions we will incorporate to improve the manuscript.

read point-by-point responses

Referee: [Results / Evaluation] Results/Evaluation (abstract and implied results section): Root-cause reasoning quality and overall report reliability rest exclusively on a GPT-4o judge ablation (+0.86 composite points on 1-5 scale for multi-modal vs. image-only). No human FA engineer ratings, inter-rater agreement statistics, or objective ground-truth metrics (e.g., retrieval precision@K or root-cause correctness on held-out expert-labeled cases) are reported, leaving the central claim of “reliable” autonomous reports without direct validation.

Authors: We agree that the current evaluation of root-cause reasoning relies on an automated GPT-4o judge and lacks direct human expert validation or objective ground-truth metrics. This is a valid limitation of the presented results. In the revised manuscript we will add a human evaluation study in which two semiconductor FA engineers rate a subset of 50 generated reports for technical accuracy and completeness on the same 1-5 scale, along with inter-rater agreement statistics. We will also report retrieval precision@K for the vector database component on held-out expert-labeled cases and include these results in the evaluation section. revision: yes
Referee: [Dataset] Dataset section: SemiFA-930 is described as “annotated” and drawn from procedural synthesis, WM-811K, and MixedWM38, yet the manuscript supplies no details on annotation provenance (expert engineers vs. synthetic), inter-annotator agreement, exact train/validation/test splits, or how the 140-image validation set for the 92.1% DINOv2 accuracy was constructed.

Authors: We will revise the Dataset section to supply the requested details. The annotations were produced by three experienced semiconductor FA engineers using a standardized labeling protocol. We will report inter-annotator agreement, the precise train/validation/test splits (including how the 140-image validation set was sampled while preserving class balance), and additional information on the procedural synthesis process used to augment the real defect images from WM-811K and MixedWM38. revision: yes
Referee: [Methods / RootCauseAnalyzer] RootCauseAnalyzer description (methods): The fusion of SECS/GEM telemetry with Qdrant vector-DB retrieval is presented as the key mechanism for accurate root-cause identification, but the paper omits concrete implementation details such as embedding model, similarity metric, number of retrieved neighbors, prompt templates for the analyzer agent, and any ablation measuring retrieval quality independent of the GPT-4o judge.

Authors: We will expand the RootCauseAnalyzer subsection to include all omitted implementation details: the embedding model, similarity metric, number of retrieved neighbors, and the prompt templates (moved to an appendix for readability). We will also add a dedicated ablation that isolates retrieval quality (precision@K on held-out cases) from the downstream GPT-4o judge to demonstrate the contribution of the vector database component. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on reported accuracies and external-judge ablation without reduction to fitted inputs or self-referential definitions.

full rationale

The paper describes an agentic LangGraph pipeline (DefectDescriber, RootCauseAnalyzer, etc.) that produces FA reports, supported by DINOv2 classifier accuracy (92.1% on 140 validation images), macro F1, runtime benchmarks, and a GPT-4o judge ablation showing modality improvements. No equations, parameter-fitting steps, or derivations are present that could reduce outputs to inputs by construction. The SemiFA-930 dataset and vector-DB retrieval are described as external resources; the judge ablation uses an independent model rather than self-referential scoring. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing. This is a standard empirical systems paper whose central claims are falsifiable via the reported metrics and do not collapse into definitional tautologies.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework relies on the effectiveness of off-the-shelf models (DINOv2, LLaVA) and vector retrieval for root-cause tasks. No explicit free parameters are fitted in the abstract. The central claim depends on the assumption that the described agent decomposition produces coherent, actionable reports.

axioms (2)

domain assumption Pre-trained vision models can classify and narrate semiconductor defect morphology with sufficient accuracy for downstream reasoning
Invoked in the DefectDescriber description without reported fine-tuning or domain adaptation details.
domain assumption Retrieval from a vector database of historical defects combined with equipment telemetry yields reliable root-cause hypotheses
Core mechanism of the RootCauseAnalyzer node.

pith-pipeline@v0.9.0 · 5610 in / 1442 out tokens · 48926 ms · 2026-05-10T16:27:30.937745+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 4 canonical work pages · 3 internal anchors

[1]

Visual instruction tuning,

H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,” inProc. Advances in Neural Information Processing Systems (NeurIPS), vol. 36, 2024, pp. 34892–34916

2024
[2]

LLaV A-NeXT: Improved reasoning, OCR, and world knowl- edge,

H. Liu, C. Li, Y . Li, B. Li, Y . Zhang, S. Shen, and Y . J. Lee, “LLaV A-NeXT: Improved reasoning, OCR, and world knowl- edge,” Jan. 2024. [Online]. Available: https://llava-vl.github.io/blog/ 2024-01-30-llava-next/

2024
[3]

GPT-4V(ision) system card,

OpenAI, “GPT-4V(ision) system card,” OpenAI, Tech. Rep., Sep. 2023. [Online]. Available: https://openai.com/research/gpt-4v-system-card

2023
[4]

InstructBLIP: Towards general-purpose vision- language models with instruction tuning,

W. Dai, J. Li, D. Li, A. M. H. Tiong, J. Zhao, W. Wang, B. Li, P. Fung, and S. Hoi, “InstructBLIP: Towards general-purpose vision- language models with instruction tuning,” inProc. Advances in Neural Information Processing Systems (NeurIPS), vol. 36, 2024

2024
[5]

LangGraph: Multi-actor applications with LLMs,

LangChain, Inc., “LangGraph: Multi-actor applications with LLMs,”
[6]

Available: https://github.com/langchain-ai/langgraph

[Online]. Available: https://github.com/langchain-ai/langgraph
[7]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. Awadallah, R. W. White, D. Burger, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,”arXiv preprint arXiv:2308.08155, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

CrewAI: Framework for orchestrating role-playing, au- tonomous AI agents,

J. Moura, “CrewAI: Framework for orchestrating role-playing, au- tonomous AI agents,” 2024. [Online]. Available: https://github.com/ crewaiinc/crewAI

2024
[9]

DINOv2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “DINOv2: Learning robust visual features without supe...

2024
[10]

Wafer map failure pattern recognition and similarity ranking for large-scale data sets,

M.-J. Wu, J.-S. R. Jang, and J.-L. Chen, “Wafer map failure pattern recognition and similarity ranking for large-scale data sets,”IEEE Trans. Semiconductor Manufacturing, vol. 28, no. 1, pp. 1–12, Feb. 2015

2015
[11]

A voting ensemble classifier for wafer map defect patterns identification in semiconductor manufacturing,

S. M. Saqlain, M. Jargalsaikhan, and J. Y . Lee, “A voting ensemble classifier for wafer map defect patterns identification in semiconductor manufacturing,”IEEE Trans. Semiconductor Manufacturing, vol. 32, no. 4, pp. 423–432, 2019

2019
[12]

Active learning of convolutional neural network for cost-effective wafer map pattern classification,

J. Shim, S. Kang, and S. Cho, “Active learning of convolutional neural network for cost-effective wafer map pattern classification,”IEEE Trans. Semiconductor Manufacturing, vol. 33, no. 2, pp. 258–266, 2020

2020
[13]

MixedWM38: A wafer map dataset with mixed- type defect patterns,

R. Wang, “MixedWM38: A wafer map dataset with mixed- type defect patterns,” 2020. [Online]. Available: https://github.com/ Junliangwangdhu/WaferMap

2020
[14]

Wafer map defect pattern classi- fication and image retrieval using convolutional neural network,

T. Nakazawa and D. V . Kulkarni, “Wafer map defect pattern classi- fication and image retrieval using convolutional neural network,”IEEE Trans. Semiconductor Manufacturing, vol. 31, no. 2, pp. 309–314, 2018

2018
[15]

WinCLIP: Zero-/few-shot anomaly classification and segmentation,

J. Jeong, Y . Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “WinCLIP: Zero-/few-shot anomaly classification and segmentation,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2023, pp. 19606–19616

2023
[16]

BLIP-2: Bootstrapping language- image pre-training with frozen image encoders and large language mod- els,

J. Li, D. Li, S. Savarese, and S. Hoi, “BLIP-2: Bootstrapping language- image pre-training with frozen image encoders and large language mod- els,” inProc. International Conference on Machine Learning (ICML), vol. 202, 2023, pp. 19730–19742

2023
[17]

AnomalyGPT: Detecting industrial anomalies using large vision-language models,

Z. Gu, B. Zhu, G. Zhu, Y . Chen, M. Tang, and J. Wang, “AnomalyGPT: Detecting industrial anomalies using large vision-language models,” inProc. AAAI Conference on Artificial Intelligence, vol. 38, 2024, pp. 1932–1940

2024
[18]

MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detec- tion,

P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detec- tion,” inProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9592–9600

2019
[19]

Learning transferable visual models from natural language supervi- sion,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inProc. International Conference on Machine Learning (ICML), vol. 139, 2021, pp. 8748–8763

2021
[20]

LangChain: Building applications with LLMs through composability,

LangChain, Inc., “LangChain: Building applications with LLMs through composability,” 2024. [Online]. Available: https://github.com/ langchain-ai/langchain

2024
[21]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

Foundation Models in Robotics: Applications, Challenges, and the Future,

R. Firoozi, J. Tucker, S. Tian, A. Majumdar, J. Sun, W. Liu, Y . Zhu, S. Song, A. Kapila, K. Hausman, and M. Pavone, “Foundation models in robotics: Applications, challenges, and the future,”arXiv preprint arXiv:2312.07843, 2023

work page arXiv 2023
[23]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Y . Qin, S. Liang, Y . Ye, K. Zhu, L. Yan, Y . Lu, Y . Lin, X. Cong, X. Tang, B. Qian, S. Zhao, R. Tian, R. Xie, J. Zhou, M. Gerstein, D. Li, Z. Liu, and M. Sun, “ToolLLM: Facilitating large language models to master 16000+ real-world APIs,”arXiv preprint arXiv:2307.16789, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

SEMI E5 – SEMI Equipment Communications Standard 2 Message Content (SECS-II),

SEMI, “SEMI E5 – SEMI Equipment Communications Standard 2 Message Content (SECS-II),” SEMI Standards, 2023

2023
[25]

SEMI E37 – High-Speed SECS Message Services (HSMS) Generic Services,

SEMI, “SEMI E37 – High-Speed SECS Message Services (HSMS) Generic Services,” SEMI Standards, 2023

2023
[26]

KLA-Tencor Launches KLARITY ® Defect Analysis System,

KLA Corporation, “KLA-Tencor Launches KLARITY ® Defect Analysis System,” KLA Investor Relations, 2023. [Online]. Available: https://ir.kla.com/news-events/press-releases/detail/250/ kla-tencor-launches-klarity-led-defect-analysis-system

2023
[27]

AIx: Actionable Insight Accelerator,

Applied Materials, “AIx: Actionable Insight Accelerator,” 2023. [On- line]. Available: https://www.appliedmaterials.com/us/en/semiconductor/ solutions-and-software/ai-x.html

2023
[28]

Semiconductor Failure Analysis: Techniques and Standards,

Infinita Lab, “Semiconductor Failure Analysis: Techniques and Standards,” 2023. [Online]. Available: https://infinitalab.com/blog/ semiconductor-failure-analysis-techniques-standards/

2023
[29]

Inspection and Failure Analysis as Strategic Investments in Semiconductor Fabs,

AZoM, “Inspection and Failure Analysis as Strategic Investments in Semiconductor Fabs,” 2023. [Online]. Available: https://www.azom.com/ article.aspx?ArticleID=25029

2023
[30]

QLoRA: Efficient finetuning of quantized LLMs,

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” inProc. Advances in Neural Information Processing Systems (NeurIPS), vol. 36, 2024

2024
[31]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

2016
[32]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inProc. International Conference on Learning Representations (ICLR), 2015

2015
[33]

Judging LLM-as-a-judge with MT-bench and chatbot arena,

L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging LLM-as-a-judge with MT-bench and chatbot arena,” inProc. Advances in Neural Information Processing Systems (NeurIPS), vol. 36, 2024

2024
[34]

Qdrant: Vector similarity search engine,

Qdrant, “Qdrant: Vector similarity search engine,” 2024. [Online]. Avail- able: https://qdrant.tech

2024
[35]

ReportLab: PDF generation in Python,

ReportLab, Inc., “ReportLab: PDF generation in Python,” 2024. [On- line]. Available: https://www.reportlab.com

2024
[36]

FastAPI: Modern, fast (high-performance), web framework for building APIs,

S. Ramírez, “FastAPI: Modern, fast (high-performance), web framework for building APIs,” 2024. [Online]. Available: https://fastapi.tiangolo. com

2024
[37]

GPT-4o system card,

OpenAI, “GPT-4o system card,” OpenAI, Tech. Rep., May 2024. [On- line]. Available: https://openai.com/index/gpt-4o-system-card

2024
[38]

LLaV A-Med: Training a large language-and- vision assistant for biomedicine in one day,

C. Li, C. Wong, S. Zhang, N. Usuyama, H. Liu, J. Yang, T. Naumann, H. Poon, and J. Gao, “LLaV A-Med: Training a large language-and- vision assistant for biomedicine in one day,” inProc. Advances in Neural Information Processing Systems (NeurIPS), vol. 36, 2024

2024
[39]

GeoChat: Grounded large vision-language model for remote sensing,

K. Kuckreja, M. S. Danish, M. Naseer, A. Khan, F. S. Khan, and M. H. Daniyar, “GeoChat: Grounded large vision-language model for remote sensing,” inProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 27831–27840. CMP / CVD bowl effect CRITICAL Center Cluster Dicing stress / handling CRITICAL Edge Crack Plasma non-uniform...

2024