MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval
Pith reviewed 2026-05-16 09:25 UTC · model grok-4.3
The pith
A multimodal retrieval framework improves accuracy on engineering document questions by 41 percent relative to standard RAG.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MCERF demonstrates that coupling ColPali multimodal retrieval with four hand-crafted reasoning modes and two routing strategies produces substantially more accurate answers to questions drawn from engineering documentation than baseline retrieval-augmented generation, delivering a 41.1% relative accuracy improvement on the DesignQA benchmark while using only partial document access.
What carries the argument
ColPali-based multimodal retriever combined with modular reasoning pipelines consisting of Hybrid Lookup, Vision-to-Text fusion, High-Reasoning LLM, and Self-Consistency modes, plus single-case and multi-agent routing.
If this is right
- Question answering systems for engineering standards can achieve higher accuracy without ingesting entire rulebooks.
- Vision-language retrieval enables direct use of figures and tables in reasoning chains.
- Modular design supports future replacement of the underlying retriever or LLM.
- Adaptive routing improves performance across different query complexities.
Where Pith is reading between the lines
- Similar pipelines could be adapted for legal or medical documents that mix text with diagrams.
- Further gains might come from training the routing agent on more diverse engineering corpora.
- The framework offers a template for building domain-specific multimodal QA systems beyond the tested benchmark.
Load-bearing premise
That the ColPali retrieval and hand-designed reasoning modes will generalize beyond the DesignQA benchmark without benchmark-specific tuning.
What would settle it
A test on a fresh set of engineering rulebooks and questions where accuracy fails to exceed baseline RAG performance would falsify the general improvement claim.
read the original abstract
Engineering rulebooks and technical standards contain multimodal information like dense text, tables, and illustrations that are challenging for retrieval augmented generation (RAG) systems. Building upon the DesignQA framework [1], which relied on full-text ingestion and text-based retrieval, this work establishes a Multimodal ColPali Enhanced Retrieval and Reasoning Framework (MCERF), a system that couples a multimodal retriever with large language model reasoning for accurate and efficient question answering from engineering documents. The system employs the ColPali, which retrieves both textual and visual information, and multiple retrieval and reasoning strategies: (i) Hybrid Lookup mode for explicit rule mentions, (ii) Vision to Text fusion for figure and table guided queries, (iii) High Reasoning LLM mode for complex multi modal questions, and (iv) SelfConsistency decision to stabilize responses. The modular framework design provides a reusable template for future multimodal systems regardless of underlying model architecture. Furthermore, this work establishes and compares two routing approaches: a single case routing approach and a multi-agent system, both of which dynamically allocate queries to optimal pipelines. Evaluation on the DesignQA benchmark illustrates that this system improves average accuracy across all tasks with a relative gain of +41.1% from baseline RAG best results, which is a significant improvement in multimodal and reasoning-intensive tasks without complete rulebook ingestion. This shows how vision language retrieval, modular reasoning, and adaptive routing enable scalable document comprehension in engineering use cases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MCERF, a multimodal retrieval-augmented generation framework for engineering documentation that pairs the ColPali retriever with four hand-designed reasoning modes (Hybrid Lookup, Vision-to-Text fusion, High Reasoning LLM, SelfConsistency) and two dynamic routing schemes (single-case and multi-agent). It reports a 41.1% relative accuracy gain over baseline RAG on the DesignQA benchmark while avoiding full rulebook ingestion.
Significance. If the accuracy lift proves robust under fixed, non-oracle routing and is supported by ablations and statistical validation, the modular design could offer a practical template for handling multimodal technical documents (text, tables, figures) where pure text RAG falls short.
major comments (3)
- [Evaluation on the DesignQA benchmark] Evaluation section: The abstract and results claim a +41.1% relative gain from 'baseline RAG best results' but supply no explicit baseline configuration, error bars, number of runs, statistical tests, or ablation isolating each mode and router; without these the central empirical claim cannot be verified as robust.
- [Routing approaches] Routing approaches: The description of single-case and multi-agent routing does not state whether mode assignment (to Hybrid Lookup, Vision-to-Text, etc.) is performed from query features alone or involves post-hoc selection after inspecting ground truth or test-set performance; oracle routing would make the reported gain an upper bound rather than evidence of a deployable fixed system.
- [Introduction and related work] Comparison to prior work: While the manuscript builds on the DesignQA framework [1], it does not report a head-to-head accuracy and efficiency comparison against the original full-text ingestion baseline on the same tasks, leaving unclear how much of the gain is attributable to ColPali plus routing versus simply avoiding complete ingestion.
minor comments (3)
- [Abstract] Abstract: the phrasing 'without complete rulebook ingestion' should be quantified (e.g., fraction of pages or tokens actually retrieved) to make the efficiency claim concrete.
- Notation: ensure 'ColPali' is introduced with a brief parenthetical description on first use rather than assuming reader familiarity.
- Figures: captions for any routing diagrams or accuracy tables should explicitly list the exact metric (e.g., exact-match accuracy) and the number of queries per task.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the empirical claims and clarifications.
read point-by-point responses
-
Referee: [Evaluation on the DesignQA benchmark] Evaluation section: The abstract and results claim a +41.1% relative gain from 'baseline RAG best results' but supply no explicit baseline configuration, error bars, number of runs, statistical tests, or ablation isolating each mode and router; without these the central empirical claim cannot be verified as robust.
Authors: We agree that additional details are required to verify robustness. In the revised manuscript we will explicitly document the baseline RAG configuration (retriever, LLM, and prompting), report mean accuracy and standard deviation over five independent runs with error bars, include paired statistical significance tests, and provide ablations that isolate the contribution of each reasoning mode and routing scheme. These additions will directly support the reported +41.1% relative gain. revision: yes
-
Referee: [Routing approaches] Routing approaches: The description of single-case and multi-agent routing does not state whether mode assignment (to Hybrid Lookup, Vision-to-Text, etc.) is performed from query features alone or involves post-hoc selection after inspecting ground truth or test-set performance; oracle routing would make the reported gain an upper bound rather than evidence of a deployable fixed system.
Authors: Mode assignment in both routing schemes is performed exclusively from query features and content, without access to ground-truth answers or test-set performance. The single-case router employs a lightweight query classifier, while the multi-agent router uses agent deliberation on the query alone. We will add explicit statements and pseudocode in the revised manuscript to confirm the absence of oracle information and to demonstrate that the system is a fixed, deployable pipeline. revision: yes
-
Referee: [Introduction and related work] Comparison to prior work: While the manuscript builds on the DesignQA framework [1], it does not report a head-to-head accuracy and efficiency comparison against the original full-text ingestion baseline on the same tasks, leaving unclear how much of the gain is attributable to ColPali plus routing versus simply avoiding complete ingestion.
Authors: We will add a direct head-to-head comparison against the original DesignQA full-text ingestion baseline on the identical DesignQA tasks. The revised evaluation section will report both accuracy and efficiency metrics (retrieval latency, token consumption, and memory usage) to quantify the incremental benefit of the ColPali retriever and routing over full ingestion. revision: yes
Circularity Check
No circularity: empirical benchmark gains rest on measured performance, not definitional reduction or self-citation chains
full rationale
The paper describes a modular system (ColPali retrieval plus four hand-designed modes and two routing schemes) and reports its measured accuracy on the external DesignQA benchmark, claiming a +41.1% relative gain over baseline RAG. No equations, fitted parameters, or predictions appear; the central result is an empirical comparison rather than a quantity derived by construction from the authors' inputs. The citation to DesignQA [1] supplies the benchmark dataset and prior baseline, not a load-bearing uniqueness theorem or ansatz that the present method reduces to. Hand-designed modes and routing are presented as engineering choices whose effectiveness is evaluated externally on held-out queries, with no indication that the reported lift is obtained by post-hoc oracle selection or by renaming a fitted quantity. The derivation chain is therefore self-contained as a system description plus benchmark measurement.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption ColPali multimodal retriever can jointly index and retrieve text, tables, and figures from engineering documents
- domain assumption The DesignQA benchmark is representative of real engineering documentation tasks
Reference graph
Works this paper leans on
-
[1]
Doris, A. C., Grandi, D., Tomich, R., Alam, M. F., Ataei, M., Cheong, H., and Ahmed, F., 2025, “Designqa: A multimodal benchmark for evaluating large language models’ understanding of engineering documentation,” Journal of Computing and Information Science in Engineering,25(2), p. 021009
work page 2025
-
[2]
Generative Models for Multimodal Docu- ment Understanding,
Rombach, R. and Esser, P., 2023, “Generative Models for Multimodal Docu- ment Understanding,”Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
work page 2023
-
[3]
Layout-Aware Pre-training for Visually Rich Document Understanding,
Zhang, W., Li, X., and Wang, H., 2022, “Layout-Aware Pre-training for Visually Rich Document Understanding,”Advances in Neural Information Processing Systems (NeurIPS)
work page 2022
-
[4]
ColPali: Efficient Document Retrieval with Vision Language Models
Faysse, M., Sibille, H., Wu, T., Omrani, B., Viaud, G., Hudelot, C., and Colombo, P., 2024, “Colpali: Efficient document retrieval with vision language models,” arXiv preprint arXiv:2407.01449
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[5]
A Comprehensive Review of Vision- Language Models,
Yin, W., Fu, J., and Liu, Z., 2023, “A Comprehensive Review of Vision- Language Models,” arXiv preprint arXiv:2301.05052
-
[6]
Naghavi Khanghah, K., Chen, Z., Romeo, L., Yang, Q., Malhotra, R., Imani, F., and Xu, H., 2025, “Multimodal Rag-Driven Anomaly Detection and Classifica- tion in Laser Powder Bed Fusion Using Large Language Models,”International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 89220, American Society of...
work page 2025
-
[7]
Agent-based Systems for Complex Task Automation and Reasoning,
Shen, Y., Chen, K., and Jiang, J., 2023, “Agent-based Systems for Complex Task Automation and Reasoning,”International Conference on Learning Rep- resentations (ICLR)
work page 2023
-
[8]
On the Limits of Retrieval-Augmented 18 Generation for Fact-intensive Tasks,
Gao, T., Yao, W., and Chen, D., 2024, “On the Limits of Retrieval-Augmented 18 Generation for Fact-intensive Tasks,”Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
work page 2024
-
[9]
Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks,
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Rocktäschel, T., Grefenstette, E., Kular, H. S., et al., 2020, “Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks,”Advances in Neu- ral Information Processing Systems (NeurIPS), Vol. 33, pp. 9459–9474
work page 2020
-
[10]
Chain-of-Thought Prompting Elicits Reasoning in Large Lan- guage Models,
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., and Zhou, D., 2022, “Chain-of-Thought Prompting Elicits Reasoning in Large Lan- guage Models,”Advances in Neural Information Processing Systems (NeurIPS)
work page 2022
- [11]
- [12]
-
[13]
Gemini API: Models - Gemini 1.0 Pro Vision,
Google AI, 2024, “Gemini API: Models - Gemini 1.0 Pro Vision,” https://ai. google.dev/gemini-api/docs/models/gemini
work page 2024
-
[14]
Anthropic, 2024, “Claude 3 Model Card,” https://www.anthropic.com/ claude-3-model-card
work page 2024
-
[15]
Liu, H., Li, C., Li, Y., and Lee, Y. J., 2023, “LLaVA-v1.5-13B,” Hugging Face, https://huggingface.co/liuhaotian/llava-v1.5-13b
work page 2023
-
[16]
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval- Augmented Generation,
Mahdi Abootorabi, M., Zobeiri, A., Dehghani, M., Mohammadkhani, M., Mo- hammadi, B., Ghahroodi, O., Soleymani Baghshah, M., and Asgari, E., 2025, “Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval- Augmented Generation,” arXiv e-prints, pp. arXiv–2502
work page 2025
-
[17]
Llm agent for fire dynamics simulations,
Xu, L., Mohaddes, D., and Wang, Y., 2024, “Llm agent for fire dynamics simulations,” arXiv preprint arXiv:2412.17146
-
[18]
Retrieval augmentation reduces hallucination in conversation,
Shuster, K., Poff, S., Chen, M., Kiela, D., and Weston, J., 2021, “Re- trieval augmentation reduces hallucination in conversation,” arXiv preprint arXiv:2104.07567
-
[19]
Khanghah, K. N., Chen, Z., Romeo, L., Yang, Q., Malhotra, R., Imani, F., and Xu, H., 2026, “Zero-Shot Anomaly Detection in Laser Powder Bed Fusion Us- ing Multimodal Retrieval-Augmented Generation and Large Language Models,” Journal of Mechanical Design,148(7), p. 072001
work page 2026
-
[20]
Large lan- guage models for extrapolative modeling of manufacturing processes,
Naghavi Khanghah, K., Patel, A., Malhotra, R., and Xu, H., 2025, “Large lan- guage models for extrapolative modeling of manufacturing processes,” Journal of Intelligent Manufacturing, pp. 1–29
work page 2025
-
[21]
Robust multi model rag pipeline for documents containing text, table & images,
Joshi, P., Gupta, A., Kumar, P., and Sisodia, M., 2024, “Robust multi model rag pipeline for documents containing text, table & images,”2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), IEEE, pp. 993–999
work page 2024
-
[22]
Learning transferable visual modelsfromnaturallanguagesupervision,
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al., 2021, “Learning transferable visual modelsfromnaturallanguagesupervision,”Internationalconferenceonmachine learning, PmLR, pp. 8748–8763
work page 2021
-
[23]
Contrastive localized language-image pre-training,
Chen, H.-Y., Lai, Z., Zhang, H., Wang, X., Eichner, M., You, K., Cao, M., Zhang, B., Yang, Y., and Gan, Z., 2024, “Contrastive localized language-image pre-training,” arXiv preprint arXiv:2410.02746
-
[24]
Uniclip: Unified framework for contrastive language-image pre-training,
Lee, J., Kim, J., Shon, H., Kim, B., Kim, S. H., Lee, H., and Kim, J., 2022, “Uniclip: Unified framework for contrastive language-image pre-training,” Ad- vances in Neural Information Processing Systems,35, pp. 1008–1019
work page 2022
-
[25]
Li, J., Li, D., Xiong, C., and Hoi, S., 2022, “Blip: Bootstrapping language- image pre-training for unified vision-language understanding and generation,” International conference on machine learning, PMLR, pp. 12888–12900
work page 2022
-
[26]
Li, J., Li, D., Savarese, S., and Hoi, S., 2023, “Blip-2: Bootstrapping language- image pre-training with frozen image encoders and large language models,” International conference on machine learning, PMLR, pp. 19730–19742
work page 2023
-
[27]
MARVEL: unlocking the multi-modal capability of dense retrieval via visual module plugin,
Zhou, T., Mei, S., Li, X., Liu, Z., Xiong, C., Liu, Z., Gu, Y., and Yu, G., 2024, “MARVEL: unlocking the multi-modal capability of dense retrieval via visual module plugin,”Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 14608–14624
work page 2024
-
[28]
Uniir: Training and benchmarking universal multimodal information retrievers,
Wei, C., Chen, Y., Chen, H., Hu, H., Zhang, G., Fu, J., Ritter, A., and Chen, W., 2024, “Uniir: Training and benchmarking universal multimodal information retrievers,”European Conference on Computer Vision, Springer, pp. 387–404
work page 2024
-
[29]
and Zaragoza, H., 2009,The probabilistic relevance framework: BM25 and beyond, Vol
Robertson, S. and Zaragoza, H., 2009,The probabilistic relevance framework: BM25 and beyond, Vol. 4, Now Publishers Inc
work page 2009
-
[30]
Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z., 2024, “M3- embedding: Multi-linguality, multi-functionality, multi-granularity text embed- dings through self-knowledge distillation,”Findings of the Association for Com- putational Linguistics ACL 2024, pp. 2318–2335
work page 2024
-
[31]
Colbert: Efficient and effective passage search via contextualized late interaction over bert,
Khattab, O. and Zaharia, M., 2020, “Colbert: Efficient and effective passage search via contextualized late interaction over bert,”Proceedings of the 43rd In- ternational ACM SIGIR conference on research and development in Information Retrieval, pp. 39–48
work page 2020
-
[32]
Shohan, F. T., Nayeem, M. T., Islam, S., Akash, A. U., and Joty, S., 2024, “XL- HeadTags: Leveraging multimodal retrieval augmentation for the multilingual generation of news headlines and tags,” arXiv preprint arXiv:2406.03776
-
[33]
arXiv preprint arXiv:2407.12735 , year=
Yan, Y. and Xie, W., 2024, “EchoSight: Advancing visual-language models with Wiki knowledge,” arXiv preprint arXiv:2407.12735
-
[34]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Wang, P., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Chen, K., Liu, X., Wang, J., Ge, W., et al., 2024, “Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution,” arXiv preprint arXiv:2409.12191
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[35]
M3docrag: Multi-modal retrieval is what you need for multi-page multi-document understanding,
Cho, J., Mahata, D., Irsoy, O., He, Y., and Bansal, M., 2024, “M3docrag: Multi- modal retrieval is what you need for multi-page multi-document understanding,” arXiv preprint arXiv:2411.04952
-
[36]
Kossiakoff, A., Sweet, W. N., Seymour, S. J., and Biemer, S. M., 2011,Systems engineering principles and practice, Vol. 83, John Wiley & Sons
work page 2011
-
[37]
DesAgent: AMulti- Agent Mechanical Design Method Based on Collaborative Large and Small Models,
Zhang, S., Li, X., Yuan, C., Feng, W., andJiang, Q., 2026, “DesAgent: AMulti- Agent Mechanical Design Method Based on Collaborative Large and Small Models,” Journal of Mechanical Design,148(5), p. 051706
work page 2026
-
[38]
AgenticLargeLanguageModelsforConcep- tual Systems Engineering and Design,
Massoudi, S.andFuge, M., 2026, “AgenticLargeLanguageModelsforConcep- tual Systems Engineering and Design,” Journal of Mechanical Design,148(5), p. 051405
work page 2026
-
[39]
Fine-grained late- interactionmulti-modalretrievalforretrievalaugmentedvisualquestionanswer- ing,
Lin, W., Chen, J., Mei, J., Coca, A., and Byrne, B., 2023, “Fine-grained late- interactionmulti-modalretrievalforretrievalaugmentedvisualquestionanswer- ing,”AdvancesinNeuralInformationProcessingSystems,36,pp.22820–22840
work page 2023
-
[40]
Tschannen, M., Gritsenko, A., Wang, X., Naeem, M. F., Alabdulmohsin, I., Parthasarathy, N., Evans, T., Beyer, L., Xia, Y., Mustafa, B., et al., 2025, “Siglip 2: Multilingual vision-language encoders with improved semantic understand- ing, localization, and dense features,” arXiv preprint arXiv:2502.14786
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Gemma: Open Models Based on Gemini Research and Technology
Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., Sifre, L., Rivière, M., Kale, M.S., Love, J., etal., 2024, “Gemma: Openmodels based on gemini research and technology,” arXiv preprint arXiv:2403.08295
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [42]
-
[43]
Chain-of-thought prompting elicits reasoning in large 19 language models,
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al., 2022, “Chain-of-thought prompting elicits reasoning in large 19 language models,” Advances in neural information processing systems,35, pp. 24824–24837
work page 2022
-
[44]
A survey of prompt engineering meth- ods in large language models for different nlp tasks,
Vatsal, S. and Dubey, H., 2024, “A survey of prompt engineering methods in large language models for different nlp tasks,” arXiv preprint arXiv:2407.12994
-
[45]
Keyword vs Semantic Search for Retrieval-Augmented Generation: A Survey,
Chihaia, T. and Ciobanu, R.-I., 2025, “Keyword vs Semantic Search for Retrieval-Augmented Generation: A Survey,”2025 25th International Confer- ence on Control Systems and Computer Science (CSCS), IEEE, pp. 169–174
work page 2025
-
[46]
An empirical study of the non-determinism of chatgpt in code generation,
Ouyang, S., Zhang, J. M., Harman, M., and Wang, M., 2025, “An empirical study of the non-determinism of chatgpt in code generation,” ACM Transactions on Software Engineering and Methodology,34(2), pp. 1–28
work page 2025
-
[47]
Dey, P., Merugu, S., and Kaveri, S., 2025, “Uncertainty-aware fusion: An ensemble framework for mitigating hallucinations in large language models,” Companion Proceedings of the ACM on Web Conference 2025, pp. 947–951
work page 2025
-
[48]
One llm is not enough: Harnessing the power of ensemble learning for medical question answering,
Yang, H., Li, M., Zhou, H., Xiao, Y., Fang, Q., and Zhang, R., 2023, “One llm is not enough: Harnessing the power of ensemble learning for medical question answering,” medRxiv
work page 2023
-
[49]
Has gpt-5 achieved spatial intelligence? an empirical study.arXiv preprint arXiv:2508.13142, 2025
Cai, Z., Wang, Y., Sun, Q., Wang, R., Gu, C., Yin, W., Lin, Z., Yang, Z., Wei, C., Shi, X., et al., 2025, “Has GPT-5 Achieved Spatial Intelligence? An Empirical Study,” arXiv preprint arXiv:2508.13142
-
[50]
arXiv preprint arXiv:2408.01319 (2024)
Wang, J., Jiang, H., Liu, Y., Ma, C., Zhang, X., Pan, Y., Liu, M., Gu, P., Xia, S., Li, W., et al., 2024, “A comprehensive review of multimodal large language models: Performance and challenges across different tasks,” arXiv preprint arXiv:2408.01319
-
[51]
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al., 2023, “Segment anything,” Proceedings of the IEEE/CVF international conference on computer vision, pp. 4015–4026
work page 2023
-
[52]
Yao, J., Chen, P., Li, Z., Cai, Y., Wu, Y., You, W., and Sun, L., 2025, “StepI- deator: Utilizing Mixed Representations to Support Step-By-Step Design With Generative Artificial Intelligence,” Journal of Mechanical Design,147(7), p. 071703
work page 2025
-
[53]
Identifying Reliable Evaluation Metrics for Scientific Text Revision,
Jourdan, L., Hernandez, N., Boudin, F., and Dufour, R., 2025, “Identifying Reliable Evaluation Metrics for Scientific Text Revision,”Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6731–6756
work page 2025
-
[54]
Naghavi Khanghah, K., Wang, Z., and Xu, H., 2025, “Reconstruction and gen- eration of porous metamaterial units via variational graph autoencoder and large language model,” Journal of Computing and Information Science in Engineer- ing,25(2), p. 021003
work page 2025
-
[55]
LLMs in e-commerce: A comparative analysis of GPT and LLaMA models in product review evaluation,
Roumeliotis, K. I., Tselikas, N. D., and Nasiopoulos, D. K., 2024, “LLMs in e-commerce: A comparative analysis of GPT and LLaMA models in product review evaluation,” Natural Language Processing Journal,6, p. 100056
work page 2024
-
[56]
Language models are few-shot learners,
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al., 2020, “Language models are few-shot learners,” Advances in neural information processing systems,33, pp. 1877–1901. 20
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.