arxiv: 2604.09617 · v1 · submitted 2026-03-16 · 💻 cs.AI · cs.IR

Recognition: no theorem link

AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation

Haoxuan Zhang , Ruochi Li , Zhenni Liang , Mehri Sattari , Phat Vo , Collin Qu , Ting Xiao , Junhua Ding

show 2 more authors

Yang Zhang Haihua Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 10:55 UTC · model grok-4.3

classification 💻 cs.AI cs.IR

keywords generative AImodel cardsdata cardsquery expansiondocumentation generationadaptive extractionbenchmarkknowledge transfer

0 comments

The pith

Adaptive query expansion generates model and data cards that exceed existing automated methods and human-authored data cards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework to automatically produce standardized documentation for generative AI models and their data from scientific papers and web repositories. Existing systems rely on fixed templates that cannot adjust to different paper formats and often leave gaps due to incomplete metadata. The proposed approach refines extraction queries iteratively based on paper context to pull more details and transfers relevant content from similar existing cards to fill remaining fields. Experiments across quality measures show the resulting cards score higher than prior automated tools and surpass human-written data cards while nearing human quality on model cards. A new expert-annotated benchmark is also released to support consistent testing of such generation methods.

Core claim

AdaQE-CG uses an Intra-Paper Extraction via Context-Aware Query Expansion module to iteratively refine queries and recover richer information directly from papers, combined with an Inter-Card Completion using the MetaGAI Pool module to transfer semantically relevant content from similar cards in a curated collection, producing documentation that outperforms prior automated approaches, exceeds human-authored data cards, and approaches human-level quality for model cards.

What carries the argument

The Intra-Paper Extraction via Context-Aware Query Expansion (IPE-QE) module that adapts queries based on paper context paired with the Inter-Card Completion using the MetaGAI Pool (ICC-MP) module that fills gaps via transfer from similar cards.

If this is right

Dynamic query refinement handles varied paper structures better than fixed templates, reducing missing information in generated cards.
Cross-card knowledge transfer from a pool of similar cards fills gaps in web-scale repositories with incomplete metadata.
The MetaGAI-Bench benchmark enables reproducible evaluation of documentation quality across multiple dimensions.
Higher-quality automated cards support more consistent and transparent documentation for generative AI systems.
The approach scales to web-scale generation by combining intra-paper adaptation with inter-card completion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same iterative expansion and transfer pattern could apply to automated generation of other technical summaries in scientific literature.
Integrating user corrections into the query refinement loop might further reduce omissions in future versions.
Repositories with growing numbers of cards could see compounding quality gains as the pool for transfer improves over time.
The method points toward hybrid systems where extraction from source documents and reuse from existing records work together for documentation tasks.

Load-bearing premise

Large language models can expand queries iteratively to extract accurate and complete information from papers without introducing hallucinations or omissions that reduce card quality.

What would settle it

A direct comparison on papers with independently verified ground-truth card content showing that the generated cards contain more factual errors or missing details than human-authored versions or prior methods.

Figures

Figures reproduced from arXiv: 2604.09617 by Collin Qu, Haihua Chen, Haoxuan Zhang, Junhua Ding, Mehri Sattari, Phat Vo, Ruochi Li, Ting Xiao, Yang Zhang, Zhenni Liang.

**Figure 2.** Figure 2: AdaQE-CG: Hybrid-module pipeline for automated card generation. Module 1 (IPE-QE): iteratively extracts field [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Field-level WCCI decomposition across generative [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Query expansion effectiveness across rounds 2-10. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Quality improvement contribution of ICC-MP. Blue [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Transparent and standardized documentation is essential for building trustworthy generative AI (GAI) systems. However, existing automated methods for generating model and data cards still face three major challenges: (i) static templates, as most systems rely on fixed query templates that cannot adapt to diverse paper structures or evolving documentation requirements; (ii) information scarcity, since web-scale repositories such as Hugging Face often contain incomplete or inconsistent metadata, leading to missing or noisy information; and (iii) lack of benchmarks, as the absence of standardized datasets and evaluation protocols hinders fair and reproducible assessment of documentation quality. To address these limitations, we propose AdaQE-CG, an Adaptive Query Expansion for Card Generation framework that combines dynamic information extraction with cross-card knowledge transfer. Its Intra-Paper Extraction via Context-Aware Query Expansion (IPE-QE) module iteratively refines extraction queries to recover richer and more complete information from scientific papers and repositories, while its Inter-Card Completion using the MetaGAI Pool (ICC-MP) module fills missing fields by transferring semantically relevant content from similar cards in a curated dataset. In addition, we introduce MetaGAI-Bench, the first large-scale, expert-annotated benchmark for evaluating GAI documentation. Comprehensive experiments across five quality dimensions show that AdaQE-CG substantially outperforms existing approaches, exceeds human-authored data cards, and approaches human-level quality for model cards. Code, prompts, and data are publicly available at: https://github.com/haoxuan-unt2024/AdaQE-CG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AdaQE-CG introduces adaptive query expansion plus a new benchmark for model and data card generation, but its superiority claims rest on unverified LLM extraction accuracy.

read the letter

The paper's main advance is the AdaQE-CG framework. It adds an IPE-QE module that iteratively refines queries to pull more complete information from papers and repositories, then uses ICC-MP to borrow missing fields from similar cards in a pool. They also release MetaGAI-Bench, a large expert-annotated dataset for this task. These pieces move past static templates and address incomplete metadata on sites like Hugging Face in a direct way. Releasing code, prompts, and data is a practical plus that lets others test and extend the work quickly.

Referee Report

2 major / 2 minor

Summary. The paper proposes AdaQE-CG, an adaptive query expansion framework for generating model and data cards. It consists of IPE-QE for iterative context-aware extraction from papers and repositories, and ICC-MP for filling missing fields via transfer from a curated MetaGAI Pool. The work introduces MetaGAI-Bench as the first large-scale expert-annotated benchmark and reports experiments across five quality dimensions claiming substantial outperformance over existing methods, exceeding human-authored data cards, and approaching human-level model card quality. Code, prompts, and data are released publicly.

Significance. If the results hold, the work addresses a timely and important problem in trustworthy AI by improving automated documentation at web scale. The introduction of MetaGAI-Bench provides a much-needed standardized evaluation resource, and the public release of code and data is a clear strength that supports reproducibility. The adaptive, non-static approach to query expansion and cross-card transfer offers a practical advance over template-based methods.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: The claim of outperformance across five quality dimensions is central to the contribution, yet the manuscript provides no specification of the exact metrics, baselines, statistical tests, sample sizes, or error analysis used to support the headline results (outperforming baselines, exceeding human data cards, approaching human model-card quality). Without these details the experimental claims cannot be verified or reproduced.
[Method (IPE-QE)] IPE-QE module (method description): The iterative context-aware query expansion is presented as recovering richer information, but the paper reports no quantitative audit of extraction fidelity such as precision/recall or hallucination rates against human gold labels on key fields (architecture, training data, metrics). This omission is load-bearing because downstream ICC-MP transfer cannot correct upstream factual errors, directly undermining the quality-score superiority claims on MetaGAI-Bench.

minor comments (2)

[Introduction and Method] Notation: The acronyms IPE-QE and ICC-MP are introduced without an explicit table or consistent first-use expansion in all sections, which reduces readability.
[Related Work] References: Several recent works on data-card generation and LLM-based information extraction are cited only in passing; a more systematic comparison table would clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive assessment of the work's significance. We address each major comment below and will revise the manuscript to provide the requested details and analyses.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The claim of outperformance across five quality dimensions is central to the contribution, yet the manuscript provides no specification of the exact metrics, baselines, statistical tests, sample sizes, or error analysis used to support the headline results (outperforming baselines, exceeding human data cards, approaching human model-card quality). Without these details the experimental claims cannot be verified or reproduced.

Authors: We agree that the experimental claims require explicit specification for verifiability. The manuscript currently describes the five quality dimensions at a high level without detailing the exact metrics, baseline implementations, statistical tests, sample sizes, or error analysis. In the revised manuscript, we will expand the Experiments section to include precise metric definitions, full baseline details, statistical significance results, exact evaluation sample sizes, and an error analysis to support all headline claims. revision: yes
Referee: [Method (IPE-QE)] IPE-QE module (method description): The iterative context-aware query expansion is presented as recovering richer information, but the paper reports no quantitative audit of extraction fidelity such as precision/recall or hallucination rates against human gold labels on key fields (architecture, training data, metrics). This omission is load-bearing because downstream ICC-MP transfer cannot correct upstream factual errors, directly undermining the quality-score superiority claims on MetaGAI-Bench.

Authors: We acknowledge this is a substantive point. While end-to-end results on MetaGAI-Bench provide indirect evidence, the manuscript lacks a direct quantitative audit of IPE-QE extraction fidelity. We will revise the Method section (and add an appendix if needed) to report precision, recall, and hallucination rates for key fields such as architecture, training data, and metrics, evaluated against human gold labels on a sampled subset of MetaGAI-Bench. This will demonstrate upstream reliability and strengthen the overall claims. revision: yes

Circularity Check

0 steps flagged

No circularity: framework and benchmark are independently specified and evaluated

full rationale

The paper presents AdaQE-CG as a two-module system (IPE-QE for iterative context-aware query expansion from papers and ICC-MP for transferring content from a curated MetaGAI pool) plus a new expert-annotated benchmark MetaGAI-Bench. Performance claims rest on comparative experiments across quality dimensions rather than any derivation that reduces outputs to fitted parameters or self-referential definitions. No equations appear, no predictions are obtained by fitting inputs within the same work, and no load-bearing uniqueness theorems or ansatzes are imported via self-citation. The approach is therefore self-contained against external baselines and the introduced benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the assumption that large language models can perform reliable iterative query refinement and semantic matching for card completion; no free parameters or invented entities are described in the abstract.

axioms (2)

domain assumption LLMs can iteratively refine extraction queries to recover richer information without significant hallucination or omission
Invoked by the description of the IPE-QE module in the abstract.
domain assumption Semantically similar cards in the MetaGAI Pool contain transferable content that correctly fills missing fields
Invoked by the description of the ICC-MP module in the abstract.

pith-pipeline@v0.9.0 · 5608 in / 1314 out tokens · 51767 ms · 2026-05-15T10:55:51.379761+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 4 internal anchors

[1]

David Adkins, Bilal Alsallakh, Adeel Cheema, Narine Kokhlikyan, Emily McReynolds, Pushkar Mishra, Chavez Procope, Jeremy Sawruk, Erin Wang, and Polina Zvyagina. 2022. Prescriptive and descriptive approaches to machine- learning transparency. InCHI conference on human factors in computing systems extended abstracts. 1–9

work page 2022
[2]

Vijay Arya, Rachel KE Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, et al. 2019. One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques.arXiv preprint arXiv:1909.03012(2019). AdaQE-CG: Adaptive Query Expansion for Web-Scale Generativ...

work page arXiv 2019
[3]

Jiaxin Bai, Wei Fan, Qi Hu, Qing Zong, Chunyang Li, Hong Ting Tsang, Hongyu Luo, Yauwai Yim, Haoyu Huang, Xiao Zhou, et al. 2025. AutoSchemaKG: Au- tonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora.arXiv preprint arXiv:2505.23628(2025)

work page arXiv 2025
[4]

Emily M Bender and Batya Friedman. 2018. Data statements for natural lan- guage processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics6 (2018), 587–604

work page 2018
[5]

Eshta Bhardwaj, Harshit Gujral, Siyi Wu, Ciara Zogheib, Tegan Maharaj, and Christoph Becker. 2024. The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track.Advances in Neural Information Processing Systems37 (2024), 53626–53648

work page 2024
[6]

Lukas Blecher, Guillem Cucurull, Thomas Scialom, and Robert Stojnic. 2023. Nougat: Neural optical understanding for academic documents.arXiv preprint arXiv:2308.13418(2023)

work page arXiv 2023
[7]

Danilo Brajovic, Niclas Renner, Vincent Philipp Goebels, Philipp Wagner, Ben- jamin Fresz, Martin Biller, Mara Klaeb, Janika Kutz, Jens Neuhuettler, and Marco F Huber. 2023. Model reporting for certifiable ai: A proposal from merging eu regulation into ai development.arXiv preprint arXiv:2307.11525(2023)

work page arXiv 2023
[8]

Cheng-Han Chiang and Hung-Yi Lee. 2023. Can Large Language Models Be an Alternative to Human Evaluations?. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15607– 15631

work page 2023
[9]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales.Educational and psychological measurement20, 1 (1960), 37–46

work page 1960
[10]

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S Rosen, Gerbrand Ceder, Kristin A Persson, and Anubhav Jain. 2024. Structured information extraction from scientific text with large language models.Nature communications15, 1 (2024), 1418

work page 2024
[12]

Roxana Daneshjou, Mary P Smith, Mary D Sun, Veronica Rotemberg, and James Zou. 2021. Lack of transparency and potential bias in artificial intelligence data sets and algorithms: a scoping review.JAMA dermatology157, 11 (2021), 1362–1369

work page 2021
[13]

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM64, 12 (2021), 86–92

work page 2021
[14]

Stephen Gilbert, Rasmus Adler, Taras Holoyad, and Eva Weicken. 2025. Could transparent model cards with layered accessible information drive trust and safety in health AI?npj Digital Medicine8, 1 (2025), 124

work page 2025
[15]

Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al . 2024. A survey on llm-as-a-judge.arXiv preprint arXiv:2411.15594(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Yufei Guo, Muzhe Guo, Juntao Su, Zhou Yang, Mengqiu Zhu, Hongfei Li, Mengyang Qiu, and Shuo Shuo Liu. 2024. Bias in Large Language Models: Origin, Evaluation, and Mitigation.CoRR(2024)

work page 2024
[17]

Carolina AM Heming, Mohamed Abdalla, Shahram Mohanna, Monish Ahluwalia, Linglin Zhang, Hari Trivedi, MinJae Woo, Benjamin Fine, Judy Wawira Gichoya, Leo Anthony Celi, et al. 2023. Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors.arXiv preprint arXiv:2311.12560(2023)

work page arXiv 2023
[18]

Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielin- ski. 2020. The dataset nutrition label.Data protection and privacy12, 12 (2020), 1

work page 2020
[19]

Eliahu Horwitz, Nitzan Kurer, Jonathan Kahana, Liel Amar, and Yedid Hoshen

work page
[20]

Charting and Navigating Hugging Face’s Model Atlas.arXiv e-prints(2025), arXiv–2503

work page 2025
[21]

Alon Jacovi, Ana Marasović, Tim Miller, and Yoav Goldberg. 2021. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency. 624–635

work page 2021
[22]

Yizhu Jiao, Sha Li, Sizhe Zhou, Heng Ji, and Jiawei Han. 2024. TEXT2DB: Integration-Aware Information Extraction with Large Language Model Agents. InFindings of the Association for Computational Linguistics ACL 2024. 185–205

work page 2024
[23]

Joshua A Kroll. 2021. Outlining traceability: A principle for operationalizing accountability in computing systems. InProceedings of the 2021 ACM Conference on fairness, accountability, and transparency. 758–771

work page 2021
[24]

Benjamin Laufer, Hamidah Oderinwale, and Jon Kleinberg. 2025. Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face.arXiv preprint arXiv:2508.06811(2025)

work page arXiv 2025
[25]

Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. 2024. Llms-as-judges: a comprehensive survey on llm-based evaluation methods.arXiv preprint arXiv:2412.05579(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Weixin Liang, Nazneen Rajani, Xinyu Yang, Ezinwanne Ozoani, Eric Wu, Yiqun Chen, Daniel Scott Smith, and James Zou. 2024. Systematic analysis of 32,111 AI model cards characterizes documentation practice in AI.Nature Machine Intelligence6, 7 (2024), 744–753

work page 2024
[27]

Jiarui Liu, Wenkai Li, Zhijing Jin, and Mona Diab. 2024. Automatic Generation of Model and Data Cards: A Step Towards Responsible AI. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 1975–1997

work page 2024
[28]

Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, et al. 2024. A large-scale audit of dataset licensing and attribution in AI.Nature Machine Intelligence6, 8 (2024), 975–987

work page 2024
[29]

Shayne Longpre, Robert Mahari, Naana Obeng-Marnu, William Brannon, Tobin South, Katy Gero, Sandy Pentland, and Jad Kabbara. 2024. Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?arXiv preprint arXiv:2404.12691(2024)

work page arXiv 2024
[30]

Rada Mihalcea, Oana Ignat, Longju Bai, Angana Borah, Luis Chiruzzo, Zhijing Jin, Claude Kwizera, Joan Nwatu, Soujanya Poria, and Thamar Solorio. 2025. Why AI Is WEIRD and Shouldn’t Be This Way: Towards AI for Everyone, with Everyone, by Everyone. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 28657–28670

work page 2025
[31]

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. InProceedings of the conference on fairness, accountability, and transparency. 220–229

work page 2019
[32]

Claudio Novelli, Mariarosaria Taddeo, and Luciano Floridi. 2024. Accountability in artificial intelligence: What it is and how it works.Ai & Society39, 4 (2024), 1871–1882

work page 2024
[33]

Trishan Panch, Heather Mattie, and Rifat Atun. 2019. Artificial intelligence and algorithmic bias: implications for health systems.Journal of global health9, 2 (2019), 020318

work page 2019
[34]

Emmanouil Papagiannidis, Patrick Mikalef, and Kieran Conboy. 2025. Responsible artificial intelligence governance: A review and research framework.The Journal of Strategic Information Systems34, 2 (2025), 101885

work page 2025
[35]

P Jonathon Phillips, P Jonathon Phillips, Carina A Hahn, Peter C Fontana, Amy N Yates, Kristen Greene, David A Broniatowski, and Mark A Przybocki. 2021. Four principles of explainable artificial intelligence. (2021)

work page 2021
[36]

Tim Puhlfürß, Julia Butzke, and Walid Maalej. 2025. Model Cards Revisited: Bridging the Gap Between Theory and Practice for Ethical AI Requirements. arXiv preprint arXiv:2507.06014(2025)

work page arXiv 2025
[37]

Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. 2022. Data cards: Purposeful and transparent dataset documentation for responsible ai. InProceed- ings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1776–1826

work page 2022
[38]

Shaina Raza, Rizwan Qureshi, Anam Zahid, Safiullah Kamawal, Ferhat Sadak, Joseph Fioresi, Muhammaed Saeed, Ranjan Sapkota, Aditya Jain, Anas Zafar, et al

work page
[39]

Who is responsible? the data, models, users or regulations? a comprehensive survey on responsible generative ai for a sustainable future.arXiv preprint arXiv:2502.08650(2025)

work page arXiv 2025
[40]

Anthony Cintron Roman, Jennifer Wortman Vaughan, Valerie See, Steph Ballard, Jehu Torres, Caleb Robinson, and Juan M Lavista Ferres. 2023. Open datasheets: Machine-readable documentation for open datasets and responsible ai assess- ments.arXiv preprint arXiv:2312.06153(2023)

work page arXiv 2023
[41]

Marco Rondina, Antonio Vetrò, and Juan Carlos De Martin. 2023. Completeness of datasets documentation on ML/AI repositories: An empirical investigation. In EPIA Conference on Artificial Intelligence. Springer, 79–91

work page 2023
[42]

Christin Seifert, Stefanie Scherzinger, and Lena Wiese. 2019. Towards generating consumer labels for machine learning models. In2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI). IEEE, 173–179

work page 2019
[43]

Harald Semmelrock, Tony Ross-Hellauer, Simone Kopeinik, Dieter Theiler, Armin Haberl, Stefan Thalmann, and Dominik Kowald. 2025. Reproducibility in machine- learning-based research: Overview, barriers, and drivers.AI Magazine46, 2 (2025), e70002

work page 2025
[44]

Hong Shen, Wesley H Deng, Aditi Chattopadhyay, Zhiwei Steven Wu, Xu Wang, and Haiyi Zhu. 2021. Value cards: An educational toolkit for teaching social impacts of machine learning through deliberation. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency. 850–861

work page 2021
[45]

Huzaifa Sidhpurwala, Emily Fox, Garth Mollett, Florencio Cano Gabarda, and Roman Zhukov. 2025. Blueprints of Trust: AI System Cards for End to End Transparency and Governance.arXiv preprint arXiv:2509.20394(2025)

work page arXiv 2025
[46]

Anna Sokol, Elizabeth Daly, Michael Hind, David Piorkowski, Xiangliang Zhang, Nuno Moniz, and Nitesh Chawla. 2024. BenchmarkCards: Standardized Documen- tation for Large Language Model Benchmarks.arXiv preprint arXiv:2410.12974 (2024)

work page arXiv 2024
[47]

Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A Rossi, and Dinesh Manocha. 2025. VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologi...

work page 2025
[48]

Xinyu Yang, Weixin Liang, and James Zou. 2024. Navigating dataset documenta- tions in AI: A large-scale analysis of dataset cards on hugging face.arXiv preprint arXiv:2401.13822(2024)

work page arXiv 2024
[49]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi

work page
[50]

InInternational Con- ference on Learning Representations

BERTScore: Evaluating Text Generation with BERT. InInternational Con- ference on Learning Representations

work page
[51]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou

work page
[52]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.arXiv preprint arXiv:2506.05176(2025). A Appendix A.1 Definition of Model and Data Card The definitions of model and data cards for GAI are shown in Table 4. A.2 Weighted Card Completeness Index The WCCI quantifies documentation quality through content avail- ability and con...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[53]

to 100-430 (Round 10); (2) sustained quality gains—model cards achieve 12-16% average improvements per dimension while data cards show 1-2% gains; (3) field-specific behavior—Model Details and Dataset Structure maintain higher activity (430-520 cards at Round 10), while responsible AI fields converge rapidly to under 200 cards. A.7 Case Study of Generated...

work page 2026