{"paper":{"title":"Context Matters: Auditing Gender Bias in T2I Generation through Risk-Tiered Use-Case Profiles","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"Text-to-image models require gender bias audits that align with the risk level of their specific use cases.","cross_cats":["cs.AI"],"primary_cat":"cs.CY","authors_text":"Jose Luna, Noa Garcia, Xiaofei Xie, Yankun Wu","submitted_at":"2026-05-13T07:25:04Z","abstract_excerpt":"Text-to-image (T2I) generative models are increasingly used to produce content for education, media, and public-facing communication, and are starting to be integrated into higher-impact pipelines. Since generated images tend to reinforce stereotypes, producing representational erasure via \"default\" depictions and shaping perceptions of who belongs in certain roles, a growing body of work has proposed metrics to quantify gender bias in T2I outputs. Yet existing evaluations remain fragmented. Metrics are often reported without a shared view of what they measure, what assumptions they entail, or"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We propose a risk-aligned auditing framework for gender bias in T2I models composed of three constituents that connects risk categories, evaluation metrics, and harms.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That existing gender-bias metrics can be cleanly consolidated into the three proposed measurement categories and mapped to context-dependent harms without significant loss of validity or coverage across deployment scenarios.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A new framework called THUMB cards organizes gender bias metrics for T2I models by risk-tiered use cases, measurement categories, and harm typologies aligned with the EU AI Act.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Text-to-image models require gender bias audits that align with the risk level of their specific use cases.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"67d1a5000aab131a11a0fdcd3bd0e89fe435ffea7918323fd5bf1c889b471916"},"source":{"id":"2605.13113","kind":"arxiv","version":1},"verdict":{"id":"dca94cf7-9093-44fe-be80-7834d2f42bca","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T18:31:59.472267Z","strongest_claim":"We propose a risk-aligned auditing framework for gender bias in T2I models composed of three constituents that connects risk categories, evaluation metrics, and harms.","one_line_summary":"A new framework called THUMB cards organizes gender bias metrics for T2I models by risk-tiered use cases, measurement categories, and harm typologies aligned with the EU AI Act.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That existing gender-bias metrics can be cleanly consolidated into the three proposed measurement categories and mapped to context-dependent harms without significant loss of validity or coverage across deployment scenarios.","pith_extraction_headline":"Text-to-image models require gender bias audits that align with the risk level of their specific use cases."},"references":{"count":114,"sample":[{"doi":"","year":2024,"title":"https://standards.ieee.org/ieee/7003/11357/","work_id":"21819bf8-e2f9-4326-ab4b-20e304bd7f40","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"InstructBlip-2","work_id":"7e7007c6-1924-40f5-b6f0-bf7c2f2b6ac1","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"2025.General Purpose AI (GPAI): High-Level Summary of the AI Act","work_id":"2296c527-5b81-4b42-9d33-b24774a7c18e","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2026,"title":"Anusha Asim. 2026. Through the AI looking glass: measuring gendered objectification in user-generated AI images.AI and Ethics6, 1 (2026), 19","work_id":"4dd7ffae-1350-42f8-8371-53d1b2c8751c","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond","work_id":"cbc2bb21-b6bb-46c0-80bf-107e195ffe10","ref_index":5,"cited_arxiv_id":"2308.12966","is_internal_anchor":true}],"resolved_work":114,"snapshot_sha256":"ffbe7738386acae03bc4594f5c1845e165c7be58158810374eaeaab5c17a6699","internal_anchors":4},"formal_canon":{"evidence_count":2,"snapshot_sha256":"28546ed1cf3fd9d163623fca4d5ea095c69a34ce8b68e11655d0c2126d6acdab"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}