{"paper":{"title":"Generative Augmented Inference","license":"http://creativecommons.org/licenses/by/4.0/","headline":"GAI uses an orthogonal moment construction to incorporate LLM-generated outputs for consistent estimation and valid inference on human-labeled outcomes with a nonparametric relationship.","cross_cats":["cs.AI","stat.ME","stat.ML"],"primary_cat":"cs.LG","authors_text":"Cheng Lu, Dennis J. Zhang, Heng Zhang, Mengxin Wang","submitted_at":"2026-04-16T03:10:37Z","abstract_excerpt":"Large language models enable inexpensive AI-generated annotations, but using them reliably for causal inference remains challenging. Naively pooling AI and human data induces bias, while existing methods such as Prediction-Powered Inference (PPI; Angelopoulos et al., 2023a) treat AI outputs as proxies of true labels -- an assumption often violated for generative model outputs in practice. We propose Generative Augmented Inference (GAI), a framework that treats AI outputs as general, potentially high-dimensional informative features for learning human labels rather than as surrogates. GAI flexi"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"GAI uses an orthogonal moment construction that enables consistent estimation and valid inference with flexible, nonparametric relationship between LLM-generated outputs and human labels. We establish asymptotic normality and show a 'safe default' property: relative to human-data-only estimators, GAI weakly improves estimation efficiency under arbitrary auxiliary signals and yields strict gains whenever the auxiliary information is predictive.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The auxiliary AI signals are generated independently of the human labeling process in a way that permits the orthogonal moment conditions to identify the target parameters without additional parametric restrictions on the relationship between AI outputs and human labels.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"GAI uses orthogonal moment conditions to integrate arbitrary AI-generated auxiliary data into human-label models, delivering consistent estimates, asymptotic normality, and a safe-default efficiency improvement over human-data-only methods.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"GAI uses an orthogonal moment construction to incorporate LLM-generated outputs for consistent estimation and valid inference on human-labeled outcomes with a nonparametric relationship.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e19521fdd97f35606ce6b3c2cf5c2352f23f640480a6608e131b9ab7a56dc9ab"},"source":{"id":"2604.14575","kind":"arxiv","version":2},"verdict":{"id":"4ec378b2-5688-4956-9d3e-05f70ed75dde","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-10T11:59:03.757757Z","strongest_claim":"GAI uses an orthogonal moment construction that enables consistent estimation and valid inference with flexible, nonparametric relationship between LLM-generated outputs and human labels. We establish asymptotic normality and show a 'safe default' property: relative to human-data-only estimators, GAI weakly improves estimation efficiency under arbitrary auxiliary signals and yields strict gains whenever the auxiliary information is predictive.","one_line_summary":"GAI uses orthogonal moment conditions to integrate arbitrary AI-generated auxiliary data into human-label models, delivering consistent estimates, asymptotic normality, and a safe-default efficiency improvement over human-data-only methods.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The auxiliary AI signals are generated independently of the human labeling process in a way that permits the orthogonal moment conditions to identify the target parameters without additional parametric restrictions on the relationship between AI outputs and human labels.","pith_extraction_headline":"GAI uses an orthogonal moment construction to incorporate LLM-generated outputs for consistent estimation and valid inference on human-labeled outcomes with a nonparametric relationship."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.14575/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}