{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:KOVPAO2N5ERMBHZP4YL5I3L4UI","short_pith_number":"pith:KOVPAO2N","schema_version":"1.0","canonical_sha256":"53aaf03b4de922c09f2fe617d46d7ca21a3cfd1b2b80f3c833d8fbf0a1e3d70d","source":{"kind":"arxiv","id":"2605.14301","version":1},"attestation_state":"computed","paper":{"title":"Language-Induced Priors for Domain Adaptation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Language-induced priors from textual descriptions let domain adaptation match oracle performance when target data is scarce.","cross_cats":["stat.ML"],"primary_cat":"cs.LG","authors_text":"Jiayu Zhou, Qiyuan Chen, Raed Al Kontar","submitted_at":"2026-05-14T03:07:48Z","abstract_excerpt":"Domain adaptation faces a fundamental paradox in the cold-start regime. When target data is scarce, statistical methods fail to distinguish relevant source domains from irrelevant ones, which often leads to negative transfer. In this paper, we address this challenge by leveraging expert textual descriptions of the target domain, a resource that is often available but overlooked. We propose a probabilistic framework that translates these semantic descriptions into a choice model, namely a Language-Induced Prior (LIP), that learns the preferences from a pretrained Large Language Model (LLM). The"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2605.14301","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-14T03:07:48Z","cross_cats_sorted":["stat.ML"],"title_canon_sha256":"b087d1cb81493fc05613a8cd0ecb5922ad5556792eeb179f17772047cb07f282","abstract_canon_sha256":"1d0a48852d500965c63bf9a453e0da1ce8fa32d991ae8bb9c9bc1b91f990597e"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:10.094277Z","signature_b64":"OAodBPQbod3Jea9965kcBZYTyUDXFaOxSXn2jNl/uPeUO0Bla2SHa6VW3yBmTPVbTvdG0JMrdx3VimBcylqpAQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"53aaf03b4de922c09f2fe617d46d7ca21a3cfd1b2b80f3c833d8fbf0a1e3d70d","last_reissued_at":"2026-05-17T23:39:10.093765Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:10.093765Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Language-Induced Priors for Domain Adaptation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Language-induced priors from textual descriptions let domain adaptation match oracle performance when target data is scarce.","cross_cats":["stat.ML"],"primary_cat":"cs.LG","authors_text":"Jiayu Zhou, Qiyuan Chen, Raed Al Kontar","submitted_at":"2026-05-14T03:07:48Z","abstract_excerpt":"Domain adaptation faces a fundamental paradox in the cold-start regime. When target data is scarce, statistical methods fail to distinguish relevant source domains from irrelevant ones, which often leads to negative transfer. In this paper, we address this challenge by leveraging expert textual descriptions of the target domain, a resource that is often available but overlooked. We propose a probabilistic framework that translates these semantic descriptions into a choice model, namely a Language-Induced Prior (LIP), that learns the preferences from a pretrained Large Language Model (LLM). The"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We prove that the estimator roughly matches an oracle cold-start MSE under a correct prior, while remaining asymptotically consistent regardless of the quality of the LIP.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The LLM-derived choice model accurately captures source relevance from textual descriptions; if this mapping is systematically biased, the early-stage guidance in the EM algorithm can degrade performance before data overrides it.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Language-Induced Priors from LLMs guide source selection in cold-start domain adaptation through an EM algorithm, matching oracle MSE under a correct prior and remaining asymptotically consistent.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Language-induced priors from textual descriptions let domain adaptation match oracle performance when target data is scarce.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"043c1e4028103032072e58b02a236f7a4be497cd9cd3fc7844dbca1833df9867"},"source":{"id":"2605.14301","kind":"arxiv","version":1},"verdict":{"id":"02565690-c93b-4b20-8697-ca46aaab8484","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:04:11.039074Z","strongest_claim":"We prove that the estimator roughly matches an oracle cold-start MSE under a correct prior, while remaining asymptotically consistent regardless of the quality of the LIP.","one_line_summary":"Language-Induced Priors from LLMs guide source selection in cold-start domain adaptation through an EM algorithm, matching oracle MSE under a correct prior and remaining asymptotically consistent.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The LLM-derived choice model accurately captures source relevance from textual descriptions; if this mapping is systematically biased, the early-stage guidance in the EM algorithm can degrade performance before data overrides it.","pith_extraction_headline":"Language-induced priors from textual descriptions let domain adaptation match oracle performance when target data is scarce."},"references":{"count":38,"sample":[{"doi":"","year":2022,"title":"IEEE/CAA Journal of Automatica Sinica , volume=","work_id":"bf8ab5a6-eb3d-4990-918d-0b8fd66460e3","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1998,"title":"Asymptotic Statistics , author=. 1998 , publisher=","work_id":"04d112d2-91c2-435d-b9e8-d7b5d526fe98","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2013,"title":"2013 International Conference on Machine Learning and Cybernetics , volume=","work_id":"845e3471-4275-4566-92bc-cd0fdfc1fc1b","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Advances in neural information processing systems , volume=","work_id":"b144ac0e-8b4f-43dd-8d60-7cc4685e38d2","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Advances in neural information processing systems , volume=","work_id":"83eb271d-29e5-42a9-a1aa-11ef7f5b0f62","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":38,"snapshot_sha256":"9e3ba7038ec2872881b4f4e340146b32d4fc80b9162ac40c17ea12ee57d10f05","internal_anchors":1},"formal_canon":{"evidence_count":2,"snapshot_sha256":"a3ab27dccdb3f43f91240278f5cde60e14faa7d581c6dd461735b538457bb12f"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.14301","created_at":"2026-05-17T23:39:10.093851+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.14301v1","created_at":"2026-05-17T23:39:10.093851+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.14301","created_at":"2026-05-17T23:39:10.093851+00:00"},{"alias_kind":"pith_short_12","alias_value":"KOVPAO2N5ERM","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"KOVPAO2N5ERMBHZP","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"KOVPAO2N","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI","json":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI.json","graph_json":"https://pith.science/api/pith-number/KOVPAO2N5ERMBHZP4YL5I3L4UI/graph.json","events_json":"https://pith.science/api/pith-number/KOVPAO2N5ERMBHZP4YL5I3L4UI/events.json","paper":"https://pith.science/paper/KOVPAO2N"},"agent_actions":{"view_html":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI","download_json":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI.json","view_paper":"https://pith.science/paper/KOVPAO2N","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.14301&json=true","fetch_graph":"https://pith.science/api/pith-number/KOVPAO2N5ERMBHZP4YL5I3L4UI/graph.json","fetch_events":"https://pith.science/api/pith-number/KOVPAO2N5ERMBHZP4YL5I3L4UI/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI/action/timestamp_anchor","attest_storage":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI/action/storage_attestation","attest_author":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI/action/author_attestation","sign_citation":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI/action/citation_signature","submit_replication":"https://pith.science/pith/KOVPAO2N5ERMBHZP4YL5I3L4UI/action/replication_record"}},"created_at":"2026-05-17T23:39:10.093851+00:00","updated_at":"2026-05-17T23:39:10.093851+00:00"}