{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2019:3EU3MUW3Y6TNM3IDTP2I4EKV46","short_pith_number":"pith:3EU3MUW3","schema_version":"1.0","canonical_sha256":"d929b652dbc7a6d66d039bf48e1155e7bd4d0d8336f66fa3c9da0729f98ec8d3","source":{"kind":"arxiv","id":"1910.04867","version":2},"attestation_state":"computed","paper":{"title":"A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"The Visual Task Adaptation Benchmark defines good representations as those that adapt to diverse unseen tasks with few examples.","cross_cats":["cs.LG","stat.ML"],"primary_cat":"cs.CV","authors_text":"Alexander Kolesnikov, Alexey Dosovitskiy, Andre Susano Pinto, Carlos Riquelme, Joan Puigcerver, Josip Djolonga, Lucas Beyer, Marcin Michalski, Mario Lucic, Maxim Neumann, Michael Tschannen, Neil Houlsby, Olivier Bachem, Olivier Bousquet, Pierre Ruyssen, Sylvain Gelly, Xiaohua Zhai","submitted_at":"2019-10-01T17:06:29Z","abstract_excerpt":"Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, reconstruction error). We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples. With VTAB, we conduct a"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"1910.04867","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CV","submitted_at":"2019-10-01T17:06:29Z","cross_cats_sorted":["cs.LG","stat.ML"],"title_canon_sha256":"01dfec4774dbcf3571afc230ad20ba5f5e08b81e5bbc3a0773410759e70002cf","abstract_canon_sha256":"2ee84f2872483f7fc0373f486fda0ec03e71cdcf6d2e260eb245a11256983169"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:12.891649Z","signature_b64":"CPCVib8cQ2QFlrWG7Q1gK2Kw8R35FAjmTOHbaMlxhHlpGRAvtgtixKANTHdfBrfOyU3f4g7sV3RqKLZLDBhBDw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"d929b652dbc7a6d66d039bf48e1155e7bd4d0d8336f66fa3c9da0729f98ec8d3","last_reissued_at":"2026-05-17T23:38:12.891040Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:12.891040Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"The Visual Task Adaptation Benchmark defines good representations as those that adapt to diverse unseen tasks with few examples.","cross_cats":["cs.LG","stat.ML"],"primary_cat":"cs.CV","authors_text":"Alexander Kolesnikov, Alexey Dosovitskiy, Andre Susano Pinto, Carlos Riquelme, Joan Puigcerver, Josip Djolonga, Lucas Beyer, Marcin Michalski, Mario Lucic, Maxim Neumann, Michael Tschannen, Neil Houlsby, Olivier Bachem, Olivier Bousquet, Pierre Ruyssen, Sylvain Gelly, Xiaohua Zhai","submitted_at":"2019-10-01T17:06:29Z","abstract_excerpt":"Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, reconstruction error). We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples. With VTAB, we conduct a"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That performance on the 19 selected tasks under few-shot linear or fine-tuning adaptation is a reliable proxy for representation quality on arbitrary future vision problems.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"VTAB is a 19-task benchmark that measures representation quality by few-shot adaptation performance across diverse vision domains, with a controlled large-scale comparison of popular pretraining methods.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"The Visual Task Adaptation Benchmark defines good representations as those that adapt to diverse unseen tasks with few examples.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"230bc010fe30a13dc4621d61162e32d6c2bdf136237e09836eeb837e0ce31cac"},"source":{"id":"1910.04867","kind":"arxiv","version":2},"verdict":{"id":"81db5567-0e80-4630-944a-029956d166f7","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T22:07:35.805305Z","strongest_claim":"We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples.","one_line_summary":"VTAB is a 19-task benchmark that measures representation quality by few-shot adaptation performance across diverse vision domains, with a controlled large-scale comparison of popular pretraining methods.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That performance on the 19 selected tasks under few-shot linear or fine-tuning adaptation is a reliable proxy for representation quality on arbitrary future vision problems.","pith_extraction_headline":"The Visual Task Adaptation Benchmark defines good representations as those that adapt to diverse unseen tasks with few examples."},"references":{"count":25,"sample":[{"doi":"","year":null,"title":"DeepMind Lab","work_id":"8a8d827f-5377-4733-bfe8-bc66c011d458","ref_index":1,"cited_arxiv_id":"1612.03801","is_internal_anchor":true},{"doi":"","year":1907,"title":"Large scale adversarial representation learning","work_id":"0d7176d3-a311-4e05-9a09-5db679be48fb","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1905,"title":"Scaling and Benchmarking Self-Supervised Visual Representation Learning","work_id":"7991ef7e-1245-4a3f-acf4-f7509b670d8b","ref_index":3,"cited_arxiv_id":"1905.01235","is_internal_anchor":true},{"doi":"","year":null,"title":"Rethinking ImageNet pre-training.arXiv preprint arXiv:1811.08883","work_id":"9b366d91-12ae-42d4-9009-cdccc3efe6d2","ref_index":4,"cited_arxiv_id":"1811.08883","is_internal_anchor":true},{"doi":"","year":1905,"title":"J., Razavi, A., Doersch, C., Eslami, S., and Oord, A","work_id":"5c545404-7449-457a-92aa-390c9d328c06","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":25,"snapshot_sha256":"d371c0c782cdec833171ce168889ab4d6ef29c262d83533c94cc8eba57cc5cfa","internal_anchors":7},"formal_canon":{"evidence_count":2,"snapshot_sha256":"a09b8e2054ca76035a8e0dd563bf115307d7cc44447e786a7fadf3e388c599d0"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"1910.04867","created_at":"2026-05-17T23:38:12.891123+00:00"},{"alias_kind":"arxiv_version","alias_value":"1910.04867v2","created_at":"2026-05-17T23:38:12.891123+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.1910.04867","created_at":"2026-05-17T23:38:12.891123+00:00"},{"alias_kind":"pith_short_12","alias_value":"3EU3MUW3Y6TN","created_at":"2026-05-18T12:33:07.085635+00:00"},{"alias_kind":"pith_short_16","alias_value":"3EU3MUW3Y6TNM3ID","created_at":"2026-05-18T12:33:07.085635+00:00"},{"alias_kind":"pith_short_8","alias_value":"3EU3MUW3","created_at":"2026-05-18T12:33:07.085635+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":25,"internal_anchor_count":25,"sample":[{"citing_arxiv_id":"2605.23719","citing_title":"Weierstrass Positional Encoding for Vision Transformers","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2010.11929","citing_title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2410.04941","citing_title":"TOAST: Transformer Optimization using Adaptive and Simple Transformations","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2411.02813","citing_title":"Sparse Orthogonal Parameters Tuning for Continual Learning","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2508.04227","citing_title":"Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting","ref_index":133,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15916","citing_title":"LoCO: Low-rank Compositional Rotation Fine-tuning","ref_index":53,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17571","citing_title":"Stable Routing for Mixture-of-Experts in Class-Incremental Learning","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2510.27359","citing_title":"GD-FPS: Growth-Driven Feedforward Parameter Selection for Efficient Fine-Tuning","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2511.12090","citing_title":"Teaching Prompts to Coordinate: Hierarchical Layer-Grouped Prompt Tuning for Continual Learning","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2209.06794","citing_title":"PaLI: A Jointly-Scaled Multilingual Language-Image Model","ref_index":180,"is_internal_anchor":true},{"citing_arxiv_id":"2303.16199","citing_title":"LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention","ref_index":86,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12678","citing_title":"No One Knows the State of the Art in Geospatial Foundation Models","ref_index":77,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02765","citing_title":"Towards Realistic Class-Incremental Learning with Free-Flow Increments","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2210.08402","citing_title":"LAION-5B: An open large-scale dataset for training next generation image-text models","ref_index":94,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11872","citing_title":"LOFT: Low-Rank Orthogonal Fine-Tuning via Task-Aware Support Selection","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2404.08471","citing_title":"Revisiting Feature Prediction for Learning Visual Representations from Video","ref_index":152,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08589","citing_title":"S2FT: Parameter-Efficient Fine-Tuning in Sparse Spectrum Domain","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2604.13313","citing_title":"Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11091","citing_title":"LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learning","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11112","citing_title":"Quantum-Gated Task-interaction Knowledge Distillation for Pre-trained Model-based Class-Incremental Learning","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2604.09088","citing_title":"Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation","ref_index":107,"is_internal_anchor":true},{"citing_arxiv_id":"2605.04058","citing_title":"MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning","ref_index":68,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06440","citing_title":"Visual prompting reimagined: The power of the Activation Prompts","ref_index":70,"is_internal_anchor":true},{"citing_arxiv_id":"2410.02713","citing_title":"LLaVA-Video: Video Instruction Tuning With Synthetic Data","ref_index":98,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18284","citing_title":"Spike-NVPT: Learning Robust Visual Prompts via Bio-Inspired Temporal Filtering and Discretization","ref_index":34,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46","json":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46.json","graph_json":"https://pith.science/api/pith-number/3EU3MUW3Y6TNM3IDTP2I4EKV46/graph.json","events_json":"https://pith.science/api/pith-number/3EU3MUW3Y6TNM3IDTP2I4EKV46/events.json","paper":"https://pith.science/paper/3EU3MUW3"},"agent_actions":{"view_html":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46","download_json":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46.json","view_paper":"https://pith.science/paper/3EU3MUW3","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=1910.04867&json=true","fetch_graph":"https://pith.science/api/pith-number/3EU3MUW3Y6TNM3IDTP2I4EKV46/graph.json","fetch_events":"https://pith.science/api/pith-number/3EU3MUW3Y6TNM3IDTP2I4EKV46/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46/action/timestamp_anchor","attest_storage":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46/action/storage_attestation","attest_author":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46/action/author_attestation","sign_citation":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46/action/citation_signature","submit_replication":"https://pith.science/pith/3EU3MUW3Y6TNM3IDTP2I4EKV46/action/replication_record"}},"created_at":"2026-05-17T23:38:12.891123+00:00","updated_at":"2026-05-17T23:38:12.891123+00:00"}