{"paper":{"title":"LIFT: Last-Mile Fine-Tuning for Table Explicitation","license":"http://creativecommons.org/licenses/by-sa/4.0/","headline":"Last-mile fine-tuning pairs a pre-trained LLM for initial table extraction with a fine-tuned SLM that repairs errors, matching end-to-end SLM fine-tuning on TEDS while using as few as 1,000 examples.","cross_cats":["cs.CL"],"primary_cat":"cs.LG","authors_text":"Ashish Tiwari, Divij Khaitan","submitted_at":"2026-05-13T12:19:01Z","abstract_excerpt":"We propose last-mile fine-tuning, or Lift, a pipeline in which a pre-trained large language model extracts an initial table from unstructured clipboard text, and a fine-tuned small language model (1B-24B parameters SLM) repairs errors in the extracted table. On a benchmark of 2,596 tables from three datasets, Lift matches or exceeds end-to-end SLM fine-tuning on tree-edit-distance-based similarity (TEDS) metric while requiring as little as 1,000 training examples - where it outperforms end-to-end fine-tuning by up to 0.144 TEDS points. We term this approach last-mile fine-tuning and show it al"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"On a benchmark of 2,596 tables from three datasets, Lift matches or exceeds end-to-end SLM fine-tuning on tree-edit-distance-based similarity (TEDS) metric while requiring as little as 1,000 training examples - where it outperforms end-to-end fine-tuning by up to 0.144 TEDS points.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That errors produced by the pre-trained LLM's initial extraction are consistently repairable by the fine-tuned SLM in a manner that generalizes across the three datasets and varying input formats without the repair step introducing new systematic errors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LIFT pairs a pre-trained LLM for initial table extraction with a fine-tuned SLM for error repair, matching end-to-end SLM fine-tuning on TEDS while needing only 1,000 examples and gaining robustness.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Last-mile fine-tuning pairs a pre-trained LLM for initial table extraction with a fine-tuned SLM that repairs errors, matching end-to-end SLM fine-tuning on TEDS while using as few as 1,000 examples.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"f23b8d6c11ea8320e3f00e6ad9adf4cf338ce201536746634dc1daf084aaf419"},"source":{"id":"2605.13424","kind":"arxiv","version":1},"verdict":{"id":"05e591d6-1012-49ba-9a09-04023f18202e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:39:01.435928Z","strongest_claim":"On a benchmark of 2,596 tables from three datasets, Lift matches or exceeds end-to-end SLM fine-tuning on tree-edit-distance-based similarity (TEDS) metric while requiring as little as 1,000 training examples - where it outperforms end-to-end fine-tuning by up to 0.144 TEDS points.","one_line_summary":"LIFT pairs a pre-trained LLM for initial table extraction with a fine-tuned SLM for error repair, matching end-to-end SLM fine-tuning on TEDS while needing only 1,000 examples and gaining robustness.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That errors produced by the pre-trained LLM's initial extraction are consistently repairable by the fine-tuned SLM in a manner that generalizes across the three datasets and varying input formats without the repair step introducing new systematic errors.","pith_extraction_headline":"Last-mile fine-tuning pairs a pre-trained LLM for initial table extraction with a fine-tuned SLM that repairs errors, matching end-to-end SLM fine-tuning on TEDS while using as few as 1,000 examples."},"references":{"count":36,"sample":[{"doi":"","year":2023,"title":"GriTS: Grid Table Similarity Metric for Table Structure Recognition","work_id":"70ea4a27-3901-45cc-a0cc-cfd136b6faf2","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Image-Based Table Recognition: Data, Model, and Evaluation","work_id":"207f87ae-a1dc-49cd-939e-ecfba4975b98","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Optimized Table Tokenization for Table Structure Recognition","work_id":"31165aef-3b7a-4428-9a6c-c4c838594d56","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"Complicated Table Structure Recognition , author=. 2019 , eprint=","work_id":"46718712-ff34-4d79-97fc-08dec8d62a72","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2022.acl-long.180","year":2022,"title":"Text-to-Table: A New Way of Information Extraction","work_id":"ea91300b-fe92-402b-9967-9e0082b11ffe","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":36,"snapshot_sha256":"6f1c2e0cfb49117e8c69f253b1906dc1f9398074fd49191f8358f059967f6007","internal_anchors":3},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}