{"paper":{"title":"Sampling Algorithms and Coresets for Lp Regression","license":"","headline":"","cross_cats":[],"primary_cat":"cs.DS","authors_text":"Anirban Dasgupta, Boulos Harb, Michael W. Mahoney, Petros Drineas, Ravi Kumar","submitted_at":"2007-07-11T22:04:18Z","abstract_excerpt":"The Lp regression problem takes as input a matrix $A \\in \\Real^{n \\times d}$, a vector $b \\in \\Real^n$, and a number $p \\in [1,\\infty)$, and it returns as output a number ${\\cal Z}$ and a vector $x_{opt} \\in \\Real^d$ such that ${\\cal Z} = \\min_{x \\in \\Real^d} ||Ax -b||_p = ||Ax_{opt}-b||_p$. In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained ($n \\gg d$) version of this classical problem, for all $p \\in [1, \\infty)$. The first stage of our algorithm non-uniformly samples $\\hat{r}_1 = O(36^p d^{\\max\\{p/2+1, p"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"0707.1714","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/0707.1714/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}