pith:3CF6TPWU
Filter-then-Weight: Online Data Selection and Reweighting for LLM Fine-Tuning
An optimizer-aware Filter-then-Weight method improves convergence in online LLM fine-tuning by matching updates to the current optimizer state.
arxiv:2604.00001 v2 · 2026-03-08 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3CF6TPWUXOC2BJKUVWBF2JKX26}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Experiments show that our method consistently improves convergence and downstream performance over existing online data selection baselines under the same data budget.
That the optimizer-aware update-matching formulation correctly captures sample utility and that the two-stage filter-plus-weight procedure can be computed efficiently without introducing new biases for long-context LLM data.
Filter-then-Weight is a two-stage optimizer-aware method that filters geometrically useful data candidates and optimizes their coefficients to shape target updates in online LLM fine-tuning.
References
Cited by
Receipt and verification
| First computed | 2026-05-18T03:10:03.368690Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d88be9bed4bb85a0a554ad825d2557d7a8cd03f7ac0ff01b41054b5386f7566c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3CF6TPWUXOC2BJKUVWBF2JKX26 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d88be9bed4bb85a0a554ad825d2557d7a8cd03f7ac0ff01b41054b5386f7566c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f90620ab9f73f6ef880cec6e7db56e159c3020357f0c959f11f48893daa59db5",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-03-08T21:46:16Z",
"title_canon_sha256": "1b5148e548e6b4015b9c36280d09946883ff2a59ef212dc0770c29192d5c4b8c"
},
"schema_version": "1.0",
"source": {
"id": "2604.00001",
"kind": "arxiv",
"version": 2
}
}