Pith Number

pith:6M3XI6D7

pith:2026:6M3XI6D7SPT3HU3RD7HVRDDWYT

not attested not anchored not stored refs resolved

LoRIF: Low-Rank Influence Functions for Scalable Training Data Attribution

Hieu Le, Jingyi Xu, Mathieu Salzmann, Shuangqi Li

LoRIF stores low-rank factors of projected gradients and approximates the Hessian inverse in a reduced subspace to scale influence functions for training data attribution.

arxiv:2601.21929 v2 · 2026-01-29 · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{6M3XI6D7SPT3HU3RD7HVRDDWYT}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On models from 0.1B to 70B parameters trained on datasets with millions of examples, LoRIF achieves up to 20× storage reduction and query-time speedup compared to LoGRA, while matching or exceeding its attribution quality.

C2weakest assumption

That the low-rank structure present in projected gradients is preserved well enough after rank-c truncation and r-dimensional Hessian approximation that attribution scores remain faithful to the full influence function for the target models and datasets.

C3one line summary

LoRIF reduces storage and query latency for gradient-based training data attribution from O(D) to O(c sqrt(D)) per sample and Hessian memory from O(D^2) to O(Dr) while preserving attribution quality on models up to 70B parameters.

References

18 extracted · 18 resolved · 0 Pith anchors

[1] Sample subsets:Generate M random subsets {S m}M m=1 of the training data, each containing a fraction α of the full dataset. 2.Compute outputs:For each queryx query and subsetS m: •Actual output:Retrai 2024

[2] For LoRIF, this includes solving the rank-c factorization via power iteration 2023

[3] Stage 2: Inverse Hessian approximation.For LoGRA, form and store (G⊤G+λI) −1 per layer. For LoRIF, perform randomized SVD to obtainV r andΣ r, then store them. Tables 5, 6, and 7 report preprocessing

[4] A is sitting opposite to D

[5] B is sitting opposite to F

Formal links

1 machine-checked theorem link

Receipt and verification

First computed	2026-05-17T23:39:16.521480Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

f33774787f93e7b3d3711fcf588c76c4ef8fb134b6edf26ac60b5a799067cf5f

Aliases

arxiv: 2601.21929 · arxiv_version: 2601.21929v2 · doi: 10.48550/arxiv.2601.21929 · pith_short_12: 6M3XI6D7SPT3 · pith_short_16: 6M3XI6D7SPT3HU3R · pith_short_8: 6M3XI6D7

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/6M3XI6D7SPT3HU3RD7HVRDDWYT \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f33774787f93e7b3d3711fcf588c76c4ef8fb134b6edf26ac60b5a799067cf5f

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "5cb14c34bedd3281c425eef76371d291959ef5adbcdf692d26ccdeb62c3286a9",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-01-29T16:18:34Z",
    "title_canon_sha256": "78dd33a28004107dd7c42ab3b1c93fa37ede3da53d4fce531824ed5f4229e028"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.21929",
    "kind": "arxiv",
    "version": 2
  }
}