pith. sign in
Pith Number

pith:F6OZ7UIL

pith:2026:F6OZ7UILPGHPC6HGBEMQP4MSQR
not attested not anchored not stored refs resolved

Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies

Ann Now\'e, Mehrdad Asadi, Senne Deproost

Linear support vector machine splits distill black-box reinforcement learning policies into fewer interpretable subpolicies with higher returns.

arxiv:2605.04254 v2 · 2026-05-05 · cs.LG · cs.HC

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{F6OZ7UILPGHPC6HGBEMQP4MSQR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our method improves mean return by +7.4% over previous critic driven state partitioning attempts such as Voronoi State Partitioning (VSP) and +2.8% over the original TD3 policy, while reducing the number of required subpolicies against VSP by 82.1%.

C2weakest assumption

That linear SVM splits on a distillation dataset of state-action pairs will reliably produce a compact hierarchical set of human-interpretable subpolicies that accurately mimic the original black-box policy behavior.

C3one line summary

SVSP partitions distillation datasets with linear SVMs to create compact interpretable subpolicies, reporting +7.4% better mean return than VSP and +2.8% over TD3 while using 82.1% fewer subpolicies.

References

8 extracted · 8 resolved · 0 Pith anchors

[1] Ribeiro, M., Singh, S. & Guestrin, C. ” Why should i trust you?” Explaining the predictions of any classifier.Proceedings Of The 22nd ACM SIGKDD International Conference On Knowledge Discovery And Dat 2016
[2] Deproost, S., Steckelmacher, D. & Now ´e, A. Explainable RL Policies by Distilling to Locally- Specialized Linear Policies with V oronoi State Partitioning.ArXiv Preprint ArXiv:2511.13322. (2025) 2025
[3] Kohler, H., Delfosse, Q., Akrour, R., Kersting, K. & Preux, P. Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning. (2024,10,28) 2024
[4] Coppens, Y ., Efthymiadis, K., Lenaerts, T., Now ´e, A., Miller, T., Weber, R. & Magazzeni, D. Distilling deep reinforcement learning policies in soft decision trees.Proceedings Of The IJCAI 2019 Work 2019
[5] Blanco, V ., Jap ´on, A. & Puerto, J. Multiclass optimal classification trees with svm-splits.Machine Learning.112, 4905-4928 (2023) 2023
Receipt and verification
First computed 2026-05-20T00:00:40.656177Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2f9d9fd10b798ef178e6091907f1928445fecad6b184143ea267bb824fe26847

Aliases

arxiv: 2605.04254 · arxiv_version: 2605.04254v2 · doi: 10.48550/arxiv.2605.04254 · pith_short_12: F6OZ7UILPGHP · pith_short_16: F6OZ7UILPGHPC6HG · pith_short_8: F6OZ7UIL
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/F6OZ7UILPGHPC6HGBEMQP4MSQR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2f9d9fd10b798ef178e6091907f1928445fecad6b184143ea267bb824fe26847
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c820fbc7ba95f475bed269d17cafa20ca108aebaba1cf2172a4f9f77307f28aa",
    "cross_cats_sorted": [
      "cs.HC"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-05T19:40:05Z",
    "title_canon_sha256": "90b3f301a6e07a8ad0a702478aa2b34dd15ddaecbcc30e97482a58d252860d2e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.04254",
    "kind": "arxiv",
    "version": 2
  }
}