pith. machine review for the scientific record. sign in
Pith Number

pith:TTCAXWKN

pith:2019:TTCAXWKNNHEEERLWX32TQW4MZN
not attested not anchored not stored refs resolved

Benchmarking Batch Deep Reinforcement Learning Algorithms

Edoardo Conti, Joelle Pineau, Mohammad Ghavamzadeh, Scott Fujimoto

Many batch deep RL algorithms underperform online DQN and the behavioral policy itself when trained on fixed Atari data from one policy.

arxiv:1910.01708 v1 · 2019-10-03 · cs.LG · cs.AI · stat.ML

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

under these conditions, many of these algorithms underperform DQN trained online with the same amount of data, as well as the partially-trained behavioral policy. ... we adapt the Batch-Constrained Q-learning algorithm to a discrete-action setting, and show it outperforms all existing algorithms at this task.

C2weakest assumption

That data generated by a single partially-trained behavioral policy under unified settings produces a representative and fair testbed for comparing batch RL algorithms.

C3one line summary

Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.

References

18 extracted · 18 resolved · 7 Pith anchors

[1] Striving for simplicity in off-policy deep reinforcement learning.arXiv preprint arXiv:1907.04543 1907
[2] Exploration by random network distillation · arXiv:1810.12894
[3] Dopamine: A research framework for deep reinforcement learning · arXiv:1812.06110
[4] Bellemare, and R´ emi Munos · arXiv:1710.10044
[5] Off-policy deep reinforcement learning without exploration 2052

Cited by

17 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.296922Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

9cc40bd94d69c8424576bef5385b8ccb5ae43f3e8fcd2a64912a0a6985a1e466

Aliases

arxiv: 1910.01708 · arxiv_version: 1910.01708v1 · doi: 10.48550/arxiv.1910.01708 · pith_short_12: TTCAXWKNNHEE · pith_short_16: TTCAXWKNNHEEERLW · pith_short_8: TTCAXWKN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TTCAXWKNNHEEERLWX32TQW4MZN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9cc40bd94d69c8424576bef5385b8ccb5ae43f3e8fcd2a64912a0a6985a1e466
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "52bc3739e1831baf67e8a14a5a40e7555d91c254c91c18d67b8f2c5ea670ac68",
    "cross_cats_sorted": [
      "cs.AI",
      "stat.ML"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2019-10-03T20:15:55Z",
    "title_canon_sha256": "b288aedd284dd5187b25b9614e70486df3cd2fa1a6c57e096f443186d963459a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "1910.01708",
    "kind": "arxiv",
    "version": 1
  }
}