Challenges of real- world reinforcement learning,

· 2021 · DOI 10.1007/s10994-021-05961-4

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Bridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution Semantics

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

Introduces an execution semantics layer for event-driven industrial dispatching that constructs valid decision snapshots, standardizes action admissibility, and attributes multi-level execution divergences to reduce sim-to-real mismatch in RL policies.

GIFT: Global stabilisation via Intrinsic Fine Tuning

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

GIFT fine-tunes deep RL policies with a stability-focused reward to improve global stability while preserving task performance.

Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning

cs.LG · 2026-06-03 · unverdicted · novelty 4.0

A hybrid DRL system for multi-pair crypto trading with deterministic risk shielding outperforms a heuristic baseline at 10% significance on Binance futures data.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Bridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution Semantics cs.AI · 2026-05-27 · unverdicted · none · ref 7
Introduces an execution semantics layer for event-driven industrial dispatching that constructs valid decision snapshots, standardizes action admissibility, and attributes multi-level execution divergences to reduce sim-to-real mismatch in RL policies.
GIFT: Global stabilisation via Intrinsic Fine Tuning cs.LG · 2026-04-25 · unverdicted · none · ref 5
GIFT fine-tunes deep RL policies with a stability-focused reward to improve global stability while preserving task performance.
Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning cs.LG · 2026-06-03 · unverdicted · none · ref 10
A hybrid DRL system for multi-pair crypto trading with deterministic risk shielding outperforms a heuristic baseline at 10% significance on Binance futures data.

Challenges of real- world reinforcement learning,

fields

years

verdicts

representative citing papers

citing papers explorer