pith. sign in

arxiv: 1906.10918 · v1 · pith:C6LBBW4Mnew · submitted 2019-06-26 · 💻 cs.LG · cs.AI· cs.NE

Towards Empathic Deep Q-Learning

Pith reviewed 2026-05-25 15:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NE
keywords empathic deep q-learningdeep q-networksreinforcement learningai safetymachine ethicsgridworld environmentscollateral harms
0
0 comments X

The pith

Empathic DQN decreases collateral harms to other agents by combining its value estimate with an empathy term from swapped-position states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Empathic DQN as an extension to standard Deep Q-Networks aimed at reducing unintended negative effects on other agents in shared environments. The method adds an empathy component by having the learning agent evaluate constructed states in which it and the other agents have exchanged positions, then combines that estimate with its own value. The approach assumes some reward signals like harm penalties apply across agents and draws from ethical ideas such as the golden rule. A reader would care because scalable reinforcement learning needs mechanisms to limit myopic damage without requiring fully specified rewards for every interaction. Early gridworld tests show the extension can lower collateral harms while keeping the core learning process intact.

Core claim

Empathic DQN combines the typical self-centered value with the estimated value of other agents by imagining the value of it being in the other's situation through constructed states where both agents are swapped, with the goal of mitigating negative side effects from myopic goal-directed behavior where some rewards generalize across agents.

What carries the argument

Empathy term obtained by evaluating constructed states in which the learning agent and other agents have swapped positions, added to the agent's own value estimate.

If this is right

  • Collateral harms to other agents decrease in the two tested gridworld environments.
  • The method supplies a prior for agents that abide by norms without explicit per-interaction reward terms.
  • Extending Empathic DQN to complex environments remains non-trivial but follows the same combination of self-value and swapped-state empathy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The swapped-state construction could be tested for robustness when the number of coexisting agents grows beyond the simple cases shown.
  • Hybrid versions might combine this empathy term with other safety techniques that operate on different assumptions about reward sharing.
  • Environments where dynamics break under position swaps would expose whether the method requires additional state-construction safeguards.

Load-bearing premise

Reward signals such as negative rewards from physical harm generalize across agents, and the learning agent can accurately construct and evaluate swapped states that preserve the relevant dynamics.

What would settle it

Run the method in an environment where other agents receive distinct rewards that do not match the learner's harm penalties, and check whether collateral harms fail to decrease or increase.

Figures

Figures reproduced from arXiv: 1906.10918 by Bart Bussmann, Jacqueline Heinerman, Joel Lehman.

Figure 1
Figure 1. Figure 1: The coexistence environment. The environment consists [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Average steps survived by the robot in the coexistence en [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: The sharing environment. The environment consists of the [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Average number of batteries collected (per episode) in the [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Equality scores (per episode) in the sharing environment, [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
read the original abstract

As reinforcement learning (RL) scales to solve increasingly complex tasks, interest continues to grow in the fields of AI safety and machine ethics. As a contribution to these fields, this paper introduces an extension to Deep Q-Networks (DQNs), called Empathic DQN, that is loosely inspired both by empathy and the golden rule ("Do unto others as you would have them do unto you"). Empathic DQN aims to help mitigate negative side effects to other agents resulting from myopic goal-directed behavior. We assume a setting where a learning agent coexists with other independent agents (who receive unknown rewards), where some types of reward (e.g. negative rewards from physical harm) may generalize across agents. Empathic DQN combines the typical (self-centered) value with the estimated value of other agents, by imagining (by its own standards) the value of it being in the other's situation (by considering constructed states where both agents are swapped). Proof-of-concept results in two gridworld environments highlight the approach's potential to decrease collateral harms. While extending Empathic DQN to complex environments is non-trivial, we believe that this first step highlights the potential of bridge-work between machine ethics and RL to contribute useful priors for norm-abiding RL agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Empathic DQN as an extension to standard Deep Q-Networks. It augments the agent's self-value estimate with an empathy term obtained by evaluating the same Q-network on constructed states in which the learning agent and other agents have swapped positions. The goal is to reduce collateral harms to other agents whose rewards are unknown, under the assumption that certain reward signals (e.g., physical harm) generalize across agents. Proof-of-concept demonstrations are provided in two gridworld environments.

Significance. If the core mechanism proves robust, the work supplies a concrete, self-contained way to inject a golden-rule-style prior into RL without requiring external reward models or additional parameters. The explicit construction of the empathy term from the agent's own network and the same reward function is a clear design choice that avoids hidden circularity. The bridge between machine ethics and RL is a positive contribution even at the proof-of-concept stage.

major comments (2)
  1. [Abstract] Abstract (paragraph beginning 'We assume a setting...'): The central claim that Empathic DQN decreases collateral harms rests on the untested assumptions that (a) negative rewards from physical harm generalize across agents and (b) swapped-state construction preserves transition dynamics, action effects, and observability. No counter-examples, sensitivity analysis, or asymmetric environments are supplied, so the gridworld results cannot distinguish the contribution of the empathy term from environmental symmetry.
  2. [Abstract] Abstract: The statement that the approach 'highlights the potential to decrease collateral harms' is supported solely by qualitative demonstrations. No quantitative metrics, error bars, ablation studies (e.g., Empathic DQN vs. standard DQN), or statistical comparisons are reported, leaving the load-bearing empirical claim without measurable evidence.
minor comments (1)
  1. [Abstract] The abstract introduces the empathy term via natural-language description but does not supply an explicit equation or pseudocode for how the swapped-state value is combined with the self-value; adding this would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive assessment of the core idea. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph beginning 'We assume a setting...'): The central claim that Empathic DQN decreases collateral harms rests on the untested assumptions that (a) negative rewards from physical harm generalize across agents and (b) swapped-state construction preserves transition dynamics, action effects, and observability. No counter-examples, sensitivity analysis, or asymmetric environments are supplied, so the gridworld results cannot distinguish the contribution of the empathy term from environmental symmetry.

    Authors: The manuscript is framed as a proof-of-concept and states the assumptions explicitly in the abstract. The symmetric gridworlds were chosen deliberately to isolate the empathy mechanism. We agree the current experiments do not fully separate the empathy term from symmetry effects and will revise the abstract to qualify the claims more carefully while adding a limitations discussion on the assumptions and the need for asymmetric test cases. revision: partial

  2. Referee: [Abstract] Abstract: The statement that the approach 'highlights the potential to decrease collateral harms' is supported solely by qualitative demonstrations. No quantitative metrics, error bars, ablation studies (e.g., Empathic DQN vs. standard DQN), or statistical comparisons are reported, leaving the load-bearing empirical claim without measurable evidence.

    Authors: The demonstrations are qualitative because the work is positioned as an initial proof-of-concept. We accept that quantitative support would strengthen the empirical claim and will add ablation comparisons (Empathic DQN vs. standard DQN), quantitative collateral-harm metrics, and error bars across runs in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; Empathic DQN is an explicit design choice with stated assumptions

full rationale

The paper proposes Empathic DQN as a method that augments standard DQN value estimates with an empathy term computed by evaluating the agent's own Q-network on explicitly constructed swapped-position states. This construction is presented as a deliberate architectural extension inspired by the golden rule, not as a derivation or prediction that reduces to its own inputs. The abstract explicitly states the assumptions (reward generalization across agents and accurate swapped-state construction) rather than deriving them. No equations, fitted parameters, or self-citations are shown to create load-bearing circularity; the proof-of-concept results in gridworlds follow directly from the defined procedure without renaming or smuggling prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The approach rests on two domain assumptions and one invented modeling choice with no independent evidence supplied.

axioms (2)
  • domain assumption Some reward signals (e.g., negative rewards from physical harm) generalize across agents.
    Stated explicitly in the abstract as the setting assumption required for the empathy term to be meaningful.
  • domain assumption The learning agent can construct swapped states that preserve the relevant transition dynamics for other agents.
    Implicit in the description of how the empathy term is computed via state swapping.
invented entities (1)
  • Empathy term computed via agent swapping no independent evidence
    purpose: To estimate the value an other agent would receive according to the learning agent's own reward function.
    New modeling device introduced in the paper; no external falsifiable handle is provided beyond the gridworld demonstrations.

pith-pipeline@v0.9.0 · 5748 in / 1447 out tokens · 14779 ms · 2026-05-25T15:31:14.462329+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 6 internal anchors

  1. [1]

    Concrete Problems in AI Safety

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man ´e. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565,

  2. [2]

    Low Impact Artificial Intelligences

    Stuart Armstrong and Benjamin Levinstein. Low impact artificial intelligences. arXiv preprint arXiv:1705.10720 ,

  3. [3]

    Embedded Agency

    Abram Demski and Scott Garrabrant. Embedded agency. arXiv preprint arXiv:1902.09469,

  4. [4]

    AGI Safety Literature Review

    Tom Everitt, Gary Lea, and Marcus Hutter. Agi safety litera- ture review. arXiv preprint arXiv:1805.01109,

  5. [5]

    Penalizing side effects using stepwise relative reachability

    Victoria Krakovna, Laurent Orseau, Miljan Martic, and Shane Legg. Measuring and avoiding side effects using rel- ative reachability. arXiv preprint arXiv:1806.01186,

  6. [6]

    The surprising creativity of digital evolution: A col- lection of anecdotes from the evolutionary computation and artificial life research communities

    Joel Lehman, Jeff Clune, Dusan Misevic, Christoph Adami, Lee Altenberg, Julie Beaulieu, Peter J Bentley, Samuel Bernard, Guillaume Beslon, David M Bryson, et al. The surprising creativity of digital evolution: A col- lection of anecdotes from the evolutionary computation and artificial life research communities. arXiv preprint arXiv:1803.03453,

  7. [7]

    Scalable agent alignment via reward modeling: a research direction

    Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871,

  8. [8]

    Modeling Others using Oneself in Multi-Agent Reinforcement Learning

    Roberta Raileanu, Emily Denton, Arthur Szlam, and Rob Fergus. Modeling others using oneself in multi-agent re- inforcement learning. arXiv preprint arXiv:1802.09640 ,

  9. [9]

    Trial without error: Towards safe reinforce- ment learning via human intervention

    William Saunders, Girish Sastry, Andreas Stuhlmueller, and Owain Evans. Trial without error: Towards safe reinforce- ment learning via human intervention. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems , pages 2067–2069. International Foundation for Autonomous Agents and Multiagent Sys- tems,

  10. [10]

    Third-person imitation learning

    Bradly C Stadie, Pieter Abbeel, and Ilya Sutskever. Third-person imitation learning. arXiv preprint arXiv:1703.01703,

  11. [11]

    Conservative agency via attainable utility preservation

    Alexander Matt Turner, Dylan Hadfield-Menell, and Prasad Tadepalli. Conservative agency via attainable utility preservation. arXiv preprint arXiv:1902.09725,

  12. [12]

    Towards an ethical robot: internal models, consequences and ethi- cal action selection

    Alan FT Winfield, Christian Blum, and Wenguo Liu. Towards an ethical robot: internal models, consequences and ethi- cal action selection. In Conference towards autonomous robotic systems, pages 85–96. Springer, 2014