Joint Scheduling of Deferrable and Nondeferrable Demand with Colocated Stochastic Supply

Lang Tong; Minjae Jeon; Qing Zhao

arxiv: 2507.09794 · v2 · submitted 2025-07-13 · 📡 eess.SY · cs.SY

Joint Scheduling of Deferrable and Nondeferrable Demand with Colocated Stochastic Supply

Minjae Jeon , Lang Tong , Qing Zhao This is my paper

Pith reviewed 2026-05-19 04:45 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords demand schedulingdeferrable loadsstochastic supplyprinciple of procrastinationMarkov decision processreinforcement learningsmart gridpiecewise linear pricing

0 comments

The pith

Under deterministic piecewise-linear retail prices, optimal deferrable demand scheduling reduces to three procrastination parameters per demand class.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies joint scheduling of randomly arriving deferrable loads that can be delayed up to deadlines, always-present nondeferrable loads whose service level depends on price, and colocated stochastic zero-cost supply that can meet local demand or be exported. Because arrivals and supply are random, the problem is a Markov decision process with continuous state and action spaces. Under the assumption that retail prices are deterministic, time-varying, and piecewise linear, the authors establish that the optimal policy obeys a Principle of Procrastination. This structural result collapses the policy search to a low-dimensional Euclidean space parameterized by three procrastination thresholds for each deferrable demand class. They further propose a reinforcement-learning procedure to learn these thresholds from data when the underlying distributions are unknown.

Core claim

Under deterministic, time-varying, and piecewise-linear retail pricing, the optimal demand scheduling policy follows the Principle of Procrastination, which reduces the infinite-dimensional policy space to a finite-dimensional Euclidean space defined by three procrastination parameters for each deferrable demand.

What carries the argument

The Principle of Procrastination: the structural property that the optimal policy defers service according to three simple threshold parameters per deferrable demand class, turning the continuous-state MDP into a finite-dimensional optimization problem.

If this is right

The policy search space shrinks from infinite-dimensional functions to a finite number of scalar parameters, one set of three per deferrable demand class.
A Procrastination Threshold Reinforcement Learning algorithm can learn the parameters from samples when arrival and supply distributions are unknown.
Numerical tests on real-world data show the learned thresholds closely approximate the optimal policy and outperform standard benchmark schedulers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The three-parameter reduction may extend to deadline-constrained resource allocation problems outside electricity, such as compute job scheduling or vehicle routing, whenever costs are piecewise linear.
Implementation in a real smart-grid controller would require only storing and updating three numbers per demand class, making online recomputation feasible at scale.
Varying the number of linear segments in the price function would test how the required number of procrastination parameters grows.

Load-bearing premise

The retail electricity price is deterministic, known in advance, and piecewise linear.

What would settle it

Solve the MDP explicitly for a small instance with non-piecewise-linear prices and check whether the resulting policy can still be represented exactly by three procrastination parameters per demand class.

Figures

Figures reproduced from arXiv: 2507.09794 by Lang Tong, Minjae Jeon, Qing Zhao.

**Figure 1.** Figure 1: A household with deferrable EV charging demand and behind-the-meter DG. The arrow indicates the direction of power flow when the associated variable is positive. The problem is to schedule the optimal quantity served (dt, vt) given the realized random DG gt. For future reference, designated symbols are listed in Table I. TABLE I NOTATIONS FOR MAJOR VARIABLES Symbol Descriptions at Action dt, d¯ Consumptio… view at source ↗

**Figure 2.** Figure 2: Procrastination scheduling with deferrable demand under time-invariant NEM. Left: DG level gt ≤ v¯. Right: Right: DG level gt > v¯. afar (segment ⃝2 ), it is optimal to procrastinate to purchase power, serving the demand using all local generation. When the remaining demand is high and deadline is near, the incompletion penalty is unavoidable unless it is reduced with purchased power as shown in segment ⃝3… view at source ↗

**Figure 5.** Figure 5: Threshold and priority structures of the optimal policy. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 4.** Figure 4: Net-consumption zones, the optimal total demand d ∗ t := 1 ⊤dt, and the (optimal) net consumption; z ∗ t := d ∗ t − gt. D. Structure of the Optimal Scheduling Policy Our main result given in Theorem 2 below and proved in Appendix A is that the optimal scheduling policy is defined by three (procrastination) parameters θt := (θ − t , θ0 t , θ+ t ), all functions of the state xt = (yt, gt). These parameters d… view at source ↗

**Figure 6.** Figure 6: Threshold learning algorithm based on SAC method. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: Average cumulative reward of 2,000 Monte Carlo runs [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Average cumulative reward of 2,000 Monte Carlo runs [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Average cumulative reward of 400 Monte Carlo runs [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: Energy mix ratio of the completed deferrable loads [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

read the original abstract

We investigate the problem of serving deferrable and nondeferrable electric demands with colocated stochastic supply and grid-imported electricity. Deferrable demands arrive randomly and can be delayed within their service deadlines. Nondeferrable demands are always present and must be served immediately, but the quantity served depends on the cost of electricity. Colocated supply is stochastic with zero marginal cost. It can be used to meet demand or exported to the grid to maximize profit. The stochasticity of demands and local supply makes optimal scheduling a Markov decision process with continuous (uncountable) state and action spaces. Under deterministic, time-varying, and piecewise-linear retail pricing of electricity, we show that the optimal demand scheduling follows the {\em Principle of Procrastination}, which reduces the infinite-dimensional policy space to a finite-dimensional Euclidean space defined by three procrastination parameters for each deferrable demand. For settings in which the underlying probability distributions are unknown, we propose a {\em Procrastination Threshold Reinforcement Learning} algorithm. Numerical experiments based on real-world test data confirm that the proposed threshold learning algorithm closely approximates the optimal policy and outperforms standard benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives a clean structural reduction of the optimal policy to three procrastination parameters per deferrable class under piecewise-linear pricing, plus a matching RL algorithm that works on real data.

read the letter

The main takeaway is that under deterministic time-varying piecewise-linear retail prices the optimal scheduling policy for this mix of deferrable loads, nondeferrable loads, and colocated stochastic supply collapses to three scalar parameters per deferrable demand class. That is the Principle of Procrastination result. It turns an infinite-dimensional policy into something low-dimensional and therefore easier to implement or learn. They also give a Procrastination Threshold RL algorithm that learns exactly those three parameters when the underlying distributions are unknown, and the real-data experiments show it tracks the optimal policy closely while beating standard benchmarks. That combination of structural insight and a practical learning method is the useful part of the work. The MDP setup itself is standard for this setting, but the reduction is the step that matters for control applications in microgrids or buildings. The experiments use real test data rather than purely synthetic cases, which adds some credibility to the performance claims. On the soft spots, everything rests on the prices being known in advance and piecewise linear. If prices have different shapes or become stochastic, the three-parameter form is unlikely to survive. The stress-test concern about supply dependence is worth checking in the full derivation: because the current supply realization is observed and affects marginal cost, one might expect the thresholds to shift with supply. The paper asserts that three fixed parameters still characterize the optimum, so the proof must show that the structure is invariant to the supply state. If that invariance is shown explicitly, the claim holds; if it is only implicit, it would be a natural point for referees to probe. This is a paper for researchers working on stochastic optimization or real-time demand response in power systems. A reader who cares about structural MDP results or practical learning for energy scheduling will get something concrete from it. It is worth sending to peer review because the central reduction is specific, the algorithm is tailored to the structure, and the experiments are on real data, even if the pricing assumption limits the scope.

Referee Report

1 major / 2 minor

Summary. The manuscript studies joint scheduling of deferrable and nondeferrable electric demands served by colocated stochastic supply and grid imports under deterministic time-varying piecewise-linear retail pricing. It derives that the optimal policy obeys the Principle of Procrastination, which collapses the infinite-dimensional policy space of the underlying continuous-state MDP to a finite-dimensional parameterization consisting of three procrastination parameters per deferrable demand class. For unknown distributions the authors introduce a Procrastination Threshold Reinforcement Learning algorithm and report numerical experiments on real-world data showing that the learned policy closely approximates the optimum and outperforms standard benchmarks.

Significance. If the structural result is correct, the reduction from an uncountable policy space to three scalar parameters per demand class is a meaningful contribution to stochastic optimal control for energy systems; it directly enables both exact dynamic programming on the reduced space and the design of the proposed RL method. The explicit use of the piecewise-linear price assumption to obtain the procrastination property, together with the real-data validation, strengthens the practical relevance for smart-grid scheduling.

major comments (1)

[Optimal policy derivation] The derivation of the Principle of Procrastination (abstract and the section presenting the optimal policy) must explicitly establish that the three procrastination parameters remain invariant to the realized stochastic supply state. Because the supply is observed before the scheduling decision and has zero marginal cost, the effective marginal cost of serving deferrable load at any instant is the minimum of the retail price and the opportunity cost of forgoing export; if this dependence is not shown to preserve the threshold structure, the claimed reduction to a supply-independent three-parameter policy may not hold.

minor comments (2)

[Abstract] The abstract states that the policy is reduced to 'three procrastination parameters' but does not name or define them; adding a one-sentence definition or a forward reference to the equation that introduces them would improve clarity for readers.
[Numerical experiments] Numerical experiments section: reporting the number of independent runs and standard-error bars on the performance metrics would allow readers to assess the statistical reliability of the claim that the learned policy 'closely approximates' the optimum.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review of our manuscript. The single major comment raises an important point about explicitly establishing the invariance of the procrastination parameters to the observed supply state, and we address it directly below.

read point-by-point responses

Referee: [Optimal policy derivation] The derivation of the Principle of Procrastination (abstract and the section presenting the optimal policy) must explicitly establish that the three procrastination parameters remain invariant to the realized stochastic supply state. Because the supply is observed before the scheduling decision and has zero marginal cost, the effective marginal cost of serving deferrable load at any instant is the minimum of the retail price and the opportunity cost of forgoing export; if this dependence is not shown to preserve the threshold structure, the claimed reduction to a supply-independent three-parameter policy may not hold.

Authors: We appreciate the referee highlighting the need for greater explicitness on this point. In the derivation of the Principle of Procrastination, the optimal policy for each deferrable demand class is characterized by three thresholds that dictate whether to serve the demand immediately or procrastinate, based on the remaining time to deadline and the time-varying piecewise-linear price segments. Although supply is observed and has zero marginal cost, the effective marginal cost for the procrastination decision equals the deterministic retail price when grid import is required or the (deterministic) forgone export revenue when local supply is used. Because both the retail price schedule and the export opportunity are deterministic and independent of the realized supply quantity, the comparison between current effective cost and expected future costs remains unchanged by the specific supply realization. The thresholds are therefore computed solely from the price function and deadline structure, rendering them invariant to the supply state. We acknowledge that the current manuscript states this invariance implicitly through the overall policy reduction but does not isolate it in a dedicated remark or lemma. In the revised version we will insert an explicit paragraph (or short lemma) immediately after the statement of the Principle of Procrastination that formally shows the supply-state independence of the three parameters per demand class, thereby confirming that the finite-dimensional parameterization is preserved. revision: yes

Circularity Check

0 steps flagged

Derivation is self-contained; no circularity in structural result or parameter learning.

full rationale

The paper formulates the problem as an MDP with continuous state-action spaces and derives the Principle of Procrastination as a structural property under the explicit assumptions of deterministic, time-varying, piecewise-linear retail pricing. This reduces the policy to three procrastination parameters per deferrable demand class via mathematical analysis of the optimality conditions rather than by redefining inputs or fitting to the target outcome. The subsequent Procrastination Threshold RL algorithm learns those parameters from data when distributions are unknown, which is a standard estimation step and does not presuppose the result. No load-bearing step reduces by construction to a self-citation, fitted input renamed as prediction, or ansatz smuggled via prior work; the central claim remains independent of the fitted values and is falsifiable against the pricing assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The paper rests on standard MDP formulation for stochastic scheduling, the assumption of piecewise-linear deterministic prices, and the existence of an optimal policy in the continuous-state MDP; no new physical entities are postulated.

free parameters (1)

three procrastination parameters per deferrable demand class
These parameters define the reduced policy space; they are learned or optimized rather than derived from first principles.

axioms (2)

domain assumption Retail electricity price is deterministic, time-varying, and piecewise linear
Invoked to establish the Principle of Procrastination (abstract).
standard math The joint process of random demand arrivals, stochastic supply, and nondeferrable demand is Markovian
Standard modeling choice for MDP formulation of scheduling.

pith-pipeline@v0.9.0 · 5731 in / 1551 out tokens · 30994 ms · 2026-05-19T04:45:51.882883+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 3 internal anchors

[1]

Scheduling power consumption with price uncertainty,

T. T. Kim and H. V . Poor, “Scheduling power consumption with price uncertainty,”IEEE Trans. Smart Grid, vol. 2, no. 3, pp. 519–527, 2011

work page 2011
[2]

Optimal deadline scheduling for electric vehicle charging with energy storage and random supply,

J. Jin, Y . Xu, and Z. Yang, “Optimal deadline scheduling for electric vehicle charging with energy storage and random supply,” Automatica, vol. 119, p. 109096, 2020

work page 2020
[3]

Optimal-cost scheduling of electrical vehicle charging under uncertainty,

Y . Zhou, D. K. Yau, P. You, and P. Cheng, “Optimal-cost scheduling of electrical vehicle charging under uncertainty,” IEEE Trans. on Smart Grid, vol. 9, no. 5, pp. 4547–4554, 2017

work page 2017
[4]

Dynamic scheduling for charging electric vehicles: A priority rule,

Y . Xu, F. Pan, and L. Tong, “Dynamic scheduling for charging electric vehicles: A priority rule,” IEEE Trans. Autom. Control, vol. 61, no. 12, pp. 4094–4099, 2016

work page 2016
[5]

A structural property of charging scheduling policy for shared electric vehicles with wind power generation,

Q.-S. Jia and J. Wu, “A structural property of charging scheduling policy for shared electric vehicles with wind power generation,” IEEE Trans. Control Sys. Technol., vol. 29, no. 6, pp. 2393–2405, 2021

work page 2021
[6]

Joint scheduling of deferrable demand and storage with random supply and processing rate limits,

J. Jin, L. Hao, Y . Xu, J. Wu, and Q.-S. Jia, “Joint scheduling of deferrable demand and storage with random supply and processing rate limits,” IEEE Trans. Autom. Control , vol. 66, no. 11, pp. 5506– 5513, 2020

work page 2020
[7]

Deadline scheduling as restless bandits,

Z. Yu, Y . Xu, and L. Tong, “Deadline scheduling as restless bandits,” IEEE Trans. Autom. Control , vol. 63, no. 8, pp. 2343–2358, 2018

work page 2018
[8]

Model-free real-time au- tonomous energy management for a residential multi-carrier energy system: A deep reinforcement learning approach,

Y . Ye, D. Qiu, J. Ward, and M. Abram, “Model-free real-time au- tonomous energy management for a residential multi-carrier energy system: A deep reinforcement learning approach,” in Proc. 29th Int. Conf. Artif. Intell. , 2021, pp. 339–346

work page 2021
[9]

Optimizing home energy management and electric vehicle charging with reinforcement learning,

D. Wu, G. Rabusseau, V . Franc ¸ois-lavet, D. Precup, and B. Boulet, “Optimizing home energy management and electric vehicle charging with reinforcement learning,”Proc. 16th Adaptive Learn. Agents, 2018

work page 2018
[10]

Model-free real-time EV charging scheduling based on deep reinforcement learning,

Z. Wan, H. Li, H. He, and D. Prokhorov, “Model-free real-time EV charging scheduling based on deep reinforcement learning,” IEEE Trans. Smart Grid , vol. 10, no. 5, pp. 5246–5257, 2018

work page 2018
[11]

On-line building energy optimization using deep reinforcement learning,

E. Mocanu, D. C. Mocanu, P. H. Nguyen, A. Liotta, M. E. Webber, M. Gibescu, and J. G. Slootweg, “On-line building energy optimization using deep reinforcement learning,” IEEE Trans. Smart Grid , vol. 10, no. 4, pp. 3698–3708, 2018

work page 2018
[12]

A deep reinforcement learning-based charging scheduling approach with augmented lagrangian for electric vehicles,

L. Yang, G. Chen, and X. Cao, “A deep reinforcement learning-based charging scheduling approach with augmented lagrangian for electric vehicles,” Applied Energy, vol. 378, p. 124706, 2025

work page 2025
[13]

Constrained ev charging scheduling based on safe deep reinforcement learning,

H. Li, Z. Wan, and H. He, “Constrained ev charging scheduling based on safe deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 2427–2439, 2019

work page 2019
[14]

Residential demand response using reinforcement learning,

D. O’Neill, M. Levorato, A. Goldsmith, and U. Mitra, “Residential demand response using reinforcement learning,” in 2010 First IEEE international conference on smart grid communications . IEEE, 2010, pp. 409–414

work page 2010
[15]

Online rein- forcement learning of optimal threshold policies for Markov decision processes,

A. Roy, V . Borkar, A. Karandikar, and P. Chaporkar, “Online rein- forcement learning of optimal threshold policies for Markov decision processes,” IEEE Trans. Autom. Control, vol. 67, no. 7, pp. 3722–3729, 2021

work page 2021
[16]

Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy struc- ture,

H. Park, D. G. Choi, and D. Min, “Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy struc- ture,” Int. J. Prod. Econ. , vol. 266, p. 109029, 2023

work page 2023
[17]

DeepTOP3: Deep threshold-optimal policy for mdps and rmabs,

K. Nakhleh, I. Hou et al., “DeepTOP3: Deep threshold-optimal policy for mdps and rmabs,” Advances in Neural Inf. Processing Sys., vol. 35, pp. 28 734–28 746, 2022

work page 2022
[18]

Laxity differentiated pricing and deadline differentiated threshold scheduling for a public electric vehicle charg- ing station,

L. Hao, J. Jin, and Y . Xu, “Laxity differentiated pricing and deadline differentiated threshold scheduling for a public electric vehicle charg- ing station,” IEEE Trans. Ind. Inform. , vol. 18, no. 9, pp. 6192–6202, 2022

work page 2022
[19]

A model predictive control approach for low-complexity electric vehicle charging scheduling: Optimality and scalability,

W. Tang and Y . J. Zhang, “A model predictive control approach for low-complexity electric vehicle charging scheduling: Optimality and scalability,” IEEE transactions on power systems , vol. 32, no. 2, pp. 1050–1063, 2016

work page 2016
[20]

Distributed noncooperative mpc for energy scheduling of charging and trading electric vehicles in energy communities,

N. Mignoni, R. Carli, and M. Dotoli, “Distributed noncooperative mpc for energy scheduling of charging and trading electric vehicles in energy communities,” IEEE Trans. on Control Sys. Technol. , vol. 31, no. 5, pp. 2159–2172, 2023

work page 2023
[21]

Model predictive charging control of in-vehicle batteries for home energy management based on vehicle state prediction,

A. Ito, A. Kawashima, T. Suzuki, S. Inagaki, T. Yamaguchi, and Z. Zhou, “Model predictive charging control of in-vehicle batteries for home energy management based on vehicle state prediction,” IEEE Trans. Control Sys. Technol., vol. 26, no. 1, pp. 51–64, 2017

work page 2017
[22]

Two-stage economic operation of microgrid-like electric vehicle parking deck,

Y . Guo, J. Xiong, S. Xu, and W. Su, “Two-stage economic operation of microgrid-like electric vehicle parking deck,” IEEE Trans. Smart Grid, vol. 7, no. 3, pp. 1703–1712, 2015

work page 2015
[23]

MPC-based appliance scheduling for residential building energy management controller,

C. Chen, J. Wang, Y . Heo, and S. Kishore, “MPC-based appliance scheduling for residential building energy management controller,” IEEE Trans. Smart Grid , vol. 4, no. 3, pp. 1401–1410, 2013

work page 2013
[24]

Modeling and stochastic control for home energy management,

Z. Yu, L. Jia, M. C. Murphy-Hoye, A. Pratt, and L. Tong, “Modeling and stochastic control for home energy management,” IEEE Trans. Smart Grid, vol. 4, no. 4, pp. 2244–2255, 2013

work page 2013
[25]

Joint optimization of electric vehicle and home energy scheduling considering user comfort preference,

D. T. Nguyen and L. B. Le, “Joint optimization of electric vehicle and home energy scheduling considering user comfort preference,” IEEE Trans. Smart Grid , vol. 5, no. 1, pp. 188–199, 2013

work page 2013
[26]

Power control framework for green data centers,

T. Yang, Y . Hou, Y . C. Lee, H. Ji, and A. Y . Zomaya, “Power control framework for green data centers,” IEEE Trans. on Cloud Comput. , vol. 10, no. 4, pp. 2876–2886, 2020

work page 2020
[27]

Toward optimal operation of internet data center microgrid,

J. Li and W. Qi, “Toward optimal operation of internet data center microgrid,” IEEE Trans. on Smart Grid , vol. 9, no. 2, pp. 971–979, 2016

work page 2016
[28]

Intelligent energy schedul- ing in renewable integrated microgrid with bidirectional electricity-to- hydrogen conversion,

M. Chen, Z. Shen, L. Wang, and G. Zhang, “Intelligent energy schedul- ing in renewable integrated microgrid with bidirectional electricity-to- hydrogen conversion,” IEEE Trans on Netw. Sci. and Eng. , vol. 9, no. 4, pp. 2212–2223, 2022

work page 2022
[29]

Renewable-Colocated Green Hydrogen Production: Optimal Scheduling and Profitability

S. Li, L. Tong, T. Mount, K. Upadhyay, H. Eisenhardt, and P. Kumar, “Renewable-colocated green hydrogen production: Optimal scheduling and profitability,” arXiv preprint arXiv:2504.18368 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Optimal scheduling of a hydrogen-based microgrid for an industrial park: A reinforcement learning approach,

W. He, C. Cai, Q.-L. Han, X. Qing, W. Du, and F. Qian, “Optimal scheduling of a hydrogen-based microgrid for an industrial park: A reinforcement learning approach,” IEEE Trans. on Syst., Man, and Cybern.: Syst., 2025

work page 2025
[31]

On the optimality of procrastination policy for ev charging under net energy metering,

M. Jeon, L. Tong, and Q. Zhao, “On the optimality of procrastination policy for ev charging under net energy metering,” in Proc. 62nd IEEE Conf. Decision and Control (CDC ‘23). IEEE, 2023, pp. 1563–1568

work page 2023
[32]

Imputing a convex objective function,

A. Keshavarz, Y . Wang, and S. Boyd, “Imputing a convex objective function,” in 2011 IEEE Int. Symp. Intell. Control . IEEE, 2011, pp. 613–619

work page 2011
[33]

On net energy metering x: Optimal prosumer decisions, social welfare, and cross-subsidies,

A. S. Alahmed and L. Tong, “On net energy metering x: Optimal prosumer decisions, social welfare, and cross-subsidies,” IEEE Trans. Smart Grid, vol. 14, no. 2, pp. 1652–1663, 2022

work page 2022
[34]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in Int. Conf. Mach. Learn. (ICML) . PMLR, 2018, pp. 1861– 1870

work page 2018
[35]

ACN-Data: Analysis and Applications of an Open EV Charging Dataset,

Z. J. Lee, T. Li, and S. H. Low, “ACN-Data: Analysis and Applications of an Open EV Charging Dataset,” in Proc. 10th Int. Conf. Future Energy Sys., ser. e-Energy ’19, Jun. 2019

work page 2019
[36]

Pecan street dataset,

“Pecan street dataset,” Available at www.pecanstreet.org/dataport/ (2022/11/01)

work page 2022
[37]

Continuous control with deep reinforcement learning

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” arXiv preprint arXiv:1509.02971 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[38]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel et al. , “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905 , 2018. APPENDIX A PROOF OF THEOREMS We use the following notations: • ¯Vt(yt) := Eg[Vt(yt, g)] • ¯Vt(yt, gt) := Eg′[Vt(yt, g′) | gt] • ∂+ y ¯Vt(yt), ∂ − y ¯Vt(yt):...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[39]

gt < v t + 1T dt(Net consuming): By the first order condition, the optimal schedule of nondeferrable load is d∗ it = d+ i := min ¯di, ∂U −1 it (π+ t ) ∀i = 1, . . . , K. There are three possible cases for the right and left deriva- tive of ¯Vt+1(yt; gt): ∂− y ¯Vt+1(yt; gt) and ∂+ y ¯Vt+1(yt; gt)

work page
[40]

Hence, by the first order condition of v, v∗ t = min{¯v, yt}

Case 1: When −π+ t − ∂+ y ¯Vt+1(0; gt) ≥ 0, by the monotonicity of the right derivative, −π+ t −∂+ y ¯Vt+1(yt−min{¯v, yt}; gt) ≥ 0, ∀ yt ≤ (T −t)¯v. Hence, by the first order condition of v, v∗ t = min{¯v, yt}. This is equivalent with the θ+ t (gt) = (T − t)¯v

work page
[41]

Case 2: When −π+ t − ∂− y ¯Vt+1 (T − t)¯v; gt ≤ 0, by the monotonicity of the left derivative, −π+ t − ∂− y ¯Vt+1(yt; gt) ≤ 0, ∀ yt ≤ (T − t)¯v Hence, by the first order condition of v, v∗ t = 0, which is equivalent with θ+ t (gt) = (T − t)¯v

work page
[42]

To summarize, for gt < d+ t + min ¯v, [yt − θ+ t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ+ t (gt)]+

Case 3: If there exists χ+ ∈ 0, (T − t)¯v that satisfies −π+ t − ∂− y ¯Vt+1(χ+; gt) ≤ 0, −π+ t − ∂+ y ¯Vt+1(χ+; gt) ≥ 0, by the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ+]+ . To summarize, for gt < d+ t + min ¯v, [yt − θ+ t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ+ t (gt)]+

work page
[43]

gt > v t + 1T dt (Net producing): By the first order condition, the optimal nondeferrabld load schedule is d∗ it = d− i := min{ ¯di, ∂U −1 it (π− t )} ∀ i = 1, . . . , K. As the previous case, there are three possible cases for ∂− y ¯Vt+1(yt; gt) and ∂+ y ¯Vt+1(yt; gt)

work page
[44]

Hence, by the first order condition of v, v∗ t = min{¯v, yt}, which is equivalent with θ− t (gt) = 0

Case 1: When −π− t − ∂+ y ¯Vt+1(0; gt) ≥ 0, by the monotonicity of the right derivative, −π− t −∂+ y ¯Vt+1(yt−min{¯v, yt}; gt) ≥ 0, ∀ yt ≤ (T −t)¯v. Hence, by the first order condition of v, v∗ t = min{¯v, yt}, which is equivalent with θ− t (gt) = 0

work page
[45]

Hence, by the first order condition of v, v∗ t = 0 where θ− t (gt) = (T − t)¯v

Case 2: When −π− t − ∂− y ¯Vt+1 (T − t)¯v; gt ≤ 0, by the monotonicity of the left derivative, −π− t − ∂− y ¯Vt+1(yt; gt) ≤ 0, ∀ yt ≤ (T − t)¯v. Hence, by the first order condition of v, v∗ t = 0 where θ− t (gt) = (T − t)¯v

work page
[46]

By the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ−(gt)]+

Case 3: If there exists χ− ∈ [0, (T − t)¯v] that satisfies −π− t − ∂− y ¯Vt+1(χ−; gt) ≤ 0, −π− t − ∂+ y ¯Vt+1(χ−; gt) ≥ 0. By the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ−(gt)]+ . To summarize for gt > d − t + min ¯v, [yt − θ− t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ− t (gt)]+

work page
[47]

gt = vt + 1T dt (Net-zero): We solve (9) with the constraint gt = vt + 1T dt. The problem becomes max (d,v)∈A,v+1T d=gt Ut(d) + ¯Vt+1(yt − v; gt) (22) Since the optimization problem above satisfies the Slater’s condition, the KKT condition is necessary and sufficient condition. Then, the Lagrangian of the (22) is L0 = Ut(d) + ν(gt − v − 1T d) + ¯Vt+1(yt −...

work page
[48]

By the first order condition with respect to v, and assumption A3 −π− + q′(yT − v) > 0

v ≤ gT : The objective function is −π−(v − gT ) − q(yT − v). By the first order condition with respect to v, and assumption A3 −π− + q′(yT − v) > 0. Hence, v∗ T ≥ gT

work page
[49]

By the first order condition, and assumption A3 −π+ + q′(yT − v) > 0, which implies v∗ T = min{yT , ¯v}

v > g T : The objective function is −π+(v − gT ) − q(yT − v). By the first order condition, and assumption A3 −π+ + q′(yT − v) > 0, which implies v∗ T = min{yT , ¯v}. For θT = (T − T )¯v = 0, the procrastination charging rate (11) becomes v∗ T = ( yT , 0 < y T ≤ min{yT , gT } min{yT , ¯v}, min{yT , gT } ≤ yT ≤ ¯v = min{yt, ¯v}, and the Proposition 1 holds...

work page
[50]

, gT ), consider a sequence of charging actions (˜vt,

yt ≤ (T − t)¯v + min{¯v, gt}: For given a sequence of realizations of DG, (gt, . . . , gT ), consider a sequence of charging actions (˜vt, . . . ,˜vT ) with ˜vt = min {yt, gt, ¯v} − δ > 0 for δ > 0. Consider another sequence of charging actions (v∗ t , . . . , v∗ T ) such that v∗ t = ˜vt + δ andPT τ=t v∗ t =PT τ=t ˜vt − δ. Let the cumulative reward under ...

work page
[51]

,˜vT ) with ˜vt = yt − (T − t)¯v + δ < ¯v for δ > 0

For yt > (T − t)¯v + min{¯v, gt}: Suppose a sequence (˜vt, . . . ,˜vT ) with ˜vt = yt − (T − t)¯v + δ < ¯v for δ > 0. Another sequence (v∗ t , . . . , v∗ T ) with v∗ t = ˜vt −δ = yt −(T − t)¯v has the cumulative reward R∗ t that satisfies : R∗ t = TX τ=t −Pπ(v∗ t − gt) = −Pπ(˜vt − gt) + π+δ + TX τ=t+1 −Pπ(v∗ t − gt) ≥ −Pπ(˜vt − gt) + π+δ + TX τ=t+1 −Pπ(˜v...

work page

[1] [1]

Scheduling power consumption with price uncertainty,

T. T. Kim and H. V . Poor, “Scheduling power consumption with price uncertainty,”IEEE Trans. Smart Grid, vol. 2, no. 3, pp. 519–527, 2011

work page 2011

[2] [2]

Optimal deadline scheduling for electric vehicle charging with energy storage and random supply,

J. Jin, Y . Xu, and Z. Yang, “Optimal deadline scheduling for electric vehicle charging with energy storage and random supply,” Automatica, vol. 119, p. 109096, 2020

work page 2020

[3] [3]

Optimal-cost scheduling of electrical vehicle charging under uncertainty,

Y . Zhou, D. K. Yau, P. You, and P. Cheng, “Optimal-cost scheduling of electrical vehicle charging under uncertainty,” IEEE Trans. on Smart Grid, vol. 9, no. 5, pp. 4547–4554, 2017

work page 2017

[4] [4]

Dynamic scheduling for charging electric vehicles: A priority rule,

Y . Xu, F. Pan, and L. Tong, “Dynamic scheduling for charging electric vehicles: A priority rule,” IEEE Trans. Autom. Control, vol. 61, no. 12, pp. 4094–4099, 2016

work page 2016

[5] [5]

A structural property of charging scheduling policy for shared electric vehicles with wind power generation,

Q.-S. Jia and J. Wu, “A structural property of charging scheduling policy for shared electric vehicles with wind power generation,” IEEE Trans. Control Sys. Technol., vol. 29, no. 6, pp. 2393–2405, 2021

work page 2021

[6] [6]

Joint scheduling of deferrable demand and storage with random supply and processing rate limits,

J. Jin, L. Hao, Y . Xu, J. Wu, and Q.-S. Jia, “Joint scheduling of deferrable demand and storage with random supply and processing rate limits,” IEEE Trans. Autom. Control , vol. 66, no. 11, pp. 5506– 5513, 2020

work page 2020

[7] [7]

Deadline scheduling as restless bandits,

Z. Yu, Y . Xu, and L. Tong, “Deadline scheduling as restless bandits,” IEEE Trans. Autom. Control , vol. 63, no. 8, pp. 2343–2358, 2018

work page 2018

[8] [8]

Model-free real-time au- tonomous energy management for a residential multi-carrier energy system: A deep reinforcement learning approach,

Y . Ye, D. Qiu, J. Ward, and M. Abram, “Model-free real-time au- tonomous energy management for a residential multi-carrier energy system: A deep reinforcement learning approach,” in Proc. 29th Int. Conf. Artif. Intell. , 2021, pp. 339–346

work page 2021

[9] [9]

Optimizing home energy management and electric vehicle charging with reinforcement learning,

D. Wu, G. Rabusseau, V . Franc ¸ois-lavet, D. Precup, and B. Boulet, “Optimizing home energy management and electric vehicle charging with reinforcement learning,”Proc. 16th Adaptive Learn. Agents, 2018

work page 2018

[10] [10]

Model-free real-time EV charging scheduling based on deep reinforcement learning,

Z. Wan, H. Li, H. He, and D. Prokhorov, “Model-free real-time EV charging scheduling based on deep reinforcement learning,” IEEE Trans. Smart Grid , vol. 10, no. 5, pp. 5246–5257, 2018

work page 2018

[11] [11]

On-line building energy optimization using deep reinforcement learning,

E. Mocanu, D. C. Mocanu, P. H. Nguyen, A. Liotta, M. E. Webber, M. Gibescu, and J. G. Slootweg, “On-line building energy optimization using deep reinforcement learning,” IEEE Trans. Smart Grid , vol. 10, no. 4, pp. 3698–3708, 2018

work page 2018

[12] [12]

A deep reinforcement learning-based charging scheduling approach with augmented lagrangian for electric vehicles,

L. Yang, G. Chen, and X. Cao, “A deep reinforcement learning-based charging scheduling approach with augmented lagrangian for electric vehicles,” Applied Energy, vol. 378, p. 124706, 2025

work page 2025

[13] [13]

Constrained ev charging scheduling based on safe deep reinforcement learning,

H. Li, Z. Wan, and H. He, “Constrained ev charging scheduling based on safe deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 2427–2439, 2019

work page 2019

[14] [14]

Residential demand response using reinforcement learning,

D. O’Neill, M. Levorato, A. Goldsmith, and U. Mitra, “Residential demand response using reinforcement learning,” in 2010 First IEEE international conference on smart grid communications . IEEE, 2010, pp. 409–414

work page 2010

[15] [15]

Online rein- forcement learning of optimal threshold policies for Markov decision processes,

A. Roy, V . Borkar, A. Karandikar, and P. Chaporkar, “Online rein- forcement learning of optimal threshold policies for Markov decision processes,” IEEE Trans. Autom. Control, vol. 67, no. 7, pp. 3722–3729, 2021

work page 2021

[16] [16]

Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy struc- ture,

H. Park, D. G. Choi, and D. Min, “Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy struc- ture,” Int. J. Prod. Econ. , vol. 266, p. 109029, 2023

work page 2023

[17] [17]

DeepTOP3: Deep threshold-optimal policy for mdps and rmabs,

K. Nakhleh, I. Hou et al., “DeepTOP3: Deep threshold-optimal policy for mdps and rmabs,” Advances in Neural Inf. Processing Sys., vol. 35, pp. 28 734–28 746, 2022

work page 2022

[18] [18]

Laxity differentiated pricing and deadline differentiated threshold scheduling for a public electric vehicle charg- ing station,

L. Hao, J. Jin, and Y . Xu, “Laxity differentiated pricing and deadline differentiated threshold scheduling for a public electric vehicle charg- ing station,” IEEE Trans. Ind. Inform. , vol. 18, no. 9, pp. 6192–6202, 2022

work page 2022

[19] [19]

A model predictive control approach for low-complexity electric vehicle charging scheduling: Optimality and scalability,

W. Tang and Y . J. Zhang, “A model predictive control approach for low-complexity electric vehicle charging scheduling: Optimality and scalability,” IEEE transactions on power systems , vol. 32, no. 2, pp. 1050–1063, 2016

work page 2016

[20] [20]

Distributed noncooperative mpc for energy scheduling of charging and trading electric vehicles in energy communities,

N. Mignoni, R. Carli, and M. Dotoli, “Distributed noncooperative mpc for energy scheduling of charging and trading electric vehicles in energy communities,” IEEE Trans. on Control Sys. Technol. , vol. 31, no. 5, pp. 2159–2172, 2023

work page 2023

[21] [21]

Model predictive charging control of in-vehicle batteries for home energy management based on vehicle state prediction,

A. Ito, A. Kawashima, T. Suzuki, S. Inagaki, T. Yamaguchi, and Z. Zhou, “Model predictive charging control of in-vehicle batteries for home energy management based on vehicle state prediction,” IEEE Trans. Control Sys. Technol., vol. 26, no. 1, pp. 51–64, 2017

work page 2017

[22] [22]

Two-stage economic operation of microgrid-like electric vehicle parking deck,

Y . Guo, J. Xiong, S. Xu, and W. Su, “Two-stage economic operation of microgrid-like electric vehicle parking deck,” IEEE Trans. Smart Grid, vol. 7, no. 3, pp. 1703–1712, 2015

work page 2015

[23] [23]

MPC-based appliance scheduling for residential building energy management controller,

C. Chen, J. Wang, Y . Heo, and S. Kishore, “MPC-based appliance scheduling for residential building energy management controller,” IEEE Trans. Smart Grid , vol. 4, no. 3, pp. 1401–1410, 2013

work page 2013

[24] [24]

Modeling and stochastic control for home energy management,

Z. Yu, L. Jia, M. C. Murphy-Hoye, A. Pratt, and L. Tong, “Modeling and stochastic control for home energy management,” IEEE Trans. Smart Grid, vol. 4, no. 4, pp. 2244–2255, 2013

work page 2013

[25] [25]

Joint optimization of electric vehicle and home energy scheduling considering user comfort preference,

D. T. Nguyen and L. B. Le, “Joint optimization of electric vehicle and home energy scheduling considering user comfort preference,” IEEE Trans. Smart Grid , vol. 5, no. 1, pp. 188–199, 2013

work page 2013

[26] [26]

Power control framework for green data centers,

T. Yang, Y . Hou, Y . C. Lee, H. Ji, and A. Y . Zomaya, “Power control framework for green data centers,” IEEE Trans. on Cloud Comput. , vol. 10, no. 4, pp. 2876–2886, 2020

work page 2020

[27] [27]

Toward optimal operation of internet data center microgrid,

J. Li and W. Qi, “Toward optimal operation of internet data center microgrid,” IEEE Trans. on Smart Grid , vol. 9, no. 2, pp. 971–979, 2016

work page 2016

[28] [28]

Intelligent energy schedul- ing in renewable integrated microgrid with bidirectional electricity-to- hydrogen conversion,

M. Chen, Z. Shen, L. Wang, and G. Zhang, “Intelligent energy schedul- ing in renewable integrated microgrid with bidirectional electricity-to- hydrogen conversion,” IEEE Trans on Netw. Sci. and Eng. , vol. 9, no. 4, pp. 2212–2223, 2022

work page 2022

[29] [29]

Renewable-Colocated Green Hydrogen Production: Optimal Scheduling and Profitability

S. Li, L. Tong, T. Mount, K. Upadhyay, H. Eisenhardt, and P. Kumar, “Renewable-colocated green hydrogen production: Optimal scheduling and profitability,” arXiv preprint arXiv:2504.18368 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Optimal scheduling of a hydrogen-based microgrid for an industrial park: A reinforcement learning approach,

W. He, C. Cai, Q.-L. Han, X. Qing, W. Du, and F. Qian, “Optimal scheduling of a hydrogen-based microgrid for an industrial park: A reinforcement learning approach,” IEEE Trans. on Syst., Man, and Cybern.: Syst., 2025

work page 2025

[31] [31]

On the optimality of procrastination policy for ev charging under net energy metering,

M. Jeon, L. Tong, and Q. Zhao, “On the optimality of procrastination policy for ev charging under net energy metering,” in Proc. 62nd IEEE Conf. Decision and Control (CDC ‘23). IEEE, 2023, pp. 1563–1568

work page 2023

[32] [32]

Imputing a convex objective function,

A. Keshavarz, Y . Wang, and S. Boyd, “Imputing a convex objective function,” in 2011 IEEE Int. Symp. Intell. Control . IEEE, 2011, pp. 613–619

work page 2011

[33] [33]

On net energy metering x: Optimal prosumer decisions, social welfare, and cross-subsidies,

A. S. Alahmed and L. Tong, “On net energy metering x: Optimal prosumer decisions, social welfare, and cross-subsidies,” IEEE Trans. Smart Grid, vol. 14, no. 2, pp. 1652–1663, 2022

work page 2022

[34] [34]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in Int. Conf. Mach. Learn. (ICML) . PMLR, 2018, pp. 1861– 1870

work page 2018

[35] [35]

ACN-Data: Analysis and Applications of an Open EV Charging Dataset,

Z. J. Lee, T. Li, and S. H. Low, “ACN-Data: Analysis and Applications of an Open EV Charging Dataset,” in Proc. 10th Int. Conf. Future Energy Sys., ser. e-Energy ’19, Jun. 2019

work page 2019

[36] [36]

Pecan street dataset,

“Pecan street dataset,” Available at www.pecanstreet.org/dataport/ (2022/11/01)

work page 2022

[37] [37]

Continuous control with deep reinforcement learning

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” arXiv preprint arXiv:1509.02971 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[38] [38]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel et al. , “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905 , 2018. APPENDIX A PROOF OF THEOREMS We use the following notations: • ¯Vt(yt) := Eg[Vt(yt, g)] • ¯Vt(yt, gt) := Eg′[Vt(yt, g′) | gt] • ∂+ y ¯Vt(yt), ∂ − y ¯Vt(yt):...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[39] [39]

gt < v t + 1T dt(Net consuming): By the first order condition, the optimal schedule of nondeferrable load is d∗ it = d+ i := min ¯di, ∂U −1 it (π+ t ) ∀i = 1, . . . , K. There are three possible cases for the right and left deriva- tive of ¯Vt+1(yt; gt): ∂− y ¯Vt+1(yt; gt) and ∂+ y ¯Vt+1(yt; gt)

work page

[40] [40]

Hence, by the first order condition of v, v∗ t = min{¯v, yt}

Case 1: When −π+ t − ∂+ y ¯Vt+1(0; gt) ≥ 0, by the monotonicity of the right derivative, −π+ t −∂+ y ¯Vt+1(yt−min{¯v, yt}; gt) ≥ 0, ∀ yt ≤ (T −t)¯v. Hence, by the first order condition of v, v∗ t = min{¯v, yt}. This is equivalent with the θ+ t (gt) = (T − t)¯v

work page

[41] [41]

Case 2: When −π+ t − ∂− y ¯Vt+1 (T − t)¯v; gt ≤ 0, by the monotonicity of the left derivative, −π+ t − ∂− y ¯Vt+1(yt; gt) ≤ 0, ∀ yt ≤ (T − t)¯v Hence, by the first order condition of v, v∗ t = 0, which is equivalent with θ+ t (gt) = (T − t)¯v

work page

[42] [42]

To summarize, for gt < d+ t + min ¯v, [yt − θ+ t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ+ t (gt)]+

Case 3: If there exists χ+ ∈ 0, (T − t)¯v that satisfies −π+ t − ∂− y ¯Vt+1(χ+; gt) ≤ 0, −π+ t − ∂+ y ¯Vt+1(χ+; gt) ≥ 0, by the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ+]+ . To summarize, for gt < d+ t + min ¯v, [yt − θ+ t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ+ t (gt)]+

work page

[43] [43]

gt > v t + 1T dt (Net producing): By the first order condition, the optimal nondeferrabld load schedule is d∗ it = d− i := min{ ¯di, ∂U −1 it (π− t )} ∀ i = 1, . . . , K. As the previous case, there are three possible cases for ∂− y ¯Vt+1(yt; gt) and ∂+ y ¯Vt+1(yt; gt)

work page

[44] [44]

Hence, by the first order condition of v, v∗ t = min{¯v, yt}, which is equivalent with θ− t (gt) = 0

Case 1: When −π− t − ∂+ y ¯Vt+1(0; gt) ≥ 0, by the monotonicity of the right derivative, −π− t −∂+ y ¯Vt+1(yt−min{¯v, yt}; gt) ≥ 0, ∀ yt ≤ (T −t)¯v. Hence, by the first order condition of v, v∗ t = min{¯v, yt}, which is equivalent with θ− t (gt) = 0

work page

[45] [45]

Hence, by the first order condition of v, v∗ t = 0 where θ− t (gt) = (T − t)¯v

Case 2: When −π− t − ∂− y ¯Vt+1 (T − t)¯v; gt ≤ 0, by the monotonicity of the left derivative, −π− t − ∂− y ¯Vt+1(yt; gt) ≤ 0, ∀ yt ≤ (T − t)¯v. Hence, by the first order condition of v, v∗ t = 0 where θ− t (gt) = (T − t)¯v

work page

[46] [46]

By the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ−(gt)]+

Case 3: If there exists χ− ∈ [0, (T − t)¯v] that satisfies −π− t − ∂− y ¯Vt+1(χ−; gt) ≤ 0, −π− t − ∂+ y ¯Vt+1(χ−; gt) ≥ 0. By the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ−(gt)]+ . To summarize for gt > d − t + min ¯v, [yt − θ− t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ− t (gt)]+

work page

[47] [47]

gt = vt + 1T dt (Net-zero): We solve (9) with the constraint gt = vt + 1T dt. The problem becomes max (d,v)∈A,v+1T d=gt Ut(d) + ¯Vt+1(yt − v; gt) (22) Since the optimization problem above satisfies the Slater’s condition, the KKT condition is necessary and sufficient condition. Then, the Lagrangian of the (22) is L0 = Ut(d) + ν(gt − v − 1T d) + ¯Vt+1(yt −...

work page

[48] [48]

By the first order condition with respect to v, and assumption A3 −π− + q′(yT − v) > 0

v ≤ gT : The objective function is −π−(v − gT ) − q(yT − v). By the first order condition with respect to v, and assumption A3 −π− + q′(yT − v) > 0. Hence, v∗ T ≥ gT

work page

[49] [49]

By the first order condition, and assumption A3 −π+ + q′(yT − v) > 0, which implies v∗ T = min{yT , ¯v}

v > g T : The objective function is −π+(v − gT ) − q(yT − v). By the first order condition, and assumption A3 −π+ + q′(yT − v) > 0, which implies v∗ T = min{yT , ¯v}. For θT = (T − T )¯v = 0, the procrastination charging rate (11) becomes v∗ T = ( yT , 0 < y T ≤ min{yT , gT } min{yT , ¯v}, min{yT , gT } ≤ yT ≤ ¯v = min{yt, ¯v}, and the Proposition 1 holds...

work page

[50] [50]

, gT ), consider a sequence of charging actions (˜vt,

yt ≤ (T − t)¯v + min{¯v, gt}: For given a sequence of realizations of DG, (gt, . . . , gT ), consider a sequence of charging actions (˜vt, . . . ,˜vT ) with ˜vt = min {yt, gt, ¯v} − δ > 0 for δ > 0. Consider another sequence of charging actions (v∗ t , . . . , v∗ T ) such that v∗ t = ˜vt + δ andPT τ=t v∗ t =PT τ=t ˜vt − δ. Let the cumulative reward under ...

work page

[51] [51]

,˜vT ) with ˜vt = yt − (T − t)¯v + δ < ¯v for δ > 0

For yt > (T − t)¯v + min{¯v, gt}: Suppose a sequence (˜vt, . . . ,˜vT ) with ˜vt = yt − (T − t)¯v + δ < ¯v for δ > 0. Another sequence (v∗ t , . . . , v∗ T ) with v∗ t = ˜vt −δ = yt −(T − t)¯v has the cumulative reward R∗ t that satisfies : R∗ t = TX τ=t −Pπ(v∗ t − gt) = −Pπ(˜vt − gt) + π+δ + TX τ=t+1 −Pπ(v∗ t − gt) ≥ −Pπ(˜vt − gt) + π+δ + TX τ=t+1 −Pπ(˜v...

work page