pith. sign in

arxiv: 2507.09794 · v2 · submitted 2025-07-13 · 📡 eess.SY · cs.SY

Joint Scheduling of Deferrable and Nondeferrable Demand with Colocated Stochastic Supply

Pith reviewed 2026-05-19 04:45 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords demand schedulingdeferrable loadsstochastic supplyprinciple of procrastinationMarkov decision processreinforcement learningsmart gridpiecewise linear pricing
0
0 comments X

The pith

Under deterministic piecewise-linear retail prices, optimal deferrable demand scheduling reduces to three procrastination parameters per demand class.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies joint scheduling of randomly arriving deferrable loads that can be delayed up to deadlines, always-present nondeferrable loads whose service level depends on price, and colocated stochastic zero-cost supply that can meet local demand or be exported. Because arrivals and supply are random, the problem is a Markov decision process with continuous state and action spaces. Under the assumption that retail prices are deterministic, time-varying, and piecewise linear, the authors establish that the optimal policy obeys a Principle of Procrastination. This structural result collapses the policy search to a low-dimensional Euclidean space parameterized by three procrastination thresholds for each deferrable demand class. They further propose a reinforcement-learning procedure to learn these thresholds from data when the underlying distributions are unknown.

Core claim

Under deterministic, time-varying, and piecewise-linear retail pricing, the optimal demand scheduling policy follows the Principle of Procrastination, which reduces the infinite-dimensional policy space to a finite-dimensional Euclidean space defined by three procrastination parameters for each deferrable demand.

What carries the argument

The Principle of Procrastination: the structural property that the optimal policy defers service according to three simple threshold parameters per deferrable demand class, turning the continuous-state MDP into a finite-dimensional optimization problem.

If this is right

  • The policy search space shrinks from infinite-dimensional functions to a finite number of scalar parameters, one set of three per deferrable demand class.
  • A Procrastination Threshold Reinforcement Learning algorithm can learn the parameters from samples when arrival and supply distributions are unknown.
  • Numerical tests on real-world data show the learned thresholds closely approximate the optimal policy and outperform standard benchmark schedulers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The three-parameter reduction may extend to deadline-constrained resource allocation problems outside electricity, such as compute job scheduling or vehicle routing, whenever costs are piecewise linear.
  • Implementation in a real smart-grid controller would require only storing and updating three numbers per demand class, making online recomputation feasible at scale.
  • Varying the number of linear segments in the price function would test how the required number of procrastination parameters grows.

Load-bearing premise

The retail electricity price is deterministic, known in advance, and piecewise linear.

What would settle it

Solve the MDP explicitly for a small instance with non-piecewise-linear prices and check whether the resulting policy can still be represented exactly by three procrastination parameters per demand class.

Figures

Figures reproduced from arXiv: 2507.09794 by Lang Tong, Minjae Jeon, Qing Zhao.

Figure 1
Figure 1. Figure 1: A household with deferrable EV charging demand and behind-the-meter DG. The arrow indicates the direction of power flow when the associated variable is positive. The problem is to schedule the optimal quantity served (dt, vt) given the realized random DG gt. For future refer￾ence, designated symbols are listed in Table I. TABLE I NOTATIONS FOR MAJOR VARIABLES Symbol Descriptions at Action dt, d¯ Consumptio… view at source ↗
Figure 2
Figure 2. Figure 2: Procrastination scheduling with deferrable demand under time-invariant NEM. Left: DG level gt ≤ v¯. Right: Right: DG level gt > v¯. afar (segment ⃝2 ), it is optimal to procrastinate to purchase power, serving the demand using all local generation. When the remaining demand is high and deadline is near, the incompletion penalty is unavoidable unless it is reduced with purchased power as shown in segment ⃝3… view at source ↗
Figure 5
Figure 5. Figure 5: Threshold and priority structures of the optimal policy. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Net-consumption zones, the optimal total demand d ∗ t := 1 ⊤dt, and the (optimal) net consumption; z ∗ t := d ∗ t − gt. D. Structure of the Optimal Scheduling Policy Our main result given in Theorem 2 below and proved in Appendix A is that the optimal scheduling policy is defined by three (procrastination) parameters θt := (θ − t , θ0 t , θ+ t ), all functions of the state xt = (yt, gt). These parameters d… view at source ↗
Figure 6
Figure 6. Figure 6: Threshold learning algorithm based on SAC method. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Average cumulative reward of 2,000 Monte Carlo runs [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Average cumulative reward of 2,000 Monte Carlo runs [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Average cumulative reward of 400 Monte Carlo runs [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Energy mix ratio of the completed deferrable loads [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
read the original abstract

We investigate the problem of serving deferrable and nondeferrable electric demands with colocated stochastic supply and grid-imported electricity. Deferrable demands arrive randomly and can be delayed within their service deadlines. Nondeferrable demands are always present and must be served immediately, but the quantity served depends on the cost of electricity. Colocated supply is stochastic with zero marginal cost. It can be used to meet demand or exported to the grid to maximize profit. The stochasticity of demands and local supply makes optimal scheduling a Markov decision process with continuous (uncountable) state and action spaces. Under deterministic, time-varying, and piecewise-linear retail pricing of electricity, we show that the optimal demand scheduling follows the {\em Principle of Procrastination}, which reduces the infinite-dimensional policy space to a finite-dimensional Euclidean space defined by three procrastination parameters for each deferrable demand. For settings in which the underlying probability distributions are unknown, we propose a {\em Procrastination Threshold Reinforcement Learning} algorithm. Numerical experiments based on real-world test data confirm that the proposed threshold learning algorithm closely approximates the optimal policy and outperforms standard benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript studies joint scheduling of deferrable and nondeferrable electric demands served by colocated stochastic supply and grid imports under deterministic time-varying piecewise-linear retail pricing. It derives that the optimal policy obeys the Principle of Procrastination, which collapses the infinite-dimensional policy space of the underlying continuous-state MDP to a finite-dimensional parameterization consisting of three procrastination parameters per deferrable demand class. For unknown distributions the authors introduce a Procrastination Threshold Reinforcement Learning algorithm and report numerical experiments on real-world data showing that the learned policy closely approximates the optimum and outperforms standard benchmarks.

Significance. If the structural result is correct, the reduction from an uncountable policy space to three scalar parameters per demand class is a meaningful contribution to stochastic optimal control for energy systems; it directly enables both exact dynamic programming on the reduced space and the design of the proposed RL method. The explicit use of the piecewise-linear price assumption to obtain the procrastination property, together with the real-data validation, strengthens the practical relevance for smart-grid scheduling.

major comments (1)
  1. [Optimal policy derivation] The derivation of the Principle of Procrastination (abstract and the section presenting the optimal policy) must explicitly establish that the three procrastination parameters remain invariant to the realized stochastic supply state. Because the supply is observed before the scheduling decision and has zero marginal cost, the effective marginal cost of serving deferrable load at any instant is the minimum of the retail price and the opportunity cost of forgoing export; if this dependence is not shown to preserve the threshold structure, the claimed reduction to a supply-independent three-parameter policy may not hold.
minor comments (2)
  1. [Abstract] The abstract states that the policy is reduced to 'three procrastination parameters' but does not name or define them; adding a one-sentence definition or a forward reference to the equation that introduces them would improve clarity for readers.
  2. [Numerical experiments] Numerical experiments section: reporting the number of independent runs and standard-error bars on the performance metrics would allow readers to assess the statistical reliability of the claim that the learned policy 'closely approximates' the optimum.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review of our manuscript. The single major comment raises an important point about explicitly establishing the invariance of the procrastination parameters to the observed supply state, and we address it directly below.

read point-by-point responses
  1. Referee: [Optimal policy derivation] The derivation of the Principle of Procrastination (abstract and the section presenting the optimal policy) must explicitly establish that the three procrastination parameters remain invariant to the realized stochastic supply state. Because the supply is observed before the scheduling decision and has zero marginal cost, the effective marginal cost of serving deferrable load at any instant is the minimum of the retail price and the opportunity cost of forgoing export; if this dependence is not shown to preserve the threshold structure, the claimed reduction to a supply-independent three-parameter policy may not hold.

    Authors: We appreciate the referee highlighting the need for greater explicitness on this point. In the derivation of the Principle of Procrastination, the optimal policy for each deferrable demand class is characterized by three thresholds that dictate whether to serve the demand immediately or procrastinate, based on the remaining time to deadline and the time-varying piecewise-linear price segments. Although supply is observed and has zero marginal cost, the effective marginal cost for the procrastination decision equals the deterministic retail price when grid import is required or the (deterministic) forgone export revenue when local supply is used. Because both the retail price schedule and the export opportunity are deterministic and independent of the realized supply quantity, the comparison between current effective cost and expected future costs remains unchanged by the specific supply realization. The thresholds are therefore computed solely from the price function and deadline structure, rendering them invariant to the supply state. We acknowledge that the current manuscript states this invariance implicitly through the overall policy reduction but does not isolate it in a dedicated remark or lemma. In the revised version we will insert an explicit paragraph (or short lemma) immediately after the statement of the Principle of Procrastination that formally shows the supply-state independence of the three parameters per demand class, thereby confirming that the finite-dimensional parameterization is preserved. revision: yes

Circularity Check

0 steps flagged

Derivation is self-contained; no circularity in structural result or parameter learning.

full rationale

The paper formulates the problem as an MDP with continuous state-action spaces and derives the Principle of Procrastination as a structural property under the explicit assumptions of deterministic, time-varying, piecewise-linear retail pricing. This reduces the policy to three procrastination parameters per deferrable demand class via mathematical analysis of the optimality conditions rather than by redefining inputs or fitting to the target outcome. The subsequent Procrastination Threshold RL algorithm learns those parameters from data when distributions are unknown, which is a standard estimation step and does not presuppose the result. No load-bearing step reduces by construction to a self-citation, fitted input renamed as prediction, or ansatz smuggled via prior work; the central claim remains independent of the fitted values and is falsifiable against the pricing assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The paper rests on standard MDP formulation for stochastic scheduling, the assumption of piecewise-linear deterministic prices, and the existence of an optimal policy in the continuous-state MDP; no new physical entities are postulated.

free parameters (1)
  • three procrastination parameters per deferrable demand class
    These parameters define the reduced policy space; they are learned or optimized rather than derived from first principles.
axioms (2)
  • domain assumption Retail electricity price is deterministic, time-varying, and piecewise linear
    Invoked to establish the Principle of Procrastination (abstract).
  • standard math The joint process of random demand arrivals, stochastic supply, and nondeferrable demand is Markovian
    Standard modeling choice for MDP formulation of scheduling.

pith-pipeline@v0.9.0 · 5731 in / 1551 out tokens · 30994 ms · 2026-05-19T04:45:51.882883+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 3 internal anchors

  1. [1]

    Scheduling power consumption with price uncertainty,

    T. T. Kim and H. V . Poor, “Scheduling power consumption with price uncertainty,”IEEE Trans. Smart Grid, vol. 2, no. 3, pp. 519–527, 2011

  2. [2]

    Optimal deadline scheduling for electric vehicle charging with energy storage and random supply,

    J. Jin, Y . Xu, and Z. Yang, “Optimal deadline scheduling for electric vehicle charging with energy storage and random supply,” Automatica, vol. 119, p. 109096, 2020

  3. [3]

    Optimal-cost scheduling of electrical vehicle charging under uncertainty,

    Y . Zhou, D. K. Yau, P. You, and P. Cheng, “Optimal-cost scheduling of electrical vehicle charging under uncertainty,” IEEE Trans. on Smart Grid, vol. 9, no. 5, pp. 4547–4554, 2017

  4. [4]

    Dynamic scheduling for charging electric vehicles: A priority rule,

    Y . Xu, F. Pan, and L. Tong, “Dynamic scheduling for charging electric vehicles: A priority rule,” IEEE Trans. Autom. Control, vol. 61, no. 12, pp. 4094–4099, 2016

  5. [5]

    A structural property of charging scheduling policy for shared electric vehicles with wind power generation,

    Q.-S. Jia and J. Wu, “A structural property of charging scheduling policy for shared electric vehicles with wind power generation,” IEEE Trans. Control Sys. Technol., vol. 29, no. 6, pp. 2393–2405, 2021

  6. [6]

    Joint scheduling of deferrable demand and storage with random supply and processing rate limits,

    J. Jin, L. Hao, Y . Xu, J. Wu, and Q.-S. Jia, “Joint scheduling of deferrable demand and storage with random supply and processing rate limits,” IEEE Trans. Autom. Control , vol. 66, no. 11, pp. 5506– 5513, 2020

  7. [7]

    Deadline scheduling as restless bandits,

    Z. Yu, Y . Xu, and L. Tong, “Deadline scheduling as restless bandits,” IEEE Trans. Autom. Control , vol. 63, no. 8, pp. 2343–2358, 2018

  8. [8]

    Model-free real-time au- tonomous energy management for a residential multi-carrier energy system: A deep reinforcement learning approach,

    Y . Ye, D. Qiu, J. Ward, and M. Abram, “Model-free real-time au- tonomous energy management for a residential multi-carrier energy system: A deep reinforcement learning approach,” in Proc. 29th Int. Conf. Artif. Intell. , 2021, pp. 339–346

  9. [9]

    Optimizing home energy management and electric vehicle charging with reinforcement learning,

    D. Wu, G. Rabusseau, V . Franc ¸ois-lavet, D. Precup, and B. Boulet, “Optimizing home energy management and electric vehicle charging with reinforcement learning,”Proc. 16th Adaptive Learn. Agents, 2018

  10. [10]

    Model-free real-time EV charging scheduling based on deep reinforcement learning,

    Z. Wan, H. Li, H. He, and D. Prokhorov, “Model-free real-time EV charging scheduling based on deep reinforcement learning,” IEEE Trans. Smart Grid , vol. 10, no. 5, pp. 5246–5257, 2018

  11. [11]

    On-line building energy optimization using deep reinforcement learning,

    E. Mocanu, D. C. Mocanu, P. H. Nguyen, A. Liotta, M. E. Webber, M. Gibescu, and J. G. Slootweg, “On-line building energy optimization using deep reinforcement learning,” IEEE Trans. Smart Grid , vol. 10, no. 4, pp. 3698–3708, 2018

  12. [12]

    A deep reinforcement learning-based charging scheduling approach with augmented lagrangian for electric vehicles,

    L. Yang, G. Chen, and X. Cao, “A deep reinforcement learning-based charging scheduling approach with augmented lagrangian for electric vehicles,” Applied Energy, vol. 378, p. 124706, 2025

  13. [13]

    Constrained ev charging scheduling based on safe deep reinforcement learning,

    H. Li, Z. Wan, and H. He, “Constrained ev charging scheduling based on safe deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 2427–2439, 2019

  14. [14]

    Residential demand response using reinforcement learning,

    D. O’Neill, M. Levorato, A. Goldsmith, and U. Mitra, “Residential demand response using reinforcement learning,” in 2010 First IEEE international conference on smart grid communications . IEEE, 2010, pp. 409–414

  15. [15]

    Online rein- forcement learning of optimal threshold policies for Markov decision processes,

    A. Roy, V . Borkar, A. Karandikar, and P. Chaporkar, “Online rein- forcement learning of optimal threshold policies for Markov decision processes,” IEEE Trans. Autom. Control, vol. 67, no. 7, pp. 3722–3729, 2021

  16. [16]

    Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy struc- ture,

    H. Park, D. G. Choi, and D. Min, “Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy struc- ture,” Int. J. Prod. Econ. , vol. 266, p. 109029, 2023

  17. [17]

    DeepTOP3: Deep threshold-optimal policy for mdps and rmabs,

    K. Nakhleh, I. Hou et al., “DeepTOP3: Deep threshold-optimal policy for mdps and rmabs,” Advances in Neural Inf. Processing Sys., vol. 35, pp. 28 734–28 746, 2022

  18. [18]

    Laxity differentiated pricing and deadline differentiated threshold scheduling for a public electric vehicle charg- ing station,

    L. Hao, J. Jin, and Y . Xu, “Laxity differentiated pricing and deadline differentiated threshold scheduling for a public electric vehicle charg- ing station,” IEEE Trans. Ind. Inform. , vol. 18, no. 9, pp. 6192–6202, 2022

  19. [19]

    A model predictive control approach for low-complexity electric vehicle charging scheduling: Optimality and scalability,

    W. Tang and Y . J. Zhang, “A model predictive control approach for low-complexity electric vehicle charging scheduling: Optimality and scalability,” IEEE transactions on power systems , vol. 32, no. 2, pp. 1050–1063, 2016

  20. [20]

    Distributed noncooperative mpc for energy scheduling of charging and trading electric vehicles in energy communities,

    N. Mignoni, R. Carli, and M. Dotoli, “Distributed noncooperative mpc for energy scheduling of charging and trading electric vehicles in energy communities,” IEEE Trans. on Control Sys. Technol. , vol. 31, no. 5, pp. 2159–2172, 2023

  21. [21]

    Model predictive charging control of in-vehicle batteries for home energy management based on vehicle state prediction,

    A. Ito, A. Kawashima, T. Suzuki, S. Inagaki, T. Yamaguchi, and Z. Zhou, “Model predictive charging control of in-vehicle batteries for home energy management based on vehicle state prediction,” IEEE Trans. Control Sys. Technol., vol. 26, no. 1, pp. 51–64, 2017

  22. [22]

    Two-stage economic operation of microgrid-like electric vehicle parking deck,

    Y . Guo, J. Xiong, S. Xu, and W. Su, “Two-stage economic operation of microgrid-like electric vehicle parking deck,” IEEE Trans. Smart Grid, vol. 7, no. 3, pp. 1703–1712, 2015

  23. [23]

    MPC-based appliance scheduling for residential building energy management controller,

    C. Chen, J. Wang, Y . Heo, and S. Kishore, “MPC-based appliance scheduling for residential building energy management controller,” IEEE Trans. Smart Grid , vol. 4, no. 3, pp. 1401–1410, 2013

  24. [24]

    Modeling and stochastic control for home energy management,

    Z. Yu, L. Jia, M. C. Murphy-Hoye, A. Pratt, and L. Tong, “Modeling and stochastic control for home energy management,” IEEE Trans. Smart Grid, vol. 4, no. 4, pp. 2244–2255, 2013

  25. [25]

    Joint optimization of electric vehicle and home energy scheduling considering user comfort preference,

    D. T. Nguyen and L. B. Le, “Joint optimization of electric vehicle and home energy scheduling considering user comfort preference,” IEEE Trans. Smart Grid , vol. 5, no. 1, pp. 188–199, 2013

  26. [26]

    Power control framework for green data centers,

    T. Yang, Y . Hou, Y . C. Lee, H. Ji, and A. Y . Zomaya, “Power control framework for green data centers,” IEEE Trans. on Cloud Comput. , vol. 10, no. 4, pp. 2876–2886, 2020

  27. [27]

    Toward optimal operation of internet data center microgrid,

    J. Li and W. Qi, “Toward optimal operation of internet data center microgrid,” IEEE Trans. on Smart Grid , vol. 9, no. 2, pp. 971–979, 2016

  28. [28]

    Intelligent energy schedul- ing in renewable integrated microgrid with bidirectional electricity-to- hydrogen conversion,

    M. Chen, Z. Shen, L. Wang, and G. Zhang, “Intelligent energy schedul- ing in renewable integrated microgrid with bidirectional electricity-to- hydrogen conversion,” IEEE Trans on Netw. Sci. and Eng. , vol. 9, no. 4, pp. 2212–2223, 2022

  29. [29]

    Renewable-Colocated Green Hydrogen Production: Optimal Scheduling and Profitability

    S. Li, L. Tong, T. Mount, K. Upadhyay, H. Eisenhardt, and P. Kumar, “Renewable-colocated green hydrogen production: Optimal scheduling and profitability,” arXiv preprint arXiv:2504.18368 , 2025

  30. [30]

    Optimal scheduling of a hydrogen-based microgrid for an industrial park: A reinforcement learning approach,

    W. He, C. Cai, Q.-L. Han, X. Qing, W. Du, and F. Qian, “Optimal scheduling of a hydrogen-based microgrid for an industrial park: A reinforcement learning approach,” IEEE Trans. on Syst., Man, and Cybern.: Syst., 2025

  31. [31]

    On the optimality of procrastination policy for ev charging under net energy metering,

    M. Jeon, L. Tong, and Q. Zhao, “On the optimality of procrastination policy for ev charging under net energy metering,” in Proc. 62nd IEEE Conf. Decision and Control (CDC ‘23). IEEE, 2023, pp. 1563–1568

  32. [32]

    Imputing a convex objective function,

    A. Keshavarz, Y . Wang, and S. Boyd, “Imputing a convex objective function,” in 2011 IEEE Int. Symp. Intell. Control . IEEE, 2011, pp. 613–619

  33. [33]

    On net energy metering x: Optimal prosumer decisions, social welfare, and cross-subsidies,

    A. S. Alahmed and L. Tong, “On net energy metering x: Optimal prosumer decisions, social welfare, and cross-subsidies,” IEEE Trans. Smart Grid, vol. 14, no. 2, pp. 1652–1663, 2022

  34. [34]

    Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in Int. Conf. Mach. Learn. (ICML) . PMLR, 2018, pp. 1861– 1870

  35. [35]

    ACN-Data: Analysis and Applications of an Open EV Charging Dataset,

    Z. J. Lee, T. Li, and S. H. Low, “ACN-Data: Analysis and Applications of an Open EV Charging Dataset,” in Proc. 10th Int. Conf. Future Energy Sys., ser. e-Energy ’19, Jun. 2019

  36. [36]

    Pecan street dataset,

    “Pecan street dataset,” Available at www.pecanstreet.org/dataport/ (2022/11/01)

  37. [37]

    Continuous control with deep reinforcement learning

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” arXiv preprint arXiv:1509.02971 , 2015

  38. [38]

    Soft Actor-Critic Algorithms and Applications

    T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel et al. , “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905 , 2018. APPENDIX A PROOF OF THEOREMS We use the following notations: • ¯Vt(yt) := Eg[Vt(yt, g)] • ¯Vt(yt, gt) := Eg′[Vt(yt, g′) | gt] • ∂+ y ¯Vt(yt), ∂ − y ¯Vt(yt):...

  39. [39]

    gt < v t + 1T dt(Net consuming): By the first order condition, the optimal schedule of nondeferrable load is d∗ it = d+ i := min ¯di, ∂U −1 it (π+ t ) ∀i = 1, . . . , K. There are three possible cases for the right and left deriva- tive of ¯Vt+1(yt; gt): ∂− y ¯Vt+1(yt; gt) and ∂+ y ¯Vt+1(yt; gt)

  40. [40]

    Hence, by the first order condition of v, v∗ t = min{¯v, yt}

    Case 1: When −π+ t − ∂+ y ¯Vt+1(0; gt) ≥ 0, by the monotonicity of the right derivative, −π+ t −∂+ y ¯Vt+1(yt−min{¯v, yt}; gt) ≥ 0, ∀ yt ≤ (T −t)¯v. Hence, by the first order condition of v, v∗ t = min{¯v, yt}. This is equivalent with the θ+ t (gt) = (T − t)¯v

  41. [41]

    Case 2: When −π+ t − ∂− y ¯Vt+1 (T − t)¯v; gt ≤ 0, by the monotonicity of the left derivative, −π+ t − ∂− y ¯Vt+1(yt; gt) ≤ 0, ∀ yt ≤ (T − t)¯v Hence, by the first order condition of v, v∗ t = 0, which is equivalent with θ+ t (gt) = (T − t)¯v

  42. [42]

    To summarize, for gt < d+ t + min ¯v, [yt − θ+ t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ+ t (gt)]+

    Case 3: If there exists χ+ ∈ 0, (T − t)¯v that satisfies −π+ t − ∂− y ¯Vt+1(χ+; gt) ≤ 0, −π+ t − ∂+ y ¯Vt+1(χ+; gt) ≥ 0, by the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ+]+ . To summarize, for gt < d+ t + min ¯v, [yt − θ+ t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ+ t (gt)]+

  43. [43]

    gt > v t + 1T dt (Net producing): By the first order condition, the optimal nondeferrabld load schedule is d∗ it = d− i := min{ ¯di, ∂U −1 it (π− t )} ∀ i = 1, . . . , K. As the previous case, there are three possible cases for ∂− y ¯Vt+1(yt; gt) and ∂+ y ¯Vt+1(yt; gt)

  44. [44]

    Hence, by the first order condition of v, v∗ t = min{¯v, yt}, which is equivalent with θ− t (gt) = 0

    Case 1: When −π− t − ∂+ y ¯Vt+1(0; gt) ≥ 0, by the monotonicity of the right derivative, −π− t −∂+ y ¯Vt+1(yt−min{¯v, yt}; gt) ≥ 0, ∀ yt ≤ (T −t)¯v. Hence, by the first order condition of v, v∗ t = min{¯v, yt}, which is equivalent with θ− t (gt) = 0

  45. [45]

    Hence, by the first order condition of v, v∗ t = 0 where θ− t (gt) = (T − t)¯v

    Case 2: When −π− t − ∂− y ¯Vt+1 (T − t)¯v; gt ≤ 0, by the monotonicity of the left derivative, −π− t − ∂− y ¯Vt+1(yt; gt) ≤ 0, ∀ yt ≤ (T − t)¯v. Hence, by the first order condition of v, v∗ t = 0 where θ− t (gt) = (T − t)¯v

  46. [46]

    By the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ−(gt)]+

    Case 3: If there exists χ− ∈ [0, (T − t)¯v] that satisfies −π− t − ∂− y ¯Vt+1(χ−; gt) ≤ 0, −π− t − ∂+ y ¯Vt+1(χ−; gt) ≥ 0. By the first order condition, an optimal decision is v∗ t = min ¯v, [yt − χ−(gt)]+ . To summarize for gt > d − t + min ¯v, [yt − θ− t (gt)]+ , (d∗ t , v∗ t ) = d+ t , min ¯v, [yt − θ− t (gt)]+

  47. [47]

    gt = vt + 1T dt (Net-zero): We solve (9) with the constraint gt = vt + 1T dt. The problem becomes max (d,v)∈A,v+1T d=gt Ut(d) + ¯Vt+1(yt − v; gt) (22) Since the optimization problem above satisfies the Slater’s condition, the KKT condition is necessary and sufficient condition. Then, the Lagrangian of the (22) is L0 = Ut(d) + ν(gt − v − 1T d) + ¯Vt+1(yt −...

  48. [48]

    By the first order condition with respect to v, and assumption A3 −π− + q′(yT − v) > 0

    v ≤ gT : The objective function is −π−(v − gT ) − q(yT − v). By the first order condition with respect to v, and assumption A3 −π− + q′(yT − v) > 0. Hence, v∗ T ≥ gT

  49. [49]

    By the first order condition, and assumption A3 −π+ + q′(yT − v) > 0, which implies v∗ T = min{yT , ¯v}

    v > g T : The objective function is −π+(v − gT ) − q(yT − v). By the first order condition, and assumption A3 −π+ + q′(yT − v) > 0, which implies v∗ T = min{yT , ¯v}. For θT = (T − T )¯v = 0, the procrastination charging rate (11) becomes v∗ T = ( yT , 0 < y T ≤ min{yT , gT } min{yT , ¯v}, min{yT , gT } ≤ yT ≤ ¯v = min{yt, ¯v}, and the Proposition 1 holds...

  50. [50]

    , gT ), consider a sequence of charging actions (˜vt,

    yt ≤ (T − t)¯v + min{¯v, gt}: For given a sequence of realizations of DG, (gt, . . . , gT ), consider a sequence of charging actions (˜vt, . . . ,˜vT ) with ˜vt = min {yt, gt, ¯v} − δ > 0 for δ > 0. Consider another sequence of charging actions (v∗ t , . . . , v∗ T ) such that v∗ t = ˜vt + δ andPT τ=t v∗ t =PT τ=t ˜vt − δ. Let the cumulative reward under ...

  51. [51]

    ,˜vT ) with ˜vt = yt − (T − t)¯v + δ < ¯v for δ > 0

    For yt > (T − t)¯v + min{¯v, gt}: Suppose a sequence (˜vt, . . . ,˜vT ) with ˜vt = yt − (T − t)¯v + δ < ¯v for δ > 0. Another sequence (v∗ t , . . . , v∗ T ) with v∗ t = ˜vt −δ = yt −(T − t)¯v has the cumulative reward R∗ t that satisfies : R∗ t = TX τ=t −Pπ(v∗ t − gt) = −Pπ(˜vt − gt) + π+δ + TX τ=t+1 −Pπ(v∗ t − gt) ≥ −Pπ(˜vt − gt) + π+δ + TX τ=t+1 −Pπ(˜v...