Switching-time bioprocess control with pulse-width-modulated optogenetics

Sebasti\'an Espinel-R\'ios

arxiv: 2511.22893 · v2 · submitted 2025-11-28 · 📡 eess.SY · cs.AI· cs.SY

Switching-time bioprocess control with pulse-width-modulated optogenetics

Sebasti\'an Espinel-R\'ios This is my paper

Pith reviewed 2026-05-17 05:00 UTC · model grok-4.3

classification 📡 eess.SY cs.AIcs.SY

keywords optogeneticspulse-width modulationreinforcement learningbioprocess controlswitching-time controlduty cycledynamic metabolic control

0 comments

The pith

Duty cycle parametrization lets reinforcement learning optimize switching times in pulse-width-modulated optogenetic bioprocess control without binary decision variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses cases where optogenetic gene expression responds steeply to light intensity, leaving little room for intermediate control when using amplitude alone. It shows that pulse-width modulation can smooth the average response by switching between fully on and fully off light within each forcing period. Rather than solving the resulting switching-time problem as a mixed-integer program on a fine time grid, the method parametrizes each control action by its duty cycle, a single continuous number that directly encodes the on-time fraction. Reinforcement learning then trains a policy on this continuous proxy, which respects the binary light constraint while keeping the decision space manageable even across many periods.

Core claim

Parametrizing control actions via the duty cycle as a continuous proxy variable encodes the ON-to-OFF switching time within each forcing period, thereby respecting the intrinsic binary nature of the light intensity while avoiding fine-grid binary decision variables.

What carries the argument

Duty cycle as a continuous proxy variable that stands in for the switching instant inside each pulse-width-modulation period.

If this is right

The number of decision variables stays constant with respect to grid resolution inside each forcing period.
Control remains feasible for long sequences of forcing periods without combinatorial explosion.
Average gene-expression levels become tunable even when the underlying light-to-expression map is nearly step-like.
The same duty-cycle encoding can be reused across different optogenetic strains or bioprocess objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to other binary-actuated systems where intermediate states are costly or impossible to implement directly.
Successful transfer would reduce reliance on specialized analog light sources in favor of simple on-off LEDs.
The method invites direct comparison with grid-based mixed-integer solvers on identical bioprocess models to quantify computational savings.

Load-bearing premise

A reinforcement learning policy trained on a simulated bioprocess model will transfer to real hardware without major performance loss or safety problems when the dose-response curve is steep.

What would settle it

Deploying the trained policy on physical optogenetic hardware and measuring whether process performance or safety metrics degrade sharply compared with simulation results.

Figures

Figures reproduced from arXiv: 2511.22893 by Sebasti\'an Espinel-R\'ios.

**Figure 1.** Figure 1: Comparison of the normalized average Hill activation function ¯q ∗ p,k over the forcing period Tk under intensity-driven and PWM-driven actuation. The normalized average activation is defined as ¯q ∗ p,k := q¯p,k/qp,max, yielding a range [0, 1]. q¯p,k = 1 T Z (k+1)T kT qp(I(t)) dt = 1 T "Z τk=(k+Dk)T kT qp(Imax) dt + Z (k+1)T τk=(k+Dk)T qp(0) dt # = Dk qp(Imax). (20) Thus, it becomes clear that the avera… view at source ↗

**Figure 2.** Figure 2: Return over training epochs for the RL poli [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Optimized light input trajectories obtained with [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Biotechnology can benefit from dynamic control to improve production efficiency. In this context, optogenetics enables modulation of gene expression using light as an external input, allowing fine-tuning of protein levels to unlock dynamic metabolic control and regulation of cell growth. Optogenetic systems can be actuated by light intensity. However, relying solely on intensity-driven control (i.e., signal amplitude) may fail to properly tune optogenetic bioprocesses when the dose-response relationship (i.e., light intensity versus gene-expression strength) is steep. In these cases, tunability is effectively constrained to either fully active or fully repressed gene expression, with little intermediate regulation. Pulse-width modulation can alleviate this issue by alternating between fully ON and OFF light intensity within forcing periods, thereby smoothing the average response and enhancing process controllability. Optimizing pulse-width-modulated optogenetics entails a switching-time optimal control problem with a binary input over multiple forcing periods. While this can be formulated as a mixed-integer optimization problem on a refined control grid with monotonic input constraints, the number of decision variables can grow rapidly with increasing control-grid resolution within forcing periods and with the total number of forcing periods, complicating the task. Here, we propose an alternative solution based on reinforcement learning. We parametrize control actions via the duty cycle, a continuous proxy variable that encodes the ON-to-OFF switching time within each forcing period, thereby respecting the intrinsic binary nature of the light intensity while avoiding fine-grid binary decision variables.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a clean RL reformulation that uses duty cycle to turn PWM optogenetic switching into a continuous-action problem, but it locks the input to one ON-then-OFF pattern per period.

read the letter

The main takeaway is that the authors recast the switching-time problem for pulse-width modulated optogenetics as a reinforcement learning task where the action is the duty cycle. This keeps the light strictly binary while avoiding the combinatorial growth that comes from gridding each forcing period finely or stacking many periods in a mixed-integer program. It is a direct response to the practical issue that steep dose-response curves leave little room for intermediate control when you can only vary light intensity.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a reinforcement learning method to solve the switching-time optimal control problem arising in pulse-width-modulated optogenetic control of bioprocesses. Control actions are parametrized by the duty cycle, treated as a continuous proxy that encodes the ON-to-OFF switching instant within each forcing period; this respects the binary character of the light input while sidestepping the combinatorial growth of mixed-integer decision variables on a refined grid.

Significance. If the RL policy can be shown to produce competitive or superior trajectories relative to mixed-integer formulations and to transfer to hardware, the approach would supply a scalable, non-combinatorial route to fine regulation of gene expression in systems whose dose-response curves are too steep for intensity modulation alone. The work draws on standard RL theory and optimal-control formulations rather than introducing new theoretical machinery.

major comments (1)

[Abstract] Abstract: The duty-cycle parametrization encodes exactly one ON-to-OFF transition per forcing period. The underlying switching-time problem, however, admits arbitrary binary sequences. When the bioprocess dynamics are nonlinear or possess memory on the scale of the forcing period, patterns such as OFF-ON or multiple switches within the same interval can produce a strictly superior average gene-expression trajectory. The manuscript must either demonstrate that the single-transition restriction is performance-neutral for the target models or quantify the sub-optimality gap.

minor comments (1)

[Abstract] The abstract states that the method 'avoids fine-grid binary decision variables' but does not specify the RL algorithm, state representation, or reward function; these details are needed to assess reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the insightful comment on the scope of the duty-cycle parametrization. We address it directly below.

read point-by-point responses

Referee: [Abstract] Abstract: The duty-cycle parametrization encodes exactly one ON-to-OFF transition per forcing period. The underlying switching-time problem, however, admits arbitrary binary sequences. When the bioprocess dynamics are nonlinear or possess memory on the scale of the forcing period, patterns such as OFF-ON or multiple switches within the same interval can produce a strictly superior average gene-expression trajectory. The manuscript must either demonstrate that the single-transition restriction is performance-neutral for the target models or quantify the sub-optimality gap.

Authors: We acknowledge that the duty-cycle approach restricts control to a single ON-to-OFF transition per forcing period. This restriction is deliberate: it implements standard pulse-width modulation, keeps the action space continuous, and avoids the combinatorial explosion of arbitrary binary sequences or multiple switches. For the bioprocess models in the manuscript, whose time scales are slower than the forcing period, we expect the gap to be small, but we accept the referee's point that this must be verified. In the revised manuscript we will add a quantitative comparison, solving a mixed-integer program that permits multiple transitions on a coarse grid for representative cases and reporting the resulting sub-optimality gap relative to the duty-cycle policy. revision: yes

Circularity Check

0 steps flagged

No significant circularity; modeling choice is independent of inputs

full rationale

The paper proposes parametrizing control actions via duty cycle as a continuous proxy to encode ON-to-OFF switching time per forcing period. This is presented as an explicit modeling alternative to mixed-integer optimization on refined grids, drawing from standard reinforcement learning and optimal control formulations. No equations or claims reduce outputs to inputs by construction, no fitted parameters are renamed as predictions, and no self-citation chains or uniqueness theorems are invoked as load-bearing. The derivation chain remains self-contained against external benchmarks in RL and control theory.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the bioprocess dynamics can be adequately simulated for RL training and that the duty cycle fully captures the average gene-expression effect without additional unmodeled nonlinearities.

free parameters (1)

RL hyperparameters (learning rate, discount factor, network architecture)
Standard RL training parameters that must be chosen or tuned; not specified in abstract.

axioms (1)

domain assumption The underlying bioprocess can be modeled as a Markov decision process with observable states and reward signals tied to production objectives.
Invoked when framing the control task as an RL problem.

pith-pipeline@v0.9.0 · 5563 in / 1380 out tokens · 42061 ms · 2026-05-17T05:00:13.692091+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We parametrize control actions via the duty cycle, a continuous proxy variable that encodes the ON-to-OFF switching time within each forcing period, thereby respecting the intrinsic binary nature of the light intensity while avoiding fine-grid binary decision variables.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ui,k(t) ∈ {0,1} ... τ_i,k(D_i,k) = (k + D_i,k) T

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...

work page
[4]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize ":" * " " *...

work page
[5]

Benisch, M., Aoki, S.K., and Khammash, M. (2024). Unlocking the potential of optogenetics in microbial applications. Current Opinion in Microbiology, 77, 102404

work page 2024
[6]

Benzinger, D., Ovinnikov, S., and Khammash, M. (2022). Synthetic gene networks recapitulate dynamic signal decoding and differential gene expression. Cell Systems, 13(5), 353--364.e6

work page 2022
[7]

Davidson, E.A., Basu, A.S., and Bayer, T.S. (2013). Programming Microbes Using Pulse Width Modulation of Optical Signals . Journal of Molecular Biology, 425(22), 4161--4166

work page 2013
[8]

(2025 a )

Espinel-Ríos, S., Avalos, J.L., Del Rio Chanona, E.A., and Zhang, D. (2025 a ). Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses. Computers & Chemical Engineering, 202, 109297

work page 2025
[9]

(2025 b )

Espinel-Ríos, S., Walser, R., and Zhang, D. (2025 b ). Reinforcement Learning for Robust Dynamic Metabolic Control . Biotechnology and Bioengineering, bit.70077

work page 2025
[10]

Ewing, T.A., Nouse, N., Van Lint, M., Van Haveren, J., Hugenholtz, J., and Van Es, D.S. (2022). Fermentation for the production of biobased chemicals in a circular economy: a perspective for the period 2022–2050. Green Chemistry, 24(17), 6373--6405

work page 2022
[11]

Hoffman, S.M., Tang, A.Y., and Avalos, J.L. (2022). Optogenetics illuminates applications in microbial engineering. Annual Review of Chemical and Biomolecular Engineering, 13(1), 373--403

work page 2022
[12]

and Nielsen, J

Konzock, O. and Nielsen, J. (2024). TRYing to evaluate production costs in microbial biotechnology. Trends in Biotechnology, 42(11), 1339--1347

work page 2024
[13]

Milias-Argeitis, A., Rullan, M., Aoki, S.K., Buchmann, P., and Khammash, M. (2016). Automated optogenetic feedback control for precise and robust regulation of gene expression and cell growth. Nature Communications, 7(1), 12546

work page 2016
[14]

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., K\" o pf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: an imperative style, high-performance deep learning library. Curran Associate...

work page 2019
[15]

Sutton, R.S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In S. Solla, T. Leen, and K. M\" u ller (eds.), Advances in Neural Information Processing Systems, volume 12. MIT Press

work page 1999

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...

work page

[4] [4]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize ":" * " " *...

work page

[5] [5]

Benisch, M., Aoki, S.K., and Khammash, M. (2024). Unlocking the potential of optogenetics in microbial applications. Current Opinion in Microbiology, 77, 102404

work page 2024

[6] [6]

Benzinger, D., Ovinnikov, S., and Khammash, M. (2022). Synthetic gene networks recapitulate dynamic signal decoding and differential gene expression. Cell Systems, 13(5), 353--364.e6

work page 2022

[7] [7]

Davidson, E.A., Basu, A.S., and Bayer, T.S. (2013). Programming Microbes Using Pulse Width Modulation of Optical Signals . Journal of Molecular Biology, 425(22), 4161--4166

work page 2013

[8] [8]

(2025 a )

Espinel-Ríos, S., Avalos, J.L., Del Rio Chanona, E.A., and Zhang, D. (2025 a ). Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses. Computers & Chemical Engineering, 202, 109297

work page 2025

[9] [9]

(2025 b )

Espinel-Ríos, S., Walser, R., and Zhang, D. (2025 b ). Reinforcement Learning for Robust Dynamic Metabolic Control . Biotechnology and Bioengineering, bit.70077

work page 2025

[10] [10]

Ewing, T.A., Nouse, N., Van Lint, M., Van Haveren, J., Hugenholtz, J., and Van Es, D.S. (2022). Fermentation for the production of biobased chemicals in a circular economy: a perspective for the period 2022–2050. Green Chemistry, 24(17), 6373--6405

work page 2022

[11] [11]

Hoffman, S.M., Tang, A.Y., and Avalos, J.L. (2022). Optogenetics illuminates applications in microbial engineering. Annual Review of Chemical and Biomolecular Engineering, 13(1), 373--403

work page 2022

[12] [12]

and Nielsen, J

Konzock, O. and Nielsen, J. (2024). TRYing to evaluate production costs in microbial biotechnology. Trends in Biotechnology, 42(11), 1339--1347

work page 2024

[13] [13]

Milias-Argeitis, A., Rullan, M., Aoki, S.K., Buchmann, P., and Khammash, M. (2016). Automated optogenetic feedback control for precise and robust regulation of gene expression and cell growth. Nature Communications, 7(1), 12546

work page 2016

[14] [14]

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., K\" o pf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: an imperative style, high-performance deep learning library. Curran Associate...

work page 2019

[15] [15]

Sutton, R.S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In S. Solla, T. Leen, and K. M\" u ller (eds.), Advances in Neural Information Processing Systems, volume 12. MIT Press

work page 1999