arxiv: 2605.14215 · v1 · submitted 2026-05-14 · 💻 cs.AI · cs.LG· q-bio.QM

Recognition: 2 theorem links

· Lean Theorem

GenCircuit-RL: Reinforcement Learning from Hierarchical Verification for Genetic Circuit Design

Noah Flynn

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:41 UTC · model grok-4.3

classification 💻 cs.AI cs.LGq-bio.QM

keywords genetic circuit designreinforcement learninghierarchical verificationsynthetic biologySBOLcurriculum learningcode generationbenchmark

0 comments

The pith

Reinforcement learning with five-level hierarchical verification rewards generates topologically correct genetic circuits in SBOL code and generalizes to novel parts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GenCircuit-RL, a reinforcement learning framework that trains models to output Python code in the Synthetic Biology Open Language for constructing genetic circuits. It replaces simple binary rewards with a five-level verification hierarchy that checks code execution, syntax, basic structure, and task-specific topology, paired with a four-stage curriculum that moves focus from code writing to functional reasoning. On the new SynBio-Reason benchmark of 4,753 circuits across six types and nine tasks, this approach raises success on functional reasoning by 14 to 16 points over binary rewards. Curriculum learning proves necessary for strong de novo design results. The trained models produce correct topologies, handle held-out biological parts, and recover known designs from the synthetic biology literature.

Core claim

GenCircuit-RL shows that decomposing circuit correctness into five graded verification levels and training via a four-stage curriculum that shifts pressure from code generation to functional reasoning enables RL models to output valid pysbol3 code for genetic circuits that satisfy topological constraints, generalize to unseen biological parts, and rediscover canonical designs.

What carries the argument

The five-level hierarchical verification reward system, ranging from code execution to task-specific topological checks, together with the four-stage curriculum that progressively emphasizes functional reasoning over basic code validity.

If this is right

The models produce circuits that pass automated topological verification and can be directly translated into SBOL representations.
Performance holds on held-out biological parts, showing the approach is not limited to training-set components.
The same training process recovers well-known circuit architectures from the synthetic biology literature without being shown them explicitly.
Curriculum learning is required; removing it sharply reduces success on de novo design tasks.
Hierarchical rewards deliver 14-16 point gains specifically on functional reasoning subtasks compared with binary success signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the generated circuits can be validated in the lab, the method could shorten the initial design phase of synthetic biology experiments from weeks of expert iteration to hours of model sampling.
The SynBio-Reason benchmark provides a concrete testbed that other code-generation or planning systems could use to measure progress on biological design tasks.
Extending the reward hierarchy to include simple dynamical simulations might further close the gap between in silico correctness and in vivo behavior.
Because the output is executable SBOL code, the framework could integrate directly with automated DNA assembly pipelines.

Load-bearing premise

Success on the verification hierarchy and SynBio-Reason benchmark reliably predicts whether the generated circuits will function as intended when built and tested in living cells.

What would settle it

Take the top circuits produced by the trained models, synthesize them in bacteria or yeast, and measure whether they exhibit the expected input-output behavior such as correct logic gate response or gene regulation.

Figures

Figures reproduced from arXiv: 2605.14215 by Noah Flynn.

**Figure 1.** Figure 1: GenCircuit-RL system overview. Zone 1: We construct 4,753 circuits from three sources (procedural generation, Cello, and literature) spanning nine tasks. Zone 2: Training proceeds in two phases—SFT followed by GRPO-based RLVF. A four-stage curriculum shifts reward emphasis from execution (Stage 1) through structure (Stage 2) and reasoning (Stage 3) to de novo design (Stage 4), with promotion gated by valid… view at source ↗

**Figure 2.** Figure 2: Verification level success rates on the ProceduralTest split. All methods show monotonically decreasing success from Execution to Function, but RLVF-Hier-Curriculum (RLVF-H-C) degrades most gracefully. The two largest drops— Structure→Semantics (−14.6 pp) and Semantics→Function (−11.4 pp)—identify ontology annotation and regulatory reasoning as the principal remaining bottlenecks. Bottom: the gain of RL… view at source ↗

**Figure 3.** Figure 3: Reward distribution evolution across curriculum stages. Stage 1 (light blue) shows a bimodal distribution with substantial mass near zero, reflecting the mixture of executable and non-executable code. Successive stages shift the distribution rightward: Stage 2 (dark blue) eliminates most zero-reward outputs, Stage 3 (light red) concentrates mass above 0.5, and Stage 4 (dark red) produces a tight, high-rewa… view at source ↗

**Figure 4.** Figure 4: Training curves for GRPO versus PPO across task groups. (a) On code fundamental tasks (T1–T2), both algorithms converge quickly, though GRPO reaches a ∼5 pp higher plateau. (b) On structural/translation tasks (T3–T5), GRPO converges faster and achieves higher peak performance (∼70% vs. ∼55%). (c) On functional reasoning tasks (T6–T7), the divergence is most pronounced: GRPO exhibits steady upward progress … view at source ↗

**Figure 5.** Figure 5: Compositional generalization across three dimensions. Panel A (Part-level): RLVF-H-C (blue) degrades gracefully as the fraction of novel parts increases, retaining 73.3% of in-distribution performance at 76–100% novel parts—substantially more robust than SFT (gray). Panel B (Topology-level): simple motifs (Cassette, Gate) transfer well with only 2.8–4.2 pp zero-shot degradation, while bistable (Toggle) and… view at source ↗

**Figure 6.** Figure 6: Schematic of a regulated genetic circuit (Poole et al., 2022). The pTetR promoter controls expression of downstream coding sequences through operator-mediated regulation. Listing 2. Regulated Expression Circuit in PySBOL 59 [PITH_FULL_IMAGE:figures/full_fig_p059_6.png] view at source ↗

**Figure 7.** Figure 7: Three-phase methodology for refining AI-generated genetic circuits using machine learning surrogates. Phase 1 generates topologically valid circuits from natural language specifications via GenCircuit-RL and the five-level verification hierarchy. Phase 2 quantitatively scores circuits using a trained MLP surrogate operating on one-hot composition vectors derived from CLASSIC highthroughput data. Phase 3 e… view at source ↗

**Figure 8.** Figure 8: RLAIF refinement dynamics. (a) Mean fold change over iterations with three mutation rate configurations. (b) Percentage of circuits exceeding the HFC threshold (> 25×). H.4. Discussion Preliminary results demonstrate feasibility that (1) ML surrogates trained on high-throughput data can serve as effective RLAIF reward signals for genetic circuit optimization and (2) RLAIF refinement enriches for target qua… view at source ↗

read the original abstract

Genetic circuit design remains a laborious, expert-driven process despite decades of progress in synthetic biology. We study this problem through code generation: models produce Python code in pysbol3 to construct genetic circuits in the Synthetic Biology Open Language (SBOL), a formal representation that supports automated verification. We introduce GenCircuit-RL, a reinforcement learning framework built around hierarchical verification rewards that decompose correctness into five levels, from code execution to task-specific topological checks, and a four-stage curriculum that shifts optimization pressure from code generation to functional reasoning. We also introduce SynBio-Reason, a benchmark of 4,753 circuits spanning six canonical circuit types and nine tasks from code repair to de novo design, with held-out biological parts for out-of-distribution evaluation. Hierarchical verification improves task success on functional reasoning tasks by 14 to 16 percentage points over binary rewards, and curriculum learning is required for strong design performance. The resulting models generate topologically correct circuits, generalize to novel biological parts, and rediscover canonical designs from the synthetic biology literature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GenCircuit-RL adds a hierarchical verification reward and the SynBio-Reason benchmark to RL-based genetic circuit design, but gains stay inside in-silico checks with no wet-lab link.

read the letter

The paper's core move is to frame genetic circuit design as code generation in SBOL, then train an RL agent with a five-level verification hierarchy for rewards and a four-stage curriculum. It also ships SynBio-Reason, a benchmark of 4753 circuits across six types and nine tasks, including held-out parts for out-of-distribution tests. Hierarchical rewards lift functional task success by 14-16 points over binary baselines, and the curriculum appears necessary for strong de-novo design results. The models produce topologically valid circuits, generalize to new parts, and recover some canonical designs from the literature. These pieces are concrete and new relative to the prior work cited in the abstract. The benchmark itself is a usable resource for anyone testing code-generation methods in this domain. The evaluation stays entirely within the verification stack. SBOL checks cover syntax, execution, and topology but do not model gene expression dynamics, resource limits, or cellular context. No wet-lab data or comparison to measured circuit behavior is reported, so the mapping from verification success to actual function is untested. The abstract also omits exact baseline definitions, statistical tests, and how out-of-distribution parts were selected, which leaves the size of the reported gains harder to judge. This work is for researchers already working on AI-assisted synthetic biology or automated design pipelines. Readers who need a new benchmark or a structured reward scheme for RL in code generation will find usable material. It is solid enough to deserve peer review; the framework and dataset are worth detailed referee scrutiny even if the biological claims need more grounding.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces GenCircuit-RL, a reinforcement learning framework for generating pysbol3 Python code to construct genetic circuits in the SBOL formal language. It uses a five-level hierarchical verification reward (from code execution to task-specific topological checks) combined with a four-stage curriculum that shifts focus from code generation to functional reasoning. A new benchmark called SynBio-Reason is presented, containing 4,753 circuits across six canonical types and nine tasks (code repair to de novo design) with held-out parts for OOD testing. The central claims are that hierarchical verification yields 14-16 percentage point gains in success on functional tasks over binary rewards, curriculum learning is required for strong design performance, and the resulting models produce topologically correct circuits, generalize to novel parts, and rediscover canonical designs from the literature.

Significance. If the empirical results prove robust under fuller reporting, the work offers a concrete advance in applying structured RL to automated code generation for synthetic biology, moving beyond binary success signals to multi-level verification that aligns with formal circuit representations. The SynBio-Reason benchmark could become a useful standard for evaluating generation models on biological design tasks. The curriculum design and use of external SBOL tools are positive technical choices for handling long-horizon generation. However, the overall significance for the target application remains provisional because all evaluation stays within the in silico verification proxy.

major comments (3)

[Abstract] Abstract: The headline claim of a 14-16 percentage point improvement on functional reasoning tasks is presented without any description of the exact baseline reward formulations, implementation details of the comparison agents, number of random seeds, or statistical tests used to establish the difference; these omissions make the central empirical result difficult to evaluate.
[Benchmark] Benchmark and evaluation sections: The SynBio-Reason benchmark description does not specify the procedure for selecting the held-out biological parts or the precise criteria used to define out-of-distribution instances, which directly affects the reliability of the reported generalization results.
[Methods] Methods and discussion: The five-level verification hierarchy is used as the sole performance signal, yet the manuscript provides no analysis or external evidence that success on these syntactic and structural SBOL checks correlates with actual circuit behavior in cellular environments (e.g., expression dynamics or resource competition); this assumption is load-bearing for all claims about genetic circuit design.

minor comments (1)

[Abstract] The abstract refers to 'pysbol3' without stating the library version or dependency constraints; the full paper should include an explicit environment specification for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us strengthen the clarity and transparency of the manuscript. We address each major comment below and have made targeted revisions to the abstract, benchmark description, and discussion sections.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of a 14-16 percentage point improvement on functional reasoning tasks is presented without any description of the exact baseline reward formulations, implementation details of the comparison agents, number of random seeds, or statistical tests used to establish the difference; these omissions make the central empirical result difficult to evaluate.

Authors: We agree that the abstract lacked sufficient detail on the empirical comparison. In the revised manuscript we have expanded the abstract to specify the baseline (binary success reward), the agents (PPO with identical policy architecture and hyperparameters), the number of random seeds (5), and the statistical test (paired t-test across seeds yielding p < 0.01). Full implementation details and variance numbers remain in Section 4. revision: yes
Referee: [Benchmark] Benchmark and evaluation sections: The SynBio-Reason benchmark description does not specify the procedure for selecting the held-out biological parts or the precise criteria used to define out-of-distribution instances, which directly affects the reliability of the reported generalization results.

Authors: We have added a new paragraph in Section 3.2 that details the held-out part selection: 200 candidate parts were drawn from the iGEM and SynBioHub repositories after removing any part that appeared in the training circuits; OOD status was defined by sequence identity < 70 % to any training part plus no shared functional annotation. This procedure is now explicitly stated and the resulting OOD split sizes are reported. revision: yes
Referee: [Methods] Methods and discussion: The five-level verification hierarchy is used as the sole performance signal, yet the manuscript provides no analysis or external evidence that success on these syntactic and structural SBOL checks correlates with actual circuit behavior in cellular environments (e.g., expression dynamics or resource competition); this assumption is load-bearing for all claims about genetic circuit design.

Authors: We acknowledge that the work evaluates correctness via in-silico SBOL verification rather than direct wet-lab measurements. The hierarchy is built on community-standard SBOL validators and topological rules that are necessary (though not sufficient) for functional circuits. We have added a dedicated limitations paragraph in the discussion that explicitly states the proxy nature of the metric and notes that in-vivo correlation remains an open question for future experimental studies. revision: partial

Circularity Check

0 steps flagged

No load-bearing circularity; empirical gains rest on external verification and held-out benchmark

full rationale

The paper reports RL performance improvements (14-16 pp) on a newly introduced SynBio-Reason benchmark using SBOL-based hierarchical verification rewards. No equations, fitted parameters, or self-citations are shown to reduce the reported success rates or topological correctness claims to quantities defined by the method itself. The verification hierarchy and curriculum are external to the model training loop in the sense that they rely on independent code-execution and topological checks rather than re-using model outputs as ground truth. This yields a minor score consistent with normal self-contained empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that SBOL code can be automatically verified at multiple levels of correctness and that these levels correlate with biological function. No free parameters or invented entities are described.

axioms (1)

domain assumption SBOL is a formal representation that supports automated verification of genetic circuits
Invoked in the abstract as the foundation for using code generation and verification rewards.

pith-pipeline@v0.9.0 · 5475 in / 1329 out tokens · 27818 ms · 2026-05-15T02:41:14.708957+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We decompose verification into five levels: execution, validity, structure, semantics, and function... rfunc = rstruct · rsem · ftask
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

four-stage curriculum that shifts optimization pressure from code generation to functional reasoning

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 3 internal anchors

[1]

doi: 10.1006/jtbi.2000.1068. Cox, R. S., Surette, M. G., and Elowitz, M. B. Pro- gramming gene expression with combinatorial promot- ers.Molecular Systems Biology, 3:145, 2007. doi: 10.1038/msb4100187. Danino, T., Mondrag ´on-Palomino, O., Tsimring, L., and Hasty, J. A synchronized quorum of genetic clocks. Nature, 463(7279):326–330, 2010. doi: 10.1038/ n...

work page doi:10.1006/jtbi.2000.1068 2000
[2]

doi: 10.1093/nar/gkq810. Din, M. O., Danino, T., Prindle, A., Skalak, M., Se- limkhanov, J., Allen, K., Julio, E., Atolia, E., Tsimring, L. S., Bhatia, S. N., and Hasty, J. Synchronized cycles of bacterial lysis for in vivo delivery.Nature, 536(7614): 81–85, 2016a. doi: 10.1038/nature18930. URL https: //www.nature.com/articles/nature18930. Din, M. O., Dan...

work page doi:10.1093/nar/gkq810 2020
[3]

Gemma 3 Technical Report

doi: 10.1038/nbt.1536. Elowitz, M. B. and Leibler, S. A synthetic oscillatory net- work of transcriptional regulators.Nature, 403:335–338, 2000a. doi: 10.1038/35002125. Elowitz, M. B. and Leibler, S. A synthetic oscillatory net- work of transcriptional regulators.Nature, 403(6767): 335–338, 2000b. doi: 10.1038/35002125. Entus, R., Aufderheide, B., and Bha...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/nbt.1536 2007
[4]

Kelly, J

doi: 10.1038/msb.2008.43. Kelly, J. R., Rubin, A. J., Davis, J. H., Ajo-Franklin, C. M., Cumbers, J., Czar, M. J., de Mora, K., Glieberman, A. L., Monie, D. D., and Endy, D. Measuring the activity of biobrick promoters using an in vivo reference standard. Journal of Biological Engineering, 3:4, 2009. doi: 10. 1186/1754-1611-3-4. Kitada, T., DiAndreth, B.,...

work page doi:10.1038/msb.2008.43 2008
[5]

URL https://www.sciencedirect.com/ science/article/pii/S0022283612000113. Ong, N. T., Olson, E. J., and Tabor, J. J. Engineering an e. coli near-infrared light sensor.ACS Synthetic Biology, 7 (1):240–248, 2018. doi: 10.1021/acssynbio.7b00289. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A...

work page doi:10.1021/acssynbio.7b00289 2018
[6]

Rosenfeld, N., Young, J

doi: 10.1038/nature09679. Rosenfeld, N., Young, J. W., Alon, U., Swain, P. S., and Elowitz, M. B. Gene regulation at the single- cell level.Science, 307(5717):1962–1965, 2005. doi: 10.1126/science.1106914. Roy, R., Ray, S., Chowdhury, A., and Anand, R. Tunable multiplexed whole-cell biosensors as environmental di- agnostics for ppb-level detection of arom...

work page doi:10.1038/nature09679 1962
[7]

Saltepe, B., Kehribar, E

doi: 10.1038/nbt.1568. Saltepe, B., Kehribar, E. S., et al. Cellular biosensors with engineered genetic circuits.ACS Sensors, 3(1):13– 26, 2018. doi: 10.1021/acssensors.7b00728. PMID: 29168381. Schmidl, S. R., Sheth, R. U., Wu, A., and Tabor, J. J. Refac- toring and optimization of light-switchable escherichia coli two-component systems.ACS Synthetic Biol...

work page doi:10.1038/nbt.1568 2018
[8]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

doi: 10.1038/s41551-018-0215-0. Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y . K., Wu, Y ., and Guo, D. Deepseekmath: Pushing the limits of mathemat- ical reasoning in open language models, 2024. URL https://arxiv.org/abs/2402.03300. Shong, J., Huang, Y .-M., Bystroff, C., and Collins, C. H. Directed evolution of the ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/s41551-018-0215-0 2024
[9]

Stanton, B

doi: 10.1021/acssynbio.7b00119. Stanton, B. C., Nielsen, A. A. K., Tamsir, A., Clancy, K., Peterson, T., and V oigt, C. A. Genomic mining of prokaryotic repressors for orthogonal logic gates. Nature Chemical Biology, 10(2):99–105, 2014. doi: 10.1038/nchembio.1411. Stricker, J., Cookson, S., Bennett, M. R., Mather, W. H., Tsimring, L. S., and Hasty, J. A f...

work page doi:10.1021/acssynbio.7b00119 2014
[10]

Su, Y ., Yu, D., Song, L., Li, J., Mi, H., Tu, Z., Zhang, M., and Yu, D

doi: 10.1038/nature07389. Su, Y ., Yu, D., Song, L., Li, J., Mi, H., Tu, Z., Zhang, M., and Yu, D. Crossing the reward bridge: Expanding rl with verifiable rewards across diverse domains, 2025. URL https://arxiv.org/abs/2503.23829. Swofford, C. A., Dessel, N. V ., and Forbes, N. S. Quorum- sensing salmonella selectively trigger protein expression within t...

work page doi:10.1038/nature07389 2025
[11]

Tamsir, A., Tabor, J

URL https://www.sciencedirect.com/ science/article/pii/S0022283610011575. Tamsir, A., Tabor, J. J., and V oigt, C. A. Robust multicellular computing using genetically encoded NOR gates and chemical ’wires’.Nature, 469(7329):212–215, 2011. doi: 10.1038/nature09565. URL https://www.nature. com/articles/nature09565. Tellechea-Luzardo, J., Stiebritz, M. T., a...

work page doi:10.1038/nature09565 2011
[12]

Xie, Z., Wroblewska, L., Prochazka, L., Weiss, R., and Benenson, Y

doi: 10.1016/j.chembiol.2014.10.008. Xie, Z., Wroblewska, L., Prochazka, L., Weiss, R., and Benenson, Y . Multi-input RNAi-based logic circuit for identification of specific cancer cells.Science, 333(6047): 1307–1311, 2011. Xu, X., Lv, X., Bi, X., Chen, J., and Liu, L. Ge- netic circuits for metabolic flux optimization.Trends in Microbiology, 32(8):791–80...

work page doi:10.1016/j.chembiol.2014.10.008 2014
[13]

Qwen3 Technical Report

URL https://www.sciencedirect.com/ science/article/pii/S0966842X24000040. Yang, A., Li, A., et al. Qwen3 technical report, 2025. URL https://arxiv.org/abs/2505.09388. Yokobayashi, Y ., Weiss, R., and Arnold, F. H. Directed evolution of a genetic circuit.Proceedings of the National Academy of Sciences, 99(26):16587–16591, 2002. doi: 10.1073/pnas.252535999....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1073/pnas.252535999 2025
[14]

sender” population produces signaling molecules in response to an external inducer, while a “receiver

implemented an AND-gate circuit that simultaneously detects thiosulfate and nitrate biomarkers, improving diagnostic specificity through multi-input logic. Isabella et al. (Isabella et al., 2018) engineered bacteria to treat phenylketonuria (PKU), a metabolic disorder. The circuit activates phenylalanine-metabolizing enzymes specifically under the anoxic ...

work page 2018
[15]

Parse the SBOL document using pysbol3

work page
[16]

33 GenCircuit-RL: Reinforcement Learning from Hierarchical Verification for Genetic Circuit Design

Create graph nodes for each Component with role in{promoter, CDS, reporter}. 33 GenCircuit-RL: Reinforcement Learning from Hierarchical Verification for Genetic Circuit Design

work page
[17]

(b) Create a directed edge from regulator (CDS producing the regulatory protein) to target (regulated promoter)

For each Interaction object in the document: (a) Identify participant roles from Participation objects (inhibitor/inhibited for repression, stimulator/stimulated for activation). (b) Create a directed edge from regulator (CDS producing the regulatory protein) to target (regulated promoter). (c) Label the edge with interaction polarity (repression or activation)

work page
[18]

repression

Validate graph connectivity and return the labeled directed graph. D.1.2. STRUCTURAL ANDSEMANTICCHECKENUMERATION Levels 3 and 4 of the hierarchical reward are computed as averages over fixed check setsCstruct and Csem, each containing five equally-weighted pass/fail checks ( |Cstruct|=|C sem|= 5 ; each check contributes 1/5 to the corresponding level scor...

work page 2026
[19]

At 4B, the gains are 9.6 pp for Gemma-3-4B and 10.3 pp for Qwen3-4B

Hierarchical reward benefit generalizes across model families.The SFT →RLVF-H-C improvement is 16.7 pp for Llama-3.1-8B, 17.2 pp for Gemma-3-12B, and 19.1 pp for Qwen3-8B (averaged across evaluation splits). At 4B, the gains are 9.6 pp for Gemma-3-4B and 10.3 pp for Qwen3-4B. The consistency of these improvements across three families with markedly differ...

work page
[20]

Curriculum necessity is architecture-independent.All five model configurations collapse to 10–15% on T6–T7 without curriculum staging (Table 46). This universal failure mode—convergence to degenerate solutions that satisfy lower verification levels while ignoring topological correctness—is the paper’s central finding regarding curriculum necessity, and it...

work page
[21]

Pre-training data composition matters more than raw parameter count.Gemma-3-12B has 50% more parameters than Qwen3-8B (12.2B vs 8.2B) yet achieves lower RLVF-H-C TSR (52.8 vs 59.1 avg). This gap originates at the code generation baseline: on EvalPlus, Gemma-3-12B-Base scores 52.65 compared to ∼62 for Qwen3-8B-Base, consistent with Qwen3’s explicit STEM/co...

work page
[22]

repressor X inhibits promoter pX

Absolute performance differences are explained by code generation baselines.Table 47 summarizes the relationship between code generation baseline (EvalPlus) and GenCircuit-RL performance across all models. Several patterns are notable. First, the rank ordering by EvalPlus closely matches the rank ordering by RLVF-H-C TSR, with one exception: Gemma-3-12B (...

work page 2022
[23]

This phase ensures topological correctness (execution, validity, structural, semantic, and task-specific checks) but does not constrain quantitative behavior

Phase 1: Topology Generation (GenCircuit-RL).The trained GenCircuit-RL agent generates structurally valid genetic circuits from natural language specifications, verified through the 5-level hierarchical reward. This phase ensures topological correctness (execution, validity, structural, semantic, and task-specific checks) but does not constrain quantitati...

work page
[24]

The surrogate predicts basal expression, induced expression, and fold change from the circuit’s part composition

Phase 2: Quantitative Estimation (CLASSIC Surrogate).Generated circuits are encoded as one-hot composition vectors over their constituent genetic parts and scored by an MLP surrogate distilled from high-throughput experimental data and model weights from the CLASSIC platform (Rai et al., 2025). The surrogate predicts basal expression, induced expression, ...

work page 2025
[25]

Rep 1 high,Rep 2 low

Phase 3: RLAIF Refinement.The surrogate predictions serve as AI feedback for reward computation. Following AUTOCIRCUIT-RL’s iterative adaptation procedure, we apply reward-weighted sampling: at each iteration, a pool of circuits is scored, the top-k are selected as elite, and mutants of the elite (with exploration via fresh random samples) form the next g...

work page 2025