Purely Agent-Driven Black-Box Optimization for Biological Design
Pith reviewed 2026-05-16 09:34 UTC · model grok-4.3
The pith
A hierarchical system of LLMs optimizes biological designs through language-based reasoning over literature and constraints.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PABLO is a hierarchical agentic system that uses LLMs pretrained on chemistry and biology literature to generate and iteratively refine biological candidates, casting the optimization as a language-based reasoning process rather than a structure-centered search; this yields state-of-the-art results on GuacaMol and peptide benchmarks while maintaining competitive token usage and enabling direct use of semantic task descriptions, retrieval-augmented knowledge, and complex constraints.
What carries the argument
The hierarchical agentic system of scientific LLMs that generates, evaluates, and refines candidates through language reasoning, retrieval, and iterative refinement.
Load-bearing premise
That LLMs pretrained on scientific literature can reliably generate chemically valid and synthesizable candidates without external verification beyond the black-box objective.
What would settle it
A controlled run on GuacaMol or a peptide task in which PABLO produces a majority of chemically invalid structures that cannot be synthesized or scored, resulting in no net improvement over baselines.
Figures
read the original abstract
Many key challenges in biological design -- such as small-molecule drug discovery, antimicrobial peptide development, and protein engineering -- can be framed as black-box optimization over vast, complex structured spaces. Existing methods rely mainly on raw structural data and struggle to exploit the rich scientific literature. While large language models (LLMs) have been added to these pipelines, they have been confined to narrow roles within structure-centered optimizers. We instead cast biological black-box optimization as an agent-driven, language-based reasoning process. We introduce Purely Agent-driven BLack-box Optimization (PABLO), a hierarchical agentic system that uses scientific LLMs pretrained on chemistry and biology literature to generate and iteratively refine biological candidates. On both the standard GuacaMol molecular design and antimicrobial peptide optimization tasks, PABLO achieves state-of-the-art performance, substantially improving sample efficiency and final objective values over established baselines. Compared to prior optimization methods that incorporate LLMs, PABLO achieves competitive token usage per run despite relying on LLMs throughout the optimization loop. Beyond raw performance, the agentic formulation offers key advantages for realistic design: it naturally incorporates semantic task descriptions, retrieval-augmented domain knowledge, and complex constraints. In follow-up in vitro validation, PABLO-optimized peptides showed strong activity against drug-resistant pathogens, underscoring the practical potential of PABLO for therapeutic discovery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PABLO, a hierarchical agentic system that casts biological black-box optimization (molecular design, antimicrobial peptide optimization) as a language-based reasoning process using scientific LLMs pretrained on chemistry and biology literature. It claims state-of-the-art performance on standard GuacaMol and peptide tasks with substantially improved sample efficiency and final objective values over baselines, competitive token usage, natural incorporation of semantic constraints, and successful in vitro validation of optimized peptides against drug-resistant pathogens.
Significance. If the performance and validity claims hold after detailed verification, PABLO would demonstrate that purely agent-driven LLM reasoning can outperform structure-centered optimizers on public benchmarks while enabling more realistic design workflows that embed domain knowledge and constraints directly in the loop.
major comments (2)
- [Experimental evaluation] Experimental evaluation section (GuacaMol and AMP tasks): the SOTA claims on sample efficiency and final objective values are presented without baseline implementation details, statistical significance tests, ablation studies on hierarchy depth/iteration budget, or explicit handling of invalid SMILES/sequences; this is load-bearing because GuacaMol penalizes invalids at the oracle and the reported efficiency gains rest on the unverified assumption that LLM agents produce valid candidates without external verification oracles.
- [Methods] Methods section on the hierarchical agentic loop: no explicit mechanism, prompt engineering, or post-generation filter is described to enforce chemical validity and synthesizability; the paper relies solely on pretrained LLM reasoning, yet the central efficiency advantage over prior LLM-augmented methods depends on this assumption holding without wasting black-box calls on non-candidates.
minor comments (2)
- [Abstract and Results] The abstract states 'competitive token usage per run' but the main text provides no quantitative token or API-call comparison table against the cited prior LLM methods.
- [In vitro validation] In vitro validation paragraph lacks details on the number of candidates synthesized, controls, or activity metrics relative to the optimization trajectory.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We have revised the paper to address the concerns about experimental details, statistical rigor, and validity enforcement in both the evaluation and methods sections. These changes clarify our implementation and strengthen the claims regarding PABLO's performance and efficiency.
read point-by-point responses
-
Referee: [Experimental evaluation] Experimental evaluation section (GuacaMol and AMP tasks): the SOTA claims on sample efficiency and final objective values are presented without baseline implementation details, statistical significance tests, ablation studies on hierarchy depth/iteration budget, or explicit handling of invalid SMILES/sequences; this is load-bearing because GuacaMol penalizes invalids at the oracle and the reported efficiency gains rest on the unverified assumption that LLM agents produce valid candidates without external verification oracles.
Authors: We agree that additional details are required to support the SOTA claims. In the revised manuscript, we have expanded the Experimental Evaluation section with: (i) full baseline implementation details including code repositories, hyperparameter settings, and re-implementation notes for each comparator; (ii) statistical significance testing (paired t-tests and Wilcoxon signed-rank tests across 5 independent runs with reported p-values); (iii) new ablation studies on hierarchy depth (single-level vs. full hierarchical) and iteration budget, shown in an additional figure and table; and (iv) explicit validity handling, including reported validity rates (>96% for GuacaMol runs) and the addition of an RDKit-based pre-oracle validation step that rejects invalid SMILES before any black-box evaluation. These revisions confirm that efficiency gains arise from valid candidates and not from unpenalized invalids. revision: yes
-
Referee: [Methods] Methods section on the hierarchical agentic loop: no explicit mechanism, prompt engineering, or post-generation filter is described to enforce chemical validity and synthesizability; the paper relies solely on pretrained LLM reasoning, yet the central efficiency advantage over prior LLM-augmented methods depends on this assumption holding without wasting black-box calls on non-candidates.
Authors: We acknowledge that the original Methods description was insufficiently explicit. The revised version now includes: (i) the complete prompt templates for each agent in the hierarchy, with explicit instructions for chemical validity and synthesizability drawn from the scientific literature; (ii) a description of the post-generation filter that applies RDKit parsing and basic synthesizability heuristics (e.g., valence checks, absence of unstable motifs) to reject invalid outputs and trigger regeneration within the agent loop; and (iii) pseudocode for the full hierarchical iteration that shows how only validated candidates proceed to the oracle. This mechanism ensures no black-box calls are wasted on non-candidates and directly supports the reported sample-efficiency advantage. Example prompts and filter code are provided in the supplementary material. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes PABLO as a hierarchical agentic LLM system for black-box biological optimization and reports performance on external public benchmarks (GuacaMol molecular design and antimicrobial peptide tasks) whose objective functions and validity rules are defined independently of the authors. No mathematical derivations, equations, fitted parameters presented as predictions, or self-citations appear in the provided text as load-bearing steps that reduce the central claims to inputs by construction. The method is presented procedurally with advantages over baselines measured against fixed external oracles.
Axiom & Free-Parameter Ledger
free parameters (1)
- Agent hierarchy depth and iteration budget
axioms (1)
- domain assumption LLMs pretrained on chemistry and biology literature can generate chemically valid and biologically relevant molecular or peptide candidates.
Reference graph
Works this paper leans on
-
[1]
Curran Associates Inc. Zhou, Z., Kearnes, S., Li, L., Zare, R. N., and Riley, P. Op- timization of molecules via deep reinforcement learning. Scientific Reports, 9(1), July 2019. ISSN 2045-2322. Zou, Y ., Cheng, A. H., Aldossary, A., Bai, J., Leong, S. X., Campos-Gonzalez-Angulo, J. A., Choi, C., Ser, C. T., Tom, G., Wang, A., Zhang, Z., Yakavets, I., Hao...
work page 2019
-
[2]
**Analyze the MOLECULE-SCORE DATA: ** What molecular features correlate with high scores? Form 2-3 hypotheses about what the scoring function rewards
-
[3]
**Generate:** Propose 10-20 NEW molecules that: - Push your hypotheses to their LOGICAL EXTREME for maximum scores - Combine best features from multiple top scorers - Explore creative new structural ideas ## OUTPUT FORMAT Return ONLY a JSON object with a list of VALID SMILES strings called ’candidates’. Example: {"candidates": ["SMILES 1", "SMILES 2", "SM...
-
[4]
**Analyze the PEPTIDE-MIC DATA: ** What sequence features correlate with low MIC? Form 2-3 hypotheses about what makes an effective AMP (e.g., cationic charge, amphipathicity, hydrophobic content, length, specific motifs)
-
[5]
**Generate:** Propose 10-20 NEW antimicrobial peptides that: - Push your hypotheses to their LOGICAL EXTREME for minimum MIC - Combine best features from multiple top performers - Explore creative new sequence ideas using what you know about AMPs ## OUTPUT FORMAT Return ONLY a JSON object with a list of VALID peptide sequences called 29 Purely Agentic Bla...
-
[7]
Keep the core scaffold mostly intact. Molecules - Default Task 2 (EXPLORE): TASK: Generate SMILES with different meaningful structural changes to the input. HINTS:
-
[8]
Each output should be a distinct modification type (ring size, linker swap, substituent move)
-
[9]
Make significant moves, not minor tweaks
-
[10]
Explore broadly around the input. Molecules - Default Task 3: (SCAFFOLD HOP): TASK: Generate scaffold hopping variations of the input molecule. HINTS:
-
[11]
Make large topology-level changes (new ring systems, fusion patterns)
-
[12]
Avoid small local edits; make substantial core changes
-
[13]
Try: fused<->bridged<->spiro, cyclic<->polycyclic, aromatic<->non-aromatic cores. Peptides - Default Task 1 (SIMILAR): TASK: Generate peptides that are conservative variants of the input. HINTS:
-
[14]
Use similar amino acid substitutions (L<->I, D<->E, K<->R, F<->Y)
-
[15]
Preserve overall charge and hydrophobicity patterns
-
[16]
Keep modifications minimal (1{2 changes). 30 Purely Agentic Black-Box Optimization for Biological Design Peptides - Default Task 2 (EXPLORE): TASK: Generate peptides with meaningfully different modifications to the input. HINTS: 1. Try substitutions from different amino acid classes (polar<->hydrophobic, charged<->neutral)
-
[17]
Vary the length by adding or removing 1{3 residues
-
[18]
Each output should explore a different modification strategy. Peptides - Default Task 3: (SHUFFLE): TASK: Generate peptides by rearranging amino acids in the input. HINTS:
-
[21]
Try circular permutations (move N-terminal residues to C-terminus). I.2 Task Performance Statistics (performance stats). At initialization, the Planner Agent receives performance stats = "No performance data yet." . Since the Task Registry is updated online, in subsequent calls performance stats is a table of task success rates computed from the registry,...
-
[22]
**Study the Score Gradient: ** Compare molecules with SIMILAR scores. What small structural change caused one to score slightly higher than another? These small differences are highly informative
-
[23]
**High vs Low Contrast: ** What features appear in top scorers but not low scorers? (ring types, chain lengths, functional groups, heteroatoms, flexibility)
-
[24]
**Identify Gaps: ** What types of modifications have NOT been tried yet? What regions of chemical space remain unexplored? ## YOUR GOAL Generate task prompts that help a smaller LLM: - **EXPLOIT:** Make targeted modifications based on patterns you observe in the score gradient - **EXPLORE:** Try diverse, creative modifications to discover new promising re...
-
[27]
Preserve peripheral substituents." } ## GUIDELINES 32 Purely Agentic Black-Box Optimization for Biological Design
-
[34]
New task names: SHORT, DESCRIPTIVE, ALL CAPS (e.g., ATOM SWAP, STABILIZE, RIGIDIFY). ## CREATIVE EXPLORATION IDEAS Consider tasks involving: - Specific functional groups - Specific atoms - Specific ring modifications (aromatic<->aliphatic, 5-ring<->6-ring, fusion, spiro) - Chain modifications (extend, shorten, branch, cyclize) - Polarity changes (add pola...
-
[35]
**Study the Score Gradient: ** Compare peptides with SIMILAR MICs. What small sequence change caused one to have slightly lower MIC than another? These small differences are highly informative
-
[36]
**High vs Low Contrast: ** What features appear in top performers but not poor performers? (charge distribution, hydrophobic patches, length, specific motifs)
-
[37]
**Identify Gaps: ** What types of modifications have NOT been tried yet? What regions of sequence space remain unexplored? ## YOUR GOAL Generate task prompts that help a smaller LLM: - **EXPLOIT:** Make targeted modifications based on patterns you observe in the score gradient - **EXPLORE:** Try diverse, creative modifications to discover new promising re...
-
[38]
Replace neutral residues with K or R
-
[39]
Replace acidic residues (D, E) with neutral or basic ones
-
[40]
Add K or R at termini." } ## GUIDELINES
-
[41]
Output **8-10 tasks total ** - a mix of existing and new
-
[42]
Include 2-3 EXPLOITATION tasks (targeted at patterns you observed)
-
[43]
Include 2-3 EXPLORATION tasks (creative, untried modification types)
-
[44]
Include 2-4 reliable existing tasks that have (>0%) success rates
-
[45]
If a task 0 successes, avoid it or create an improved version (e.g., TASK NAME V2)
-
[46]
Keep new task descriptions concise (3-5 hints max)
-
[47]
New task names: SHORT, DESCRIPTIVE, ALL CAPS (e.g., CHARGE BOOST, HELIX FORM, TRUNCATE). ## CREATIVE EXPLORATION IDEAS FOR AMPs Consider tasks involving: - Charge modifications (increase/decrease cationic character) - Hydrophobicity changes (add/remove hydrophobic residues) - Secondary structure (promote helix, add proline kinks) - Length modifications (t...
-
[48]
Modify side chains, linkers, or substituents
-
[49]
Keep the core scaffold mostly intact. OUTPUT FORMAT (REQUIRED): Return ONLY a JSON object with a list of 5-10 SMILES strings called ’candidates’. Full Worker System Prompt Example - Peptides: You are an expert peptide generator operating in amino acid sequence space. INPUT: You will be given a single input peptide in the prompt. TASK: Generate peptides by...
-
[50]
Try swapping positions of residues
-
[51]
Try reversing short segments (3{5 residues)
-
[52]
Try circular permutations (move N-terminal residues to C-terminus). OUTPUT FORMAT (REQUIRED): Return ONLY a JSON object with a list of 5-10 peptide sequences called ’candidates’. J.2 Worker Generation-time Prompt. While the system prompt encodes the task description, the specific seed sequence or molecule is provided separately at generation time. For a c...
-
[53]
Insert/remove CH 2 units in aliphatic chains
-
[54]
Add/remove rotatable bonds near functional groups
-
[55]
Test both rigid (cycloalkyl) and flexible (alkoxy) linkers. Molecules Example 2 - Objective: adip - Task name: RING SIZE MOD, Task text: TASK: Adjust ring sizes in high-scoring scaffolds to explore conformational effects. HINTS:
-
[56]
Convert 5-membered rings to 6-membered (or vice versa)
-
[57]
Maintain aromaticity where possible
-
[58]
Ensure substituents are appropriately positioned. Molecules Example 3 - Objective: fexo - Task name: BRANCHING, Task text: TASK: Increase molecular branching in hydrocarbon chains. HINTS:
-
[59]
Add methyl branches to aliphatic chains
-
[60]
Create gem-dimethyl groups
-
[61]
Introduce cyclopropyl rings for rigidity. Molecules Example 4 - Objective: fexo - Task name: ATOM SW AP, Task text: TASK: Replace key carbon atoms with heteroatoms (N, O, S) in aliphatic rings and linkers. HINTS:
-
[62]
Prioritize substitutions that maintain ring size but alter electronic 36 Purely Agentic Black-Box Optimization for Biological Design properties
-
[63]
Test bioisosteric replacements (e.g., -CH2- --> -O- in linkers). Molecules Example 5 - Objective: med1 - Task name: SPIRO FUSE, Task text: TASK: Generate spiro-fused ring systems to explore novel conformational constraints. HINTS:
-
[64]
Identify adjacent rings separated by 1-2 atoms
-
[65]
Merge into spiro junctions (shared single atom)
-
[66]
Preserve peripheral substituents like isopropyl groups. Molecules Example 6 - Objective: med1 - Task name: RING EXPANSION V2, Task text: TASK: Expand non-aromatic rings from 5 to 6 members. HINTS:
-
[67]
Target rings adjacent to ketones
-
[68]
Use methylene insertion
-
[69]
Maintain bicyclic rigidity. Molecules Example 7 - Objective: med2 - Task name: QUINAZOLINONE KETONE SW APV3, Task text: TASK: Replace the quinazolinone ketone with thiazole or oxazole rings to alter electronic distribution and hydrogen bonding. HINTS:
-
[70]
Maintain planarity at the core interaction site
-
[71]
Ensure retention of key hydrogen bond acceptors
-
[72]
Test both 5-membered and 6-membered heterocycle replacements.", Molecules Example 8 - Objective: med2 - Task name: RING FUSION ENHANCE, Task text: TASK: Generate fused polycyclic variants by merging indole with adjacent aromatic rings through strategic bond formation. HINTS:
-
[73]
Create 6-5-6 tricyclic systems
-
[74]
Preserve indole’s NH while forming new ring junctions
-
[75]
Explore both angular and linear fusion patterns. Molecules Example 9 - Objective: osmb - Task name: CORE SW APBIOISOSTERE, Task text: TASK: Replace pyrimidine cores with bioisosteric heterocycles (e.g., triazine, pyridone, thiazine) while preserving substituent patterns. HINTS:
-
[76]
Match nitrogen positioning in new cores
-
[77]
Maintain planar aromaticity
-
[78]
Evaluate both 5- and 6-membered alternative cores. Molecules Example 10 - Objective: osmb - Task name: HYDROXYL POSITION, Task text: TASK: Systematically relocate hydroxyl groups between chain positions and ring substituents. 37 Purely Agentic Black-Box Optimization for Biological Design HINTS:
-
[79]
Compare terminal vs internal hydroxyl placement
-
[80]
Test hydroxyl migration to adjacent carbons
-
[81]
Consider diol formation in chains Molecules Example 11 - Objective: pdop - Task name: INDOLE BRANCHING, Task text: TASK: Add alkyl or functionalized branches to indole rings in the input molecule. HINTS:
-
[82]
Introduce methyl or hydroxyl groups at indole C4-C7 positions
-
[83]
Attach small polar groups (e.g., -CH2OH) to indole nitrogen
-
[84]
Preserve core indole hydrogen bonding capability. Molecules Example 12 - Objective: pdop - Task name: CHAIN MOD, Task text: TASK: Modify alkyl chain lengths and branching in linker regions (e.g., +1/-1 CH2, add methyl branches). HINTS:
-
[85]
Focus on chains between amide bonds
-
[86]
Test both elongation and shortening
-
[87]
Introduce branching near aromatic systems. Molecules Example 13 - Objective: rano - Task name: FLUORINE CHAIN OPT, Task text: TASK: Optimize fluorinated chain geometry by adjusting double bond positions and terminal fluorine placement. HINTS:
-
[88]
Shift F from terminal to penultimate position
-
[89]
Alternate E/Z configurations in conjugated system
-
[90]
Introduce cyclopropane into the chain for rigidity. Molecules Example 14 - Objective: rano - Task name: DOUBLE BOND MOD, Task text: TASK: Alter conjugated double bond systems. HINTS:
-
[91]
Shift /C=C/ positions closer to aromatic rings
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.