Progressive Generalization Augmentation with Deeply Coupled RND-PPO and Domain-Prioritized Noise Injection for Robust Crop Management Reinforcement Learning

Wu Yang

arxiv: 2605.17428 · v1 · pith:FJ4UBNEDnew · submitted 2026-05-17 · 💻 cs.LG · cs.AI

Progressive Generalization Augmentation with Deeply Coupled RND-PPO and Domain-Prioritized Noise Injection for Robust Crop Management Reinforcement Learning

Wu Yang This is my paper

Pith reviewed 2026-05-20 13:45 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords reinforcement learningcrop managementprogressive generalizationnoise injectionPPOagricultural simulationrobustnessyield optimization

0 comments

The pith

A three-phase curriculum with coupled rewards and prioritized noise injection builds more robust RL policies for crop irrigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the performance drop in reinforcement learning agents for maize irrigation when they encounter temperature noise and other real-world measurement errors after clean training. It introduces a progressive schedule that starts with clean data, gradually adds perturbations, and ends with full augmentation, while tightly integrating intrinsic exploration rewards with the main task reward and directing noise toward the most sensitive state variables. A sympathetic reader would care because this targets a key barrier to deploying RL in agriculture, where sensors are imperfect and conditions change. Experiments in two simulated locations report yield gains over a prior method along with much higher retention when perturbations are applied at test time.

Core claim

The central claim is that Progressive Generalization Augmentation through a three-phase curriculum (clean episodes 0-800, progressive 800-1200, full augmentation 1200-2000), a deeply coupled RND-PPO architecture using dual-channel GAE normalization, progress-decayed intrinsic coefficients, and semantic discretization, together with hierarchical domain-prioritized noise injection, produces policies that improve yield and nitrogen use efficiency while retaining 94.4 percent performance under combined perturbations versus 80 percent for baselines.

What carries the argument

Progressive Generalization Augmentation curriculum paired with deeply coupled RND-PPO and domain-prioritized noise injection, which phases in perturbations, balances intrinsic and extrinsic signals, and targets noise to agriculturally sensitive variables.

If this is right

Policies achieve 8.43 percent higher yield and 16.42 percent better nitrogen use efficiency than BERT-DQN in Florida simulations.
Yield rises 5.61 percent in the Zaragoza location even though economic score is lower in that Mediterranean climate.
Performance retention reaches 94.4 percent versus 80 percent for standard approaches when combined perturbations are introduced at evaluation time.
All runs use five random seeds, 2000 episodes, and a 2048-step buffer on A100 GPUs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same phased curriculum and prioritized noise could be tested in other sensor-heavy RL domains such as robotic greenhouse control where variable sensitivities are known in advance.
Varying the activation hierarchy of the noise injection across additional crops or weather regimes might identify which state variables drive generalization most strongly.
Combining the method with direct sim-to-real transfer techniques could shorten the path from simulation results to field deployment.

Load-bearing premise

The temperature noise and state sensitivities simulated in gym-DSSAT match the differential measurement errors that appear in actual farm deployments, and the three-phase schedule transfers to locations beyond the two tested.

What would settle it

Apply the trained policy to real sensor data from a physical farm plot and measure whether the yield and efficiency gains over baseline RL policies appear at the magnitudes reported in simulation.

read the original abstract

Our preliminary experiments on gym-DSSAT maize irrigation tasks revealed that +/-2 degrees C temperature noise causes an 11.9% reduction in economic returns for PPO policies trained under clean conditions - a systematic robustness deficit that existing research has not adequately addressed. This paper tackles three interconnected limitations impeding practical deployment of agricultural RL systems: the trade-off between early-stage learning efficiency and late-stage generalization capability; the naive additive combination of intrinsic and extrinsic rewards in exploration-augmented PPO; and uniform measurement noise injection strategies that disregard empirically validated differential sensitivity across agricultural state variables. We introduce three systematic innovations: Progressive Generalization Augmentation (PGA) implementing a three-phase curriculum (clean training 0-800 episodes, progressive 800-1200, full augmentation 1200-2000); a deeply coupled RND-PPO architecture with dual-channel GAE normalization, progress-decayed intrinsic coefficients, and semantic discretization; and domain-prioritized noise injection with hierarchical activation. Our experimental evaluation demonstrates: 8.43% yield improvement and 16.42% nitrogen use efficiency improvement over SOTA BERT-DQN in Florida; 5.61% yield improvement in Zaragoza (though 3.67% lower economic score due to challenging Mediterranean climate); and 94.4% vs 80.0% performance retention under combined perturbations. All experiments used 5 random seeds on NVIDIA A100 GPUs with 4.2+/-0.3 hours per run (2000 episodes, 2048-step buffer, 64 mini-batch size).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a reasonable pipeline for noise-robust crop RL in simulation and reports specific gains over BERT-DQN, but the retention numbers rest on an unvalidated gym-DSSAT noise model.

read the letter

This paper's main takeaway is that a three-phase curriculum plus a coupled RND-PPO setup and prioritized noise can lift retention under simulated perturbations from 80% to 94.4% while also showing yield and nitrogen-use gains over BERT-DQN in two locations. The integration looks like a practical engineering response to the robustness gap they measured in clean PPO runs. What is actually new is the explicit three-phase PGA schedule, the dual-channel GAE normalization inside the RND-PPO coupling, and the hierarchical activation that weights noise by domain-known state sensitivities rather than adding it uniformly. Those choices directly target the three limitations they list, and the experimental section gives concrete numbers plus five random seeds and run-time details, which is better than many RL papers. The work is honest about the preliminary finding that temperature noise alone cuts returns by 11.9%. The soft spots are proportionate to the claims. No error bars or statistical tests appear in the reported results, so it is hard to judge whether the 8.43% yield lift is reliable. More critically, the stress-test concern holds: the headline retention figure depends on the injected noise distributions matching real differential sensor errors in Florida and Zaragoza fields, yet the paper supplies no calibration data or field comparison to support that match. The phase boundaries and progress-decayed coefficients are also free parameters whose selection process is not shown. This paper is for researchers working on agricultural RL or sim-to-real transfer who need concrete ideas for handling noisy state variables. A reader already building similar systems could extract usable architecture details. I would send it for peer review; the core methods are grounded enough to deserve a full check on reproducibility and external validation.

Referee Report

3 major / 2 minor

Summary. The paper introduces Progressive Generalization Augmentation (PGA) via a three-phase curriculum (clean training 0-800 episodes, progressive 800-1200, full augmentation 1200-2000), a deeply coupled RND-PPO architecture with dual-channel GAE normalization, progress-decayed intrinsic coefficients, and semantic discretization, plus domain-prioritized noise injection with hierarchical activation. It reports 8.43% yield and 16.42% nitrogen use efficiency gains over BERT-DQN in Florida, 5.61% yield gain in Zaragoza, and 94.4% vs. 80.0% performance retention under combined perturbations in gym-DSSAT maize irrigation tasks, using 5 random seeds.

Significance. If the robustness and generalization claims hold beyond the simulation, the work could meaningfully advance RL deployment in agriculture by mitigating sensitivity to measurement noise and improving sample efficiency across training phases. The explicit reporting of runtimes, seeds, and hardware supports reproducibility, and the focus on differential state-variable sensitivities addresses a practical gap in prior RL-for-crop work.

major comments (3)

[Experimental evaluation] The 94.4% vs. 80.0% retention claim under combined perturbations (abstract) is load-bearing for the robustness contribution, yet the manuscript provides no error bars, confidence intervals, or statistical tests despite using 5 random seeds; without these, it is impossible to determine whether the reported gap exceeds run-to-run variance.
[Domain-prioritized noise injection] Domain-prioritized noise injection (abstract and methods): the +/-2°C temperature noise that drops clean PPO returns by 11.9% and the hierarchical activation schedule are central to the generalization claims, but the paper contains no comparison of these injected distributions or state-variable sensitivities against empirical sensor calibration or variance data from Florida or Zaragoza field deployments.
[Progressive Generalization Augmentation] Three-phase PGA curriculum (abstract): the specific episode boundaries (0-800, 800-1200, 1200-2000) and progress-decayed intrinsic coefficients are presented as key innovations, yet no ablation or sensitivity analysis is shown to justify these choices or demonstrate transferability beyond the two tested locations.

minor comments (2)

[Abstract] The abstract states a 3.67% lower economic score in Zaragoza without clarifying whether this is statistically significant or how it affects the overall claim of practical utility.
[Deeply coupled RND-PPO] Notation for the dual-channel GAE normalization and semantic discretization is introduced without an accompanying equation or pseudocode block, making the deeply coupled RND-PPO architecture difficult to reimplement precisely.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each of the major comments point by point below, providing clarifications and committing to specific revisions where appropriate to enhance the experimental rigor and justification of our methods.

read point-by-point responses

Referee: [Experimental evaluation] The 94.4% vs. 80.0% retention claim under combined perturbations (abstract) is load-bearing for the robustness contribution, yet the manuscript provides no error bars, confidence intervals, or statistical tests despite using 5 random seeds; without these, it is impossible to determine whether the reported gap exceeds run-to-run variance.

Authors: We agree that the inclusion of error bars and statistical analysis is essential for substantiating the robustness claims. Although results are averaged over 5 random seeds, the manuscript does not explicitly report variability measures or significance tests. In the revised version, we will add error bars representing standard deviation across seeds to all relevant figures and tables. Additionally, we will perform and report statistical tests, such as independent t-tests, to determine if the difference in performance retention (94.4% vs. 80.0%) is statistically significant. This will allow readers to better assess the reliability of the reported gap beyond run-to-run variance. revision: yes
Referee: [Domain-prioritized noise injection] Domain-prioritized noise injection (abstract and methods): the +/-2°C temperature noise that drops clean PPO returns by 11.9% and the hierarchical activation schedule are central to the generalization claims, but the paper contains no comparison of these injected distributions or state-variable sensitivities against empirical sensor calibration or variance data from Florida or Zaragoza field deployments.

Authors: The +/-2°C temperature perturbation was selected following preliminary experiments indicating an 11.9% drop in economic returns for standard PPO under clean training. The domain-prioritized approach and hierarchical activation were motivated by known differential sensitivities of crop state variables (e.g., temperature vs. soil moisture) in agricultural modeling. We acknowledge that the current manuscript does not include a direct mapping or comparison to empirical sensor variance data from the specific Florida and Zaragoza sites. To address this, we will expand the methods section with a discussion relating our noise levels to typical sensor accuracies reported in agricultural literature and clarify the simulation-based nature of the study. We note that obtaining and integrating site-specific field calibration data would require additional resources and is planned for future work. revision: partial
Referee: [Progressive Generalization Augmentation] Three-phase PGA curriculum (abstract): the specific episode boundaries (0-800, 800-1200, 1200-2000) and progress-decayed intrinsic coefficients are presented as key innovations, yet no ablation or sensitivity analysis is shown to justify these choices or demonstrate transferability beyond the two tested locations.

Authors: The three-phase structure and associated hyperparameters were determined through iterative preliminary experiments aimed at optimizing the trade-off between sample efficiency in early training and generalization in later stages. We concur that providing ablation studies would strengthen the justification and show transferability. In the revised manuscript, we will include an ablation analysis varying the phase boundaries and decay rates, presenting comparative results on yield, efficiency, and robustness metrics for the Florida and Zaragoza environments. This will demonstrate the sensitivity of performance to these choices and support their applicability. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural proposals and experimental comparisons are independent of internal fits or self-referential definitions

full rationale

The paper proposes three engineering innovations (PGA three-phase curriculum, deeply coupled RND-PPO with dual-channel GAE and progress-decayed coefficients, domain-prioritized noise injection) motivated by observed limitations in preliminary gym-DSSAT runs, then validates them through direct comparisons to external baselines such as BERT-DQN on yield, nitrogen efficiency, and perturbation retention metrics. No equations are presented that define a quantity in terms of itself, no fitted parameters are relabeled as predictions on the same data, and no uniqueness theorems or ansatzes are imported via self-citation. The reported gains (8.43% yield, 94.4% retention) rest on simulator-based evaluation against independent SOTA methods rather than any reduction to the authors' own prior outputs or internal tuning loops, rendering the derivation chain self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard RL assumptions plus domain-specific choices about noise sensitivity and curriculum timing that are not independently validated in the provided abstract.

free parameters (2)

Curriculum phase boundaries
0-800 clean, 800-1200 progressive, 1200-2000 full augmentation episodes chosen to trade early efficiency against late generalization.
Progress-decayed intrinsic coefficients
Coefficients that decay the weight of RND reward over training; values not stated in abstract.

axioms (1)

domain assumption Agricultural state variables exhibit empirically validated differential sensitivity to measurement noise.
Invoked to justify hierarchical activation in domain-prioritized noise injection.

pith-pipeline@v0.9.0 · 5820 in / 1234 out tokens · 47702 ms · 2026-05-20T13:45:54.261466+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

three-phase curriculum (clean training 0-800 episodes, progressive 800-1200, full augmentation 1200-2000); domain-prioritized noise injection with hierarchical activation (temperature at α > 0.3, rainfall at α > 0.5, soil moisture at α > 0.7)
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

empirical sensitivity rankings: temperature noise −11.9%, rainfall −7.1%, soil moisture <1%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 2 internal anchors

[1]

A comparative study of deep reinforce- ment learning for crop production management,

J. Balderas, D. Chen, Y . Huang, L. Wang, and R.-C. Li, “A comparative study of deep reinforce- ment learning for crop production management,”arXiv:2411.04106, 2024

work page arXiv 2024
[2]

Curriculum learning,

Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProc. ICML, 2009, pp. 41–48

work page 2009
[3]

Large-scale study of curiosity-driven learning,

Y . Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell, and A. A. Efros, “Large-scale study of curiosity-driven learning,” inProc. ICLR, 2019

work page 2019
[4]

Integrating reinforcement learning and large language models for crop production process management optimization and control through a new knowledge-based deep learning paradigm,

D. Chen and Y . Huang, “Integrating reinforcement learning and large language models for crop production process management optimization and control through a new knowledge-based deep learning paradigm,”arXiv:2410.09680, 2024

work page arXiv 2024
[5]

Reinforcement learning for crop management support: Review, prospects and challenges,

R. Gautron, O.-A. Maillard, P. Preux, M. Corbeels, and R. Sabbadin, “Reinforcement learning for crop management support: Review, prospects and challenges,”Comput. Electron. Agric., vol. 200, p. 107182, 2022

work page 2022
[6]

gym-DSSAT: A crop man- agement turned into a gym environment,

R. Gautron, E. J. Padrón, P. Preux, J. Corbeels, and O.-A. Maillard, “gym-DSSAT: A crop man- agement turned into a gym environment,” inProc. AAAI Spring Symposium Series, 2022

work page 2022
[7]

The DSSAT cropping system model,

J. W. Jones, G. Hoogenboom, C. H. Porter, K. J. Boote, W. D. Batchelor, L. A. Hunt, P. W. Wilkens, U. Singh, A. J. Gijsman, and J. T. Ritchie, “The DSSAT cropping system model,”Eur . J. Agron., vol. 18, nos. 3–4, pp. 235–265, 2003

work page 2003
[8]

Brief history of agricultural systems modeling,

J. W. Jones, J. Antle, B. Basso, K. Boote, J. Conant, I. Foster, A. J. Gijsman, C. H. Porter, M. E. P. G–rtner, L. R. Koo, J. L. Monteith, R. C. Ogoshi, A. C. Ruane, J. Sala, T. S. Sinclair, J. White, and G. Hoogenboom, “Brief history of agricultural systems modeling,”Agric. Syst., vol. 155, pp. 240–254, 2017

work page 2017
[9]

Deep drone acrobatics,

E. Kaufmann, A. Loquercio, R. Ranftl, M. Müller, V . Koltun, and D. Scaramuzza, “Deep drone acrobatics,”Robotics: Science and Systems (RSS), 2020

work page 2020
[10]

DSSAT modeling to improve irrigation and nitrogen man- agement in Mediterranean conditions,

W. Malik, R. Dechmi, and J. Cavero, “DSSAT modeling to improve irrigation and nitrogen man- agement in Mediterranean conditions,”Agric. Water Manag., vol. 213, pp. 298–311, 2019

work page 2019
[11]

Sim-to-real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” inProc. IEEE ICRA, 2018, pp. 1–8

work page 2018
[12]

High-dimensional continuous control using generalized advantage estimation,

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” inProc. ICLR, 2016

work page 2016
[13]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Optimizing irrigation and nitrogen fertiliza- tion for winter wheat production using the DSSAT-CERES-Wheat model,

F. Tao, B. Liu, Z. Liu, W. Liu, Q. Feng, and J. Zhang, “Optimizing irrigation and nitrogen fertiliza- tion for winter wheat production using the DSSAT-CERES-Wheat model,”Agric. Water Manag., vol. 262, p. 107420, 2022

work page 2022
[15]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inProc. IEEE/RSJ IROS, 2017, pp. 23–30

work page 2017
[16]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and efficient foundation language models,”arXiv:2302.13971, 2023. 13

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

New knowledge-based deep learning Paradigm: Integrating reinforcement learning and large language models for crop production process management optimization and control,

W. Yang, “New knowledge-based deep learning Paradigm: Integrating reinforcement learning and large language models for crop production process management optimization and control,” arXiv:2408.12056, 2024. 14

work page arXiv 2024

[1] [1]

A comparative study of deep reinforce- ment learning for crop production management,

J. Balderas, D. Chen, Y . Huang, L. Wang, and R.-C. Li, “A comparative study of deep reinforce- ment learning for crop production management,”arXiv:2411.04106, 2024

work page arXiv 2024

[2] [2]

Curriculum learning,

Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProc. ICML, 2009, pp. 41–48

work page 2009

[3] [3]

Large-scale study of curiosity-driven learning,

Y . Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell, and A. A. Efros, “Large-scale study of curiosity-driven learning,” inProc. ICLR, 2019

work page 2019

[4] [4]

Integrating reinforcement learning and large language models for crop production process management optimization and control through a new knowledge-based deep learning paradigm,

D. Chen and Y . Huang, “Integrating reinforcement learning and large language models for crop production process management optimization and control through a new knowledge-based deep learning paradigm,”arXiv:2410.09680, 2024

work page arXiv 2024

[5] [5]

Reinforcement learning for crop management support: Review, prospects and challenges,

R. Gautron, O.-A. Maillard, P. Preux, M. Corbeels, and R. Sabbadin, “Reinforcement learning for crop management support: Review, prospects and challenges,”Comput. Electron. Agric., vol. 200, p. 107182, 2022

work page 2022

[6] [6]

gym-DSSAT: A crop man- agement turned into a gym environment,

R. Gautron, E. J. Padrón, P. Preux, J. Corbeels, and O.-A. Maillard, “gym-DSSAT: A crop man- agement turned into a gym environment,” inProc. AAAI Spring Symposium Series, 2022

work page 2022

[7] [7]

The DSSAT cropping system model,

J. W. Jones, G. Hoogenboom, C. H. Porter, K. J. Boote, W. D. Batchelor, L. A. Hunt, P. W. Wilkens, U. Singh, A. J. Gijsman, and J. T. Ritchie, “The DSSAT cropping system model,”Eur . J. Agron., vol. 18, nos. 3–4, pp. 235–265, 2003

work page 2003

[8] [8]

Brief history of agricultural systems modeling,

J. W. Jones, J. Antle, B. Basso, K. Boote, J. Conant, I. Foster, A. J. Gijsman, C. H. Porter, M. E. P. G–rtner, L. R. Koo, J. L. Monteith, R. C. Ogoshi, A. C. Ruane, J. Sala, T. S. Sinclair, J. White, and G. Hoogenboom, “Brief history of agricultural systems modeling,”Agric. Syst., vol. 155, pp. 240–254, 2017

work page 2017

[9] [9]

Deep drone acrobatics,

E. Kaufmann, A. Loquercio, R. Ranftl, M. Müller, V . Koltun, and D. Scaramuzza, “Deep drone acrobatics,”Robotics: Science and Systems (RSS), 2020

work page 2020

[10] [10]

DSSAT modeling to improve irrigation and nitrogen man- agement in Mediterranean conditions,

W. Malik, R. Dechmi, and J. Cavero, “DSSAT modeling to improve irrigation and nitrogen man- agement in Mediterranean conditions,”Agric. Water Manag., vol. 213, pp. 298–311, 2019

work page 2019

[11] [11]

Sim-to-real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” inProc. IEEE ICRA, 2018, pp. 1–8

work page 2018

[12] [12]

High-dimensional continuous control using generalized advantage estimation,

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” inProc. ICLR, 2016

work page 2016

[13] [13]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Optimizing irrigation and nitrogen fertiliza- tion for winter wheat production using the DSSAT-CERES-Wheat model,

F. Tao, B. Liu, Z. Liu, W. Liu, Q. Feng, and J. Zhang, “Optimizing irrigation and nitrogen fertiliza- tion for winter wheat production using the DSSAT-CERES-Wheat model,”Agric. Water Manag., vol. 262, p. 107420, 2022

work page 2022

[15] [15]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inProc. IEEE/RSJ IROS, 2017, pp. 23–30

work page 2017

[16] [16]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and efficient foundation language models,”arXiv:2302.13971, 2023. 13

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

New knowledge-based deep learning Paradigm: Integrating reinforcement learning and large language models for crop production process management optimization and control,

W. Yang, “New knowledge-based deep learning Paradigm: Integrating reinforcement learning and large language models for crop production process management optimization and control,” arXiv:2408.12056, 2024. 14

work page arXiv 2024