Felix Book, Arne Traue, Maximilian Schenke, Barnabas Haucke-Korber, and Oliver Wallscheid

Bonnet, Cl · 2023 · arXiv 2306.09884

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.

Gymnasium: A Standard Interface for Reinforcement Learning Environments

cs.LG · 2024-07-24 · accept · novelty 5.0

Gymnasium establishes a standardized API for RL environments to improve interoperability, reproducibility, and ease of development in reinforcement learning.

citing papers explorer

Showing 3 of 3 citing papers.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 54
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation cs.LG · 2026-05-13 · unverdicted · none · ref 3
CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.
Gymnasium: A Standard Interface for Reinforcement Learning Environments cs.LG · 2024-07-24 · accept · none · ref 4
Gymnasium establishes a standardized API for RL environments to improve interoperability, reproducibility, and ease of development in reinforcement learning.

Felix Book, Arne Traue, Maximilian Schenke, Barnabas Haucke-Korber, and Oliver Wallscheid

fields

years

verdicts

representative citing papers

citing papers explorer