Recognition: 1 theorem link
· Lean TheoremSteerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution
Pith reviewed 2026-05-15 17:56 UTC · model grok-4.3
The pith
Parametric schemas co-evolve with an actor model to synthesize large-scale instruction-following coding data that lifts open models to match proprietary SOTA performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
IFCodeEvolve constructs a schema library that covers the instruction space via parametric function schemas and dynamic constraint instantiation. An MCTS sampler navigates this space with actor model feedback serving as the dynamic termination signal. A co-evolving paradigm then advances both the actor and the schema library through composition and mutation driven by sampler statistics, yielding data that significantly boosts base model performance on instruction-following coding tasks to the point that a 32B model achieves parity with proprietary state-of-the-art systems.
What carries the argument
The actor-parametric schema co-evolution framework, where instructions are encoded as parametric function schemas allowing dynamic constraint instantiation, navigated by MCTS with actor feedback and iteratively mutated for harder problems.
Load-bearing premise
That representing instructions as parametric function schemas with dynamic constraint instantiation produces logically compatible combinations of multiple constraints without introducing inconsistencies or biases.
What would settle it
Train an otherwise identical base model on data generated by this method versus standard coding data and measure whether instruction-following accuracy on multi-constraint problems shows no gain or exhibits logical errors in generated code.
read the original abstract
Interpreting and following human instructions is a critical capability of large language models (LLMs) in automatic programming. However, synthesizing large-scale instruction-paired coding data remains largely unexplored and is particularly challenging when ensuring logical compatibility among multiple constraints. In this study, we propose IFCodeEvolve, an actor-schema co-evolution framework for instruction following coding data generation. By representing instructions as parametric function schema, we construct a library that covers the vast instruction space via dynamic constraint instantiation. Building upon this, Monte Carlo Tree Search (MCTS) sampler is applied to efficiently navigate this space, utilizing actor model feedback as a dynamic termination signal. Furthermore, to progressively explore challenging problems, we introduce a co-evolving paradigm that iteratively advances both the actor model and the schema library, via schema composition and mutation, based on sampler statistics. Empirical results demonstrate that IFCodeEvolve significantly boosts base model performance, with our 32B model achieving parity with proprietary SOTA models. Additionally, we contribute IFCodeBench, a comprehensive human-verified benchmark equipped with solutions and robust AST-based verification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes IFCodeEvolve, an actor-schema co-evolution framework for synthesizing large-scale instruction-following coding data. Instructions are represented as parametric function schemas with dynamic constraint instantiation to cover the instruction space; MCTS sampling uses actor-model feedback as a termination signal, and a co-evolution loop iteratively refines both the actor and the schema library via composition and mutation based on sampler statistics. The central empirical claim is that fine-tuning on the resulting data significantly boosts base-model performance, with the authors' 32B model reaching parity with proprietary SOTA models; the paper also contributes the human-verified IFCodeBench benchmark with AST-based verification.
Significance. If the performance claims are substantiated, the work would be significant for the field of LLM-based code generation: it directly tackles the open problem of scalable, logically consistent instruction-paired data synthesis and demonstrates that open 32B models can match closed SOTA systems on instruction following. The introduction of a verified benchmark with robust AST checking would also provide a reusable resource for future research.
major comments (2)
- [Abstract / Experimental Results] Abstract and Experimental Results section: the claim that the 32B model achieves parity with proprietary SOTA models is presented without any description of the experimental setup, baselines, evaluation metrics, number of runs, or error bars. This information is load-bearing for the central performance claim and must be supplied before the result can be assessed.
- [Method] Method section (schema co-evolution and dynamic constraint instantiation): the construction does not include an explicit check or proof that simultaneously instantiated constraints remain satisfiable. The termination signal comes only from downstream actor utility; without an upstream consistency verifier, it is possible that a non-negligible fraction of the synthetic pairs contain incompatible constraints, which would undermine the attribution of gains to the steerable synthesis method.
minor comments (1)
- [Abstract] The abstract refers to 'IFCodeBench' but provides no citation or pointer to its release location or exact composition; this should be added for reproducibility.
Simulated Author's Rebuttal
Thank you for your valuable feedback on our manuscript. We address the major comments point by point below, agreeing that additional details and checks are warranted, and will update the paper accordingly.
read point-by-point responses
-
Referee: [Abstract / Experimental Results] Abstract and Experimental Results section: the claim that the 32B model achieves parity with proprietary SOTA models is presented without any description of the experimental setup, baselines, evaluation metrics, number of runs, or error bars. This information is load-bearing for the central performance claim and must be supplied before the result can be assessed.
Authors: We fully agree with this observation. The current manuscript provides insufficient detail on the experimental protocol supporting the performance claims. In the revised version, we will include a dedicated subsection in the Experimental Results that describes the full setup: the models compared (including specific proprietary SOTA systems), the evaluation metrics used on IFCodeBench, the number of independent runs performed, and statistical measures such as standard deviations or error bars. This will allow proper assessment of the claim that the 32B model reaches parity with closed-source SOTA. revision: yes
-
Referee: [Method] Method section (schema co-evolution and dynamic constraint instantiation): the construction does not include an explicit check or proof that simultaneously instantiated constraints remain satisfiable. The termination signal comes only from downstream actor utility; without an upstream consistency verifier, it is possible that a non-negligible fraction of the synthetic pairs contain incompatible constraints, which would undermine the attribution of gains to the steerable synthesis method.
Authors: This is a valid concern. Although the actor feedback in MCTS serves as a practical filter by assigning low utility to unsatisfiable or inconsistent cases, an explicit upstream verification would enhance reliability. We will revise the Method section to incorporate a consistency verification step during dynamic constraint instantiation. This verifier will check for logical compatibility of simultaneously instantiated constraints using a simple satisfiability solver or rule-based checks before proceeding with sampling. We believe this addition will strengthen the attribution of performance gains to the proposed synthesis method. revision: yes
Circularity Check
No circularity in empirical synthesis and evaluation pipeline
full rationale
The paper describes an empirical data-generation pipeline (parametric schemas + MCTS + actor feedback + co-evolution) whose performance claims rest on downstream fine-tuning results and a separately contributed human-verified benchmark (IFCodeBench) with AST verification. No equations, predictions, or uniqueness claims are shown to reduce by construction to fitted inputs, self-citations, or ansatzes imported from prior author work. The central result is therefore an observed empirical outcome rather than a self-referential derivation.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
representing instructions as parametric function schema... MCTS sampler... actor-schema co-evolution... proof-by-construction instantiation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Program Synthesis with Large Language Models
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models.arXiv preprint arXiv:2108.07732, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[3]
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models.Journal of Machine Learning Research, 25(70):1–53, 2024
work page 2024
-
[4]
Kaustubh Deshpande, Ved Sirdeshmukh, Johannes Baptist Mols, Lifeng Jin, Ed-Yeremai Hernandez-Cardona, Dean Lee, Jeremy Kritz, Willow E Primack, Summer Yue, and Chen Xing. Multichallenge: A realistic multi-turn conversation evaluation benchmark challenging to frontier llms. InFindings of the Association for Computational Linguistics: ACL 2025, pages 18632–...
work page 2025
-
[5]
Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, and Jingren Zhou. Self-play with execution feedback: Improving instruction-following capabilities of large language models.arXiv preprint arXiv:2406.13542, 2024
-
[6]
Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation
Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation. arXiv preprint arXiv:2308.01861, 2023
-
[7]
Yuanqi Du, Botao Yu, Tianyu Liu, Tony Shen, Junwu Chen, Jan G Rittig, Kunyang Sun, Yikun Zhang, Zhangde Song, Bo Zhou, et al. Accelerating scientific discovery with autonomous goal-evolving agents.arXiv preprint arXiv:2512.21782, 2025
-
[8]
Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
Automated Design of Agentic Systems
Shengran Hu, Cong Lu, and Jeff Clune. Automated design of agentic systems.arXiv preprint arXiv:2408.08435, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, and Dong Yu. R-zero: Self-evolving reasoning llm from zero data.arXiv preprint arXiv:2508.05004, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Followbench: A multi-level fine-grained constraints following benchmark for large language models
Yuxin Jiang, Yufei Wang, Xingshan Zeng, Wanjun Zhong, Liangyou Li, Fei Mi, Lifeng Shang, Xin Jiang, Qun Liu, and Wei Wang. Followbench: A multi-level fine-grained constraints following benchmark for large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4667–4688, 2024
work page 2024
-
[14]
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
Highly accurate protein structure prediction with alphafold
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. nature, 596(7873):583–589, 2021
work page 2021
-
[16]
Competition-level code generation with alphacode.Science, 378 (6624):1092–1097, 2022
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. Competition-level code generation with alphacode.Science, 378 (6624):1092–1097, 2022. 11
work page 2022
-
[17]
Se-agent: Self-evolution trajectory optimization in multi-step reasoning with llm-based agents
Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, et al. Se-agent: Self-evolution trajectory optimization in multi-step reasoning with llm-based agents. arXiv preprint arXiv:2508.02085, 2025
-
[18]
A comprehensive survey on instruction following.arXiv preprint arXiv:2303.10475, 1, 2023
Renze Lou, Kai Zhang, and Wenpeng Yin. A comprehensive survey on instruction following.arXiv preprint arXiv:2303.10475, 1, 2023
-
[19]
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems, 36:46534–46594, 2023
work page 2023
-
[20]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Alexander Novikov, Ngân V˜ u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
Training language models to follow instructions with human feedback
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advancesin neural information processing systems, 35:27730–27744, 2022
work page 2022
-
[22]
Generalizing verifiable instruction following.arXiv preprint arXiv:2507.02833, 2025
Valentina Pyatkin, Saumya Malik, Victoria Graf, Hamish Ivison, Shengyi Huang, Pradeep Dasigi, Nathan Lambert, and Hannaneh Hajishirzi. Generalizing verifiable instruction following.arXiv preprint arXiv:2507.02833, 2025
-
[23]
Seed-coder: Let the code model curate data for itself.arXiv preprint arXiv:2506.03524, 2025
ByteDance Seed, Yuyu Zhang, Jing Su, Yifan Sun, Chenguang Xi, Xia Xiao, Shen Zheng, Anxiang Zhang, Kaibo Liu, Daoguang Zan, et al. Seed-coder: Let the code model curate data for itself.arXiv preprint arXiv:2506.03524, 2025
-
[24]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Mastering the game of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016
work page 2016
-
[26]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[27]
Self-instruct: Aligning language models with self-generated instructions
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), pages 13484–13508, 2023
work page 2023
-
[28]
Yunhui Xia, Wei Shen, Yan Wang, Jason Klein Liu, Huifeng Sun, Siyue Wu, Jian Hu, and Xiaolong Xu. Leetcodedataset: A temporal dataset for robust evaluation and efficient training of code llms.arXiv preprint arXiv:2504.14655, 2025
-
[29]
Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.arXiv preprint arXiv:2405.14333, 2024
-
[30]
Bfs-prover: Scalable best-first tree search for llm-based automatic theorem proving
Ran Xin, Chenguang Xi, Jie Yang, Feng Chen, Hang Wu, Xia Xiao, Yifan Sun, Shen Zheng, and Ming Ding. Bfs-prover: Scalable best-first tree search for llm-based automatic theorem proving. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 32588–32599, 2025
work page 2025
-
[31]
Wizardlm: Empowering large pre-trained language models to follow complex instructions
Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, Qingwei Lin, and Daxin Jiang. Wizardlm: Empowering large pre-trained language models to follow complex instructions. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[32]
Kaiwen Yan, Hongcheng Guo, Xuanqing Shi, Jingyi Xu, Yaonan Gu, and Zhoujun Li. Codeif: Benchmarking the instruction-following capabilities of large language models for code generation.arXiv preprint arXiv:2502.19166, 2025
-
[33]
Large language models as optimizers
Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe TwelfthInternational Conference on Learning Representations, 2023. 12
work page 2023
-
[34]
Ifevalcode: Controlled code generation.arXiv preprint arXiv:2507.22462, 2025
Jian Yang, Wei Zhang, Shukai Liu, Linzheng Chai, Yingshui Tan, Jiaheng Liu, Ge Zhang, Wangchunshu Zhou, Guanglin Niu, Zhoujun Li, et al. Ifevalcode: Controlled code generation.arXiv preprint arXiv:2507.22462, 2025
-
[35]
Optimizing generative ai by backpropagating language model feedback.Nature, 639(8055):609–616, 2025
Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Pan Lu, Zhi Huang, Carlos Guestrin, and James Zou. Optimizing generative ai by backpropagating language model feedback.Nature, 639(8055):609–616, 2025
work page 2025
-
[36]
Multi-swe-bench: A multilingual benchmark for issue resolving,
Daoguang Zan, Zhirong Huang, Wei Liu, Hanwu Chen, Linhao Zhang, Shulin Xin, Lu Chen, Qi Liu, Xiaojian Zhong, Aoyan Li, et al. Multi-swe-bench: A multilingual benchmark for issue resolving.arXiv preprint arXiv:2504.02605, 2025
-
[37]
Star: Self-taught reasoner bootstrapping reasoning with reasoning
Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah D Goodman. Star: Self-taught reasoner bootstrapping reasoning with reasoning. InProc. the 36th International Conference on Neural Information Processing Systems, volume 1126, 2024
work page 2024
-
[38]
A study of smoothing methods for language models applied to ad hoc information retrieval
Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. InAcm sigir forum, volume 51, pages 268–276. ACM New York, NY, USA, 2017
work page 2017
-
[39]
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, et al. Aflow: Automating agentic workflow generation.arXiv preprint arXiv:2410.10762, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[40]
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, and Gao Huang. Absolute zero: Reinforced self-play reasoning with zero data.arXiv preprint arXiv:2505.03335, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Instruction-Following Evaluation for Large Language Models
Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models.arXiv preprint arXiv:2311.07911, 2023. 13 Appendix A Limitation A limitation of our current study is its focus on Python, chosen for its dominance in algorithmic reasoning benchmarks and comp...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
It is forbidden to use the built-in ’max’ function in your code
-
[44]
The code must make use of list comprehension. # Example Input and Output Example Input 1: [(’Juan Whelan’,90),(’Sabah Colley’,88),(’Peter Nichols’,7),(’Juan Whelan’,122),(’Sabah Colley’,84)] Example Output 1: (’Juan Whelan’, 212) Example Input 2: [(’Juan Whelan’,10),(’Sabah Colley’,20),(’Peter Nichols’,30),(’Juan Whelan’,40),(’Sabah Colley’,50)] Example O...
-
[45]
Develop the solution using Python
-
[46]
Define a variable called found and set its initial value to False in your code
-
[47]
Utilize the set data structure in your implementation
-
[48]
Include a switch (or match/case) statement within the code
-
[49]
Make sure to import the math library in your code. # Example Input and Output Example 1: Input: 25 Output: True (since 3²+ 4²= 25) Example 2: Input: 24 Output: False # Function Signature def sum_Square(n): pass # Solution import math def sum_Square(n): found = False squares = set() i = 1 while i <= math.isqrt(n): square = i * i squares.add(square) remaind...
-
[51]
Ensure every variable name in your code adheres to the snake_case naming convention
-
[52]
Do not import the math library in your code
-
[53]
Make use of list comprehension in your implementation
-
[54]
Avoid using any for loops in your code
-
[55]
Incorporate the set data structure in your solution
-
[56]
Include exactly one while loop in your code. # Example Input and Output Example Input 1: 12 Example Output 1: True Example Input 2: 15 Example Output 2: False # Function Signature def check_abundant(n): pass # Solution def check_abundant(n): divisors = set() i = 1 while i * i <= n: if n % i == 0: divisors.add(i) if i != n // i: divisors.add(n // i) i += 1...
-
[57]
Implement the solution using Python
-
[58]
The code must include exactly one list comprehension and one while loop
-
[59]
Ensure your code does not define a variable with the name "index"
-
[60]
It is mandatory to incorporate a switch (or match/case) statement in the code. 19
-
[61]
The solution must make use of a generator expression
-
[62]
You must employ the built-in ‘collections.Counter‘ function
-
[63]
A global variable named ‘palindrome_check_enabled‘ must be defined in the code
-
[64]
Full type annotations (type hints) are required for every function and variable
-
[65]
code" Output: false Example 2: Input: s =
The built-in ‘sum‘ function is strictly prohibited from use in the solution. # Example Input and Output Example 1: Input: s = "code" Output: false Example 2: Input: s = "aab" Output: true Example 3: Input: s = "carerac" Output: true # Function Signature def canPermutePalindrome(s: str) -> bool: pass # Solution from collections import Counter from typing i...
-
[66]
‘<question>’: The original problem description, existing instructions, the reference solution (Code), programming language, and language
-
[67]
‘<Mutations>’: A list of ‘<Mutation>’ tags. Each mutation is a template for a new instruction and a list of ‘<params>’ required to instantiate that template. # Workflow (Step-by-Step) Before generating the final XML output, you must perform the following reasoning steps inside a ‘<thought>’ tag:
-
[68]
Analyze the Original Code: - Understand the algorithm, complexity, and existing instructions of the seed code
-
[69]
Strategic Parameter Selection and Applicability Check: - Iterate through **each** provided ‘<Mutation>’ in the list. - For each candidate, evaluate two criteria: - **Compatibility**: Does this mutation make sense for this problem and the current instructions? - **Challenge Level**: How much does this force a refactor? 26 - **Selection Strategy**: - Discar...
-
[70]
Parameter Instantiation: - For the selected mutation, look at its ‘<params>’. - If ‘<Mutation>/<params>’ is empty, skip parameter selection and proceed directly to conflict detection. The instruction text is fixed. - If ‘<Mutation>/<params>’ is not empty: - **Maximize Challenge**: Choose parameter options that contradict the *current* implementation (e.g....
-
[71]
Conflict Detection: - Ensure the selected mutation and its parameters do not contradict the original ‘<instruction>’. - Such as ’Your code must utilize exactly 1 list comprehension.’ and ’Your code must not use any for loops.’, which are incompatible with each other
-
[72]
- The modified code must produce the exact same output for the same inputs as the original code
Refactor Code (Only if Successful): - If successful, rewrite the reference code to **strictly adhere** to the new instantiated constraint. - The modified code must produce the exact same output for the same inputs as the original code
-
[73]
- If ‘<success>false</success>’, **STOP** after closing the tag
Synthesize Output: - Generate the ‘<success>’ tag first. - If ‘<success>false</success>’, **STOP** after closing the tag. Do not generate params or question. - If ‘<success>true</success>’, generate ‘<instantiated_params>’ and the modified ‘<question>’. - If the input ‘<Mutation>/<params>’ was empty, the tag <instantiated_params> must also be empty (i.e.,...
-
[74]
- D e c i s i o n : Se le cte d M uta ti on ID [ X ] because
E v a l u a t i o n : - M uta ti on ID 1 : [ C o m p a t i b i l i t y : Yes / No ] | [ C h a l l e n g e : Low / Med / High ] | [ R e a s o n i n g ] - M uta ti on ID 2 : ... - D e c i s i o n : Se le cte d M uta ti on ID [ X ] because
-
[75]
P a r a m e t e r S e l e c t i o n : [ R e a s o n i n g for chosen params ]
-
[76]
R e f a c t o r i n g S t r a t e g y : [ How the code will change ]] ] > </ thought > < success >[ true / false ] </ success > < i n s t a n t i a t e d _ p a r a m s > < param > < name >[ P a r a m e t e r Name , e . g . , v a r i a b l e _ n a m e ] </ name > < value >[ Sel ec te d Value , e . g . , t o t a l _ s c o r e ] </ value > </ param > </ i n ...
-
[77]
- You must explicitly justify this subsumption
Subsumption - Any program that satisfies the EVOLVED instruction MUST also satisfy the ORIGINAL instruction. - You must explicitly justify this subsumption. - The evolved instruction MUST introduce at most ONE new AST-level constraint beyond those already enforced by the original instruction. - The evolved instruction shouldn’t be overly complicated
-
[78]
28 - These failures must arise from a GENERAL, STRUCTURAL, AST-level constraint
Examples-as-Negative - ALL provided example programs MUST FAIL the EVOLVED instruction. 28 - These failures must arise from a GENERAL, STRUCTURAL, AST-level constraint. - You MUST NOT reference specific identifiers, literals, or fingerprints unique to the examples
-
[79]
- No runtime execution, no I/O, no performance, no semantic reasoning
AST-Checkability - The evolved instruction MUST be checkable using static AST analysis. - No runtime execution, no I/O, no performance, no semantic reasoning
-
[80]
Generalization - The evolved instruction MUST generalize beyond the given examples. - You are FORBIDDEN from writing constraints that merely exclude the examples without structural meaning. You MUST NOT: - Mention or encode specific variable names, constants, or literals that appear only in the examples. - Refer to line counts, whitespace, formatting, or ...
-
[81]
Use synonyms, change sentence structures, and vary the tone to express the exact same requirement
**Rephrase**: This is your main tool. Use synonyms, change sentence structures, and vary the tone to express the exact same requirement
-
[82]
**Combine**: Merge two or multiple instructions into a single instruction
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.