Recognition: unknown
Social Bias in LLM-Generated Code: Benchmark and Mitigation
Pith reviewed 2026-05-09 19:30 UTC · model grok-4.3
The pith
The Fairness Monitor Agent reduces social bias in LLM-generated code by 65.1 percent while improving functional correctness from 75.80 to 83.97 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Social bias reaches up to 60.58 percent in code generated by leading LLMs across seven demographic dimensions. Standard prompt interventions and diffuse fairness instructions often worsen outcomes. The Fairness Monitor Agent, inserted into any existing pipeline, first scopes which attributes the code should ignore or restrict, then detects and repairs violations without needing an executable test suite. Across all 343 tasks this reduces bias by 65.1 percent relative to a baseline developer agent and raises the rate of functionally correct code from 75.80 percent to 83.97 percent, outperforming the other approaches tested.
What carries the argument
The Fairness Monitor Agent, a plug-in module that extracts demographic attributes to consider or restrict from the task description, then performs iterative detection and correction of violations in generated code.
If this is right
- Bias reduction is possible by adding a targeted review step without changing the base model or generation process.
- Explicit fairness instructions given to every agent role increase bias compared with giving none.
- Early definition of which demographic attributes the code must ignore produces better fairness than later or distributed instructions.
- The method works without access to executable test cases or runtime verification.
Where Pith is reading between the lines
- A similar scoped-review component could be added to pipelines that generate other artifacts such as documentation or test cases.
- Separating fairness enforcement into one dedicated role may generalize better than embedding fairness prompts throughout an entire workflow.
- The same scoping logic could be applied to other constraints such as security or performance rules that are hard to test automatically.
Load-bearing premise
The 343 tasks and the Code Bias Score together capture the social biases that matter in real deployed software.
What would settle it
Applying the Fairness Monitor Agent to an independent set of production coding tasks and finding that bias scores do not drop by a comparable margin or that functional correctness does not increase.
Figures
read the original abstract
Large Language Models (LLMs) are increasingly deployed to generate code for human-centered applications where demographic fairness is critical. However, existing evaluations focus almost exclusively on functional correctness, leaving social bias in LLM-generated code largely unexamined. Extending our prior work on Solar, we conduct a comprehensive empirical study using SocialBias-Bench, a benchmark of 343 real-world coding tasks spanning seven demographic dimensions. We evaluate four prominent LLMs and find severe bias across all models, with Code Bias Scores reaching up to 60.58%. We further show that standard prompt-level interventions, such as Chain-of-Thought reasoning and fairness persona assignment, inadvertently amplify bias rather than reduce it. We then investigate whether structured multi-agent software process frameworks can improve fairness, finding that structured pipelines reduce bias when early roles correctly scope what the code should and should not consider. However, adding explicit fairness instructions to all agent roles produces worse outcomes than providing none, suggesting that diffused responsibility goes unaddressed. To address these limitations, we propose the Fairness Monitor Agent (FMA), a modular component that plugs into any existing code generation pipeline without modifying it. FMA analyzes the task description to determine which attributes should be considered or restricted, then detects and corrects violations through an iterative review process, without requiring an executable test suite. Evaluated on all 343 tasks, FMA reduces bias by 65.1% compared to a developer agent alone and improves functional correctness from 75.80% to 83.97%, outperforming all other studied approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SocialBias-Bench, a benchmark of 343 real-world coding tasks spanning seven demographic dimensions, to evaluate social bias in code generated by four prominent LLMs. It reports high bias levels (Code Bias Scores up to 60.58%), shows that standard prompt interventions like Chain-of-Thought and fairness personas can amplify bias, and finds that structured multi-agent pipelines reduce bias only when early roles properly scope demographic considerations. The authors propose the Fairness Monitor Agent (FMA), a modular plug-in component that analyzes tasks, detects violations, and corrects them iteratively without needing test suites; on the full benchmark, FMA reduces bias by 65.1% relative to a baseline developer agent while raising functional correctness from 75.80% to 83.97%.
Significance. If the benchmark and metric prove valid, the work is significant for software engineering and responsible AI: it fills a gap in fairness evaluation for code generation, demonstrates counter-intuitive effects of common interventions, and supplies a practical, non-intrusive mitigation that integrates with existing pipelines. The dual improvement in fairness and correctness is a notable strength.
major comments (2)
- [§3 and §4] §3 (SocialBias-Bench construction) and §4 (Code Bias Score definition): All headline quantitative results (65.1% bias reduction, correctness lift from 75.80% to 83.97%) rest exclusively on the Code Bias Score applied to the 343 tasks. The manuscript reports no inter-rater agreement with human fairness judgments, no coverage analysis across the seven demographic dimensions, and no comparison against existing bias benchmarks; without such validation the deltas could be artifacts of the chosen metric rather than evidence of improved real-world fairness.
- [§5] §5 (FMA evaluation): The claim that FMA outperforms all other studied approaches is load-bearing for the contribution, yet the text does not clarify whether the comparison agents received equivalent prompting resources or iteration budgets, nor does it report statistical significance tests or variance across the 343 tasks; this weakens the superiority conclusion.
minor comments (2)
- [Abstract] The abstract references 'extending our prior work on Solar' but the manuscript does not supply a citation or brief summary of that work, which would help readers situate the current contribution.
- [§3] Notation for the seven demographic dimensions is introduced without an explicit table or figure summarizing their distribution in the 343 tasks; adding this would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3 and §4] §3 (SocialBias-Bench construction) and §4 (Code Bias Score definition): All headline quantitative results (65.1% bias reduction, correctness lift from 75.80% to 83.97%) rest exclusively on the Code Bias Score applied to the 343 tasks. The manuscript reports no inter-rater agreement with human fairness judgments, no coverage analysis across the seven demographic dimensions, and no comparison against existing bias benchmarks; without such validation the deltas could be artifacts of the chosen metric rather than evidence of improved real-world fairness.
Authors: We agree that further validation would strengthen the work. SocialBias-Bench consists of 343 manually curated tasks drawn from real-world coding scenarios across seven demographic dimensions, and the Code Bias Score quantifies inappropriate demographic references in generated code. We will add a coverage analysis table showing task distribution across dimensions and a new subsection relating the metric to prior fairness benchmarks in NLP and SE. However, a full inter-rater agreement study with human judges is not feasible within the current scope. The observed deltas align with patterns in the curated tasks rather than metric artifacts, but we accept the need for these additions. revision: partial
-
Referee: [§5] §5 (FMA evaluation): The claim that FMA outperforms all other studied approaches is load-bearing for the contribution, yet the text does not clarify whether the comparison agents received equivalent prompting resources or iteration budgets, nor does it report statistical significance tests or variance across the 343 tasks; this weakens the superiority conclusion.
Authors: We thank the referee for highlighting this. All agents in the comparisons (baseline developer agent, Chain-of-Thought, persona-based, and other multi-agent pipelines) received equivalent prompting structures and iteration budgets, as specified in the experimental protocol. We will revise §5 to explicitly document this equivalence and add statistical significance tests (e.g., paired Wilcoxon tests) together with variance measures (standard deviation of bias and correctness scores across the 343 tasks). These changes will be presented in updated tables and text. revision: yes
- A complete inter-rater agreement study involving human fairness judgments on the 343 tasks would require new large-scale annotation that exceeds the resources and scope of the present manuscript.
Circularity Check
Minor self-citation to prior Solar work; empirical evaluation otherwise self-contained
full rationale
This is an empirical study that introduces SocialBias-Bench (343 tasks) and the FMA agent, then reports measured bias reduction and correctness gains on that benchmark. No equations, fitted parameters, or derivations exist that could reduce to inputs by construction. The single self-reference to prior Solar work appears only as scene-setting and is not invoked to justify uniqueness, forbid alternatives, or carry any quantitative claim. All headline numbers (65.1% bias reduction, 75.80% to 83.97% correctness) are direct empirical outputs on the newly defined benchmark and metric, which is the normal, non-circular pattern for benchmark papers.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The Code Bias Score is an appropriate and sufficient metric for quantifying social bias in generated code.
- domain assumption The 343 tasks in SocialBias-Bench represent a comprehensive and representative sample of real-world coding scenarios involving demographic considerations.
invented entities (1)
-
Fairness Monitor Agent (FMA)
no independent evidence
Forward citations
Cited by 4 Pith papers
-
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair
HEJ-Robust benchmark shows LLM-based program repair models drop over 50% in accuracy when buggy code is rewritten with equivalent syntax.
-
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair
LLM-based Java program repair models lose over 50% of their bug-fixing success rate when presented with equivalent but syntactically varied buggy code.
-
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation
Many reported failures in LLM-based code translation are false negatives due to evaluation pipeline issues such as improper compilation flags, missing library links, and unconfigured runtime environments rather than i...
-
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation
A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint , note=
Codegen: An open large language model for code with multi-turn program synthesis , author=. arXiv preprint , note=
-
[2]
arXiv preprint , note=
Program synthesis with large language models , author=. arXiv preprint , note=
-
[3]
and Sarro, Federica and Harman, Mark , title =
Chen, Zhenpeng and Zhang, Jie M. and Sarro, Federica and Harman, Mark , title =. 2024 , booktitle =
2024
-
[4]
Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining , pages=
Algorithmic decision making and the cost of fairness , author=. Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining , pages=
-
[5]
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Upstream mitigation is not all you need: Testing the bias transfer hypothesis in pre-trained language models , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[6]
arXiv preprint , note=
On measures of biases and harms in NLP , author=. arXiv preprint , note=
-
[7]
Political Research Quarterly , volume=
Religious stereotyping and voter support for evangelical candidates , author=. Political Research Quarterly , volume=. 2009 , publisher=
2009
-
[8]
arXiv preprint , note=
StereoSet: Measuring stereotypical bias in pretrained language models , author=. arXiv preprint , note=
-
[9]
arXiv preprint , note=
Gender bias in coreference resolution: Evaluation and debiasing methods , author=. arXiv preprint , note=
-
[10]
arXiv preprint , note=
CrowS-pairs: A challenge dataset for measuring social biases in masked language models , author=. arXiv preprint , note=
-
[11]
arXiv preprint , note=
FairBench: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models , author=. arXiv preprint , note=
-
[12]
arXiv preprint , note=
GPTBIAS: A comprehensive framework for evaluating bias in large language models , author=. arXiv preprint , note=
-
[13]
arXiv preprint , note=
An empirical survey of the effectiveness of debiasing techniques for pre-trained language models , author=. arXiv preprint , note=
-
[14]
arXiv preprint , note=
Bias and fairness in large language models: A survey , author=. arXiv preprint , note=
-
[15]
Advances in neural information processing systems , volume=
Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models , author=. Advances in neural information processing systems , volume=
-
[16]
Proceedings of The ACM Collective Intelligence Conference , pages=
Gender bias and stereotypes in large language models , author=. Proceedings of The ACM Collective Intelligence Conference , pages=
-
[17]
International Conference on Machine Learning , pages=
Towards understanding and mitigating social biases in language models , author=. International Conference on Machine Learning , pages=. 2021 , organization=
2021
-
[18]
arXiv preprint , note=
Unified detoxifying and debiasing in language generation via inference-time adaptive optimization , author=. arXiv preprint , note=
-
[19]
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) , pages=
A taxonomy of bias-causing ambiguities in machine translation , author=. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) , pages=
-
[20]
Annual review of psychology , volume=
Gender stereotypes , author=. Annual review of psychology , volume=. 2018 , publisher=
2018
-
[21]
Proceedings of the 2018 chi conference on human factors in computing systems , pages=
Addressing age-related bias in sentiment analysis , author=. Proceedings of the 2018 chi conference on human factors in computing systems , pages=
2018
-
[22]
arXiv preprint , note=
Does gender matter? towards fairness in dialogue systems , author=. arXiv preprint , note=
-
[23]
arXiv preprint , note=
Beyond accuracy: Behavioral testing of NLP models with CheckList , author=. arXiv preprint , note=
-
[24]
Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=
Biasasker: Measuring the bias in conversational ai system , author=. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages=
-
[25]
ACM Transactions on Software Engineering and Methodology , volume=
TESTSGD: Interpretable testing of neural networks against subtle group discrimination , author=. ACM Transactions on Software Engineering and Methodology , volume=. 2023 , publisher=
2023
-
[26]
Proceedings of the 2017 11th Joint meeting on foundations of software engineering , pages=
Fairness testing: testing software for discrimination , author=. Proceedings of the 2017 11th Joint meeting on foundations of software engineering , pages=
2017
-
[27]
arXiv preprint , note=
BBQ: A hand-built bias benchmark for question answering , author=. arXiv preprint , note=
-
[28]
Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
Do neural ranking models intensify gender bias? , author=. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
-
[29]
Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages=
Bold: Dataset and metrics for measuring biases in open-ended language generation , author=. Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages=
2021
-
[30]
PloS one , volume=
Hate speech detection and racial bias mitigation in social media based on BERT model , author=. PloS one , volume=. 2020 , publisher=
2020
-
[31]
Proceedings of the 57th annual meeting of the association for computational linguistics , pages=
The risk of racial bias in hate speech detection , author=. Proceedings of the 57th annual meeting of the association for computational linguistics , pages=
-
[32]
Advances in Neural Information Processing Systems , volume=
Uncovering and quantifying social biases in code generation , author=. Advances in Neural Information Processing Systems , volume=
-
[33]
2023 , journal=
Bias Testing and Mitigation in LLM-based Code Generation , author=. 2023 , journal=
2023
-
[34]
arXiv preprint , note=
StarCoder: may the source be with you! , author=. arXiv preprint , note=
-
[35]
arXiv preprint , note=
Code llama: Open foundation models for code , author=. arXiv preprint , note=
-
[36]
arXiv preprint , note=
Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x , author=. arXiv preprint , note=
-
[37]
arXiv preprint , note=
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering , author=. arXiv preprint , note=
-
[38]
arXiv preprint , note=
Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT , author=. arXiv preprint , note=
-
[39]
2021 , eprint=
Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=
2021
-
[40]
arXiv preprint , note=
Piloting copilot and codex: Hot temperature, cold prompts, or black magic? , author=. arXiv preprint , note=
-
[41]
arXiv preprint , note=
Metamorphic testing: a new approach for generating next test cases , author=. arXiv preprint , note=
-
[42]
arXiv preprint , note=
Towards controllable biases in language generation , author=. arXiv preprint , note=
-
[43]
Knowledge-based systems , volume=
Textx: a python tool for domain-specific languages implementation , author=. Knowledge-based systems , volume=. 2017 , publisher=
2017
-
[44]
arXiv preprint , note=
Llama guard: Llm-based input-output safeguard for human-ai conversations , author=. arXiv preprint , note=
-
[45]
2024 , howpublished =
Wikipedia , title =. 2024 , howpublished =
2024
-
[46]
2022 , howpublished =
OpenAI , title =. 2022 , howpublished =
2022
-
[47]
2023 , howpublished =
Google , title =. 2023 , howpublished =
2023
-
[48]
2024 , howpublished =
Meta , title =. 2024 , howpublished =
2024
-
[49]
2024 , howpublished =
Anthropic , title =. 2024 , howpublished =
2024
-
[50]
arXiv preprint , note=
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents , author=. arXiv preprint , note=
-
[51]
2024 , eprint=
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation , author=. 2024 , eprint=
2024
-
[52]
2024 , eprint=
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step , author=. 2024 , eprint=
2024
-
[53]
2024 , eprint=
Self-collaboration Code Generation via ChatGPT , author=. 2024 , eprint=
2024
-
[54]
2024 , eprint=
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework , author=. 2024 , eprint=
2024
-
[55]
2024 , eprint=
ChatDev: Communicative Agents for Software Development , author=. 2024 , eprint=
2024
-
[56]
Proceedings of the AAAI conference on artificial intelligence , volume=
Bias unveiled: Investigating social bias in LLM-generated code , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
-
[57]
2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=
Soen-101: Code generation by emulating software process models using large language model agents , author=. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=. 2025 , organization=
2025
-
[58]
and Wallach, Hanna and Cotterell, Ryan , title =
Zmigrod, Ran and Mielke, Sabrina J. and Wallach, Hanna and Cotterell, Ryan , title =. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages =. 2019 , address =
2019
-
[59]
and Saligrama, Venkatesh and Kalai, Adam Tauman , title =
Bolukbasi, Tolga and Chang, Kai-Wei and Zou, James Y. and Saligrama, Venkatesh and Kalai, Adam Tauman , title =. Advances in Neural Information Processing Systems 29 (NeurIPS 2016) , pages =
2016
-
[60]
Proceedings of the 34th International Conference on Software Engineering (ICSE 2012) , pages =
Le Goues, Claire and Dewey-Vogt, Michael and Forrest, Stephanie and Weimer, Westley , title =. Proceedings of the 34th International Conference on Software Engineering (ICSE 2012) , pages =. 2012 , publisher =
2012
-
[61]
Proceedings of the 38th International Conference on Software Engineering (ICSE 2016) , pages =
Mechtaev, Sergey and Yi, Jooyong and Roychoudhury, Abhik , title =. Proceedings of the 38th International Conference on Software Engineering (ICSE 2016) , pages =. 2016 , publisher =
2016
-
[62]
Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024) , pages =
Xia, Chunqiu Steven and Zhang, Lingming , title =. Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024) , pages =. 2024 , publisher =
2024
-
[63]
arXiv preprint , note=
Xia, Chunqiu Steven and Deng, Yinlin and Dunn, Soren and Zhang, Lingming , title =. arXiv preprint , note=
-
[64]
Advances in Neural Information Processing Systems 36 (NeurIPS 2023) , year =
Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Welleck, Sean and Majumder, Bodhisattwa Prasad and Gupta, Shashank and Yazdanbakhsh, Amir and Clark, Peter , title =. Advances in Neural Information Processing Systems 36 (Neu...
2023
-
[65]
Advances in Neural Information Processing Systems 36 (NeurIPS 2023) , year =
Shinn, Noah and Cassano, Federico and Berman, Edward and Gopinath, Ashwin and Narasimhan, Karthik and Yao, Shunyu , title =. Advances in Neural Information Processing Systems 36 (NeurIPS 2023) , year =
2023
-
[66]
Ieee transactions on software engineering , volume=
Genprog: A generic method for automatic software repair , author=. Ieee transactions on software engineering , volume=. 2011 , publisher=
2011
-
[67]
Proceedings of the 43rd IEEE Symposium on Security and Privacy (SP 2022) , pages =
Pearce, Hammond and Ahmad, Baleegh and Tan, Benjamin and Dolan-Gavitt, Brendan and Karri, Ramesh , title =. Proceedings of the 43rd IEEE Symposium on Security and Privacy (SP 2022) , pages =. 2022 , publisher =
2022
-
[68]
Evaluating the Code Quality of
Yeti. Evaluating the Code Quality of. arXiv preprint , note=
-
[69]
Siddiq, Mohammed Latif and Santos, Joanna C. S. , title =. Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S 2022) , pages =. 2022 , publisher =
2022
-
[70]
Tony, Catherine and Mutas, Markus and Ferreyra, Nicolas E. D. Proceedings of the 2023 IEEE/ACM International Conference on Mining Software Repositories (MSR 2023) , pages =. 2023 , publisher =
2023
-
[71]
arXiv preprint , note=
A multi-language perspective on the robustness of LLM code generation , author=. arXiv preprint , note=
-
[72]
arXiv preprint , note=
Specification-Driven Code Translation Powered by Large Language Models: How Far Are We? , author=. arXiv preprint , note=
-
[73]
arXiv preprint , note=
BabelCoder: Agentic Code Translation with Specification Alignment , author=. arXiv preprint , note=
-
[74]
arXiv preprint , note=
Secure-Instruct: An Automated Pipeline for Synthesizing Instruction-Tuning Datasets Using LLMs for Secure Code Generation , author=. arXiv preprint , note=
-
[75]
Empirical Software Engineering , volume=
An exploratory study on fine-tuning large language models for secure code generation , author=. Empirical Software Engineering , volume=. 2026 , publisher=
2026
-
[76]
arXiv preprint , note=
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation , author=. arXiv preprint , note=
-
[77]
arXiv preprint , note=
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair , author=. arXiv preprint , note=
-
[78]
https://docs.anthropic.com/en/docs/about-claude/models, accessed: 2024-06-20
Anthropic (2024) Claude models. https://docs.anthropic.com/en/docs/about-claude/models, accessed: 2024-06-20
2024
-
[81]
In: Advances in Neural Information Processing Systems 29 (NeurIPS 2016), pp 4349--4357
Bolukbasi T, Chang KW, Zou JY, Saligrama V, Kalai AT (2016) Man is to computer programmer as woman is to homemaker? D ebiasing word embeddings. In: Advances in Neural Information Processing Systems 29 (NeurIPS 2016), pp 4349--4357
2016
-
[82]
Evaluating Large Language Models Trained on Code
Chen M, Tworek J, Jun H, Yuan Q, de Oliveira Pinto HP, Kaplan J, Edwards H, Burda Y, Joseph N, Brockman G, Ray A, Puri R, Krueger G, Petrov M, Khlaaf H, Sastry G, Mishkin P, Chan B, Gray S, Ryder N, Pavlov M, Power A, Kaiser L, Bavarian M, Winter C, Tillet P, Such FP, Cummings D, Plappert M, Chantzis F, Barnes E, Herbert-Voss A, Guss WH, Nichol A, Paino A...
work page internal anchor Pith review Pith/arXiv arXiv 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.