Recognition: unknown
When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation
Pith reviewed 2026-05-10 16:44 UTC · model grok-4.3
The pith
Large language models often fail to incorporate API updates into their code generation, resulting in low rates of executable output even when current specifications are supplied.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper constructs a benchmark of 270 real-world API updates across eight Python libraries and evaluates eleven models from four families on code generation under conditions of deprecation, modification, and addition. It reports that, without comprehensive documentation, only 42.55 percent of the generated examples execute correctly in the target environment. Structured documentation and larger model scales raise this figure to 66.36 percent, while reasoning-based strategies such as Self-Reflection add a further 11 percent improvement in executability. The central observation is that outdated internal patterns continue to influence outputs even when explicit update information is provided.
What carries the argument
Context-memory conflict between an LLM's static parametric knowledge and external API update specifications, measured by whether generated code examples execute successfully in the target environment.
If this is right
- Structured documentation improves LLMs' adoption of API changes but leaves more than one-third of outputs non-executable.
- Increasing model scale helps modestly yet does not remove the underlying conflict with outdated internal knowledge.
- Reasoning strategies such as Self-Reflection deliver an 11 percent gain in executable code on these tasks.
- The persistence of stale patterns indicates a need for benchmarks and techniques explicitly designed around ongoing API evolution.
Where Pith is reading between the lines
- Teams that integrate LLMs into development pipelines may need extra verification steps whenever libraries they depend on release updates.
- Training methods that allow continuous incorporation of new facts could reduce reliance on post-hoc retrieval for time-sensitive information.
- Similar knowledge conflicts are likely to appear in other generative tasks where facts change, such as legal drafting or medical advice.
- Repeating the evaluation on libraries from additional programming languages would show whether the observed rates are specific to Python or more general.
Load-bearing premise
The 270 updates drawn from eight libraries represent typical API evolution, and the rate at which generated code runs correctly captures the practical impact of knowledge conflicts on development work.
What would settle it
Run the same benchmark on a new set of libraries and models after supplying documentation that is both more complete and formatted differently; if the executable rate remains below 70 percent for the largest models, the conflict persists beyond the tested conditions.
Figures
read the original abstract
The rapid evolution of software libraries creates a significant challenge for Large Language Models (LLMs), whose static parametric knowledge often becomes stale post-training. While retrieval-augmented generation (RAG) is commonly used to provide up-to-date API specifications, "context-memory conflict" arises when external instructions contradict a model's internal parametric knowledge. This paper presents a systematic empirical study of LLM code generation under API evolution (e.g., API deprecation, API modification, and API addition), by constructing a benchmark of 270 real-world updates from eight Python libraries. We evaluate four LLM families of 11 models. Our results show that without comprehensive documentation, LLMs struggle to prioritize external context, averaging only 42.55% of generated code examples are executable in the target environment. While structured documentation and larger model scales improve LLMs' ability to update adoption, they do not fully resolve executability issues with a low 66.36% executable rate. In addition, reasoning-based strategies (e.g., Self-Reflection) significantly boost LLMs' performance with 11% improvement on executable rate. Our findings highlight the persistence of outdated patterns from LLMs, even when API update specifications are provided, and emphasize the need for evolution-aware benchmarks and techniques.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts an empirical study of LLMs' code generation under API evolution in Python libraries, constructing a benchmark of 270 real-world updates (deprecations, modifications, additions) from eight libraries. It evaluates 11 models across four families and reports that LLMs achieve only 42.55% executability without documentation (rising to 66.36% with structured docs), with further gains from scale and reasoning strategies such as self-reflection (11% improvement). The central claim is that LLMs struggle to override stale parametric knowledge with external context even when API updates are provided, motivating evolution-aware benchmarks and techniques.
Significance. If the results hold under a properly validated benchmark, the work provides concrete evidence of a persistent limitation in RAG-augmented code generation for dynamic software ecosystems. The use of real-world API updates across multiple libraries and model families, combined with the evaluation of mitigation strategies, offers actionable insights for improving LLM reliability in software engineering tasks.
major comments (2)
- [Benchmark construction and evaluation protocol (likely §3–4)] The interpretation that low executability rates (42.55% without docs, 66.36% with) demonstrate failure to resolve context-memory conflicts requires that the 270 selected updates are breaking changes where old-API code fails in the target environment. The manuscript provides no explicit confirmation or table documenting that each update triggers runtime or deprecation errors under the evaluation setup; without this, the percentages may primarily reflect baseline code-generation quality rather than specific prioritization of external context.
- [Abstract and §4 (Experiments)] The abstract and results sections report precise executability percentages and an 11% improvement from self-reflection, yet supply no details on benchmark construction methodology, prompting templates, exact definition of 'executable' (syntax vs. functional correctness), target environment versions, or controls for selection bias in the 270 updates. These omissions make it impossible to assess whether the numbers robustly support the stated conclusions about knowledge conflicts.
minor comments (2)
- [§3] The paper would benefit from a table or appendix listing the eight libraries, the distribution of update types (deprecation/modification/addition), and the specific criteria used to verify that each update is a breaking change.
- [Evaluation metrics] Clarify whether 'executable' includes runtime success only or also checks for correct functional behavior against expected outputs; this distinction affects how strongly the results speak to practical workflow impact.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our empirical study of LLM code generation under API evolution. The comments highlight important aspects of benchmark validation and methodological transparency that we have addressed in the revision.
read point-by-point responses
-
Referee: [Benchmark construction and evaluation protocol (likely §3–4)] The interpretation that low executability rates (42.55% without docs, 66.36% with) demonstrate failure to resolve context-memory conflicts requires that the 270 selected updates are breaking changes where old-API code fails in the target environment. The manuscript provides no explicit confirmation or table documenting that each update triggers runtime or deprecation errors under the evaluation setup; without this, the percentages may primarily reflect baseline code-generation quality rather than specific prioritization of external context.
Authors: We agree that confirming the updates as breaking changes is essential to isolate context-memory conflicts from general generation quality. The original selection drew from official release notes and deprecation warnings across the eight libraries, but we did not include per-update error documentation. In the revised manuscript, we have added Table 2 in §3.1, which lists for each update the specific runtime error (e.g., AttributeError, TypeError, or DeprecationWarning treated as failure) observed when executing outdated code in the target environment. We also describe the verification procedure: generating minimal old-API snippets and confirming failure before including the update. This directly supports that the reported executability gaps reflect prioritization of stale parametric knowledge over provided context. revision: yes
-
Referee: [Abstract and §4 (Experiments)] The abstract and results sections report precise executability percentages and an 11% improvement from self-reflection, yet supply no details on benchmark construction methodology, prompting templates, exact definition of 'executable' (syntax vs. functional correctness), target environment versions, or controls for selection bias in the 270 updates. These omissions make it impossible to assess whether the numbers robustly support the stated conclusions about knowledge conflicts.
Authors: We acknowledge that the abstract and §4 omitted key methodological specifics needed for reproducibility and evaluation of robustness. The full construction details appear in §3, but we have now expanded the abstract with a concise description of the 270-update benchmark (stratified sampling from release notes of eight libraries, covering deprecation, modification, and addition). We added §3.3 on evaluation protocol, including: (i) exact prompting templates (now in Appendix A), (ii) definition of 'executable' as code that runs to completion without exceptions or deprecation-induced failures in the target environment, (iii) target versions (Python 3.9 with latest stable library releases), and (iv) bias controls via library-stratified and type-balanced sampling. These revisions allow direct assessment of whether the results demonstrate knowledge conflicts. revision: yes
Circularity Check
No circularity: purely empirical reporting of observed executability rates
full rationale
The paper constructs a benchmark of 270 real-world API updates and reports direct experimental measurements of LLM code executability (42.55% without docs, 66.36% with structured docs, 11% gain from self-reflection). No equations, fitted parameters, derived predictions, or load-bearing self-citations appear in the derivation chain; all central claims are observational outcomes from the evaluation setup rather than reductions to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs possess static parametric knowledge that becomes stale after training cutoff
- domain assumption Executability of generated code in the target environment is a valid measure of whether external context has overridden internal outdated knowledge
Forward citations
Cited by 1 Pith paper
-
When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
Stale repository context in code RAG actively induces models to produce obsolete helper references, raising stale outputs by 76-88 percentage points over current-only retrieval in a 17-sample diagnostic study.
Reference graph
Works this paper leans on
-
[1]
https://github.com/AhmedNusayer/knowledge-conflict-codegen
2026. https://github.com/AhmedNusayer/knowledge-conflict-codegen
2026
-
[2]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al . 2021. Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[3]
Aniket Bhattacharyya, Anurag Tripathi, Ujjal Das, Archan Karmakar, Amit Pathak, and Maneesh Gupta. 2025. Information extraction from visually rich documents using LLM-based organization of documents into independent textual segments. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers). 17241–17256
2025
- [4]
-
[5]
Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, and William Yang Wang
-
[6]
Understanding the interplay between parametric and contextual knowledge for large language models. arXiv preprint arXiv:2410.08414 (2024)
-
[7]
Farbod Daneshyan, Runzhi He, Jianyu Wu, and Minghui Zhou. 2025. Smartnote: An llm-powered, personalised release note generator that just works. Proceedings of the ACM on Software Engineering 2, FSE (2025), 1663–1686
2025
-
[8]
Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, et al. 2023. Crosscodeeval: A diverse and multilingual benchmark for cross-file code completion. Advances in Neural Information Processing Systems 36 (2023), 46701–46723
2023
- [9]
- [10]
-
[11]
Xiaodong Gu, Meng Chen, Yalan Lin, Yuhan Hu, Hongyu Zhang, Chengcheng Wan, Zhao Wei, Yong Xu, and Juhong Wang. 2025. On the effectiveness of large language models in domain-specific code generation. ACM Transactions on Software Engineering and Methodology 34, 3 (2025), 1–22
2025
-
[12]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yifan Wu, YK Li, et al. 2024. DeepSeek-Coder: when the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196 (2024)
work page internal anchor Pith review arXiv 2024
-
[14]
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y . Wu, Y . K. Li, Fuli Luo, Yingfei Xiong, and Wen- feng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR abs/2401.14196 (2024). https://doi.org/10.48550/arXiv.2401.14196
work page internal anchor Pith review doi:10.48550/arxiv.2401.14196 2024
-
[15]
Pengfei He, Shaowei Wang, Shaiful Chowdhury, and Tse-Hsun Chen. 2025. Eval- uating the effectiveness and efficiency of demonstration retrievers in rag for coding tasks. In 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 500–510
2025
-
[16]
Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, et al . [n. d.]. Measuring Coding Challenge Competence With APPS. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)
-
[17]
Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. 2024. Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2026. A survey on large language models for code generation. ACM Transactions on Software Engineering and Methodology 35, 2 (2026), 1–72
2026
-
[19]
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Ranim Khojah, Francisco Gomes de Oliveira Neto, Mazen Mohamad, and Philipp Leitner. 2025. The impact of prompt programming on function-level code genera- tion. IEEE Transactions on Software Engineering (2025)
2025
-
[21]
Sachit Kuhar, Wasi Ahmad, Zijian Wang, Nihal Jain, Haifeng Qian, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, and Anoop Deoras. 2025. Libevo- lutioneval: A benchmark and study for version-specific code generation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human L...
2025
-
[22]
Maxime Lamothe, Yann-Gaël Guéhéneuc, and Weiyi Shang. 2021. A systematic review of API evolution literature. ACM Computing Surveys (CSUR) 54, 8 (2021), 1–36
2021
-
[23]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 (2020), 9459–9474
2020
- [24]
-
[25]
R Li, LB Allal, Y Zi, N Muennighoff, D Kocetkov, C Mou, M Marone, C Akiki, J Li, J Chim, et al. 2023. StarCoder: May the Source be With You! Transactions on machine learning research (2023)
2023
-
[26]
X Li, DG Wang, S Wang, S Wang, Y Wang, Y Wang, Y Wang, Y Wang, Z Wang, Z Wang, et al. 2022. Evaluating large language models trained on code. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 12345–12356
2022
- [27]
- [28]
-
[29]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems 36 (2023), 21558–21572. Ahmed Nusayer Ashik, Shaowei Wang, Tse-Hsun Chen, Muhammad Asaduzzaman, and Yuan Tian
2023
-
[30]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=1qvx610Cu7
2023
-
[31]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys 55, 9 (2023), 1–35
2023
- [32]
-
[33]
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al
-
[34]
Advances in neural information processing systems 36 (2023), 46534–46594
Self-refine: Iterative refinement with self-feedback. Advances in neural information processing systems 36 (2023), 46534–46594
2023
-
[35]
Diganta Misra, Nizar Islah, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Antonio Orvieto, Muawiz Chaudhary, Eilif B Muller, Irina Rish, et al . 2025. GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Ver- sion Incompatibilities. arXiv preprint arXiv:2507.12367 (2025)
-
[36]
OpenAI. 2024. Models – OpenAI Platform Documentation. https://developers. openai.com/api/docs/models/gpt-4o-mini Accessed: 2026-03-06
2024
-
[37]
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The impact of ai on developer productivity: Evidence from github copilot. arXiv preprint arXiv:2302.06590 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiao- qing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927 1 (2024)
work page internal anchor Pith review arXiv 2024
-
[40]
Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al
-
[41]
Openai gpt-5 system card. arXiv preprint arXiv:2601.03267 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Kaiser Sun, Fan Bai, and Mark Dredze. 2025. What is seen cannot be unseen: The disruptive effect of knowledge conflict on large language models. arXiv e-prints (2025), arXiv–2506
2025
-
[43]
Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, and Xin Peng. 2025. LLMs Meet Library Evolution: Evaluating Deprecated API Usage in LLM-Based Code Completion. In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 885–897
2025
-
[44]
Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, and Xin Peng. 2025. Llms meet library evolution: Evaluating deprecated api usage in llm-based code completion. In 2025 ieee/acm 47th international conference on software engineering (icse). IEEE, 885–897
2025
-
[45]
Jiawei Wang, Li Li, Kui Liu, and Haipeng Cai. 2020. Exploring how deprecated python library apis are (not) handled. InProceedings of the 28th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering. 233–244
2020
-
[46]
Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. 2024. Openhands: An open platform for ai software developers as generalist agents. arXiv preprint arXiv:2407.16741 (2024)
work page internal anchor Pith review arXiv 2024
-
[47]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837
2022
- [48]
-
[49]
Yixi Wu, Pengfei He, Zehao Wang, Shaowei Wang, Yuan Tian, and Tse-Hsun Chen
-
[50]
arXiv preprint arXiv:2409.15228 (2024)
A comprehensive framework for evaluating api-oriented code generation in large language models. arXiv preprint arXiv:2409.15228 (2024)
-
[51]
Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente. 2017. His- torical and impact analysis of API breaking changes: A large-scale study. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 138–147
2017
-
[52]
Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, and Yu Su. 2023. Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts. In The Twelfth International Conference on Learning Representations
2023
-
[53]
Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, and Wei Xu. 2024. Knowledge conflicts for llms: A survey. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 8541–8565
2024
- [54]
-
[55]
Daoguang Zan, Ailun Yu, Bo Shen, Bei Chen, Wei Li, Yongshun Gong, Xiaolin Chen, Yafen Yao, Weihua Luo, Bei Guan, et al. 2024. DiffCoder: Enhancing large language model on API invocation via analogical code exercises. Proceedings of the ACM on Software Engineering 1, FSE (2024), 406–426
2024
-
[56]
Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. 2024. Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level cod- ing challenges. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers). 13643–13658
2024
-
[57]
Zhaoxu Zhang, Hengcheng Zhu, Ming Wen, Yida Tao, Yepang Liu, and Yingfei Xiong. 2020. How do python framework apis evolve? an exploratory study. In 2020 ieee 27th international conference on software analysis, evolution and reengineering (saner). IEEE, 81–92
2020
-
[58]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems 36 (2023), 46595–46623
2023
-
[59]
Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Lei Shen, Zihan Wang, Andi Wang, Yang Li, et al. 2023. Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5673–5684
2023
-
[60]
Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2024. Measuring github copilot’s impact on productivity.Commun. ACM 67, 3 (2024), 54–63
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.