Multi-agent AI systems outperform human teams in creativity
Pith reviewed 2026-05-20 11:21 UTC · model grok-4.3
The pith
Multi-agent LLM teams generate more creative ideas than human teams by exploring semantic space more efficiently.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Multi-agent LLM teams not only surpass single agents, but also substantially outperform human teams in creativity (Cohen's d=1.50) across 4,541 multi-agent LLM ideas and 341 human-team ideas on six diverse problem-solving tasks. This advantage is driven by novelty while maintaining comparable usefulness. Both LLM and human teams produce more creative ideas when conversations range widely rather than staying centered on a single theme (low global coherence). However, the additional patterns that predict creativity differ: LLM teams benefit from efficient exploration (high semantic spread, shorter paths), while human teams benefit from maintaining smooth conversational flow (high local coher
What carries the argument
Representation of team conversations as paths through semantic space using neural language model embeddings, which quantifies global coherence, semantic spread, local coherence, and path length to compare generative processes.
If this is right
- LLM teams reach higher creativity via high semantic spread combined with shorter paths.
- Human teams reach higher creativity via high local coherence and frequent pivots.
- Model choice and discussion structure together explain 26.8 percent of variance in LLM conversational dynamics.
- These levers enable systematic design of multi-agent systems for greater creative output.
Where Pith is reading between the lines
- Hybrid human-AI teams could combine human conversational flow with AI semantic spread to exceed either alone.
- The semantic path technique could be applied to study creativity in scientific or design collaborations beyond the tested tasks.
- Training objectives that reward efficient wide exploration might further increase LLM team performance.
Load-bearing premise
Creativity can be measured comparably for AI and human ideas through novelty and usefulness ratings without systematic bias from the chosen tasks or evaluation method.
What would settle it
Independent raters blind to idea source rate human-team ideas as equally or more novel than multi-agent LLM ideas on the same six tasks, or the performance gap vanishes when the tasks are replaced with new real-world innovation challenges.
read the original abstract
Although artificial intelligence (AI) now matches or exceeds human performance across numerous cognitive tasks, creativity remains a highly contested frontier. As AI systems based on large language models (LLMs) are increasingly adopted in research and innovation, it is essential to understand and augment their creativity. Here we demonstrate that multi-agent LLM teams not only surpass single agents, but also substantially outperform human teams in creativity (Cohen's d=1.50) across 4,541 multi-agent LLM ideas and 341 human-team ideas on six diverse problem-solving tasks. This advantage is driven by novelty while maintaining comparable usefulness. To investigate the generative processes in both groups, we represent conversations as paths through semantic space using neural language model representations. Both LLM and human teams produce more creative ideas when conversations range widely rather than staying centered on a single theme (low global coherence). However, the additional patterns that predict creativity differ: LLM teams benefit from efficient exploration (high semantic spread, shorter paths), while human teams benefit from maintaining smooth conversational flow (high local coherence, frequent pivots). Additionally, we identify model choice and discussion structure as orthogonal design levers that together explain 26.8% of variance in LLM conversational dynamics, paving the way for systematic approaches to developing multi-agent systems with augmented creative capabilities.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical comparison of creative idea generation by multi-agent LLM teams versus single LLMs and human teams across six problem-solving tasks. It reports that multi-agent LLM teams substantially outperform human teams in overall creativity (Cohen's d=1.50) based on pooled novelty and usefulness ratings of 4,541 LLM-generated ideas and 341 human-team ideas, with the advantage driven by novelty. The work further represents team conversations as paths in semantic embedding space, showing that both groups benefit from low global coherence but differ in other predictors (efficient exploration for LLMs, local coherence for humans), and identifies model choice plus discussion structure as levers explaining 26.8% of variance in LLM dynamics.
Significance. If the measurement and comparison protocols prove robust, the finding that multi-agent LLM systems can exceed human teams in creativity would be a notable contribution to AI-augmented innovation research, with direct implications for designing collaborative AI systems. The semantic-path representation of conversational trajectories provides a useful quantitative lens for comparing generative processes, and the variance decomposition offers actionable design insights. The scale of the LLM idea sample is a clear empirical strength.
major comments (3)
- [Methods] Methods section: the rating protocol description provides no inter-rater reliability statistics (e.g., ICC or Cohen's kappa), no details on rater blinding verification, and no source-guessing test or residual-source regression to quantify whether stylistic cues allowed raters to detect LLM versus human origin. This directly undermines confidence in the commensurability of novelty scores that drive the reported Cohen's d=1.50.
- [Results] Results section on creativity comparison: the headline effect size is presented without error bars, confidence intervals, or explicit reporting of data exclusion criteria and participant-matching procedures between the human-team (n=341 ideas) and LLM conditions. These omissions are load-bearing for interpreting the magnitude and generalizability of the claimed superiority.
- [Semantic path analysis] Semantic path analysis subsection: the assumption that the same embedding space represents LLM and human conversational trajectories without differential distortion is not tested (e.g., via domain-adaptation checks or cross-source alignment metrics), yet it underpins the differential predictor findings for LLM versus human teams.
minor comments (2)
- [Abstract] Abstract: the six tasks are referred to as 'diverse' but not enumerated; adding their names would improve immediate clarity for readers.
- [Figures] Figure captions for semantic path visualizations: axis labels and color legends could be expanded to explicitly state what 'global coherence' and 'semantic spread' quantify in the embedding space.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important aspects of methodological transparency and analytical rigor. We address each major comment below and outline revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods section: the rating protocol description provides no inter-rater reliability statistics (e.g., ICC or Cohen's kappa), no details on rater blinding verification, and no source-guessing test or residual-source regression to quantify whether stylistic cues allowed raters to detect LLM versus human origin. This directly undermines confidence in the commensurability of novelty scores that drive the reported Cohen's d=1.50.
Authors: We agree that inter-rater reliability and blinding details are essential for validating the novelty and usefulness ratings. In the revised manuscript, we will report intraclass correlation coefficients (ICC) for the ratings across the four raters. We will also add explicit confirmation that raters were blinded to idea origin (LLM vs. human) and include a source-guessing test on a subset of ideas to quantify any detectable stylistic cues, along with residual-source regression if appropriate. These additions will directly support the commensurability of the scores underlying the d=1.50 effect. revision: yes
-
Referee: [Results] Results section on creativity comparison: the headline effect size is presented without error bars, confidence intervals, or explicit reporting of data exclusion criteria and participant-matching procedures between the human-team (n=341 ideas) and LLM conditions. These omissions are load-bearing for interpreting the magnitude and generalizability of the claimed superiority.
Authors: We acknowledge that error bars, confidence intervals, and detailed exclusion/matching criteria are necessary for proper interpretation. The revised results section will include 95% confidence intervals for Cohen's d and mean novelty/usefulness scores. We will also explicitly report all data exclusion criteria applied to both the 4,541 LLM ideas and 341 human ideas, as well as the participant- and task-matching procedures used to align the human-team and multi-agent LLM conditions. revision: yes
-
Referee: [Semantic path analysis] Semantic path analysis subsection: the assumption that the same embedding space represents LLM and human conversational trajectories without differential distortion is not tested (e.g., via domain-adaptation checks or cross-source alignment metrics), yet it underpins the differential predictor findings for LLM versus human teams.
Authors: This is a fair point regarding the embedding space. In revision, we will add explicit tests for differential distortion, including cross-source alignment metrics (e.g., Procrustes superimposition) and domain-adaptation checks comparing intra- vs. inter-source trajectory distances. Results of these checks will be reported, and any implications for the distinct predictor patterns (efficient exploration for LLMs vs. local coherence for humans) will be discussed. revision: yes
Circularity Check
No significant circularity in empirical comparison
full rationale
The paper conducts an empirical study generating and rating 4,541 multi-agent LLM ideas against 341 human-team ideas across six tasks, using human novelty/usefulness scores and semantic path representations from pre-trained embeddings. No equations or derivations are presented that reduce any claimed result (such as Cohen's d=1.50 or the 26.8% variance figure) to a fitted parameter defined by the same data or to a self-citation chain. The variance result is reported as an observational outcome from regression on model choice and discussion structure, not as a definitional prediction. All central claims rest on external human ratings and standard statistical methods without self-referential loops, rendering the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions underlying Cohen's d effect size and linear variance partitioning hold for the creativity ratings and semantic coherence metrics.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
multi-agent LLM teams ... outperform human teams in creativity (Cohen’s d=1.50) ... represent conversations as paths through semantic space using neural language model representations ... trajectory features ... global coherence, semantic spread, local coherence
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Jcost uniqueness, φ-ladder, 8-tick period, parameter-free constants
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Westview press, Boulder, CO (1996)
Amabile, T.M.: Creativity in Context: Update to the Social Psychology of Creativity. Westview press, Boulder, CO (1996)
work page 1996
-
[2]
Boden, M.A.: The Creative Mind: Myths and Mechanisms, 2nd edn. Routledge, London, UK (2004). https://doi.org/10.4324/9780203508527 . https://doi.org/10.4324/9780203508527
-
[3]
Creativity Research Journal24(1), 92–96 (2012)
Runco, M.A., Jaeger, G.J.: The standard definition of creativity. Creativity Research Journal24(1), 92–96 (2012)
work page 2012
-
[4]
Colton, S., Wiggins, G.A.: Computational creativity: the final frontier? In: Pro- ceedings of the 20th European Conference on Artificial Intelligence. ECAI’12, pp. 21–26. IOS Press, NLD (2012)
work page 2012
-
[5]
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 1026–1034. IEEE Computer Society, Washington, DC (2015). https: //doi.org/10.1109/ICCV.2015.123 .https://doi.org/10...
-
[6]
and Ko, Justin and Swetter, Susan M
Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature542(7639), 115–118 (2017) https://doi.org/10.1038/nature21056
-
[7]
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., Bello, I., Berdine, J., Bernadett-Shapiro, G., Berner, C., Bogdonoff, L., Boiko, O., Boyd, M., Brakman, A.-L., Brockman,...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
Peng, S., Kalliamvakou, E., Cihon, P., Demirer, M.: The Impact of AI on Devel- oper Productivity: Evidence from GitHub Copilot (2023). https://arxiv.org/abs/ 2302.06590
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [9]
-
[10]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Novikov, A., V˜ u, N., Eisenberger, M., Dupont, E., Huang, P.-S., Wagner, A.Z., Shirobokov, S., Kozlovskii, B., Ruiz, F.J.R., Mehrabian, A., Kumar, M.P., See, A., Chaudhuri, S., Holland, G., Davies, A., Nowozin, S., Kohli, P., Balog, M.: AlphaEvolve: A coding agent for scientific and algorithmic discovery (2025). https://arxiv.org/abs/2506.13131
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
Nature (2026) https://doi.org/10.1038/s41586-025-10072-4
Asai, A., He, J., Shao, R., Shi, W., Singh, A., Chang, J.C., Lo, K., Soldaini, L., Feldman, S., D’Arcy, M., Wadden, D., Latzke, M., Sparks, J., Hwang, J.D., Kishore, V., Tian, M., Ji, P., Liu, S., Tong, H., Wu, B., Xiong, Y., Zettlemoyer, L., Neubig, G., Weld, D.S., Downey, D., Yih, W.-t., Koh, P.W., Hajishirzi, H.: Syn- thesizing scientific literature wi...
-
[12]
Nature651(8107), 914–919 (2026)
Lu, C., Lu, C., Lange, R.T., Yamada, Y., Hu, S., Foerster, J., Ha, D., Clune, 21 J.: Towards end-to-end automation of ai research. Nature651(8107), 914–919 (2026)
work page 2026
-
[13]
Nature550(7676), 354–359 (2017) https://doi.org/10.1038/ nature24270
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of go without human knowledge. Nature550(7676), 354–359 (2017) https://doi.org/10.1038/ nature24270
work page 2017
-
[14]
Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., O. Pinto, H.P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., Zhang, S.: Dota 2 with Large Scale Deep Reinforcemen...
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[15]
Science 330(6004), 686–688 (2010) https://doi.org/10.1126/science.1193147
Woolley, A.W., Chabris, C.F., Pentland, A., Hashmi, N., Malone, T.W.: Evidence for a collective intelligence factor in the performance of human groups. Science 330(6004), 686–688 (2010) https://doi.org/10.1126/science.1193147
-
[16]
In: Proceedings of the 41st International Conference on Machine Learning
Du, Y., Li, S., Torralba, A., Tenenbaum, J.B., Mordatch, I.: Improving factuality and reasoning in language models through multiagent debate. In: Proceedings of the 41st International Conference on Machine Learning. ICML’24. JMLR.org, Vienna, Austria (2024)
work page 2024
-
[17]
In: Al-Onaizan, Y., Bansal, M., Chen, Y.-N
Liang, T., He, Z., Jiao, W., Wang, X., Wang, Y., Wang, R., Yang, Y., Shi, S., Tu, Z.: Encouraging divergent thinking in large language models through multi-agent debate. In: Al-Onaizan, Y., Bansal, M., Chen, Y.-N. (eds.) Proceedings of the 2024 Conference on Empirical Methods in Natural Lan- guage Processing, pp. 17889–17904. Association for Computational...
-
[18]
doi: 10.18653/v1/2024.acl-long
Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., Yang, C., Chen, W., Su, Y., Cong, X., Xu, J., Li, D., Liu, Z., Sun, M.: ChatDev: Communicative agents for software development. In: Ku, L.-W., Martins, A., Srikumar, V. (eds.) Proceed- ings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15174–...
-
[19]
In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V
Lin, Y.-C., Chen, K.-C., Li, Z.-Y., Wu, T.-H., Wu, T.-H., Chen, K.-Y., Lee, H.-y., Chen, Y.-N.: Creativity in LLM-based multi-agent systems: A sur- vey. In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V. (eds.) Proceedings of the 2025 Conference on Empirical Methods in Natural Lan- guage Processing, pp. 27584–27607. Association for Computatio...
-
[20]
Journal of Personality and Social Psychology43(5), 997–1013 (1982)
Amabile, T.M.: Social psychology of creativity: A consensual assessment tech- nique. Journal of Personality and Social Psychology43(5), 997–1013 (1982)
work page 1982
-
[21]
Diehl, M., Stroebe, W.: Productivity loss in brainstorming groups: Toward the solution of a riddle. Journal of Personality and Social Psychology53(3), 497–509 (1987) https://doi.org/10.1037/0022-3514.53.3.497
-
[22]
https://arxiv.org/abs/2601.13295
Khatua, A., Zhu, H., Tran, P., Prabhudesai, A., Sadrieh, F., Lieberwirth, J.K., Yu, X., Fu, Y., Ryan, M.J., Pei, J., Yang, D.: CooperBench: Why Coding Agents Cannot be Your Teammates Yet (2026). https://arxiv.org/abs/2601.13295
-
[23]
Kozlowski, A.C., Taddy, M., Evans, J.A.: The geometry of culture: Ana- lyzing the meanings of class through word embeddings. American Sociolog- ical Review84(5), 905–949 (2019) https://doi.org/10.1177/0003122419877135 https://doi.org/10.1177/0003122419877135
-
[24]
Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space
Toro-Hernández, F.D., Filho, J.V., Cabral-Carvalho, R.M.: Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embed- ding Space (2026). https://arxiv.org/abs/2602.05971
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
Proceedings of the National Academy of Sciences120(42), 2305290120 (2023) https://doi.org/10
Nour, M.M., McNamee, D.C., Liu, Y., Dolan, R.J.: Trajectories through semantic spaces in schizophrenia and the relationship to ripple bursts. Proceedings of the National Academy of Sciences120(42), 2305290120 (2023) https://doi.org/10. 1073/pnas.2305290120
work page 2023
-
[26]
Psychological Review119(2), 431–440 (2012) https://doi.org/10.1037/a0027373
Hills, T.T., Jones, M.N., Todd, P.M.: Optimal foraging in semantic memory. Psychological Review119(2), 431–440 (2012) https://doi.org/10.1037/a0027373
-
[27]
Trends in Cognitive Sciences19(1), 46–54 (2015)
Hills, T.T., Todd, P.M., Lazer, D., Redish, A.D., Couzin, I.D., Group, C.S.R.: Exploration versus exploitation in space, mind, and society. Trends in Cognitive Sciences19(1), 46–54 (2015)
work page 2015
-
[28]
Measuring Semantic Coherence of a Conversation
Vakulenko, S., Rijke, M., Cochez, M., Savenkov, V., Polleres, A.: Measuring Semantic Coherence of a Conversation (2018). https://arxiv.org/abs/1806.06411
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[29]
Efficient Estimation of Word Representations in Vector Space
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[30]
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettle- moyer,L.:Deepcontextualizedwordrepresentations.In:Walker,M.,Ji,H.,Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Association ...
-
[31]
Nature Human Behaviour10(3), 531–540 (2026) https://doi.org/10.1038/s41562-025-02331-1
Wang, D., Huang, D., Shen, H., Uzzi, B.: A large-scale comparison of divergent creativity in humans and large language models. Nature Human Behaviour10(3), 531–540 (2026) https://doi.org/10.1038/s41562-025-02331-1
-
[32]
Scientific Reports16, 1279 (2026) https://doi.org/10.1038/ s41598-025-25157-3
Bellemare-Pepin, A., Lespinasse, F., Thölke, P., Harel, Y., Mathewson, K., Olson, J.A., Bengio, Y., Jerbi, K.: Divergent creativity in humans and large language models. Scientific Reports16, 1279 (2026) https://doi.org/10.1038/ s41598-025-25157-3
work page 2026
-
[33]
Stevenson, C., Smal, I., Baas, M., Grasman, R., Maas, H.: Putting GPT-3’s Cre- ativity to the (Alternative Uses) Test. arXiv (2022). https://doi.org/10.48550/ arXiv.2206.08932 . http://arxiv.org/abs/2206.08932 Accessed 2024-02-18
-
[34]
Nath, S.S., Dayan, P., Stevenson, C.: Characterising the Creative Process in Humans and Large Language Models. arXiv (2024). https://doi.org/10.48550/ arXiv.2405.00899 . http://arxiv.org/abs/2405.00899 Accessed 2026-01-22
-
[35]
Proceedings of the National Academy of Sciences118(25), 2022340118 (2021)
Olson, J.A., Nahas, J., Chmoulevitch, D., Cropper, S.J., Webb, M.E.: Nam- ing unrelated words predicts creativity. Proceedings of the National Academy of Sciences118(25), 2022340118 (2021)
work page 2021
-
[36]
Behavior Research Methods 53(2), 757–780 (2021)
Beaty, R.E., Johnson, D.R.: Automating creativity assessment with semdis: An open platform for computing semantic distance. Behavior Research Methods 53(2), 757–780 (2021)
work page 2021
-
[37]
European Review of Social Psychology21(1), 34–77 (2010)
Nijstad, B.A., De Dreu, C.K., Rietzschel, E.F., Baas, M.: The dual pathway to creativity model: Creative ideation as a function of flexibility and persistence. European Review of Social Psychology21(1), 34–77 (2010)
work page 2010
-
[38]
Administrative Science Quarterly44(2), 350–383 (1999)
Edmondson, A.: Psychological safety and learning behavior in work teams. Administrative Science Quarterly44(2), 350–383 (1999)
work page 1999
-
[39]
Academy of Management Review39(3), 324–343 (2014)
Harvey, S.: Creative synthesis: Exploring the process of extraordinary group creativity. Academy of Management Review39(3), 324–343 (2014)
work page 2014
-
[40]
https://arxiv.org/abs/2601.10825
Kim, J., Lai, S., Scherrer, N., Arcas, B.A., Evans, J.: Reasoning Models Generate Societies of Thought (2026). https://arxiv.org/abs/2601.10825
-
[41]
Psychological Bulletin53(4), 267–293 (1956) https://doi.org/10.1037/h0040755
Guilford, J.P.: The structure of intellect. Psychological Bulletin53(4), 267–293 (1956) https://doi.org/10.1037/h0040755
-
[42]
Applied Psychology49(2), 237–262 (2000)
Paulus, P.B.: Groups, teams, and creativity: The creative potential of idea- generating groups. Applied Psychology49(2), 237–262 (2000)
work page 2000
-
[43]
Paulus, P.B., Nijstad, B.A.: Group creativity: An introduction. In: Paulus, P.B., Nijstad, B.A. (eds.) Group Creativity: Innovation Through Collaboration, pp. 3–12. Oxford University Press, Oxford, 24 UK (2003). https://doi.org/10.1093/acprof:oso/9780195147308.003.0001 . https://doi.org/10.1093/acprof:oso/9780195147308.003.0001
work page doi:10.1093/acprof:oso/9780195147308.003.0001 2003
-
[44]
https://openai.com/index/gpt-4-1/ (2025)
OpenAI: Introducing GPT-4.1 in the API. https://openai.com/index/gpt-4-1/ (2025)
work page 2025
-
[45]
https://openai.com/index/ introducing-o3-and-o4-mini/ (2025)
OpenAI: Introducing OpenAI o3 and o4-mini. https://openai.com/index/ introducing-o3-and-o4-mini/ (2025)
work page 2025
-
[46]
Nature645, 633–638 (2025) https://doi.org/10.1038/ s41586-025-09422-z
Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X.,et al.: Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature645, 633–638 (2025) https://doi.org/10.1038/ s41586-025-09422-z
work page 2025
-
[47]
Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al.: Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[48]
Quantifying the persona effect in LLM simulations
Hu, T., Collier, N.: Quantifying the persona effect in LLM simulations. In: Ku, L.-W., Martins, A., Srikumar, V. (eds.) Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 10289–10307. Association for Computational Linguis- tics, Bangkok, Thailand (2024). https://doi.org/10.18653/v1/2024.acl...
-
[49]
https://arxiv.org/abs/2502.20859
Duan, Y., Tang, Y., Bai, X., Chen, K., Li, J., Zhang, M.: The Power of Per- sonality: A Human Simulation Perspective to Investigate Large Language Model Agents (2025). https://arxiv.org/abs/2502.20859
-
[50]
Osborn, A.F.: Applied Imagination: Principles and Procedures of Creative Thinking. Scribner, New York (1953)
work page 1953
-
[51]
Zhang, Y., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., Huang, F., Zhou, J.: Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models (2025). https://arxiv.org/abs/2506. 05176
work page 2025
-
[52]
Neuropsychology11(1), 138–146 (1997) https://doi.org/10.1037/0894-4105.11.1
Troyer, A.K., Moscovitch, M., Winocur, G.: Clustering and switching as two components of verbal fluency: Evidence from younger and older healthy adults. Neuropsychology11(1), 138–146 (1997) https://doi.org/10.1037/0894-4105.11.1. 138
-
[53]
Creativity Research Journal18(3), 391–404 (2006) https://doi.org/10.1207/s15326934crj1803_13 25
Cropley, A.: In praise of convergent thinking. Creativity Research Journal18(3), 391–404 (2006) https://doi.org/10.1207/s15326934crj1803_13 25
-
[54]
Frontiers in Human Neuroscience8, 407 (2014)
Kenett, Y.N., Anaki, D., Faust, M.: Investigating the structure of semantic net- works in low and high creative persons. Frontiers in Human Neuroscience8, 407 (2014)
work page 2014
-
[55]
Schizophrenia Research93(1-3), 304–316 (2007)
Elvevåg, B., Foltz, P.W., Weinberger, D.R., Goldberg, T.E.: Quantifying incoher- ence in speech: An automated methodology and novel application to schizophre- nia. Schizophrenia Research93(1-3), 304–316 (2007)
work page 2007
-
[56]
In: Loveys, K., Niederhoffer, K., Prud’hommeaux, E., Resnik, R., Resnik, P
Iter, D., Yoon, J., Jurafsky, D.: Automatic detection of incoherent speech for diagnosing schizophrenia. In: Loveys, K., Niederhoffer, K., Prud’hommeaux, E., Resnik, R., Resnik, P. (eds.) Proceedings of the Fifth Workshop on Compu- tational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 136–146. Association for Computational Linguistics...
-
[57]
Li, J., Hovy, E.: A model of coherence based on distributed sentence representa- tion. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2039–2048 (2014)
work page 2014
-
[58]
Psychological Science35(7), 749–759 (2024)
Malaie, S., Spivey, M.J., Marghetis, T.: Divergent and convergent creativity are different kinds of foraging. Psychological Science35(7), 749–759 (2024)
work page 2024
-
[59]
The Journal of Creative Behavior1(1), 3–14 (1967) https://doi.org/10.1002/j.2162-6057.1967
Guilford, J.P.: Creativity: Yesterday, today and tomorrow. The Journal of Creative Behavior1(1), 3–14 (1967) https://doi.org/10.1002/j.2162-6057.1967. tb00002.x
-
[60]
Management Science56(4), 591–605 (2010)
Girotra, K., Terwiesch, C., Ulrich, K.T.: Idea generation and the quality of the best idea. Management Science56(4), 591–605 (2010)
work page 2010
-
[61]
Behavior Research Methods55(7), 3726–3759 (2023) https://doi.org/10.3758/s13428-022-01986-2
Johnson, D.R., Kaufman, J.C., Baker, B.S., Patterson, J.D., Barbot, B., Green, A.E., Hell, J., Kennedy, E., Sullivan, G.F., Taylor, C.L., Ward, T., Beaty, R.E.: Divergent semantic integration (DSI): Extracting creativity from narratives with distributional semantic modeling. Behavior Research Methods55(7), 3726–3759 (2023) https://doi.org/10.3758/s13428-0...
-
[62]
Zarrieß, S., Junker, S., Sieker, J., Alacam, Ö.: Components of creativity: Lan- guage model-based predictors for clustering and switching in verbal fluency. In: Boleda, G., Roth, M. (eds.) Proceedings of the 29th Conference on Computa- tional Natural Language Learning, pp. 216–232. Association for Computational Linguistics, Vienna, Austria (2025). https:/...
-
[63]
AI Progress through the Lens of Predictable AI Ecosystems
Coursey, L.E., Gertner, R.T., Williams, B.C., Kenworthy, J.B., Paulus, P.B., Doboli, S.: Linking the divergent and convergent processes of collaborative cre- ativity: The impact of expertise levels and elaboration processes. Frontiers in Psychology10(2019) https://doi.org/10.3389/fpsyg.2019.00699 26 Supplementary information.Supplementary information is p...
-
[64]
One agent generated a new candidate idea intended to be more creative than existing ideas
-
[65]
All agents rated the new candidate, the current idea, and recently explored past ideas on creativity (1-10 scale)
-
[66]
The highest-rated idea became the new current idea for the next iteration. This allowed the system to either adopt the new modification, retain the current idea, or backtrack to a recently considered alternative. This proposal-and-evaluation cycle continued until one of two stopping criteria was met: • Convergence: The same idea was selected (rated highes...
work page 2082
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.