Recognition: unknown
Hypothesis generation and updating in large language models
Pith reviewed 2026-05-08 14:38 UTC · model grok-4.3
The pith
Large language models behave like Bayesian hypothesis updaters with offsets that favor narrower hypotheses and prevent good extrapolation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the number game, LLMs' inferences over hypotheses supported by positive examples are well captured by a two-parameter Bayesian model, but they exhibit a default strong-sampling assumption that implicitly favors narrower hypotheses, a shift toward prior reliance in thinking mode, a robust gap between evaluation and generation performance, and a failure to extrapolate the pattern beyond the observed examples.
What carries the argument
The number game, in which a learner sees positive integers and infers the underlying rule or interval, with posteriors measured via prediction, evaluation, and generation probes compared against a Bayesian model.
If this is right
- LLMs default to narrower hypotheses via strong sampling, acting as an implicit Occam's razor.
- Switching to thinking mode increases reliance on prior probabilities over likelihoods.
- LLMs evaluate hypotheses more accurately than they generate them, preferring rule-like outputs in generation.
- The Bayesian-with-bias behavior does not extend to generalization outside the trained examples.
Where Pith is reading between the lines
- This pattern suggests LLMs may struggle with scientific discovery tasks requiring broad hypothesis exploration beyond given data.
- The evaluation-generation gap points to a need for better alignment between selection and creation mechanisms in model training.
- Future tests could examine whether the same biases appear in other structured inference domains beyond numbers.
Load-bearing premise
That the three different probes all tap into the same underlying posterior distribution over hypotheses inside the LLM.
What would settle it
A direct test would be to check whether LLMs continue to show the same parameter fits and biases when probed with new hypothesis spaces or when forced to generate hypotheses that extrapolate to unseen numbers in the game.
Figures
read the original abstract
Large language models (LLMs) increasingly help people solve problems, from debugging code to repairing machinery. This process requires generating plausible hypotheses from partial descriptions, then updating them as more information arrives. Yet how LLMs perform this form of inference, and how close it is to optimal, remains unclear. We study this question in the number game, a controlled setting in which a learner infers the hypothesis supported by a few positive integers, such as $\{16, 8, 2, 64\}$: a rule like powers of 2 or an interval like numbers near 20. We measure the posterior over hypotheses using three complementary probes: posterior prediction, hypothesis evaluation, and hypothesis generation. We then compare LLM behavior with an optimal Bayesian model and human behavior, and test whether the same posterior is expressed across probes. LLMs are often well described by a two-parameter Bayesian fit, but with systematic offsets: by default they show a strong-sampling assumption that creates an implicit Occam's razor, favoring narrower hypotheses, while thinking mode shifts them toward greater prior reliance. We also find a robust evaluation--generation gap: LLMs select more correct hypotheses during hypothesis evaluation but generate simpler, more rule-like hypotheses. Finally, this Bayesian-with-bias pattern does not extrapolate. Models can behave as if they hold rule-like hypotheses over observed examples, yet generalize poorly to parts of the hypothesis domain not covered by those examples. Our results highlight a limitation of LLMs as general problem solvers, especially for scientific inference, where hypotheses must go beyond the data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies hypothesis generation and updating in LLMs using the number game, where models infer rules from positive examples like {16, 8, 2, 64}. It uses three probes—posterior prediction, hypothesis evaluation, and hypothesis generation—to measure posteriors over hypotheses, compares to Bayesian models and humans, and reports that LLMs fit a two-parameter Bayesian model with biases (strong-sampling assumption creating implicit Occam's razor, thinking mode increasing prior reliance), a robust evaluation-generation gap, and poor extrapolation of the pattern.
Significance. If substantiated, these results would be significant for understanding LLM limitations in inference tasks relevant to scientific discovery and problem-solving. The controlled experimental setup and multi-probe design allow for direct comparison to optimal Bayesian inference and human behavior, highlighting both strengths and biases in LLM reasoning. However, the non-extrapolation finding underscores a key limitation.
major comments (3)
- [Abstract] The claim that LLMs are 'often well described by a two-parameter Bayesian fit' is central but lacks specifics on how the two parameters were determined, whether the fits were pre-specified versus post-hoc, inclusion of error bars, or quantitative measures of fit quality such as R-squared or likelihood ratios.
- [Abstract] The assumption that the three probes (posterior prediction, hypothesis evaluation, and hypothesis generation) all measure the same underlying posterior distribution is load-bearing for the unified description of LLM inference with specific biases. The reported robust evaluation-generation gap suggests potential systematic differences in elicited behavior that could indicate probe-dependent posteriors rather than noise around a shared one.
- [Abstract] The non-extrapolation result, that the Bayesian-with-bias pattern does not hold for parts of the hypothesis domain not covered by observed examples, is a key claim but requires more detail on the specific generalization tests conducted and how they were designed to test extrapolation.
minor comments (1)
- [Abstract] Clarify what 'thinking mode' refers to, perhaps with a short description of the experimental manipulation.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the presentation of our results. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] The claim that LLMs are 'often well described by a two-parameter Bayesian fit' is central but lacks specifics on how the two parameters were determined, whether the fits were pre-specified versus post-hoc, inclusion of error bars, or quantitative measures of fit quality such as R-squared or likelihood ratios.
Authors: The two parameters are the strong-sampling bias strength and the prior weight, pre-specified from the model variants in Section 3.2. Fits used maximum likelihood with bootstrap-derived error bars (reported in the supplement). We will add quantitative fit metrics (e.g., log-likelihood ratios and R^2 values) to the abstract and results in revision. revision: yes
-
Referee: [Abstract] The assumption that the three probes (posterior prediction, hypothesis evaluation, and hypothesis generation) all measure the same underlying posterior distribution is load-bearing for the unified description of LLM inference with specific biases. The reported robust evaluation-generation gap suggests potential systematic differences in elicited behavior that could indicate probe-dependent posteriors rather than noise around a shared one.
Authors: The evaluation-generation gap does indicate probe-specific elicitation differences. However, the core biases remain consistent across probes, which we interpret as a shared posterior plus output-format effects. We will revise the abstract and discussion to explicitly note this distinction, report per-probe posterior estimates, and clarify that the unified description applies to the bias parameters rather than every hypothesis probability. revision: partial
-
Referee: [Abstract] The non-extrapolation result, that the Bayesian-with-bias pattern does not hold for parts of the hypothesis domain not covered by observed examples, is a key claim but requires more detail on the specific generalization tests conducted and how they were designed to test extrapolation.
Authors: We agree more detail is warranted. The tests used held-out numbers outside the observed range and measured whether rule-like hypotheses continued to be favored. We will expand the abstract and add a results subsection with explicit test design, example stimuli, and metrics to make the extrapolation failure transparent. revision: yes
Circularity Check
Two-parameter Bayesian fit to LLM probe data then interpreted as 'strong-sampling bias' and prior-reliance shift
specific steps
-
fitted input called prediction
[Abstract (and corresponding Results on Bayesian modeling)]
"LLMs are often well described by a two-parameter Bayesian fit, but with systematic offsets: by default they show a strong-sampling assumption that creates an implicit Occam's razor, favoring narrower hypotheses, while thinking mode shifts them toward greater prior reliance."
The two parameters are estimated by fitting the Bayesian model to the LLM's posterior-prediction, evaluation, and generation data. The 'strong-sampling assumption' and 'greater prior reliance' are then labeled as systematic offsets or predictions about LLM behavior, but these are exactly the values of the fitted parameters; the description is therefore equivalent to the input fit rather than an independent result.
full rationale
The paper fits a two-parameter Bayesian model (sampling assumption + prior weight) to responses from the three probes on the number game. It then presents the fitted values as evidence that LLMs exhibit a 'strong-sampling assumption that creates an implicit Occam's razor' by default and shift toward 'greater prior reliance' in thinking mode. Because the parameters are estimated from the same LLM data rather than fixed independently, the claimed systematic offsets reduce to the fit results by construction. The reported evaluation-generation gap further indicates the probes may not access a single shared posterior, undermining the unified two-parameter description, yet the fit is still used to characterize LLM inference overall.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Nvidia nemotron 3: Efficient and open intelligence, 2025
NVIDIA and Blakeman, Aaron and Grattafiori, Aaron and Basant, Aarti and Gupta, Abhibha and Khattar, Abhinav and Renduchintala, Adi and Vavre, Aditya and Shukla, Akanksha and Bercovich, Akhiad and Ficek, Aleksander and Shaposhnikov, Aleksandr and Kondratenko, Alex and Bukharin, Alexander and Milesi, Alexandre and Taghibakhshi, Ali and Liu, Alisa and Barton...
-
[2]
Google , author =
Gemma 4:. Google , author =
-
[3]
Wei, Jiaqi and Yang, Yuejin and Zhang, Xiang and Chen, Yuhan and Zhuang, Xiang and Gao, Zhangyang and Zhou, Dongzhan and Wang, Guangshuai and Gao, Zhiqiang and Cao, Juntai and Qiu, Zijie and Hu, Ming and Ma, Chenglong and Tang, Shixiang and He, Junjun and Song, Chunfeng and He, Xuming and Zhang, Qiang and You, Chenyu and Zheng, Shuangjia and Ding, Ning an...
-
[4]
Zheng, Tianshi and Deng, Zheye and Tsang, Hong Ting and Wang, Weiqi and Bai, Jiaxin and Wang, Zihao and Song, Yangqiu , month = may, year =. From
-
[5]
Mitchener, Ludovico and Yiu, Angela and Chang, Benjamin and Bourdenx, Mathieu and Nadolski, Tyler and Sulovari, Arvis and Landsness, Eric C. and Barabasi, Daniel L. and Narayanan, Siddharth and Evans, Nicky and Reddy, Shriya and Foiani, Martha and Kamal, Aizad and Shriver, Leah P. and Cao, Fang and Wassie, Asmamaw T. and Laurent, Jon M. and Melville-Green...
-
[6]
Tang, Jiabin and Xia, Lianghao and Li, Zhonghang and Huang, Chao , month = may, year =. doi:10.48550/arXiv.2505.18705 , abstract =
-
[7]
Yamada, Yutaro and Lange, Robert Tjarko and Lu, Cong and Hu, Shengran and Lu, Chris and Foerster, Jakob and Clune, Jeff and Ha, David , month = apr, year =. The. doi:10.48550/arXiv.2504.08066 , abstract =
work page internal anchor Pith review doi:10.48550/arxiv.2504.08066
-
[8]
Gottweis, Juraj and Weng, Wei-Hung and Daryin, Alexander and Tu, Tao and Palepu, Anil and Sirkovic, Petar and Myaskovsky, Artiom and Weissenberger, Felix and Rong, Keran and Tanno, Ryutaro and Saab, Khaled and Popovici, Dan and Blum, Jacob and Zhang, Fan and Chou, Katherine and Hassidim, Avinatan and Gokturk, Burak and Vahdat, Amin and Kohli, Pushmeet and...
work page internal anchor Pith review doi:10.48550/arxiv.2502.18864
-
[9]
, year =
Clark, Eve V. , year =. What's in a word?. Cognitive development and acquisition of language , publisher =
-
[10]
Cognitive basis of language learning in infants , volume =
MacNamara, John , year =. Cognitive basis of language learning in infants , volume =. Psychological Review , publisher =. doi:10.1037/h0031901 , abstract =
-
[11]
Journal of Child Language , author =
Overextension in early language development , volume =. Journal of Child Language , author =. 1980 , pages =. doi:10.1017/S0305000900002658 , abstract =
-
[12]
von Helmholtz, Hermann , editor =. The. Selected. 1878 , keywords =
-
[13]
Kumaran, Dharshan and Patraucean, Viorica and Osindero, Simon and Velickovic, Petar and Daw, Nathaniel , month = apr, year =. How. doi:10.48550/arXiv.2604.22271 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.22271
-
[14]
Reasoning theater: Disentangling model beliefs from chain-of- thought, 2026
Boppana, Siddharth and Ma, Annabel and Loeffler, Max and Sarfati, Raphael and Bigelow, Eric and Geiger, Atticus and Lewis, Owen and Merullo, Jack , month = mar, year =. Reasoning. doi:10.48550/arXiv.2603.05488 , abstract =
-
[15]
Raaschou-jensen, Hans Peter Lynsgøe and Fierro, Constanza and Søgaard, Anders , month = oct, year =. Real-. doi:10.48550/arXiv.2506.23274 , abstract =
-
[16]
Xie, Zhihui and Guo, Jizhou and Yu, Tong and Li, Shuai , month = dec, year =. Calibrating. doi:10.48550/arXiv.2405.18711 , abstract =
-
[17]
Frankland, Steven M and Webb, Taylor and Lewis, Richard L and Cohen, Jonathan D and Marjieh, Raja and Nurisso, Marco and Petri, Giovanni and Fluegemann, Joseph , month = feb, year =. No. doi:10.31234/osf.io/cjuxb_v2 , abstract =
-
[18]
Reasoning with sampling: Your base model is smarter than you think.arXiv preprint arXiv:2510.14901,
Karan, Aayush and Du, Yilun , month = oct, year =. Reasoning with. doi:10.48550/arXiv.2510.14901 , abstract =
-
[19]
Cencerrado, Iván Vicente Moreno and Masdemont, Arnau Padrés and Hawthorne, Anton Gonzalvez and Africa, David Demitri and Pacchiardi, Lorenzo , month = mar, year =. No. doi:10.48550/arXiv.2509.10625 , abstract =
-
[20]
David, Joey , month = nov, year =. Temporal. doi:10.48550/arXiv.2511.14773 , abstract =
-
[21]
LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals
Sun, Lihao and Dong, Hang and Qiao, Bo and Lin, Qingwei and Zhang, Dongmei and Rajmohan, Saravan , month = apr, year =. doi:10.48550/arXiv.2604.05655 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.05655
-
[22]
Efficient PRM Training Data Synthesis via Formal Verification
Kamoi, Ryo and Zhang, Yusen and Zhang, Nan and Das, Sarkar Snigdha Sarathi and Zhang, Ranran Haoran and Yin, Wenpeng and Zhang, Rui , month = apr, year =. Efficient. doi:10.48550/arXiv.2505.15960 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.15960
-
[23]
Truth as a
Damirchi, Hamed and De la Jara, Ignacio Meza and Abbasnejad, Ehsan and Shamsi, Afshar and Zhang, Zhen and Shi, Javen , month = mar, year =. Truth as a
-
[24]
Miao, Miranda Muqing and Ungar, Lyle , month = apr, year =. Closing the. doi:10.48550/arXiv.2603.25052 , abstract =
-
[25]
Kumaran, Dharshan and Conmy, Arthur and Barbero, Federico and Osindero, Simon and Patraucean, Viorica and Velickovic, Petar , month = mar, year =. How do. doi:10.48550/arXiv.2603.17839 , abstract =
-
[26]
Ni, Jingwei and Fadeeva, Ekaterina and Wu, Tianyi and Akhtar, Mubashara and Zhang, Jiaheng and Ash, Elliott and Leippold, Markus and Baldwin, Timothy and Ng, See-Kiong and Shelmanov, Artem and Sachan, Mrinmaya , month = jan, year =. Efficient. doi:10.48550/arXiv.2511.06209 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.06209
-
[27]
Cacioli, Jon-Paul , month = mar, year =. Do. doi:10.48550/arXiv.2603.25112 , abstract =
-
[28]
Evidence for
Ackerman, Christopher , month = oct, year =. Evidence for
-
[29]
Orgad, Hadas and Toker, Michael and Gekhman, Zorik and Reichart, Roi and Szpektor, Idan and Kotek, Hadas and Belinkov, Yonatan , month = oct, year =
-
[30]
He, Yixin and Tang, Lumingyuan , month = sep, year =. Learning to. doi:10.48550/arXiv.2509.24238 , abstract =
-
[31]
Yadav, Advait and Black, Sid and Sourbut, Oliver , month = apr, year =. More. doi:10.48550/arXiv.2604.07821 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.07821
-
[32]
Lossless data compression by large models , volume =
Li, Ziguang and Huang, Chao and Wang, Xuliang and Hu, Haibo and Wyeth, Cole and Bu, Dongbo and Yu, Quan and Gao, Wen and Liu, Xingwu and Li, Ming , month = may, year =. Lossless data compression by large models , volume =. Nature Machine Intelligence , publisher =. doi:10.1038/s42256-025-01033-7 , abstract =
-
[33]
Yoran, Ori and Zheng, Kunhao and Gloeckle, Fabian and Gehring, Jonas and Synnaeve, Gabriel and Cohen, Taco , month = oct, year =. The
-
[34]
Goldblum, Micah and Finzi, Marc and Rowan, Keefer and Wilson, Andrew Gordon , month = jun, year =. The. doi:10.48550/arXiv.2304.05366 , abstract =
-
[35]
Language
Deletang, Gregoire and Ruoss, Anian and Duquenne, Paul-Ambroise and Catt, Elliot and Genewein, Tim and Mattern, Christopher and Grau-Moya, Jordi and Wenliang, Li Kevin and Aitchison, Matthew and Orseau, Laurent and Hutter, Marcus and Veness, Joel , month = oct, year =. Language
-
[36]
Cai, Yuchen and Cao, Ding and Xu, Xin and Yao, Zijun and Huang, Yuqing and Tan, Zhenyu and Zhang, Benyi and Sun, Guangzhong and Liu, Guiquan and Fang, Junfeng , month = feb, year =. On. doi:10.48550/arXiv.2510.00553 , abstract =
-
[37]
Advances in Neural Information Processing Systems , author =
Coarse-to-. Advances in Neural Information Processing Systems , author =. 2024 , pages =. doi:10.52202/079017-3340 , language =
-
[38]
Transactions on Machine Learning Research , author =
Beyond the. Transactions on Machine Learning Research , author =
-
[39]
Kazemi, Mehran and Fatemi, Bahare and Bansal, Hritik and Palowitch, John and Anastasiou, Chrysovalantis and Mehta, Sanket Vaibhav and Jain, Lalit K. and Aglietti, Virginia and Jindal, Disha and Chen, Peter and Dikkala, Nishanth and Tyen, Gladys and Liu, Xin and Shalit, Uri and Chiappa, Silvia and Olszewska, Kate and Tay, Yi and Tran, Vinh Q. and Le, Quoc ...
-
[40]
Baeumel, Tanja and Genabith, Josef van and Ostermann, Simon , month = feb, year =. The. doi:10.48550/arXiv.2502.19981 , abstract =
-
[41]
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Gu, Alex and Rozière, Baptiste and Leather, Hugh and Solar-Lezama, Armando and Synnaeve, Gabriel and Wang, Sida I. , month = jan, year =. doi:10.48550/arXiv.2401.03065 , abstract =
work page internal anchor Pith review doi:10.48550/arxiv.2401.03065
-
[42]
Yang, Songlin and Shen, Yikang and Wen, Kaiyue and Tan, Shawn and Mishra, Mayank and Ren, Liliang and Panda, Rameswar and Kim, Yoon , month = oct, year =. The
-
[43]
Zhou, Lexin and Pacchiardi, Lorenzo and Martínez-Plumed, Fernando and Collins, Katherine M. and Moros-Daval, Yael and Zhang, Seraphina and Zhao, Qinlin and Huang, Yitian and Sun, Luning and Prunty, Jonathan E. and Li, Zongqian and Sánchez-García, Pablo and Jiang-Chen, Kexin and Casares, Pablo A. M. and Zu, Jiyun and Burden, John and Mehrbakhsh, Behzad and...
-
[44]
What and
Zhang, Yufeng and Zhang, Fengzhuo and Yang, Zhuoran and Wang, Zhaoran , month = apr, year =. What and. Proceedings of
-
[45]
Gwak, Minju and Son, Guijin and Kim, Jaehyung , month = oct, year =. Revisiting the. doi:10.48550/arXiv.2510.13850 , abstract =
-
[46]
and Piantadosi, Steven T
Bigelow, Eric J. and Piantadosi, Steven T. , year =. Inferring priors in compositional cognitive models , volume =. Proceedings of the
-
[47]
2023 , keywords =
Advances in Neural Information Processing Systems , author =. 2023 , keywords =
2023
-
[48]
Journal of Open Psychology Data , author =
A large dataset of generalization patterns in the number game , volume =. Journal of Open Psychology Data , author =. 2016 , pages =. doi:10.5334/jopd.19 , number =
-
[49]
Rules and
Tenenbaum, Joshua , year =. Rules and. Advances in
-
[50]
Zhang, Liyi and Snell, Jake and Griffiths, Thomas L. , month = apr, year =. Meta-. doi:10.48550/arXiv.2508.14285 , abstract =
-
[51]
Bayesian teaching enables probabilistic reasoning in large language models , volume =
Qiu, Linlu and Sha, Fei and Allen, Kelsey and Kim, Yoon and Linzen, Tal and van Steenkiste, Sjoerd , month = jan, year =. Bayesian teaching enables probabilistic reasoning in large language models , volume =. Nature Communications , publisher =. doi:10.1038/s41467-025-67998-6 , abstract =
-
[52]
Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Deng, Yuntian and Poovendran, Radha and Choi, Yejin and Lin, Bill Yuchen , month = oct, year =. Magpie:. doi:10.48550/arXiv.2406.08464 , abstract =
-
[53]
Padmanabhan, Sriram and Misra, Kanishka and Mahowald, Kyle and Choi, Eunsol , month = apr, year =. On. doi:10.48550/arXiv.2504.09387 , abstract =
-
[54]
Chari, Anirudh and Pattanaik, Neil , month = feb, year =. Wild. doi:10.48550/arXiv.2602.06818 , abstract =
-
[55]
Bazigaran, Arghavan and Sohn, Hansem , month = dec, year =. Concept. doi:10.48550/arXiv.2512.20162 , abstract =
-
[56]
Revisiting Uncertainty Esti- mation and Calibration of Large Language Models
Tao, Linwei and Yeh, Yi-Fan and Dong, Minjing and Huang, Tao and Torr, Philip and Xu, Chang , month = may, year =. Revisiting. doi:10.48550/arXiv.2505.23854 , abstract =
-
[57]
Kapoor, Sanyam and Gruver, Nate and Roberts, Manley and Pal, Arka and Dooley, Samuel and Goldblum, Micah and Wilson, Andrew , editor =. Calibration-. Proceedings of the 1st. 2024 , pages =. doi:10.18653/v1/2024.uncertainlp-1.1 , abstract =
-
[58]
Shojaee, Parshin and Mirzadeh, Iman and Alizadeh, Keivan and Horton, Maxwell and Bengio, Samy and Farajtabar, Mehrdad , month = nov, year =. The. doi:10.48550/arXiv.2506.06941 , abstract =
-
[59]
Contextual position encoding: Learning to count what’s important
Golovneva, Olga and Wang, Tianlu and Weston, Jason and Sukhbaatar, Sainbayar , month = may, year =. Contextual. doi:10.48550/arXiv.2405.18719 , abstract =
-
[60]
arXiv preprint arXiv:2509.10739 , year=
Pournemat, Mobina and Rezaei, Keivan and Sriramanan, Gaurang and Zarei, Arman and Fu, Jiaxiang and Wang, Yang and Eghbalzadeh, Hamid and Feizi, Soheil , month = sep, year =. Reasoning. doi:10.48550/arXiv.2509.10739 , abstract =
-
[61]
Xiaohu, Xie and Xiaohu, Liu and Benjamin, Yao , month = feb, year =. Know
-
[62]
Leng, Jixuan and Huang, Chengsong and Zhu, Banghua and Huang, Jiaxin , month = oct, year =. Taming
-
[63]
Position:
Yan, Hanqi and Zhang, Linhai and Li, Jiazheng and Shen, Zhenyi and He, Yulan , month = jun, year =. Position:
-
[64]
EleutherAI Blog , author =
Attention. EleutherAI Blog , author =
-
[65]
Zhang, Ruixiang and Bai, Richard He and Zheng, Huangjie and Jaitly, Navdeep and Collobert, Ronan and Zhang, Yizhe , month = apr, year =. Embarrassingly. doi:10.48550/arXiv.2604.01193 , abstract =
-
[66]
Salimans, Tim and Chen, Richard , month = dec, year =. Learning. doi:10.48550/arXiv.1812.03381 , abstract =
-
[67]
Cooperative inverse reinforcement learning,
Hadfield-Menell, Dylan and Dragan, Anca and Abbeel, Pieter and Russell, Stuart , month = feb, year =. Cooperative. doi:10.48550/arXiv.1606.03137 , abstract =
-
[68]
Test your best methods on our hard
daria and Tyagi, Riya and Engels, Josh and Nanda, Neel , month = mar, year =. Test your best methods on our hard
-
[69]
Lu, Zhengxi and Yao, Zhiyuan and Wu, Jinyang and Han, Chengcheng and Gu, Qi and Cai, Xunliang and Lu, Weiming and Xiao, Jun and Zhuang, Yueting and Shen, Yongliang , month = apr, year =. doi:10.48550/arXiv.2604.02268 , abstract =
-
[70]
arXiv preprint arXiv:2603.03414 , year=
Mineault, Patrick J. and Griffiths, Thomas L. and Escola, Sean , month = mar, year =. Cognitive. doi:10.48550/arXiv.2603.03414 , abstract =
-
[71]
Cognitive models and AI algorithms provide templates for designing language agents
Liu, Ryan and Arumugam, Dilip and Zhang, Cedegao E. and Escola, Sean and Pitkow, Xaq and Griffiths, Thomas L. , month = feb, year =. Cognitive. doi:10.48550/arXiv.2602.22523 , abstract =
-
[72]
, year =
Ortega, Pedro A. , year =. Universal
-
[73]
and Vezhnevets, Alexander Sasha and Diaz, Manfred and Agapiou, John P
Leibo, Joel Z. and Vezhnevets, Alexander Sasha and Diaz, Manfred and Agapiou, John P. and Cunningham, William A. and Sunehag, Peter and Cross, Logan and Koster, Raphael and Bileschi, Stanley M. and Chang, Minsuk and Rahwan, Iyad and Osindero, Simon and Evans, James A. , month = mar, year =. A. doi:10.48550/arXiv.2603.14050 , abstract =
-
[74]
Superposition Yields Robust Neural Scaling
Liu, Yizhou and Liu, Ziming and Gore, Jeff , month = nov, year =. Superposition. doi:10.48550/arXiv.2505.10465 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.10465
-
[75]
Geometry of sequence working memory in macaque prefrontal cortex , volume =. Science , author =. 2022 , pages =. doi:10.1126/science.abm0204 , abstract =
-
[76]
Representations and generalization in artificial and brain neural networks , volume =
Li, Qianyi and Sorscher, Ben and Sompolinsky, Haim , month = jul, year =. Representations and generalization in artificial and brain neural networks , volume =. Proceedings of the National Academy of Sciences , publisher =. doi:10.1073/pnas.2311805121 , abstract =
-
[77]
(Joshua Brett) , year =
Tenenbaum, Joshua B. (Joshua Brett) , year =. A
-
[78]
Wang, William Yang , editor =. “. Proceedings of the 55th. 2017 , pages =. doi:10.18653/v1/P17-2067 , abstract =
-
[79]
Prototypical
Snell, Jake and Swersky, Kevin and Zemel, Richard , year =. Prototypical. Advances in
-
[80]
Gemma, Team and Kamath, Aishwarya and Ferret, Johan and Pathak, Shreya and Vieillard, Nino and Merhej, Ramona and Perrin, Sarah and Matejovicova, Tatiana and Ramé, Alexandre and Rivière, Morgane and Rouillard, Louis and Mesnard, Thomas and Cideron, Geoffrey and Grill, Jean-bastien and Ramos, Sabela and Yvinec, Edouard and Casbon, Michelle and Pot, Etienne...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.