pith. sign in

arxiv: 2410.17448 · v3 · submitted 2024-10-22 · 💻 cs.CL · cs.AI

In Context Learning and Reasoning for Symbolic Regression with Large Language Models

Pith reviewed 2026-05-23 18:47 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords symbolic regressionlarge language modelsin-context learningchain-of-thought promptingequation discoveryscientific contextGPT-4
0
0 comments X

The pith

LLMs can rediscover physical equations from data when prompted to reason over scientific context.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests GPT-4 and GPT-4o on symbolic regression by having them propose expressions from datasets, which external Python tools then optimize and score. Chain-of-thought prompts instruct the models to examine data patterns, earlier proposals, and natural-language scientific background before suggesting new forms that balance complexity and loss. On adsorption data the models recover the Langmuir and dual-site Langmuir equations; on Nikuradse pipe-flow data they generate improved expressions even without a known target. Performance rises when a scratchpad is used and when background knowledge is supplied in plain language, though the method does not beat specialized symbolic-regression programs on harder targets.

Core claim

GPT-4 and GPT-4o, given iterative chain-of-thought prompts that include data analysis, prior expressions, and scientific context in natural language, rediscover the Langmuir and dual-site Langmuir models from adsorption datasets and produce better-fitting expressions for Nikuradse's rough-pipe data, with gains from scratchpad use and context.

What carries the argument

Iterative prompting loop in which the LLM analyzes data patterns and scientific context in natural language, proposes symbolic expressions, receives external optimization results, and refines the next proposals while respecting complexity and loss constraints.

If this is right

  • Natural-language prompts allow LLMs to incorporate domain theory directly into expression search.
  • Strategic chain-of-thought instructions raise the rate at which LLMs generate meaningful equations.
  • Symbolic constraints drawn from background knowledge can be enforced simply by stating them in the prompt.
  • The workflow lets models iterate toward lower loss while obeying explicit instructions on complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Scientists without coding expertise could use the same prompting style to explore equations from their own measurements.
  • The natural-language interface may let the method transfer to fields where context is rich but formal models are sparse.
  • Hybrid systems could route LLM-generated candidates into conventional symbolic-regression optimizers for final refinement.

Load-bearing premise

That scientific context supplied in natural language will steer the LLM toward expressions whose externally fitted parameters recover the actual physical relationships instead of coincidental or overfit forms.

What would settle it

Running the identical workflow on the Langmuir dataset but omitting the scientific-context portion of the prompt and checking whether the recovered functional forms still match the known models.

Figures

Figures reproduced from arXiv: 2410.17448 by Samiha Sharlin, Tyler R. Josephson.

Figure 1
Figure 1. Figure 1: Illustration of GPT-4 attempting symbolic regression. GPT-4 predicts expressions with optimized coefficients when passed a dataset for nitrogen adsorption on mica [69]. The Python code snippet from Prompt 2 output has been truncated to keep the figure concise. Note that the actual parameter values produced by running the code differ from what GPT-4 generates [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Workflow for using LLMs for SR. First, the dataset is sent to LLMs (with or without context) which is instructed to suggest expressions without optimizing parameters. The generated expressions are then parsed by Python and optimized using SciPy (Nelder-Mead [70] method with basin-hopping as the numerical optimizer [71]). Results for each expression are stored in a Python dictionary, and added to a list of … view at source ↗
Figure 3
Figure 3. Figure 3: Bias from examples GPT-4 generated exact expressions from the prompt examples that were given to illustrate the output syntax. The revised prompts (Prompt 2 and 3) eliminate this bias from the equation prediction process. This motivated our two-prompt setup, with an initial prompt tasking LLMs to generate unbiased expressions in LaTeX and an iteration prompt receiving previously-generated examples as Pytho… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of scratchpad approach. We observe substantial, qualitative improvements in the predicted expressions after implementing the scratchpad technique (Prompt 2). The suggested expressions for (a) Langmuir’s and (b) Kepler’s Law dataset include operators (/ and sqrt, respectively) present in the target models (y = c1∗x c2+x and y 2 = c1x 3 2 , respectively) 6 [PITH_FULL_IMAGE:figures/full_fig_p006… view at source ↗
Figure 5
Figure 5. Figure 5: ) [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of GPT-4 following restrictions from prompt. 3 Results We evaluated our workflow using experimental datasets associated with meaningful scientific context. SR benchmarks often use synthetic data; we designed our tests around a benchmark [88] specifically curated for evaluating SR algorithms for scientific data. From here, we selected three experimental datasets from astronomical observations (… view at source ↗
Figure 7
Figure 7. Figure 7: GPT-4 and GPT-4o results on Hubble’s data. In the GPT-4 model, all the trends except ‘Without Context’ overlap at score 5 for the “Easy Search” plot, and the trends ‘With All Tools’ and ‘Without Data’ overlap at score 4 for the “Hard Search”. In the GPT-4o model, the trends for ‘Without Data’ and ‘Without Context’ overlap at score 0 for the “Easy Search” plot, and all the trends except ‘Without Data’ overl… view at source ↗
Figure 8
Figure 8. Figure 8: GPT-4 and GPT-4o results on Bode’s data. For the “Hard Search” in the GPT-4 model, all the trends except ‘Without Context’ overlap at a score of 0. Transparency and different marker styles are used to differentiate these trends [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: GPT-4o reasoning on Bode’s data. The scratchpad snippet is from the first iteration in Run 4 with all tools on for hard search. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: GPT-4 and GPT-4o results on Kepler’s Third Law data. For the “Hard Search” in the GPT-4o model, the trends for ‘Without Data’ and ‘With All Tools’ overlap at a score of 0. Transparency and different marker styles are used to differentiate these trends. Langmuir: Langmuir’s model is more obscure, and was almost never guessed in the first round. In easier searches, both GPT-4 and GPT-4o consistently found i… view at source ↗
Figure 11
Figure 11. Figure 11: GPT-4 and GPT-4o results on Langmuir’s adsorption data. For the “Hard Search” in the GPT-4 model, all the trends except ‘With All Tools’ overlap at a score of 0. Transparency and different marker styles are used to differentiate these trends. Dual-site Langmuir: The dual-site Langmuir model is particularly challenging for SR, because the target model does not significantly fit the data much better than ma… view at source ↗
Figure 12
Figure 12. Figure 12: Pareto fronts for dual-site Langmuir dataset. The black line represents the best total front from the five independent runs. The target model is labeled as a blue star. Dual-site Langmuir dataset was previously explored in literature [9, 28] with three SR algorithms: Bayesian-based SR (BMS) [6], genetic programming-based SR (PySR) [88], and mixed-integer non-linear programming-based SR [92]. Because the t… view at source ↗
Figure 13
Figure 13. Figure 13: GPT-4 and GPT-4o on dual-site Langmuir’s adsorption data. All the trends in GPT-4 model overlap at a score of 0. Transparency and different marker styles are used to differentiate these trends. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Instances of GPT-4o reasoning on dual-site Langmuir dataset. The top snippet of the scratchpad is from Run 1, and the bottom is from Run 2. Both are for runs with all tools on and iteration 4. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Models for Nikuradse dataset. Here, P refers to prompt versions and S refers to dataset. The expressions generated by the GPT-4o model are marked with an ‘o’ at the end. We notice unphysical behavior with fewer data in prompt versions 1 and 2 ( [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Illustration of GPT-3.5 attempting to perform SR [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Context provided to LLMs for all the datasets 18 [PITH_FULL_IMAGE:figures/full_fig_p018_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Datasets explored using LLMs for SR 0.1 1 10 100 0 0.5 1 1.5 2 Target Model GPT-4 Model Pressure (p) Loading (q) [PITH_FULL_IMAGE:figures/full_fig_p019_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: GPT-4-generated model does not follow theory on extrapolation. It gives a negative value for adsorption loading at lower pressures. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Nikuradse Dataset. The red and yellow points represent the data sent to GPT-4 (selected randomly from the original dataset), while the grey ones show the original dataset sent to SciPy for optimization. Expression Mean Absolute Error (MAE) Complexity P1S1 0.02270419 13 P1S2 0.00978477 29 P2S1 0.00897093 69 P2S2 0.00931620 49 P3S1 0.01086397 41 P3S2 0.00992712 49 P1S1o 0.02007803 19 P1S2o 0.02141686 17 P2S… view at source ↗
read the original abstract

Large Language Models (LLMs) are transformer-based machine learning models that have shown remarkable performance in tasks for which they were not explicitly trained. Here, we explore the potential of LLMs to perform symbolic regression -- a machine-learning method for finding simple and accurate equations from datasets. We prompt GPT-4 and GPT-4o models to suggest expressions from data, which are then optimized and evaluated using external Python tools. These results are fed back to the LLMs, which propose improved expressions while optimizing for complexity and loss. Using chain-of-thought prompting, we instruct the models to analyze data, prior expressions, and the scientific context (expressed in natural language) for each problem before generating new expressions. We evaluated the workflow in rediscovery of Langmuir and dual-site Langmuir's model for adsorption, along with Nikuradse's dataset on flow in rough pipes, which does not have a known target model equation. Both the GPT-4 and GPT-4o models successfully rediscovered equations, with better performance when using a scratchpad and considering scientific context. GPT-4o model demonstrated improved reasoning with data patterns, particularly evident in the dual-site Langmuir and Nikuradse dataset. We demonstrate how strategic prompting improves the model's performance and how the natural language interface simplifies integrating theory with data. We also applied symbolic mathematical constraints based on the background knowledge of data via prompts and found that LLMs generate meaningful equations more frequently. Although this approach does not outperform established SR programs where target equations are more complex, LLMs can nonetheless iterate toward improved solutions while following instructions and incorporating scientific context in natural language.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a workflow in which GPT-4 and GPT-4o are prompted (with chain-of-thought, scratchpad, and natural-language scientific context) to generate candidate symbolic expressions for given datasets; the expressions are then externally optimized and evaluated in Python, with results fed back for iterative refinement. It reports successful rediscovery of the Langmuir and dual-site Langmuir isotherms and generation of 'meaningful' expressions on the Nikuradse pipe-flow data (which lacks a known target equation), with improved performance when scientific context and constraints are supplied.

Significance. If the reported improvements can be shown to be robust under pre-specified quantitative criteria and controls, the work would illustrate a practical route for injecting domain knowledge expressed in natural language into symbolic regression, complementing existing SR algorithms that lack such interfaces. The external optimization step correctly avoids circularity in evaluation.

major comments (3)
  1. [Abstract] Abstract and evaluation sections: the central claim that both models 'successfully rediscovered equations' and that GPT-4o showed 'improved reasoning with data patterns' is asserted without reporting trial counts, success-rate definitions (e.g., exact functional match after optimization, tolerance on parameters), or quantitative metrics such as median loss or recovery frequency across repeated runs.
  2. [Abstract and results on Nikuradse] Nikuradse dataset (no known target): the statements that LLMs 'generate meaningful equations more frequently' when given context and that GPT-4o demonstrated 'improved reasoning with data patterns' lack an operational definition of 'meaningful,' a pre-specified success criterion, an inter-rater protocol, or a control condition (expressions generated without context or from random templates). Without these, improvement cannot be distinguished from post-hoc selection of plausible-looking fits.
  3. [Methods / workflow description] Iteration and prompting protocol: the manuscript does not specify the number of iterations performed, the precise loss-plus-complexity objective passed back to the LLM, or how scientific-context prompts are constructed and varied, making it impossible to assess reproducibility or isolate the contribution of the context component.
minor comments (2)
  1. Notation for the external optimizer and loss function should be introduced once and used consistently; currently the description mixes 'loss' and 'complexity' without explicit formulas.
  2. Figure captions should state the exact prompt templates and number of independent runs shown, rather than only qualitative outcomes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below and will revise the manuscript to improve quantitative reporting, definitions, and reproducibility details.

read point-by-point responses
  1. Referee: [Abstract] Abstract and evaluation sections: the central claim that both models 'successfully rediscovered equations' and that GPT-4o showed 'improved reasoning with data patterns' is asserted without reporting trial counts, success-rate definitions (e.g., exact functional match after optimization, tolerance on parameters), or quantitative metrics such as median loss or recovery frequency across repeated runs.

    Authors: We agree that the abstract and evaluation sections would benefit from greater precision. The presented results were obtained from a small number of targeted runs intended as demonstrations. In the revision we will report the exact trial counts performed, define success explicitly (exact functional form recovered after optimization with parameter values within 10% relative tolerance of known values where applicable), and include quantitative metrics such as median loss and recovery frequency. revision: yes

  2. Referee: [Abstract and results on Nikuradse] Nikuradse dataset (no known target): the statements that LLMs 'generate meaningful equations more frequently' when given context and that GPT-4o demonstrated 'improved reasoning with data patterns' lack an operational definition of 'meaningful,' a pre-specified success criterion, an inter-rater protocol, or a control condition (expressions generated without context or from random templates). Without these, improvement cannot be distinguished from post-hoc selection of plausible-looking fits.

    Authors: We accept that an operational definition and controls are required. We will add an explicit definition of 'meaningful' (low loss combined with consistency to the supplied scientific context) and include control runs without context in the revised results. Because the Nikuradse case has no ground-truth equation, evaluation will remain partly qualitative but will be anchored to the new controls and pre-specified fit-quality thresholds. revision: partial

  3. Referee: [Methods / workflow description] Iteration and prompting protocol: the manuscript does not specify the number of iterations performed, the precise loss-plus-complexity objective passed back to the LLM, or how scientific-context prompts are constructed and varied, making it impossible to assess reproducibility or isolate the contribution of the context component.

    Authors: We will expand the Methods section to document the iteration limit (maximum 10 iterations or until loss plateau), the exact objective (MSE loss plus a fixed complexity penalty term), and the prompt templates (including how scientific context is inserted). These details will be placed in the main text or a new appendix to support reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: evaluation pipeline uses independent external optimization

full rationale

The paper's workflow has LLMs propose expressions from data and context, after which parameters are optimized and loss is computed via separate Python tools; success on known-target cases (Langmuir) is measured against ground-truth recovery, and the process does not define any quantity in terms of itself or rename a fit as a prediction. No self-citations, uniqueness theorems, or ansatzes are invoked to close the loop. The Nikuradse case uses the same external evaluation, so the central claim does not reduce to a self-referential definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the untested premise that current GPT-4-class models possess reliable in-context reasoning over both numerical data patterns and natural-language physical descriptions sufficient to guide equation search toward physically meaningful expressions.

axioms (1)
  • domain assumption GPT-4 and GPT-4o models can perform chain-of-thought analysis of data patterns together with scientific context expressed in natural language and thereby generate improved symbolic expressions
    The entire iterative workflow depends on this capability of the models.

pith-pipeline@v0.9.0 · 5820 in / 1514 out tokens · 34194 ms · 2026-05-23T18:47:43.155861+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLM-driven design of physics-constrained constitutive models: two agents are better than one

    cs.LG 2026-05 unverdicted novelty 7.0

    A Creator-Inspector multi-agent LLM pipeline for constitutive artificial neural networks increases the rate of models satisfying all nine physical constraints to 100% or 56% depending on the LLM backbone.

  2. LLM-Guided Open Hypothesis Learning from Autonomous Scanning Probe Microscopy Experiments

    cond-mat.mtrl-sci 2026-05 unverdicted novelty 7.0

    The framework uses symbolic regression to propose analytical expressions from piezoresponse force microscopy data and an LLM to rank them for physical plausibility, yielding voltage-time growth laws for ferroelectric ...

  3. Leveraging Mathematical Reasoning of LLMs for Efficient GPU Thread Mapping

    cs.DC 2026-04 unverdicted novelty 6.0

    Large language models derive exact analytical GPU thread mappings for complex 2D/3D domains and fractals via in-context learning, outperforming symbolic regression and enabling up to thousands-fold speedups and energy...

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages · cited by 3 Pith papers · 11 internal anchors

  1. [1]

    Data-Driven Discovery of Physical Laws

    Pat Langley. Data-Driven Discovery of Physical Laws. Cognitive Science, 5(1):31–54, 1981

  2. [2]

    Application Issues of Genetic Programming in Industry

    Arthur Kordon, Flor Castillo, Guido Smits, and Mark Kotanchek. Application Issues of Genetic Programming in Industry. In Tina Yu, Rick Riolo, and Bill Worzel, editors, Genetic Programming Theory and Practice III, volume 9, pages 241–258. Kluwer Academic Publishers, Boston, 2006. Series Title: Genetic Programming

  3. [3]

    Distilling Free-Form Natural Laws from Experimental Data

    Michael Schmidt and Hod Lipson. Distilling Free-Form Natural Laws from Experimental Data. Science, 324(5923):81–85, 2009. ISBN: 0036-8075

  4. [4]

    Improvement and Validation of Genetic Programming Symbolic Regression Technique of Silva and Applications in Deriving Heat Transfer Correlations

    Yan Liu, Zhi Long Cheng, Jing Xu, Jian Yang, and Qiu Wang Wang. Improvement and Validation of Genetic Programming Symbolic Regression Technique of Silva and Applications in Deriving Heat Transfer Correlations. Heat Transfer Engineering, 37(10):862–874, 2016

  5. [5]

    Bayesian Symbolic Regression

    Ying Jin, Weilin Fu, Jian Kang, Jiadong Guo, and Jian Guo. Bayesian Symbolic Regression. arXiv preprint arXiv:1910.08892, 2019

  6. [6]

    Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales-Pardo

    Roger Guimerà, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A. Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales-Pardo. A Bayesian Machine Scientist to Aid in the Solution of Challenging Scientific Problems. Science Advances, January 2020. Publisher: American Association for the Advancement of Science

  7. [7]

    Globally Optimal Symbolic Regression

    Vernon Austel, Sanjeeb Dash, Oktay Gunluk, Lior Horesh, Leo Liberti, Giacomo Nannicini, and Baruch Schieber. Globally Optimal Symbolic Regression. NeurIPS, 2017

  8. [8]

    Sahinidis

    Alison Cozad and Nikolaos V . Sahinidis. A Global MINLP Approach to Symbolic Regression.Mathematical Programming, 170(1):97–119, 2018. Publisher: Springer Berlin Heidelberg

  9. [9]

    Combining data and theory for derivable scientific discovery with AI-Descartes

    Cristina Cornelio, Sanjeeb Dash, Vernon Austel, Tyler R Josephson, Joao Goncalves, Kenneth L Clarkson, Nimrod Megiddo, Bachir El Khadir, and Lior Horesh. Combining data and theory for derivable scientific discovery with AI-Descartes. Nature Communications, 14(1):1777, 2023

  10. [10]

    A Greedy Search Tree Heuristic for Symbolic Regression

    Fabrício Olivetti de França. A Greedy Search Tree Heuristic for Symbolic Regression. Information Sciences, 442:18–32, 2018

  11. [11]

    End-To-End Symbolic Regression with Transformers

    Pierre-Alexandre Kamienny, Stéphane d’Ascoli, Guillaume Lample, and François Charton. End-To-End Symbolic Regression with Transformers. Advances in Neural Information Processing Systems, 35:10269–10281, 2022

  12. [12]

    Deep Symbolic Regression for Recurrent Sequences

    Stéphane d’Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, and François Charton. Deep Symbolic Regression for Recurrent Sequences. arXiv preprint arXiv:2201.04600, 2022

  13. [13]

    Brunton, Joshua L

    Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering Governing Equations from Data: Sparse Identification of Nonlinear Dynamical Systems. Proceedings of the National Academy of Sciences, 113(15):3932– 3937, 2016. ISBN: 1091-6490 (Electronic) 0027-8424 (Linking)

  14. [14]

    Mangan, Steven L

    Niall M. Mangan, Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Inferring Biological Networks by Sparse Identification of Nonlinear Dynamics. IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 2(1):52–63, 2016. Publisher: Institute of Electrical and Electronics Engineers Inc

  15. [15]

    Ghiringhelli

    Runhai Ouyang, Stefano Curtarolo, Emre Ahmetcik, Matthias Scheffler, and Luca M. Ghiringhelli. SISSO: A Compressed-Sensing Method for Identifying the Best Low-Dimensional Descriptor in an Immensity of Offered Candidates. Physical Review Materials, 2(8):1–11, 2018. Publisher: American Physical Society

  16. [16]

    Genetic Algorithms in Search

    David E Goldberg. Genetic Algorithms in Search. Optimization, Machine Learning, 1989

  17. [17]

    Fitting Potential-Energy Surfaces: A Search in the Function Space by Directed Genetic Programming

    Dmitrii E Makarov and Horia Metiu. Fitting Potential-Energy Surfaces: A Search in the Function Space by Directed Genetic Programming. The Journal of Chemical Physics, 108(2):590–598, 1998

  18. [18]

    Using Genetic Programming with Prior Formula Knowledge to Solve Symbolic Regression Problem

    Qiang Lu, Jun Ren, and Zhiguang Wang. Using Genetic Programming with Prior Formula Knowledge to Solve Symbolic Regression Problem. Computational intelligence and neuroscience, 2016:1–1, 2016

  19. [19]

    AI Feynman: A Physics-Inspired Method for Symbolic Regression

    Silviu-Marian Udrescu and Max Tegmark. AI Feynman: A Physics-Inspired Method for Symbolic Regression. Science Advances, 6(16):eaay2631, 2020

  20. [20]

    AI-DARWIN: A First Principles-Based Model Discovery Engine using Machine Learning

    Arijit Chakraborty, Abhishek Sivaram, and Venkat Venkatasubramanian. AI-DARWIN: A First Principles-Based Model Discovery Engine using Machine Learning. Computers & Chemical Engineering, 154:107470, November 2021

  21. [21]

    Multi-Objective Symbolic Regression for Physics-Aware Dynamic Modeling

    Jiˇrí Kubalík, Erik Derner, and Robert Babuška. Multi-Objective Symbolic Regression for Physics-Aware Dynamic Modeling. Expert Systems with Applications, 182:115210, 2021

  22. [22]

    Engle and Nikolaos V

    Marissa R. Engle and Nikolaos V . Sahinidis. Deterministic Symbolic Regression with Derivative Information: General Methodology and Application to Equations of State. AIChE Journal, 68(6):e17457, 2022. 21 LLMs for SR A PREPRINT

  23. [23]

    Shape-Constrained Symbolic Regression – Improving Extrapolation with Prior Knowledge

    Gabriel Kronberger, Fabricio Olivetti de França, Bogdan Burlacu, Christian Haider, and Michael Kommenda. Shape-Constrained Symbolic Regression – Improving Extrapolation with Prior Knowledge. Evolutionary Compu- tation, 30(1):75–98, March 2022. arXiv:2103.15624 [cs, stat]

  24. [24]

    Haider, F.O

    C. Haider, F.O. De Franca, B. Burlacu, and G. Kronberger. Shape-Constrained Multi-Objective Genetic Program- ming for Symbolic Regression. Applied Soft Computing, 132:109855, January 2023

  25. [25]

    Deep Symbolic Regression for Physics Guided by Units Constraints: Toward the Automated Discovery of Physical Laws

    Wassim Tenachi, Rodrigo Ibata, and Foivos I Diakogiannis. Deep Symbolic Regression for Physics Guided by Units Constraints: Toward the Automated Discovery of Physical Laws. arXiv preprint arXiv:2303.03192, 2023

  26. [26]

    A Computational Framework for Physics-Informed Symbolic Regression with Straightforward Integration of Domain Knowledge

    Liron Simon Keren, Alex Liberzon, and Teddy Lazebnik. A Computational Framework for Physics-Informed Symbolic Regression with Straightforward Integration of Domain Knowledge. Scientific Reports, 13(1):1249, 2023

  27. [27]

    Active Learning in Symbolic Regression Performance with Physical Constraints

    Jorge Medina and Andrew D White. Active Learning in Symbolic Regression Performance with Physical Constraints. arXiv preprint arXiv:2305.10379, 2023

  28. [28]

    Incorporating Background Knowledge in Symbolic Regression using a Computer Algebra System

    Charles Fox, Neil D Tran, F Nikki Nacion, Samiha Sharlin, and Tyler R Josephson. Incorporating Background Knowledge in Symbolic Regression using a Computer Algebra System. Machine Learning: Science and Technology, 5(2):025057, 2024

  29. [29]

    Fully Autonomous Programming with Large Language Models

    Vadim Liventsev, Anastasiia Grishina, Aki Härmä, and Leon Moonen. Fully Autonomous Programming with Large Language Models. In Proceedings of the Genetic and Evolutionary Computation Conference , pages 1146–1155, 2023

  30. [30]

    ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design

    Pier Luca Lanzi and Daniele Loiacono. ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1383–1390, 2023

  31. [31]

    Language Model Crossover: Variation through Few-Shot Prompting

    Elliot Meyerson, Mark J Nelson, Herbie Bradley, Adam Gaier, Arash Moradi, Amy K Hoover, and Joel Lehman. Language Model Crossover: Variation through Few-Shot Prompting. arXiv preprint arXiv:2302.12170, 2023

  32. [32]

    The OpenELM Library: Leveraging Progress in Language Models for Novel Evolutionary Algorithms

    Herbie Bradley, Honglu Fan, Theodoros Galanos, Ryan Zhou, Daniel Scott, and Joel Lehman. The OpenELM Library: Leveraging Progress in Language Models for Novel Evolutionary Algorithms. In Genetic Programming Theory and Practice XX, pages 177–201. Springer, 2024

  33. [33]

    Two Optimizers Are Better Than One: LLM Catalyst for Enhancing Gradient-Based Optimization

    Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, and Wangmeng Zuo. Two Optimizers Are Better Than One: LLM Catalyst for Enhancing Gradient-Based Optimization. arXiv preprint arXiv:2405.19732, 2024

  34. [34]

    Evolving Code with a Large Language Model

    Erik Hemberg, Stephen Moskal, and Una-May O’Reilly. Evolving Code with a Large Language Model. arXiv preprint arXiv:2401.07102, 2024

  35. [35]

    Large Language Models as Evolutionary Optimizers

    Shengcai Liu, Caishun Chen, Xinghua Qu, Ke Tang, and Yew-Soon Ong. Large Language Models as Evolutionary Optimizers. In 2024 IEEE Congress on Evolutionary Computation (CEC), pages 1–8. IEEE, 2024

  36. [36]

    In-Context Symbolic Regression: Leveraging Large Language Models for Function Discovery

    Matteo Merler, Katsiaryna Haitsiukevich, Nicola Dainese, and Pekka Marttinen. In-Context Symbolic Regression: Leveraging Large Language Models for Function Discovery. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 589–606, 2024

  37. [37]

    LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

    Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. LLM-SR: Scientific Equation Discovery via Programming with Large Language Models. arXiv preprint arXiv:2404.18400, 2024

  38. [38]

    Symbolic Regression with a Learned Concept Library

    Arya Grayeli, Atharva Sehgal, Omar Costilla-Reyes, Miles Cranmer, and Swarat Chaudhuri. Symbolic Regression with a Learned Concept Library. arXiv preprint arXiv:2409.09359, 2024

  39. [39]

    Attention is All You Need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is All You Need. Advances in Neural Information Processing Systems, 30, 2017

  40. [40]

    Language Models are Few-Shot Learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language Models are Few-Shot Learners. Advances in neural information processing systems, 33:1877–1901, 2020

  41. [41]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the Opportunities and Risks of Foundation Models. arXiv preprint arXiv:2108.07258, 2021

  42. [42]

    Who’s the Best Detective? Large Language Models vs

    Felipe Urrutia and Roberto Araya. Who’s the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers. Journal of Educational Computing Research, 61(8):187–218, 2024. 22 LLMs for SR A PREPRINT

  43. [43]

    Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs’ Mathematical Competency through Ontology-guided Perturbations

    Pengfei Hong, Deepanway Ghosal, Navonil Majumder, Somak Aditya, Rada Mihalcea, and Soujanya Poria. Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs’ Mathematical Competency through Ontology-guided Perturbations. arXiv preprint arXiv:2401.09395, 2024

  44. [44]

    Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange

    Ankit Satpute, Noah Gießing, Andre Greiner-Petter, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, and Bela Gipp. Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange. arXiv preprint arXiv:2404.00344, 2024

  45. [45]

    ChatGPT for good? On opportunities and Challenges of Large Language Models for Education

    Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, et al. ChatGPT for good? On opportunities and Challenges of Large Language Models for Education. Learning and individual differences, 103:102274, 2023

  46. [46]

    Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education using Large Language Models

    Tiffany H Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos, Lorie De Leon, Camille Elepaño, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, et al. Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education using Large Language Models. PLoS digital health, 2(2):e0000198, 2023

  47. [47]

    Material Transformers: Deep Learning Language Models for Generative Materials Design

    Nihang Fu, Lai Wei, Yuqi Song, Qinyang Li, Rui Xin, Sadman Sadeed Omee, Rongzhi Dong, Edirisuriya M Dilanga Siriwardane, and Jianjun Hu. Material Transformers: Deep Learning Language Models for Generative Materials Design. Machine Learning: Science and Technology, 4(1):015001, 2023

  48. [48]

    Future Directions for Chatbot Research: An Interdisciplinary Research Agenda

    Asbjørn Følstad, Theo Araujo, Effie Lai-Chong Law, Petter Bae Brandtzaeg, Symeon Papadopoulos, Lea Reis, Marcos Baez, Guy Laban, Patrick McAllister, Carolin Ischen, et al. Future Directions for Chatbot Research: An Interdisciplinary Research Agenda. Computing, 103(12):2915–2942, 2021

  49. [49]

    Quantifying Mental Health Signals in Twitter

    Glen Coppersmith, Mark Dredze, and Craig Harman. Quantifying Mental Health Signals in Twitter. InProceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality, pages 51–60, 2014

  50. [50]

    Artificial Intelligence in Drug Discovery and Development

    Kit-Kay Mak, Yi-Hang Wong, and Mallikarjuna Rao Pichika. Artificial Intelligence in Drug Discovery and Development. Drug Discovery and Evaluation: Safety and Pharmacokinetic Assays, pages 1–38, 2023

  51. [51]

    What types of COVID-19 conspiracies are populated by Twitter bots? arXiv preprint arXiv:2004.09531, 2020

    Emilio Ferrara. What types of COVID-19 conspiracies are populated by Twitter bots? arXiv preprint arXiv:2004.09531, 2020

  52. [52]

    Machine Intelligence for Chemical Reaction Space

    Philippe Schwaller, Alain C Vaucher, Ruben Laplaza, Charlotte Bunne, Andreas Krause, Clemence Corminboeuf, and Teodoro Laino. Machine Intelligence for Chemical Reaction Space. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5):e1604, 2022

  53. [53]

    A Review of Large Language Models and Autonomous Agents in Chemistry

    Mayk Caldas Ramos, Christopher J Collison, and Andrew D White. A Review of Large Language Models and Autonomous Agents in Chemistry. arXiv preprint arXiv:2407.01603, 2024

  54. [54]

    Neural Legal Judgment Prediction in English

    Ilias Chalkidis, Ion Androutsopoulos, and Nikolaos Aletras. Neural Legal Judgment Prediction in English. arXiv preprint arXiv:1906.02059, 2019

  55. [55]

    Legalbench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

    Neel Guha, Julian Nyarko, Daniel Ho, Christopher Ré, Adam Chilton, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel Rockmore, Diego Zambrano, et al. Legalbench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 36, 2024

  56. [56]

    Twitter Mood Predicts the Stock Market

    Johan Bollen, Huina Mao, and Xiaojun Zeng. Twitter Mood Predicts the Stock Market. Journal of computational science, 2(1):1–8, 2011

  57. [57]

    Artificial Intelligence and Machine Learning in Finance: Identifying Foundations, Themes, and Research Clusters from Bibliometric Analysis

    John W Goodell, Satish Kumar, Weng Marc Lim, and Debidutta Pattnaik. Artificial Intelligence and Machine Learning in Finance: Identifying Foundations, Themes, and Research Clusters from Bibliometric Analysis. Journal of Behavioral and Experimental Finance, 32:100577, 2021

  58. [58]

    Music Transformer

    Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. Music Transformer. arXiv preprint arXiv:1809.04281, 2018

  59. [59]

    A systematic Review of Artificial Intelligence-Based Music Generation: Scope, Applications, and Future Trends

    Miguel Civit, Javier Civit-Masot, Francisco Cuadrado, and Maria J Escalona. A systematic Review of Artificial Intelligence-Based Music Generation: Scope, Applications, and Future Trends. Expert Systems with Applications, 209:118190, 2022

  60. [60]

    Pre-trained Language Models as Prior Knowledge for Playing Text-Based Games

    Ishika Singh, Gargi Singh, and Ashutosh Modi. Pre-trained Language Models as Prior Knowledge for Playing Text-Based Games. arXiv preprint arXiv:2107.08408, 2021

  61. [61]

    A survey on large language model-based game agents

    Sihao Hu, Tiansheng Huang, Fatih Ilhan, Selim Tekin, Gaowen Liu, Ramana Kompella, and Ling Liu. A Survey on Large Language Model-Based Game Agents. arXiv preprint arXiv:2404.02039, 2024. 23 LLMs for SR A PREPRINT

  62. [62]

    Symbolicgpt: A generative trans- former model for symbolic regression

    Mojtaba Valipour, Bowen You, Maysum Panju, and Ali Ghodsi. SymbolicGPT: A Generative Transformer Model for Symbolic Regression. arXiv preprint arXiv:2106.14131, 2021

  63. [63]

    Program Synthesis with Large Language Models

    Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program Synthesis with Large Language Models. arXiv preprint arXiv:2108.07732, 2021

  64. [64]

    Evaluating Large Language Models Trained on Code

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374, 2021

  65. [65]

    Chain-Of-Thought Prompting Elicits Reasoning in Large Language Models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-Of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in neural information processing systems, 35:24824–24837, 2022

  66. [66]

    Show Your Work: Scratchpads for Intermediate Computation with Language Models, 2021

    Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models, 2021

  67. [67]

    How I Think about LLM Prompt Engineering

    François Chollet. How I Think about LLM Prompt Engineering. https://fchollet.substack.com/p/ how-i-think-about-llm-prompt-engineering . Accessed: 2024-08-13

  68. [68]

    Le, Denny Zhou, and Xinyun Chen

    Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V . Le, Denny Zhou, and Xinyun Chen. Large Language Models as Optimizers, 2024

  69. [69]

    The Adsorption of Gases on Plane Surfaces of Glass, Mica and Platinum

    Irving Langmuir. The Adsorption of Gases on Plane Surfaces of Glass, Mica and Platinum. Journal of the American Chemical Society, 40(9):1361–1403, September 1918. Publisher: American Chemical Society

  70. [70]

    A Simplex Method for Function Minimization

    John A Nelder and Roger Mead. A Simplex Method for Function Minimization. The Computer Journal , 7(4):308–313, 1965

  71. [71]

    Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms

    David J Wales and Jonathan PK Doye. Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms. The Journal of Physical Chemistry A, 101(28):5111–5116, 1997

  72. [72]

    Keep Me Updated! Memory Management in Long-Term Conversations

    Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri Kim, Sang-Woo Lee, Woomyoung Park, and Nako Sung. Keep Me Updated! Memory Management in Long-Term Conversations. arXiv preprint arXiv:2210.08750, 2022

  73. [73]

    Beyond Goldfish Memory: Long-Term Open-Domain Conversation

    Jing Xu, Arthur Szlam, and Jason Weston. Beyond Goldfish Memory: Long-Term Open-Domain Conversation. arXiv preprint arXiv:2107.07567, 2021

  74. [74]

    Long time no see! open-domain conversation with long-term persona memory,

    Xinchao Xu, Zhibin Gou, Wenquan Wu, Zheng-Yu Niu, Hua Wu, Haifeng Wang, and Shihang Wang. Long Time No See! Open-Domain Conversation with Long-Term Persona Memory. arXiv preprint arXiv:2203.05797, 2022

  75. [75]

    Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation

    Hanxun Zhong, Zhicheng Dou, Yutao Zhu, Hongjin Qian, and Ji-Rong Wen. Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation. arXiv preprint arXiv:2204.08128, 2022

  76. [76]

    Lost in the Middle: How Language Models use Long Contexts

    Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the Middle: How Language Models use Long Contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024

  77. [77]

    Large Language Models Can Be Easily Distracted by Irrelevant Context

    Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, and Denny Zhou. Large Language Models Can Be Easily Distracted by Irrelevant Context. In International Conference on Machine Learning, pages 31210–31227. PMLR, 2023

  78. [78]

    Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

    Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 55(9):1–35, 2023

  79. [79]

    Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

    Laria Reynolds and Kyle McDonell. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7, 2021

  80. [80]

    Learning How to Ask: Querying LMs with Mixtures of Soft Prompts

    Guanghui Qin and Jason Eisner. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts. arXiv preprint arXiv:2104.06599, 2021

Showing first 80 references.