pith. machine review for the scientific record. sign in

arxiv: 2605.08368 · v1 · submitted 2026-05-08 · 💻 cs.AI · cond-mat.stat-mech· cs.LG

Recognition: no theorem link

On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:47 UTC · model grok-4.3

classification 💻 cs.AI cond-mat.stat-mechcs.LG
keywords capability elicitationcapability creationpost-trainingaccessible supportfree-energy perspectivesupervised fine-tuningreinforcement learninglarge language models
0
0 comments X

The pith

Post-training reweights behaviors within a pretrained model's accessible support to elicit capabilities, or expands that support to create new ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that the common distinction between supervised fine-tuning as imitation and reinforcement learning as discovery is too coarse for understanding large language model post-training. Instead, it proposes distinguishing capability elicitation, which increases the probability of behaviors the model could already produce, from capability creation, which changes what the model can practically reach. This distinction is made operational through the concept of accessible support, defined as the set of behaviors reachable under finite budgets. Using a free-energy perspective, both SFT and RL are viewed as reweighting a reference distribution based on external signals, with the key factor being whether the update remains close to the base model. If the paper is right, research should focus on whether post-training expands the reachable behavioral space through mechanisms like search or new information rather than the choice of SFT or RL.

Core claim

Post-training that reweights behaviors within the accessible support is capability elicitation; whereas changing the support itself corresponds to capability creation. Both SFT and RL can be seen as reweighting the pretrained reference distribution, only with different external signals, and when the update remains close to the base model, the main effect is local reweighting, not capability creation.

What carries the argument

Accessible support, the set of behaviors that a model can practically produce under finite budgets, which determines whether post-training elicits or creates capabilities by reweighting within it or changing it.

If this is right

  • The central question for post-training is no longer whether it is framed as SFT or RL, but whether it reweights behaviors already within reach or expands the model's reachable behavioral space through search, interaction, tool use, or new information.
  • When the update remains close to the base model, the main effect is local reweighting, not capability creation.
  • SFT and RL can both be seen as reweighting a pretrained reference distribution, only with different external signals: demonstrations define low-energy behavior for SFT and rewards define low-energy behavior for RL.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This view suggests measuring post-training by testing whether new behaviors could have been reached with finite effort before the update.
  • Training procedures that incorporate explicit search or external interaction are positioned as more likely to expand accessible support.
  • The framework could be used to reinterpret scaling curves as mixtures of elicitation and creation effects at different training stages.

Load-bearing premise

The notion of accessible support can be made precise and measurable enough to distinguish elicitation from creation in practice.

What would settle it

An experiment showing a concrete behavior that a model produces after post-training but could not produce before under any finite budget of compute, data, or interaction, separate from mere reweighting of already-reachable outputs.

Figures

Figures reproduced from arXiv: 2605.08368 by Shengchao Liu, Yuhao Li.

Figure 1
Figure 1. Figure 1: Schematic energy landscape for accessible support. Behaviors in basins are easily produced by the base model, while tail behaviors are rare but reachable under larger sampling or search budgets. Barrier regions require crossing low-probability intermediate states. In the unsupported limit, the effective energy becomes divergent, and the local reweighting view no longer applies. This is a distributional eff… view at source ↗
read the original abstract

Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction is too coarse. What matters is whether a training procedure increases the probability of behaviors the pretrained model could already produce, or whether it changes what the model can practically reach. We argue that post-training research should distinguish between capability elicitation and capability creation. We make this distinction operational by introducing the notion of accessible support: the set of behaviors that a model can practically produce under finite budgets. Post-training that reweights behaviors within this support is capability elicitation; whereas changing the support itself corresponds to capability creation. We develop this argument through a free-energy view of post-training. SFT and RL can both be seen as reweighting a pretrained reference distribution, only with different external signals. Demonstration signals define low-energy behavior for SFT, and reward signals define low-energy behavior for RL. When the update remains close to the base model, the main effect is local reweighting, not capability creation. Within this framework, the central question is no longer whether post-training is framed as SFT or RL, but whether it reweights behaviors already within reach, or instead expands the model's reachable behavioral space through search, interaction, tool use, or the incorporation of new information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that debates on LLM post-training oversimplify SFT as imitation and RL as discovery. It introduces 'accessible support' (behaviors a model can practically produce under finite budgets) to distinguish capability elicitation (reweighting probabilities within this support) from capability creation (expanding the support itself). Both SFT and RL are reframed as reweighting a pretrained reference distribution under a free-energy perspective, where demonstration or reward signals define low-energy behaviors; the key question is whether updates remain local to the base model or expand reachable behaviors via search, interaction, or new information.

Significance. If operationalized, the distinction could usefully reorient post-training research toward explicit analysis of whether updates elicit existing behaviors or create new ones, moving beyond coarse SFT/RL labels. The free-energy analogy correctly highlights that both methods optimize energy-like objectives and that proximity to the base model favors reweighting. However, the manuscript is entirely conceptual with no derivations, data, benchmarks, or examples, so its significance remains prospective rather than demonstrated.

major comments (1)
  1. [Abstract] Abstract: The central claim requires that 'accessible support' (behaviors reachable under finite budgets) can be identified precisely enough to classify any post-training update as reweighting inside the support or expansion of the support. The paper defines it only descriptively ('the set of behaviors that a model can practically produce under finite budgets') and states that SFT/RL are reweightings of a reference distribution, but supplies no mathematical characterization (e.g., no measure on behavior space, no budget parameterization, no decision procedure), no algorithm, and no worked example.
minor comments (2)
  1. The manuscript would benefit from a brief toy-model illustration showing how one would determine whether a specific update changes the accessible support.
  2. Clarify whether the free-energy perspective is intended as a strict analogy or as a formal mapping that could yield testable predictions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and insightful review. The comments accurately highlight the conceptual focus of the manuscript and the need for greater precision around the definition of accessible support. We respond to the major comment below, indicating the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim requires that 'accessible support' (behaviors reachable under finite budgets) can be identified precisely enough to classify any post-training update as reweighting inside the support or expansion of the support. The paper defines it only descriptively ('the set of behaviors that a model can practically produce under finite budgets') and states that SFT/RL are reweightings of a reference distribution, but supplies no mathematical characterization (e.g., no measure on behavior space, no budget parameterization, no decision procedure), no algorithm, and no worked example.

    Authors: We agree that the current treatment of accessible support is descriptive rather than equipped with a formal measure on behavior space, explicit budget parameterization, or a decision procedure. The manuscript is a perspective paper whose primary aim is to reframe post-training debates; it does not purport to deliver a complete operational framework. In the revised version we will (i) add a more explicit parameterization of the finite budget (in terms of sampling temperature, sequence length, and computational resources) and (ii) include a short worked example illustrating how one might assess whether a given behavior lies inside or outside the accessible support in a simplified setting. We will also state clearly that a full algorithm or classification procedure lies beyond the scope of this work and remains an open question for future research. These changes will make the central claim more precise while preserving the paper's conceptual character. revision: partial

Circularity Check

1 steps flagged

Central distinction introduced by definition of 'accessible support' without formalization or external derivation

specific steps
  1. self definitional [Abstract]
    "We make this distinction operational by introducing the notion of accessible support: the set of behaviors that a model can practically produce under finite budgets. Post-training that reweights behaviors within this support is capability elicitation; whereas changing the support itself corresponds to capability creation."

    The elicitation-vs-creation distinction is defined exactly as reweighting inside versus expansion of the newly introduced 'accessible support' term. The term is presented as making the distinction operational, but the definition supplies no independent measure, budget parameterization, or decision procedure; the classification therefore holds by construction of the definition rather than by derivation from prior results or data.

full rationale

The paper's load-bearing claim—that post-training is elicitation when it reweights inside accessible support and creation when it expands the support—is made by introducing the term 'accessible support' and then defining the distinction directly in terms of it. This reduces the claimed operationalization to a definitional move rather than a derivation from independent equations, benchmarks, or measurable procedures. The free-energy framing is asserted as a perspective under which SFT and RL are both reweightings, but supplies no checked equations or external validation that would make the support concept falsifiable outside the definition itself. No self-citations or fitted parameters appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework introduces one invented entity and relies on a domain assumption about free-energy minimization; no fitted parameters or external benchmarks are used.

axioms (1)
  • domain assumption Post-training procedures can be viewed as reweighting a pretrained reference distribution using external signals that define low-energy behaviors.
    Stated directly in the abstract as the basis for unifying SFT and RL.
invented entities (1)
  • accessible support no independent evidence
    purpose: To operationalize the boundary between capability elicitation and capability creation.
    Defined as the set of behaviors a model can practically produce under finite budgets; no independent evidence or measurement protocol supplied.

pith-pipeline@v0.9.0 · 5548 in / 1284 out tokens · 57137 ms · 2026-05-12T00:47:44.749512+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 9 internal anchors

  1. [1]

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P ., Neelakantan, A., Shyam, P ., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A...

  2. [2]

    Touvron, H., Martin, L., Stone, K., Albert, P ., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P ., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V ., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V ., Kh...

  3. [3]

    D., Ermon, S

    Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S. & Finn, C.Direct Preference Optimization: Your Language Model is Secretly a Reward ModelinThirty-seventh Conference on Neural Information Processing Systems (2023).https://openreview.net/forum?id=HPuSIXJaa9

  4. [4]

    Fine-Tuning Language Models from Human Preferences

    Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P . & Irving, G.Fine-Tuning Language Models from Human Preferences2020. arXiv:1909.08593 [cs.CL].https://arxiv.org/abs/1909.08593

  5. [5]

    & Christiano, P

    Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D. & Christiano, P . F. Learning to summarize with human feedbackinAdvances in Neural Information Processing Systems(eds Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H.)33(Curran Associates, Inc., 2020), 3008–3021.https://proceedin gs.neurips.cc/paper_fi...

  6. [6]

    F., Leike, J

    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P ., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P ., Christiano, P . F., Leike, J. & Lowe, R.Training language models to follow instructions with human feedbackinAdvances in Neural Information Processing...

  7. [7]

    W., Lester, B., Du, N., Dai, A

    Wei, J., Bosma, M., Zhao, V ., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M. & Le, Q. V .Finetuned Language Models are Zero-Shot LearnersinInternational Conference on Learning Representations(2022).https://openreview.net/for um?id=gEZrGCozdqR

  8. [8]

    Self-Instruct: Aligning Language Models with Self-Generated Instructions

    Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D. & Hajishirzi, H.Self-Instruct: Aligning Language Models with Self-Generated Instructions2023. arXiv:2212.10560 [cs.CL].https://arxiv.org/abs/2212.10560

  9. [9]

    Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P ., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P ., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P ., Such, F. P ., Cummings, D., Plappert, M....

  10. [10]

    Code Llama: Open Foundation Models for Code

    Rozière, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X. E., Adi, Y., Liu, J., Sauvestre, R., Remez, T., Rapin, J., Kozhevnikov, A., Evtimov, I., Bitton, J., Bhatt, M., Ferrer, C. C., Grattafiori, A., Xiong, W., Défossez, A., Copet, J., Azhar, F., Touvron, H., Martin, L., Usunier, N., Scialom, T. & Synnaeve, G.Code Llama: Open Foundation Model...

  11. [11]

    & Cobbe, K.Let’s Verify Step by StepinThe Twelfth International Conference on Learning Representations(2024).https://openr eview.net/forum?id=v8L0pN6EOi

    Lightman, H., Kosaraju, V ., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I. & Cobbe, K.Let’s Verify Step by StepinThe Twelfth International Conference on Learning Representations(2024).https://openr eview.net/forum?id=v8L0pN6EOi

  12. [12]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Shao, Z., Wang, P ., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y. K., Wu, Y. & Guo, D.DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models2024. arXiv:2402.03300 [cs.CL].https://a rxiv.org/abs/2402.03300. 11

  13. [13]

    V ., Levine, S

    Chu, T., Zhai, Y., Yang, J., Tong, S., Xie, S., Schuurmans, D., Le, Q. V ., Levine, S. & Ma, Y.SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-traininginForty-second International Conference on Machine Learning(2025).https://openreview.net/forum?id=dYur3yabMj

  14. [14]

    Supervised fine-tuning versus reinforcement learning: A study of post-training methods for large language models.arXiv preprint arXiv:2603.13985, 2026

    Jiang, H., Zhang, W., Yao, J., Cai, H., Wang, S. & Song, R.Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models2026. arXiv:2603.13985 [cs.AI].https://arxiv.org/a bs/2603.13985

  15. [15]

    & Levy, O.LIMA: Less Is More for AlignmentinAdvances in Neural Information Processing Systems (eds Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M

    Zhou, C., Liu, P ., Xu, P ., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P ., YU, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L. & Levy, O.LIMA: Less Is More for AlignmentinAdvances in Neural Information Processing Systems (eds Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M. & Levine, S.)36(Curran Associates, Inc., 2023), 55006–5502...

  16. [16]

    Zhang, S., Dong, L., Li, X., Zhang, S., Sun, X., Wang, S., Li, J., Hu, R., Zhang, T., Wang, G. & Wu, F. Instruction Tuning for Large Language Models: A Survey.ACM Comput. Surv.58.ISSN: 0360-0300.https://doi.org/10.11 45/3777411(Jan. 2026)

  17. [17]

    & Hovy, E.Reinforcement Learning Enhanced LLMs: A Survey2025

    Wang, S., Zhang, S., Zhang, J., Hu, R., Li, X., Zhang, T., Li, J., Wu, F., Wang, G. & Hovy, E.Reinforcement Learning Enhanced LLMs: A Survey2025. arXiv:2412.10400 [cs.CL].https://arxiv.org/abs/2412.10400

  18. [18]

    OpenAI o1 System Card

    OpenAIet al. OpenAI o1 System Card2024. arXiv:2412.16720 [cs.AI].https://arxiv.org/abs/2412.16720

  19. [19]

    ISSN: 1476-4687.http://dx.doi.org/10.1038/s41586-025-09422-z(2025)

    Guo, D.et al.DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645,633–638. ISSN: 1476-4687.http://dx.doi.org/10.1038/s41586-025-09422-z(2025)

  20. [20]

    & Goodman, N.STaR: Bootstrapping Reasoning With ReasoninginAdvances in Neural Information Processing Systems(eds Oh, A

    Zelikman, E., Wu, Y., Mu, J. & Goodman, N.STaR: Bootstrapping Reasoning With ReasoninginAdvances in Neural Information Processing Systems(eds Oh, A. H., Agarwal, A., Belgrave, D. & Cho, K.) (2022).https://openreview .net/forum?id=_3ELRdg2sgI

  21. [21]

    arXiv:2506.08060 [cs.LG]

    Sharma, A.Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques2025. arXiv:2506.08060 [cs.LG]. https://arxiv.org/abs/2506.08060

  22. [22]

    Toshniwal, S., Du, W., Moshkov, I., Kisacanin, B., Ayrapetyan, A. & Gitman, I.OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DatainThe Thirteenth International Conference on Learning Representa- tions(2025).https://openreview.net/forum?id=mTCbq2QssD

  23. [23]

    & Chen, W.ToRA: A Tool-Integrated Reason- ing Agent for Mathematical Problem SolvinginThe Twelfth International Conference on Learning Representations(2024)

    Gou, Z., Shao, Z., Gong, Y., yelong shen, Yang, Y., Huang, M., Duan, N. & Chen, W.ToRA: A Tool-Integrated Reason- ing Agent for Mathematical Problem SolvinginThe Twelfth International Conference on Learning Representations(2024). https://openreview.net/forum?id=Ep0TtjVoap

  24. [24]

    Theodorou, E. A. & Todorov, E.Relative entropy and free energy dualities: Connections to Path Integral and KL control in2012 IEEE 51st IEEE Conference on Decision and Control (CDC)(2012), 1466–1473

  25. [25]

    Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

    Levine, S.Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review2018. arXiv:1805.00909 [cs.LG].https://arxiv.org/abs/1805.00909

  26. [26]

    & Kiela, D.Model Alignment as Prospect Theoretic Optimization inForty-first International Conference on Machine Learning(2024).https://openreview.net/forum?id=iUwHnoENn l

    Ethayarajh, K., Xu, W., Muennighoff, N., Jurafsky, D. & Kiela, D.Model Alignment as Prospect Theoretic Optimization inForty-first International Conference on Machine Learning(2024).https://openreview.net/forum?id=iUwHnoENn l

  27. [27]

    Gheshlaghi Azar, M., Daniel Guo, Z., Piot, B., Munos, R., Rowland, M., Valko, M. & Calandriello, D.A General Theoretical Paradigm to Understand Learning from Human PreferencesinProceedings of The 27th International Conference on Artificial Intelligence and Statistics(eds Dasgupta, S., Mandt, S. & Li, Y.)238(PMLR, Feb. 2024), 4447–4455.http s://proceedings...

  28. [28]

    V ., Geng, X., Liu, H., Abbeel, P ., Levine, S

    Gudibande, A., Wallace, E., Snell, C. V ., Geng, X., Liu, H., Abbeel, P ., Levine, S. & Song, D.The False Promise of Imitating Proprietary Language ModelsinThe Twelfth International Conference on Learning Representations(2024). https://openreview.net/forum?id=Kz3yckpCN5

  29. [29]

    Removing Sandbagging in LLMs by Training with Weak Supervision

    Ryd, E., Bartsch, H., Stastny, J., Benton, J. & Hebbar, V .Removing Sandbagging in LLMs by Training with Weak Supervision2026. arXiv:2604.22082 [cs.LG].https://arxiv.org/abs/2604.22082

  30. [30]

    URL https://arxiv

    Korbak, T., Perez, E. & Buckley, C. L.RL with KL penalties is better viewed as Bayesian inference2022. arXiv:2205.11 275 [cs.LG].https://arxiv.org/abs/2205.11275

  31. [31]

    Sharma, A., Keh, S., Mitchell, E., Finn, C., Arora, K. & Kollar, T.A Critical Evaluation of AI Feedback for Aligning Large Language ModelsinThe Thirty-eighth Annual Conference on Neural Information Processing Systems(2024).http s://openreview.net/forum?id=FZQYfmsmX9

  32. [32]

    & Liu, P .LIMO: Less is More for ReasoninginSecond Conference on Language Modeling(2025).https://openreview.net/forum?id=T2TZ0RY4Zk

    Ye, Y., Huang, Z., Xiao, Y., Chern, E., Xia, S. & Liu, P .LIMO: Less is More for ReasoninginSecond Conference on Language Modeling(2025).https://openreview.net/forum?id=T2TZ0RY4Zk. 12

  33. [33]

    & Dhoedt, B

    Mazzaglia, P ., Verbelen, T., Çatal, O. & Dhoedt, B. The Free Energy Principle for Perception and Action: A Deep Learning Perspective.Entropy24.ISSN: 1099-4300.https://www.mdpi.com/1099-4300/24/2/301(2022)

  34. [34]

    Taniguchi, R

    Taniguchi, T., Ueda, R., Nakamura, T., Suzuki, M. & Taniguchi, A.Generative Emergent Communication: Large Lan- guage Model is a Collective World Model2025. arXiv:2501.00226 [cs.AI].https://arxiv.org/abs/2501.00226

  35. [35]

    & Murfet, D.Stagewise Reinforcement Learning and the Geometry of the Regret Landscape2026

    Elliott, C., Urdshals, E., Quarel, D., Farrugia-Roberts, M. & Murfet, D.Stagewise Reinforcement Learning and the Geometry of the Regret Landscape2026. arXiv:2601.07524 [cs.LG].https://arxiv.org/abs/2601.07524

  36. [36]

    H., Wu, Y., Le, Q

    Trinh, T. H., Wu, Y., Le, Q. V ., He, H. & Luong, T. Solving olympiad geometry without human demonstrations. Nature625,476–482 (2024)

  37. [37]

    & Li, Y.ARGS: Alignment as Reward-Guided SearchinThe Twelfth International Confer- ence on Learning Representations(2024).https://openreview.net/forum?id=shgx0eqdw6

    Khanov, M., Burapacheep, J. & Li, Y.ARGS: Alignment as Reward-Guided SearchinThe Twelfth International Confer- ence on Learning Representations(2024).https://openreview.net/forum?id=shgx0eqdw6

  38. [38]

    Quan, S.Automatically Generating Custom Context-Driven SFT Data for LLMs with Multi-GranularityinAdaptive Foundation Models: Evolving AI for Personalized and Efficient Learning(2024).https://openreview.net/forum?id =wu8NIjf8pD. 13 A Detailed Derivation for the Boltzmann Reweighting Solution To derive the minimizer of free-energy, we introduce a Lagrange m...