Information Theoretic Adversarial Training of Large Language Models
Pith reviewed 2026-05-08 17:25 UTC · model grok-4.3
The pith
Warden reweights adversarial examples inside an f-divergence ball to cut attack success rates on large language models while keeping utility costs comparable to prior methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Warden optimizes the worst-case adversarial loss within an f-divergence ambiguity set around the empirical training distribution; under the KL divergence this reduces to a log-sum-exp objective controlled by a dynamical reweighting parameter that automatically focuses on harder examples, yielding substantially lower attack success rates across multiple LLMs and attack settings at computational and utility costs comparable to CAT, CAPO, and MixAT baselines.
What carries the argument
The f-divergence ambiguity set with dynamical reweighting that converts worst-case loss minimization into an automatic emphasis on difficult adversarial examples.
If this is right
- Attack success rates drop substantially on the tested LLMs and attack types.
- Model utility on normal tasks stays comparable to non-robust baselines.
- Training compute remains in line with existing continuous adversarial methods.
- The framework supplies a new family of information-theoretic objectives for robust alignment.
Where Pith is reading between the lines
- The same reweighting mechanism could apply to other distribution-shift problems beyond adversarial prompting.
- Automatic focus on hard examples might lessen the need for manual design of attack prompts.
- Whether the dynamical parameter remains stable when models grow much larger is worth direct testing.
Load-bearing premise
The f-divergence ball around the empirical training distribution covers the adversarial perturbations that LLMs actually encounter, and the reweighting parameter can be chosen without introducing instability or overfitting.
What would settle it
A new attack strategy or previously untested LLM where WARDEN produces higher attack success rates than CAT or CAPO at matched utility and compute.
Figures
read the original abstract
Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are computationally expensive and difficult to scale. Recent continuous adversarial training methods, such as Continuous adversarial training (CAT) and Continuous Adversarial Preference Optimization (CAPO), address this challenge by leveraging gradient-based perturbations in the embedding space, enabling more efficient and expressive attacks. Building on this paradigm, we propose WARDEN, a distributionally robust adversarial training framework for LLMs that dynamically reweights adversarial examples through an f -divergence ambiguity set around the empirical training distribution. Our method optimizes the worst-case adversarial loss within a divergence ball around the empirical data distribution, automatically emphasizing harder adversarial examples. Using the convex dual formulation, the objective reduces to a log-sum-exp form under the KL divergence, with a dynamical parameter controlling the strength of reweighting. This study leads to a new class of information-theoretic objectives that significantly reduce attack success rates while maintaining model utility. Across multiple LLMs and attack settings, WARDEN substantially reduces attack success rates with computational and utility costs comparable to CAT-, CAPO-, and MixAT-based baselines, making it a practical approach for scalable robust alignment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes WARDEN, a distributionally robust optimization (DRO) framework for adversarial training of large language models (LLMs). It uses an f-divergence ambiguity set around the empirical training distribution to optimize the worst-case adversarial loss, which under KL divergence reduces to a log-sum-exp objective with a dynamical reweighting parameter. The paper claims that this leads to significant reductions in attack success rates across multiple LLMs and attack settings, with costs comparable to baselines such as CAT, CAPO, and MixAT, while maintaining model utility.
Significance. If the empirical results are robust and the ambiguity set is appropriately chosen, this work offers a principled information-theoretic approach to scalable adversarial training for LLMs. It builds on continuous adversarial training methods by incorporating automatic reweighting of harder examples via DRO, potentially providing better robustness without excessive computational overhead. The provision of a convex dual formulation is a strength.
major comments (2)
- [Method description (abstract and §3)] The central claim that optimizing within the f-divergence ball yields robustness to practical adversarial prompts (e.g., jailbreaks) rests on the assumption that continuous embedding-space perturbations within the ball are representative of discrete, semantically coherent attacks. This is not obviously true, as the ball may not intersect the support of effective real-world attacks; additional justification or experiments showing coverage would be needed to support the robustness gains.
- [Experimental section] The abstract mentions reductions in attack success rates but without details on statistical significance, number of runs, or specific attack success metrics used for tuning the dynamical reweighting parameter, it is difficult to rule out overfitting or circularity in the reported improvements.
minor comments (2)
- [Abstract] The phrasing 'This study leads to a new class of information-theoretic objectives' is vague; specify what the new class is and how it differs from existing DRO objectives.
- [Abstract] Typo: 'f -divergence' should be 'f-divergence'.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive comments on our manuscript. We address each major comment point-by-point below and describe the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Method description (abstract and §3)] The central claim that optimizing within the f-divergence ball yields robustness to practical adversarial prompts (e.g., jailbreaks) rests on the assumption that continuous embedding-space perturbations within the ball are representative of discrete, semantically coherent attacks. This is not obviously true, as the ball may not intersect the support of effective real-world attacks; additional justification or experiments showing coverage would be needed to support the robustness gains.
Authors: We appreciate the referee highlighting this key assumption. Our approach directly extends prior continuous adversarial training methods (CAT and CAPO) that have already demonstrated practical effectiveness against discrete jailbreak attacks via embedding perturbations. The f-divergence ball provides a principled neighborhood for robust optimization, and our empirical results across multiple LLMs and attack settings show consistent reductions in attack success rates. To address the concern, we will revise the abstract and Section 3 to include additional justification, referencing the literature on continuous attacks and providing qualitative examples from our experiments where embedding perturbations yield semantically coherent prompt variations. This will better articulate the coverage without requiring entirely new experiments. revision: partial
-
Referee: [Experimental section] The abstract mentions reductions in attack success rates but without details on statistical significance, number of runs, or specific attack success metrics used for tuning the dynamical reweighting parameter, it is difficult to rule out overfitting or circularity in the reported improvements.
Authors: We agree that greater experimental transparency is essential. In the revised manuscript, we will expand the experimental section (and update the abstract accordingly) to report the number of independent runs, include statistical significance measures such as standard deviations and confidence intervals for attack success rate reductions, and provide a detailed description of the attack success rate metric along with the exact validation procedure used to tune the dynamical reweighting parameter. This will explicitly demonstrate that tuning was performed on held-out data to avoid circularity or overfitting. revision: yes
Circularity Check
No circularity; standard DRO dual applied without reduction to inputs
full rationale
The paper applies the convex dual of f-divergence DRO to produce a log-sum-exp objective with dynamical reweighting for adversarial LLM training. This is a standard mathematical reduction independent of the target robustness claims or evaluation metrics. The ambiguity set and parameter are explicit modeling choices, not fitted to the reported attack success rates by construction. No self-citation load-bearing steps, no self-definitional equations, and no renaming of known results as novel derivations appear in the abstract or described chain. Empirical gains are measured on separate attack benchmarks, keeping the derivation self-contained against external evaluation.
Axiom & Free-Parameter Ledger
free parameters (1)
- dynamical reweighting parameter
axioms (1)
- standard math The convex dual formulation of the worst-case loss over an f-divergence ball reduces to a log-sum-exp expression under the KL divergence.
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems , volume=
Many-shot jailbreaking , author=. Advances in Neural Information Processing Systems , volume=
-
[2]
Wang, Xunguang and Wu, Daoyuan and Ji, Zhenlan and Li, Zongjie and Ma, Pingchuan and Wang, Shuai and Li, Yingjiu and Liu, Yang and Liu, Ning and Rahmel, Juergen , booktitle=
-
[3]
Shaopeng Fu and Liang Ding and Di Wang , booktitle=. ''. 2025 , url=
2025
-
[4]
Efficient adversarial training in
Xhonneux, Sophie and Sordoni, Alessandro and G. Efficient adversarial training in. Advances in Neural Information Processing Systems , volume=
-
[5]
Understanding and Improving Continuous
Shaopeng and Fu, Di and Wang , journal=. Understanding and Improving Continuous
-
[6]
D. Mix. Advances in Neural Information Processing Systems , year=
-
[7]
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in
Abhay Sheshadri and Aidan Ewart and Phillip Huang Guo and Aengus Lynch and Cindy Wu and Vivek Hebbar and Henry Sleight and Asa Cooper Stickland and Ethan Perez and Dylan Hadfield-Menell and Stephen Casper , journal=. Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in. 2025 , url=
2025
-
[8]
Transactions on Machine Learning Research , issn=
Defending Against Unforeseen Failure Modes with Latent Adversarial Training , author=. Transactions on Machine Learning Research , issn=. 2025 , url=
2025
-
[9]
Forty-first International Conference on Machine Learning , year=
Benign overfitting in adversarial training of neural networks , author=. Forty-first International Conference on Machine Learning , year=
-
[11]
The Thirteenth International Conference on Learning Representations , year=
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data , author=. The Thirteenth International Conference on Learning Representations , year=
-
[12]
International Conference on Machine Learning , pages=
Explaining the role of Intrinsic Dimensionality in Adversarial Training , author=. International Conference on Machine Learning , pages=. 2025 , organization=
2025
-
[13]
International Conference on Machine Learning , pages=
CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models , author=. International Conference on Machine Learning , pages=. 2025 , organization=
2025
-
[14]
Advances in Neural Information Processing Systems , volume=
High-dimensional (group) adversarial training in linear regression , author=. Advances in Neural Information Processing Systems , volume=
-
[15]
Advances in Neural Information Processing Systems , volume=
Defensive unlearning with adversarial training for robust concept erasure in diffusion models , author=. Advances in Neural Information Processing Systems , volume=
-
[16]
Advances in Neural Information Processing Systems , volume=
Stability and generalization of adversarial training for shallow neural networks with smooth activation , author=. Advances in Neural Information Processing Systems , volume=
-
[17]
Proceedings of the 41st International Conference on Machine Learning , pages=
Improving accuracy-robustness trade-off via pixel reweighted adversarial training , author=. Proceedings of the 41st International Conference on Machine Learning , pages=
-
[19]
Forty-second International Conference on Machine Learning , year=
Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training , author=. Forty-second International Conference on Machine Learning , year=
-
[20]
Advances in Neural Information Processing Systems , volume=
RAMP: Boosting Adversarial Robustness Against Multiple l\_p Perturbations for Universal Robustness , author=. Advances in Neural Information Processing Systems , volume=
-
[21]
13th International Conference on Learning Representations, ICLR 2025 , pages=
INDIRECT GRADIENT MATCHING FOR ADVERSARIAL ROBUST DISTILLATION , author=. 13th International Conference on Learning Representations, ICLR 2025 , pages=. 2025 , organization=
2025
-
[22]
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
Vulnerable Data-Aware Adversarial Training , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
-
[23]
Entropic Value-at-Risk: A New Coherent Risk Measure
Ahmadi-Javid, A. Entropic Value-at-Risk: A New Coherent Risk Measure. Journal of Optimization Theory and Applications. 2012
2012
-
[24]
2004 , publisher=
Approaches to the Theory of Optimization , author=. 2004 , publisher=
2004
-
[25]
Studia Scientiarum Mathematicarum Hungarica , year=
Broniatowski, Michel and Keziou, Amor , title=. Studia Scientiarum Mathematicarum Hungarica , year=
-
[26]
IEEE Transactions on Information Theory , title=
X. IEEE Transactions on Information Theory , title=. 2010 , volume=
2010
-
[27]
Birrell, Jeremiah and Dupuis, Paul and Katsoulakis, Markos A and Pantazis, Yannis and Rey-Bellet, Luc , journal=. (f,
-
[28]
1997 , publisher=
Optimization by Vector Space Methods , author=. 1997 , publisher=
1997
-
[30]
IEEE Transactions on Information Theory , title=
F. IEEE Transactions on Information Theory , title=. 2006 , volume=
2006
-
[31]
Journal of the Royal Statistical Society: Series B (Methodological) , volume=
A general class of coefficients of divergence of one distribution from another , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1966 , publisher=
1966
-
[32]
Studia Sci
On information-type measure of difference of probability distributions and indirect observations , author=. Studia Sci. Math. Hungar. , volume=
-
[33]
International Conference on Artificial Intelligence and Statistics , pages=
A general theoretical paradigm to understand learning from human preferences , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=
2024
-
[34]
2023 , howpublished =
Zephyr-7B- , author =. 2023 , howpublished =
2023
-
[35]
2025 , howpublished =
Mistral-7B-Instruct-v0.1 , author =. 2025 , howpublished =
2025
-
[36]
2023 , howpublished =
Llama-2-7b-chat-hf , author =. 2023 , howpublished =
2023
-
[37]
2024 , howpublished =
Meta-Llama-3-8B-Instruct , author =. 2024 , howpublished =
2024
-
[39]
International Conference on Learning Representations , year=
Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=
-
[41]
Formulation and properties of a divergence used to compare probability measures without absolute continuity , DOI= "10.1051/cocv/2022002", url= "https://doi.org/10.1051/cocv/2022002", journal =
-
[42]
Function-space regularized
Jeremiah Birrell and Yannis Pantazis and Paul Dupuis and Luc Rey-Bellet and Markos Katsoulakis , booktitle=. Function-space regularized. 2023 , url=
2023
-
[43]
Katsoulakis and Yannis Pantazis and Luc Rey-Bellet , title =
Jeremiah Birrell and Paul Dupuis and Markos A. Katsoulakis and Yannis Pantazis and Luc Rey-Bellet , title =. Journal of Machine Learning Research , year =
-
[44]
2008 , publisher=
Optimal Transport: Old and New , author=. 2008 , publisher=
2008
-
[45]
Mohajerin Esfahani, Peyman and Kuhn, Daniel , title =. Mathematical Programming , volume =. doi:10.1007/s10107-017-1172-1 , year =
-
[46]
2013 , publisher=
Real Analysis: Modern Techniques and Their Applications , author=. 2013 , publisher=
2013
-
[47]
2018 , publisher=
Weak Convergence of Measures , author=. 2018 , publisher=
2018
-
[48]
2022 , eprint=
On Generalization and Regularization via Wasserstein Distributionally Robust Optimization , author=. 2022 , eprint=
2022
-
[49]
Distributionally Robust Optimization and Generalization in Kernel Methods , url =
Staib, Matthew and Jegelka, Stefanie , booktitle =. Distributionally Robust Optimization and Generalization in Kernel Methods , url =
-
[50]
2022 , eprint=
A General Wasserstein Framework for Data-driven Distributionally Robust Optimization: Tractability and Applications , author=. 2022 , eprint=
2022
-
[51]
Operations Research , volume =
Goh, Joel and Sim, Melvyn , title =. Operations Research , volume =. 2010 , doi =
2010
-
[52]
Operations Research , volume =
Delage, Erick and Ye, Yinyu , title =. Operations Research , volume =. 2010 , doi =. https://doi.org/10.1287/opre.1090.0741 , abstract =
-
[53]
Operations Research , volume =
Wiesemann, Wolfram and Kuhn, Daniel and Sim, Melvyn , title =. Operations Research , volume =. 2014 , doi =
2014
-
[54]
Ben-Tal, Aharon and Bertsimas, Dimitris and Brown, David B. , title =. Operations Research , volume =. 2010 , doi =. https://doi.org/10.1287/opre.1100.0821 , abstract =
-
[55]
and Hong, L.J
Hu, Z. and Hong, L.J. , year =
-
[56]
Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , volume =
Aharon Ben-Tal and Dick den Hertog and Anja De Waegenaere and Bertrand Melenberg and Gijs Rennen , journal =. Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , volume =
-
[57]
arXiv e-prints , keywords =
Recovering Best Statistical Guarantees via the Empirical Divergence-based Distributionally Robust Optimization. arXiv e-prints , keywords =. 2016
2016
-
[58]
Mathematics of Operations Research , volume =
Gao, Rui and Kleywegt, Anton , title =. Mathematics of Operations Research , volume =. 2023 , doi =. https://doi.org/10.1287/moor.2022.1275 , abstract =
-
[59]
Mathematics of Operations Research , volume =
Blanchet, Jose and Murthy, Karthyek , title =. Mathematics of Operations Research , volume =. 2019 , doi =
2019
-
[60]
Ahmadi-Javid
A. Ahmadi-Javid. Entropic value-at-risk: A new coherent risk measure. Journal of Optimization Theory and Applications, 155: 0 1105--1123, 2012
2012
-
[61]
A general class of coefficients of divergence of one distribution from another
Syed Mumtaz Ali and Samuel D Silvey. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society: Series B (Methodological), 28 0 (1): 0 131--142, 1966
1966
-
[62]
Explaining the role of intrinsic dimensionality in adversarial training
Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Hassan Sajjad, and Sanjay Chawla. Explaining the role of intrinsic dimensionality in adversarial training. In International Conference on Machine Learning, pp.\ 1298--1313. PMLR, 2025
2025
-
[63]
Many-shot jailbreaking
Cem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, et al. Many-shot jailbreaking. Advances in Neural Information Processing Systems, 37: 0 129696--129742, 2024
2024
-
[64]
A general theoretical paradigm to understand learning from human preferences
Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. A general theoretical paradigm to understand learning from human preferences. In International Conference on Artificial Intelligence and Statistics, pp.\ 4447--4455. PMLR, 2024
2024
-
[65]
An old-new concept of convex risk measures: The optimized certainty equivalent
Aharon Ben-Tal and Marc Teboulle. An old-new concept of convex risk measures: The optimized certainty equivalent. Mathematical Finance, 17 0 (3): 0 449--476, 2007. doi:10.1111/j.1467-9965.2007.00311.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9965.2007.00311.x
-
[66]
(f, ) -divergences: I nterpolating between f-divergences and integral probability metrics
Jeremiah Birrell, Paul Dupuis, Markos A Katsoulakis, Yannis Pantazis, and Luc Rey-Bellet. (f, ) -divergences: I nterpolating between f-divergences and integral probability metrics. Journal of machine learning research, 23 0 (39): 0 1--70, 2022
2022
-
[67]
Minimization of divergences on sets of signed measures
Michel Broniatowski and Amor Keziou. Minimization of divergences on sets of signed measures. Studia Scientiarum Mathematicarum Hungarica, 43 0 (4): 0 403–442, 2006
2006
-
[68]
Long-tailed adversarial training with self-distillation
Seungju Cho, Hongsin Lee, and Changick Kim. Long-tailed adversarial training with self-distillation. arXiv preprint arXiv:2503.06461, 2025
-
[69]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018
work page internal anchor Pith review arXiv 2018
-
[70]
On information-type measure of difference of probability distributions and indirect observations
Imre Csisz \'a r. On information-type measure of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar., 2: 0 299--318, 1967
1967
-
[71]
Mix AT : Combining continuous and discrete adversarial training for LLM s
Csaba D \'e k \'a ny, Stefan Balauca, Robin Staab, Dimitar I Dimitrov, and Martin Vechev. Mix AT : Combining continuous and discrete adversarial training for LLM s. Advances in Neural Information Processing Systems, 2025
2025
-
[72]
Vulnerable data-aware adversarial training
Yuqi Feng, Jiahao Fan, and Yanan Sun. Vulnerable data-aware adversarial training. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[73]
'' Short-length '' adversarial training helps LLM s defend '' Long-length '' jailbreak attacks: Theoretical and empirical evidence
Shaopeng Fu, Liang Ding, and Di Wang. '' Short-length '' adversarial training helps LLM s defend '' Long-length '' jailbreak attacks: Theoretical and empirical evidence. In ICLR 2025 Workshop on Foundation Models in the Wild, 2025. URL https://openreview.net/forum?id=U74MXMriLw
2025
-
[74]
Boosting adversarial robustness with clat: Criticality leveraged adversarial training
Bhavna Gopal, Huanrui Yang, Jingyang Zhang, Mark Horton, and Yiran Chen. Boosting adversarial robustness with clat: Criticality leveraged adversarial training. In Forty-second International Conference on Machine Learning, 2025
2025
-
[75]
Measuring massive multitask language understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. International Conference on Learning Representations, 2021
2021
-
[76]
Zephyr-7b-
Hugging Face H4 . Zephyr-7b- . https://github.com/huggingface/alignment-handbook, 2023. Hugging Face model checkpoint
2023
-
[77]
Ramp: Boosting adversarial robustness against multiple l\_p perturbations for universal robustness
Enyi Jiang and Gagandeep Singh. Ramp: Boosting adversarial robustness against multiple l\_p perturbations for universal robustness. Advances in Neural Information Processing Systems, 37: 0 43759--43787, 2024
2024
-
[78]
Indirect gradient matching for adversarial robust distillation
Hongsin Lee, Seungju Cho, and Changick Kim. Indirect gradient matching for adversarial robust distillation. In 13th International Conference on Learning Representations, ICLR 2025, pp.\ 49625--49646. International Conference on Learning Representations, ICLR, 2025
2025
-
[79]
Adversarial training can provably improve robustness: Theoretical analysis of feature learning process under structured data
Binghui Li and Yuanzhi Li. Adversarial training can provably improve robustness: Theoretical analysis of feature learning process under structured data. In The Thirteenth International Conference on Learning Representations, 2025
2025
-
[80]
Liese and I
F. Liese and I. Vajda . On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52 0 (10): 0 4394--4412, 2006
2006
-
[81]
Luenberger
D.G. Luenberger. Optimization by Vector Space Methods. Professional Series. Wiley, 1997. ISBN 9780471181170. URL https://books.google.com/books?id=M5n9DwAAQBAJ
1997
-
[82]
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, et al. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal. arXiv preprint arXiv:2402.04249, 2024
work page internal anchor Pith review arXiv 2024
-
[83]
Llama-2-7b-chat-hf
meta-llama . Llama-2-7b-chat-hf. https://huggingface.co/meta-llama/Llama-2-7b-chat-hf, 2023. Hugging Face model checkpoint
2023
-
[84]
Meta-llama-3-8b-instruct
meta-llama . Meta-llama-3-8b-instruct. https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct, 2024. Hugging Face model checkpoint
2024
-
[85]
Mistral-7b-instruct-v0.1
mistralai . Mistral-7b-instruct-v0.1. https://huggingface.co/mistralai/Mistral-7B-v0.1, 2025. Hugging Face model checkpoint
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.