Context-Instrumental Data Distillation for Kubernetes Manifest Generation: Method and Experimental Evaluation

Aleksandr Kozachok; Anatoliy Bakaev; Andrey Kozachok; Artem Noev; Shamil Magomedov

arxiv: 2605.25835 · v1 · pith:HS4F4RW3new · submitted 2026-05-25 · 💻 cs.LG · cs.AI

Context-Instrumental Data Distillation for Kubernetes Manifest Generation: Method and Experimental Evaluation

Andrey Kozachok , Anatoliy Bakaev , Aleksandr Kozachok , Shamil Magomedov , Artem Noev This is my paper

Pith reviewed 2026-06-29 23:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Kubernetesmanifest generationsmall language modelsdata distillationYAMLsupervised fine-tuningLoRADSL

0 comments

The pith

Strict output format requirements improve small language model Kubernetes manifest quality more than adding training examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a context-instrumental data distillation approach to specialize small language models for producing Kubernetes YAML files. It creates training data through synthetic generation and reverse instructions from real files, keeping only those pairs that pass external validators and fit a domain context model. This reduces to standard supervised fine-tuning on the verified set. In a resource-limited pilot using a 1.5B parameter model, the method reached 91.5 percent full-pass accuracy on a held-out test set. The results indicate that enforcing strict output formats in prompts contributed more to success than simply scaling up the number of training examples.

Core claim

The context-instrumental data distillation method forms a corpus of synthetic and reverse-generated instruction pairs for Kubernetes manifests, includes them only after validation by external tools and a domain context model, and fine-tunes small models via supervised learning on the filtered data. In the pilot experiment, this produced a 91.5% full-pass@1 rate on 200 test cases when using strict prompt requirements and a token limit of 768, showing that format enforcement outweighed corpus size increases.

What carries the argument

Context-instrumental data distillation, which filters synthetic and reverse-instruction pairs using external validators and a domain context model before supervised fine-tuning.

If this is right

Quality in Kubernetes YAML generation depends more on strict output format requirements than on the volume of training examples.
Small language models with 1.5 billion parameters can achieve over 90 percent accuracy in generating valid manifests after fine-tuning on verified examples.
The method allows specialization of models up to 4 billion parameters under resource constraints using CPU-based LoRA.
Reverse instruction generation from real YAML files provides an additional source of training pairs when combined with validation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This filtering approach might reduce the need for massive datasets in other domain-specific language generation tasks.
Resource-constrained fine-tuning on CPU could extend to other infrastructure-as-code domains if similar validators exist.
The emphasis on prompt strictness suggests that prompt engineering may interact strongly with data quality in DSL tasks.

Load-bearing premise

The external validators and domain context model correctly identify high-quality examples without excluding valid ones or allowing invalid data through.

What would settle it

Re-running the fine-tuning with human-validated examples instead of the automatic validators and measuring if full-pass@1 drops below 91.5%.

Figures

Figures reproduced from arXiv: 2605.25835 by Aleksandr Kozachok, Anatoliy Bakaev, Andrey Kozachok, Artem Noev, Shamil Magomedov.

**Figure 1.** Figure 1: Pilot experiment pipeline: generation via API, L1–L4 filtering, canonicalization, deduplication, fixed split, LoRA training, and evaluation on test_200. 4.1 Stage 1: Assembly of Source Pairs In the pilot implementation, the teacher model is the DeepSeek-V4 Flash API (deepseek-v4-flash). The primary stream used was synthetic_direct: the teacher receives a structured prompt including the target resource fami… view at source ↗

**Figure 2.** Figure 2: Kubernetes context model components and their relationship to instrumental verification levels L1–L4. – Method: LoRA, fp32; – LoRA rank: r = 4; – LoRA alpha: α = 8; – Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj; – Optimizer: AdamW; – Mini-batch size: 1 with gradient accumulation; – Hardware platform: CPU, laptop with 32 GB RAM; – Evaluation mode: Hugging Face Transformers … view at source ↗

**Figure 3.** Figure 3: full-pass@1 trajectory on fixed test_200. The main jump 82.0% → 91.0% was obtained by changing the inference mode, without retraining the adapter. The best result was achieved in runs with a stricter prompt formulation and max_new_tokens=768. Increasing the training set to 2 000 examples with the same inference mode did not improve quality but reduced full-pass@1 to 78.5%. In contrast, changing the inferen… view at source ↗

**Figure 4.** Figure 4: Distribution of failures by L1–L4 levels for four comparable runs. In the best mode, residual errors concentrate primarily at L2. 1K-adapter result from 82.0% to 91.0%. Subsequent residual correction gave a small improvement to 91.5%, corresponding to just one additional successfully passing test example. Resource characteristics of the best run are presented in [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

This paper examines the specialization of Small Language Models (SLMs) with up to 4 billion parameters for generating artifacts in domain-specific languages (DSL). Kubernetes manifests are chosen as the target domain. We propose the context-instrumental data distillation method: the source corpus is formed through synthetic generation and, in an extended scheme, through reverse instruction generation from real Kubernetes YAML files, with pairs included in training only upon passing external validators and matching the domain context model. Unlike classical KL-divergence knowledge distillation, the baseline implementation reduces to supervised fine-tuning on instrumentally verified examples. The experimental section presents a pilot implementation under resource-constrained conditions: the DeepSeek-V4 Flash API serves as the teacher for synthetic generation, while Qwen2.5-Coder-1.5B-Instruct is fine-tuned via LoRA on CPU. On the K8s-Distill-Pilot corpus (train_1200, validation_100, test_200), we achieved full-pass@1 = 91.5% (183/200) with a stricter prompt formulation and max_new_tokens=768. The key empirical finding is that for Kubernetes YAML, result quality in the pilot depended more on strict output format requirements than on simply increasing the number of training examples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports a 91.5% pass rate on Kubernetes manifest generation after filtering 1200 synthetic examples but the claim that format rules mattered more than data volume has no supporting comparisons.

read the letter

The main takeaway is a pilot result: after filtering synthetic data with external validators, fine-tuning Qwen2.5-Coder-1.5B-Instruct on 1200 examples produced 91.5% full-pass@1 on a 200-example test set under CPU-only LoRA training. The authors also note that stricter output format requirements appeared to drive quality more than simply adding more examples.

The work is straightforward supervised fine-tuning on verified pairs rather than anything more elaborate. They describe the corpus construction, the teacher model for synthetic generation, and the resource-constrained setup clearly enough for a reader to understand what was done. The high pass rate under those constraints is a concrete data point for anyone working on small-model specialization for infrastructure DSLs.

The central claim about format versus quantity is the weak point. The abstract gives only the single successful configuration and states the relative importance without showing pass rates for looser prompts at the same data size or for smaller corpora with strict prompts. Without those numbers the finding cannot be assessed. Details on test-set construction and checks for leakage from the synthetic generation step are also missing, which leaves open questions about how robust the 91.5% figure is.

This is useful for practitioners or researchers focused on Kubernetes automation or narrow DSL generation tasks. A reader looking for a worked example of filtered synthetic data for code-like output will find the numbers and setup worth seeing. It is not positioned as a general method advance.

I would send it to peer review. The empirical result is specific enough that referees can evaluate the missing ablations and data details, and the paper is honest about its pilot scope.

Referee Report

2 major / 1 minor

Summary. The paper proposes the context-instrumental data distillation method for fine-tuning small language models (≤4B parameters) to generate Kubernetes manifests. The source corpus is built via synthetic generation (DeepSeek-V4 teacher) and reverse instruction from real YAML, with pairs retained only after passing external validators and a domain context model. This reduces to supervised fine-tuning (LoRA on Qwen2.5-Coder-1.5B-Instruct under CPU constraints). On the K8s-Distill-Pilot corpus (train_1200 / val_100 / test_200), the pilot reports full-pass@1 = 91.5% (183/200) using stricter prompt formulation and max_new_tokens=768. The central empirical claim is that output-format strictness affected quality more than simply increasing the number of training examples.

Significance. If the comparative claim on format versus quantity were substantiated with controlled ablations, the work could usefully inform data-curation priorities for resource-constrained DSL generation. The emphasis on instrumentally verified filtering is a methodological strength of the pilot. At present the absence of the required comparative results limits the strength of that specific finding.

major comments (2)

[Abstract] Abstract: the claim that 'result quality in the pilot depended more on strict output format requirements than on simply increasing the number of training examples' is unsupported. No pass rates, ablation tables, or descriptions are supplied for non-strict prompts at matched example counts or for varying example counts under fixed prompt strictness, so the relative magnitude of the two factors cannot be assessed.
[Experimental section] Experimental section (K8s-Distill-Pilot corpus description): the 91.5% full-pass@1 result is given without details on test-set construction, potential train/test leakage, or any baseline comparisons. This directly weakens the soundness of the key empirical finding.

minor comments (1)

[Abstract] Abstract: the phrasing 'reduces to supervised fine-tuning on instrumentally verified examples' is accurate but could briefly contrast the approach with standard knowledge-distillation objectives for readers outside the subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the central claim requires explicit supporting evidence and that the experimental section needs additional details on methodology and validation. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'result quality in the pilot depended more on strict output format requirements than on simply increasing the number of training examples' is unsupported. No pass rates, ablation tables, or descriptions are supplied for non-strict prompts at matched example counts or for varying example counts under fixed prompt strictness, so the relative magnitude of the two factors cannot be assessed.

Authors: We acknowledge that the manuscript does not provide the ablation studies or pass-rate comparisons needed to substantiate the relative impact of strict output formatting versus training example count. This claim was based on internal pilot observations that were not reported with quantitative details. In the revised version we will add the required ablation tables (comparing strict vs. non-strict prompts at matched example counts and varying example counts under fixed prompt strictness) or, if space constraints prevent full inclusion, we will qualify or remove the claim from the abstract. revision: yes
Referee: [Experimental section] Experimental section (K8s-Distill-Pilot corpus description): the 91.5% full-pass@1 result is given without details on test-set construction, potential train/test leakage, or any baseline comparisons. This directly weakens the soundness of the key empirical finding.

Authors: We agree that the experimental section lacks necessary details on test-set construction, checks for train/test leakage, and baseline comparisons, which limits evaluation of the 91.5% result. We will expand this section to describe how the 200 test examples were selected, any deduplication or leakage detection steps performed against the training set, and comparisons against baselines including the untuned Qwen2.5-Coder-1.5B-Instruct model under identical prompting conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical method and measurement

full rationale

The paper presents a data-generation and fine-tuning pipeline evaluated on held-out test examples. No equations, fitted parameters, or derivations are described. The reported 91.5% full-pass@1 is a direct count on the test_200 set after training on the filtered train_1200 set; it is not obtained by re-using the same quantity as an input or by any self-referential definition. The claim that format strictness mattered more than example count is an informal observation from a single pilot run and does not rely on any of the enumerated circular patterns. No self-citations or uniqueness theorems are invoked as load-bearing premises.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical study and does not rely on or introduce any mathematical axioms, free parameters, or invented entities beyond standard machine learning practices.

pith-pipeline@v0.9.1-grok · 5776 in / 1120 out tokens · 40801 ms · 2026-06-29T23:04:55.245517+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 16 canonical work pages · 6 internal anchors

[1]

https://connect.uptimeinstitute.com/resources/research-and-reports/annual- outage-analysis-2023

Uptime Institute: Annual Outage Analysis 2023 (2023). https://connect.uptimeinstitute.com/resources/research-and-reports/annual- outage-analysis-2023

2023
[2]

https://www.qualys.com/2023/totalcloud-security-insights/

Qualys Threat Research Unit: 2023 Qualys TotalCloud Security Insights (2023). https://www.qualys.com/2023/totalcloud-security-insights/

2023
[3]

Qwen2.5-Coder Technical Report

Hui, B., Yang, J., Cui, Z., et al.: Qwen2.5-Coder Technical Report. arXiv:2409.12186 (2024). https://doi.org/10.48550/arXiv.2409.12186

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.12186 2024
[4]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Abdin, M., Jacobs, S.A., Awan, A.A., et al.: Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv:2404.14219 (2024). https://doi.org/10.48550/arXiv.2404.14219

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.14219 2024
[5]

Textbooks Are All You Need

Gunasekar, S., Zhang, Y., Aneja, J., et al.: Textbooks Are All You Need. arXiv:2306.11644 (2023). https://doi.org/10.48550/arXiv.2306.11644

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.11644 2023
[6]

arXiv:2312.02120 (2023)

Wei, Y., Wang, Z., Liu, J., Ding, Y., Zhang, L.: Magicoder: Source Code Is All You Need. arXiv:2312.02120 (2023). https://doi.org/10.48550/arXiv.2312.02120

work page doi:10.48550/arxiv.2312.02120 2023
[7]

Wizardcoder: Empowering code large language models with evol-instruct.arXiv preprint arXiv:2306.08568,

Luo, Z., Xu, C., Zhao, P., et al.: WizardCoder: Empowering Code Large Language Models with Evol-Instruct. arXiv:2306.08568 (2023). https://doi.org/10.48550/arXiv.2306.08568

work page doi:10.48550/arxiv.2306.08568 2023
[8]

arXiv:2402.10379 (2024)

Patel, A., Raffel, C., Callison-Burch, C.: DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows. arXiv:2402.10379 (2024). https://doi.org/10.48550/arXiv.2402.10379

work page doi:10.48550/arxiv.2402.10379 2024
[9]

Foerster, Roberta Raileanu, and Maria Lomeli

Lupidi, A.M., Gemmell, C., Cancedda, N., et al.: Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources. arXiv:2409.08239 (2024). https://doi.org/10.48550/arXiv.2409.08239

work page doi:10.48550/arxiv.2409.08239 2024
[10]

Programming and Computer Software41(1), 49–64 (2015)

Zakharov, I.S., Mandrykin, M.U., Mutilin, V.S., Novikov, E.M., Petrenko, A.K., Khoroshilov, A.V.: Configurable Toolset for Static Verification of Operating Sys- tems Kernel Modules. Programming and Computer Software41(1), 49–64 (2015). https://doi.org/10.1134/S0361768815010065

work page doi:10.1134/s0361768815010065 2015
[12]

Trudy ISP RAN29(3), 43–56 (2017)

Khoroshilov,A.V.,Shchepetkov,I.V.:ADV_SPM–FormalSecurityPolicyModels in Practice. Trudy ISP RAN29(3), 43–56 (2017). https://doi.org/10.15514/ispras- 2017-29(3)-4

work page doi:10.15514/ispras- 2017
[13]

arXiv:2305.19234 (2023)

Willard, B.T., Louf, R.: Efficient Guided Generation for Large Language Models. arXiv:2305.19234 (2023). https://doi.org/10.48550/arXiv.2305.19234

work page doi:10.48550/arxiv.2305.19234 2023
[14]

In: Proceedings of EMNLP 2023, pp

Geng, S., Josifoski, M., Peyrard, M., West, R.: Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning. In: Proceedings of EMNLP 2023, pp. 10932–10952 (2023). https://doi.org/10.18653/v1/2023.emnlp-main.674

work page doi:10.18653/v1/2023.emnlp-main.674 2023
[15]

arXiv:2405.21047 (2024)

Park, K., Wang, J., Berg-Kirkpatrick, T., Polikarpova, N., D’Antoni, L.: Grammar-Aligned Decoding. arXiv:2405.21047 (2024). https://doi.org/10.48550/arXiv.2405.21047

work page doi:10.48550/arxiv.2405.21047 2024
[16]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural Network. arXiv:1503.02531 (2015). https://doi.org/10.48550/arXiv.1503.02531

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1503.02531 2015
[17]

Kocetkov, D., Li, R., Ben Allal, L., et al.: The Stack: 3 TB of Permissively Licensed SourceCode.arXiv:2211.15533(2022).https://doi.org/10.48550/arXiv.2211.15533

work page doi:10.48550/arxiv.2211.15533 2022
[18]

https://huggingface.co/datasets/substratusai/the-stack-yaml-k8s

Substratus AI: The Stack YAML K8s Dataset. https://huggingface.co/datasets/substratusai/the-stack-yaml-k8s
[19]

https://artifacthub.io/docs/api/ Context-Instrumental Data Distillation for K8s Manifests 15

Artifact Hub: API Documentation. https://artifacthub.io/docs/api/ Context-Instrumental Data Distillation for K8s Manifests 15
[20]

https://kubeconform.mandragor.org/docs/overview/

Kubeconform Documentation: A Fast Kubernetes Manifests Validator. https://kubeconform.mandragor.org/docs/overview/
[21]

https://github.com/bridgecrewio/checkov

Bridgecrew: Checkov – Static Code Analysis for Infrastructure as Code. https://github.com/bridgecrewio/checkov
[22]

https://trivy.dev/docs/

Aqua Security: Trivy Documentation – Misconfiguration Scanning for IaC and Kubernetes. https://trivy.dev/docs/
[23]

QLoRA: Efficient Finetuning of Quantized LLMs

Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314 (2023). https://doi.org/10.48550/arXiv.2305.14314

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.14314 2023
[24]

Evaluating Large Language Models Trained on Code

Chen, M., Tworek, J., Jun, H., et al.: Evaluating Large Language Models Trained on Code. arXiv:2107.03374 (2021). https://doi.org/10.48550/arXiv.2107.03374

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2107.03374 2021

[1] [1]

https://connect.uptimeinstitute.com/resources/research-and-reports/annual- outage-analysis-2023

Uptime Institute: Annual Outage Analysis 2023 (2023). https://connect.uptimeinstitute.com/resources/research-and-reports/annual- outage-analysis-2023

2023

[2] [2]

https://www.qualys.com/2023/totalcloud-security-insights/

Qualys Threat Research Unit: 2023 Qualys TotalCloud Security Insights (2023). https://www.qualys.com/2023/totalcloud-security-insights/

2023

[3] [3]

Qwen2.5-Coder Technical Report

Hui, B., Yang, J., Cui, Z., et al.: Qwen2.5-Coder Technical Report. arXiv:2409.12186 (2024). https://doi.org/10.48550/arXiv.2409.12186

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.12186 2024

[4] [4]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Abdin, M., Jacobs, S.A., Awan, A.A., et al.: Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv:2404.14219 (2024). https://doi.org/10.48550/arXiv.2404.14219

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.14219 2024

[5] [5]

Textbooks Are All You Need

Gunasekar, S., Zhang, Y., Aneja, J., et al.: Textbooks Are All You Need. arXiv:2306.11644 (2023). https://doi.org/10.48550/arXiv.2306.11644

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.11644 2023

[6] [6]

arXiv:2312.02120 (2023)

Wei, Y., Wang, Z., Liu, J., Ding, Y., Zhang, L.: Magicoder: Source Code Is All You Need. arXiv:2312.02120 (2023). https://doi.org/10.48550/arXiv.2312.02120

work page doi:10.48550/arxiv.2312.02120 2023

[7] [7]

Wizardcoder: Empowering code large language models with evol-instruct.arXiv preprint arXiv:2306.08568,

Luo, Z., Xu, C., Zhao, P., et al.: WizardCoder: Empowering Code Large Language Models with Evol-Instruct. arXiv:2306.08568 (2023). https://doi.org/10.48550/arXiv.2306.08568

work page doi:10.48550/arxiv.2306.08568 2023

[8] [8]

arXiv:2402.10379 (2024)

Patel, A., Raffel, C., Callison-Burch, C.: DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows. arXiv:2402.10379 (2024). https://doi.org/10.48550/arXiv.2402.10379

work page doi:10.48550/arxiv.2402.10379 2024

[9] [9]

Foerster, Roberta Raileanu, and Maria Lomeli

Lupidi, A.M., Gemmell, C., Cancedda, N., et al.: Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources. arXiv:2409.08239 (2024). https://doi.org/10.48550/arXiv.2409.08239

work page doi:10.48550/arxiv.2409.08239 2024

[10] [10]

Programming and Computer Software41(1), 49–64 (2015)

Zakharov, I.S., Mandrykin, M.U., Mutilin, V.S., Novikov, E.M., Petrenko, A.K., Khoroshilov, A.V.: Configurable Toolset for Static Verification of Operating Sys- tems Kernel Modules. Programming and Computer Software41(1), 49–64 (2015). https://doi.org/10.1134/S0361768815010065

work page doi:10.1134/s0361768815010065 2015

[11] [12]

Trudy ISP RAN29(3), 43–56 (2017)

Khoroshilov,A.V.,Shchepetkov,I.V.:ADV_SPM–FormalSecurityPolicyModels in Practice. Trudy ISP RAN29(3), 43–56 (2017). https://doi.org/10.15514/ispras- 2017-29(3)-4

work page doi:10.15514/ispras- 2017

[12] [13]

arXiv:2305.19234 (2023)

Willard, B.T., Louf, R.: Efficient Guided Generation for Large Language Models. arXiv:2305.19234 (2023). https://doi.org/10.48550/arXiv.2305.19234

work page doi:10.48550/arxiv.2305.19234 2023

[13] [14]

In: Proceedings of EMNLP 2023, pp

Geng, S., Josifoski, M., Peyrard, M., West, R.: Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning. In: Proceedings of EMNLP 2023, pp. 10932–10952 (2023). https://doi.org/10.18653/v1/2023.emnlp-main.674

work page doi:10.18653/v1/2023.emnlp-main.674 2023

[14] [15]

arXiv:2405.21047 (2024)

Park, K., Wang, J., Berg-Kirkpatrick, T., Polikarpova, N., D’Antoni, L.: Grammar-Aligned Decoding. arXiv:2405.21047 (2024). https://doi.org/10.48550/arXiv.2405.21047

work page doi:10.48550/arxiv.2405.21047 2024

[15] [16]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural Network. arXiv:1503.02531 (2015). https://doi.org/10.48550/arXiv.1503.02531

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1503.02531 2015

[16] [17]

Kocetkov, D., Li, R., Ben Allal, L., et al.: The Stack: 3 TB of Permissively Licensed SourceCode.arXiv:2211.15533(2022).https://doi.org/10.48550/arXiv.2211.15533

work page doi:10.48550/arxiv.2211.15533 2022

[17] [18]

https://huggingface.co/datasets/substratusai/the-stack-yaml-k8s

Substratus AI: The Stack YAML K8s Dataset. https://huggingface.co/datasets/substratusai/the-stack-yaml-k8s

[18] [19]

https://artifacthub.io/docs/api/ Context-Instrumental Data Distillation for K8s Manifests 15

Artifact Hub: API Documentation. https://artifacthub.io/docs/api/ Context-Instrumental Data Distillation for K8s Manifests 15

[19] [20]

https://kubeconform.mandragor.org/docs/overview/

Kubeconform Documentation: A Fast Kubernetes Manifests Validator. https://kubeconform.mandragor.org/docs/overview/

[20] [21]

https://github.com/bridgecrewio/checkov

Bridgecrew: Checkov – Static Code Analysis for Infrastructure as Code. https://github.com/bridgecrewio/checkov

[21] [22]

https://trivy.dev/docs/

Aqua Security: Trivy Documentation – Misconfiguration Scanning for IaC and Kubernetes. https://trivy.dev/docs/

[22] [23]

QLoRA: Efficient Finetuning of Quantized LLMs

Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314 (2023). https://doi.org/10.48550/arXiv.2305.14314

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.14314 2023

[23] [24]

Evaluating Large Language Models Trained on Code

Chen, M., Tworek, J., Jun, H., et al.: Evaluating Large Language Models Trained on Code. arXiv:2107.03374 (2021). https://doi.org/10.48550/arXiv.2107.03374

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2107.03374 2021