arxiv: 2305.15717 · v1 · submitted 2023-05-25 · 💻 cs.CL

The False Promise of Imitating Proprietary LLMs

Arnav Gudibande , Eric Wallace , Charlie Snell , Xinyang Geng , Hao Liu , Pieter Abbeel , Sergey Levine , Dawn Song This is my paper

Pith reviewed 2026-05-18 06:49 UTC · model grok-4.3

classification 💻 cs.CL

keywords language model imitationfinetuningChatGPTopen-source LLMscapabilities gapinstruction followingfactuality

0 comments

The pith

Finetuning open models on proprietary LLM outputs like ChatGPT fails to close the capabilities gap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether finetuning weaker open language models on outputs from systems like ChatGPT can cheaply close the performance difference. Models ranging from 1.5B to 13B parameters were trained on imitation datasets from 0.3M to 150M tokens. Human crowd raters judged the resulting outputs as competitive in instruction following, yet targeted automatic benchmarks revealed almost no reduction in the gap to ChatGPT on tasks outside the imitation data distribution. The authors conclude that imitation primarily reproduces surface style rather than underlying capabilities, making it an ineffective route compared with building stronger base models.

Core claim

Model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs.

What carries the argument

Finetuning runs that generate imitation models from base sizes 1.5B–13B on varying volumes of ChatGPT outputs, followed by side-by-side comparison of crowd ratings against automatic evaluations on held-out NLP tasks.

Load-bearing premise

That the targeted automatic evaluations on tasks not heavily supported in the imitation data accurately capture the meaningful capabilities gap, rather than reflecting only the distribution of the collected imitation data itself.

What would settle it

An imitation model trained on a moderate data volume that matches ChatGPT accuracy on a task whose required skills are absent from the imitation set would refute the claim of a persistent gap.

read the original abstract

An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others). This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model. In this work, we critically analyze this approach. We first finetune a series of LMs that imitate ChatGPT using varying base model sizes (1.5B--13B), data sources, and imitation data amounts (0.3M--150M tokens). We then evaluate the models using crowd raters and canonical NLP benchmarks. Initially, we were surprised by the output quality of our imitation models -- they appear far better at following instructions, and crowd workers rate their outputs as competitive with ChatGPT. However, when conducting more targeted automatic evaluations, we find that imitation models close little to none of the gap from the base LM to ChatGPT on tasks that are not heavily supported in the imitation data. We show that these performance discrepancies may slip past human raters because imitation models are adept at mimicking ChatGPT's style but not its factuality. Overall, we conclude that model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs. In turn, we argue that the highest leverage action for improving open-source models is to tackle the difficult challenge of developing better base LMs, rather than taking the shortcut of imitating proprietary systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Imitation mostly copies style but leaves a clear factuality gap on unsupported tasks, and the evaluation choices make that gap look larger than it might be with better data coverage.

read the letter

Hey, the main thing to know is that this paper finds imitation fine-tuning from ChatGPT mostly teaches style rather than substance on tasks outside the data, so the models still lag ChatGPT on targeted automatic checks even when crowd raters rate them close. They ran a controlled set of runs that vary base model size from 1.5B to 13B, data sources, and imitation data volume from 0.3M to 150M tokens, then compared outputs to ChatGPT with both human ratings and standard benchmarks. That setup is more systematic than the single-model reports like Alpaca, and the split they show between fluent style and missing factuality on low-support tasks is the clearest new empirical point. The results line up with the claim that you would need a lot more data or a stronger base model to close the gap under current methods. The soft spot is exactly the one in the stress-test note: the automatic evaluations target tasks not heavily supported in the imitation sets, so without per-task overlap numbers or an ablation that adds coverage while holding the base model fixed, the persistent gap could partly reflect data mismatch rather than an intrinsic limit. Their conclusion that open-source efforts should focus on better pretraining instead of imitation still follows from the data they have, but the evidence is stronger on the style-versus-factuality distinction than on proving the gap is unbridgeable at scale. This is useful reading for anyone building or evaluating open models who might default to distillation. It shows straightforward empirical thinking and honest engagement with the limits of the approach. I would send it to peer review; the experiments are concrete enough to get useful feedback even if task selection needs more detail.

Referee Report

2 major / 2 minor

Summary. The paper claims that finetuning open-source LMs (1.5B–13B) on ChatGPT outputs using 0.3M–150M tokens of imitation data produces models that human raters find competitive with ChatGPT on instruction following, yet targeted automatic evaluations show these imitation models close little to none of the capabilities gap to ChatGPT on tasks not heavily supported in the imitation data. The authors attribute the human-automatic discrepancy to stylistic mimicry without corresponding gains in factuality, concluding that imitation is a false promise that can only be overcome with impractically large data volumes or stronger base models, and that open-source progress should instead prioritize better base LMs.

Significance. If the central empirical findings hold, the work provides a timely cautionary result for the open-source LLM community by demonstrating that current imitation pipelines do not substitute for stronger base models. The systematic variation across base-model sizes, data sources, and data volumes, combined with dual human and automatic evaluation protocols, supplies concrete evidence that stylistic fluency can mask persistent factual and reasoning shortfalls. This strengthens the case for redirecting effort toward pretraining improvements rather than post-hoc distillation.

major comments (2)

[Evaluation and Results sections] The load-bearing claim that imitation closes little of the gap specifically on tasks 'not heavily supported in the imitation data' (abstract and results) lacks explicit quantification. No per-task coverage statistics, token-overlap metrics, or ablations that increase support while holding the base model fixed are reported; without these, the observed discrepancies risk being partly tautological with the data-collection process rather than evidence of an intrinsic capabilities ceiling.
[Human vs. Automatic Evaluation Comparison] The interpretation that human raters are fooled by style while automatic metrics reveal factuality gaps (results) is plausible but under-supported. Additional controls—such as factuality-specific probes on high- versus low-coverage tasks or inter-rater reliability broken down by factual accuracy—would be needed to rule out that the automatic benchmarks simply penalize distribution shift.

minor comments (2)

[Experimental Setup] Clarify the precise list of canonical NLP benchmarks, any data-filtering rules applied to the imitation sets, and whether statistical significance was assessed across the multiple experimental configurations.
[Figures] Label all curves and bars in the performance plots with exact base-model sizes and token counts so that trends across the 0.3M–150M range are immediately readable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and indicating where revisions will be made to improve the paper.

read point-by-point responses

Referee: [Evaluation and Results sections] The load-bearing claim that imitation closes little of the gap specifically on tasks 'not heavily supported in the imitation data' (abstract and results) lacks explicit quantification. No per-task coverage statistics, token-overlap metrics, or ablations that increase support while holding the base model fixed are reported; without these, the observed discrepancies risk being partly tautological with the data-collection process rather than evidence of an intrinsic capabilities ceiling.

Authors: We appreciate this point and agree that more explicit quantification would help substantiate the claim. While our imitation data consists of general instruction-following examples generated via Self-Instruct, which by design covers a broad range of tasks, we acknowledge the value of direct metrics. In the revised version, we will add token-overlap statistics between the imitation dataset and each evaluation benchmark to quantify support levels. We will also include per-task analysis showing performance gaps on low-overlap tasks. Regarding ablations that increase support while holding the base model fixed, this would require collecting additional targeted data for specific benchmarks, which is computationally intensive but we will discuss this as a direction for future work and potentially include a small-scale experiment if space permits. revision: partial
Referee: [Human vs. Automatic Evaluation Comparison] The interpretation that human raters are fooled by style while automatic metrics reveal factuality gaps (results) is plausible but under-supported. Additional controls—such as factuality-specific probes on high- versus low-coverage tasks or inter-rater reliability broken down by factual accuracy—would be needed to rule out that the automatic benchmarks simply penalize distribution shift.

Authors: We agree that additional controls would strengthen the interpretation. The current evidence comes from the consistent pattern where imitation models match ChatGPT on human ratings for instruction following but lag on automatic metrics for factual and reasoning tasks. To address this, we will incorporate factuality-specific probes (e.g., using datasets like TruthfulQA) and analyze performance on high- versus low-coverage tasks in the revision. For inter-rater reliability, we will report breakdowns by task type if the data permits, to show that raters are consistent on stylistic aspects but the gaps appear in objective measures. We maintain that the automatic benchmarks are standard and not merely penalizing distribution shift, as the base models and imitation models are evaluated under the same conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparisons on held-out tasks are self-contained

full rationale

The paper conducts direct experimental finetuning of open LMs on imitation data from ChatGPT and evaluates performance gaps using crowd ratings and canonical NLP benchmarks on tasks not heavily supported in the data. No mathematical derivations, equations, or first-principles predictions are present that could reduce to fitted inputs by construction. The central claim rests on observable discrepancies between base models, imitation models, and the target, with no self-citation chains or ansatzes invoked to justify uniqueness or force results. This is a standard empirical study self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of crowd-sourced ratings as an initial quality signal and on the assumption that tasks outside the imitation data distribution are representative of general capabilities.

free parameters (1)

imitation data volume
Varied experimentally from 0.3M to 150M tokens to test scaling behavior.

axioms (1)

domain assumption Crowd worker ratings on instruction following provide a meaningful initial signal of model quality
The paper first reports positive crowd ratings before introducing automatic evaluations.

pith-pipeline@v0.9.0 · 5843 in / 1254 out tokens · 63564 ms · 2026-05-18T06:49:53.207839+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation.DimensionForcing dimension_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multi-Rollout On-Policy Distillation via Peer Successes and Failures
cs.LG 2026-05 unverdicted novelty 7.0

MOPD improves on-policy distillation for LLMs by using peer successes for positive patterns and failures for negative examples to create more informative teacher signals.
GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification
cs.AI 2026-04 unverdicted novelty 7.0

GFT uses group advantage learning and dynamic coefficient rectification to fix reward sparsity and optimization instability in SFT for LLMs, yielding better policies than standard SFT.
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
cs.SE 2025-10 conditional novelty 7.0

CodeRL+ integrates variable-level execution trajectory inference into RLVR training to align textual code representations with execution semantics, delivering 4.6% relative pass@1 gains and generalization to code-reas...
Self-Rewarding Language Models
cs.CL 2024-01 conditional novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
SOD: Step-wise On-policy Distillation for Small Language Model Agents
cs.CL 2026-05 unverdicted novelty 6.0

SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
cs.SE 2026-04 unverdicted novelty 6.0

Structured knowledge extracted from corpora enables test-driven data engineering for LLMs by mapping training data to source code, model training to compilation, benchmarking to unit testing, and failures to targeted ...
Hybrid Policy Distillation for LLMs
cs.CL 2026-04 unverdicted novelty 6.0

Hybrid Policy Distillation unifies existing knowledge distillation methods for LLMs into a reweighted log-likelihood objective and introduces a hybrid forward-reverse KL approach with mixed data sampling to improve st...
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
cs.LG 2025-07 unverdicted novelty 6.0

RaR uses aggregated rubric feedback as rewards in on-policy RL, delivering up to 31% relative gains on HealthBench and 7% on GPQA-Diamond versus direct Likert LLM-as-judge baselines.
Training Language Models to Self-Correct via Reinforcement Learning
cs.LG 2024-09 unverdicted novelty 6.0

SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
cs.AI 2024-08 conditional novelty 6.0

Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
cs.SE 2024-03 unverdicted novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
Zephyr: Direct Distillation of LM Alignment
cs.LG 2023-10 accept novelty 6.0

Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.
Towards Understanding Sycophancy in Language Models
cs.CL 2023-10 conditional novelty 6.0

Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
cs.CL 2023-10 unverdicted novelty 6.0

Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
Textbooks Are All You Need
cs.CL 2023-06 unverdicted novelty 6.0

A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.
MiniLLM: On-Policy Distillation of Large Language Models
cs.CL 2023-06 conditional novelty 6.0

MiniLLM distills large language models into smaller ones via reverse KL divergence and on-policy optimization, yielding higher-quality responses with lower exposure bias than standard KD baselines.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
cs.CL 2023-06 accept novelty 6.0

GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.
A Survey on Knowledge Distillation of Large Language Models
cs.CL 2024-02 accept novelty 3.0

A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages · cited by 18 Pith papers · 32 internal anchors

[1]

2021 , booktitle=

Extracting Training Data from Large Language Models , author=. 2021 , booktitle=

work page 2021
[2]

NIPS , year=

Attention is all you need , author=. NIPS , year=

work page
[3]

NIPS Deep Learning Workshop , year=

Distilling the knowledge in a neural network , author=. NIPS Deep Learning Workshop , year=

work page
[4]

How much do language models copy from their training data?

McCoy, R Thomas and Smolensky, Paul and Linzen, Tal and Gao, Jianfeng and Celikyilmaz, Asli , journal=. How much do language models copy from their training data?

work page
[5]

EMNLP , year=

Imitation attacks and defenses for black-box machine translation systems , author=. EMNLP , year=

work page
[6]

Diffusion Art or Digital Forgery?

Somepalli, Gowthami and Singla, Vasu and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom , booktitle=. Diffusion Art or Digital Forgery?

work page
[7]

Deduplicating Training Data Makes Language Models Better

Deduplicating training data makes language models better , author=. arXiv preprint arXiv:2107.06499 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

arXiv preprint arXiv:2112.12938 , year=

Counterfactual Memorization in Neural Language Models , author=. arXiv preprint arXiv:2112.12938 , year=

work page arXiv
[9]

ICLR , year=

Dataset inference: Ownership resolution in machine learning , author=. ICLR , year=

work page
[10]

Parth Thakkar , year=

work page
[11]

Hashimoto , title =

Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =

work page
[12]

2023 , journal=

Xinyang Geng and Arnav Gudibande and Hao Liu and Eric Wallace and Pieter Abbeel and Sergey Levine and Dawn Song , title =. 2023 , journal=

work page 2023
[13]

Aaron Gokaslan and Vanya Cohen and Ellie Pavlick and Stefanie Tellex , year=

work page
[14]

What neural networks memorize and why:

Feldman, Vitaly and Zhang, Chiyuan , booktitle=. What neural networks memorize and why:

work page
[15]

2020 , booktitle=

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. 2020 , booktitle=

work page 2020
[16]

OpenAI Technical Report , year=

Language Models are Unsupervised Multitask Learners , author=. OpenAI Technical Report , year=

work page
[17]

2021 , booktitle=

On Memorization in Probabilistic Deep Generative Models , author=. 2021 , booktitle=

work page 2021
[18]

Understanding Membership Inferences on Well-Generalized Learning Models

Understanding Membership Inferences on Well-Generalized Learning Models , author=. arXiv preprint arXiv:1802.04889 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others , journal=. The

work page
[20]

IEEE CSF , year=

Privacy risk in machine learning: Analyzing the connection to overfitting , author=. IEEE CSF , year=

work page
[21]

ACM CCS , year=

Model inversion attacks that exploit confidence information and basic countermeasures , author=. ACM CCS , year=

work page
[22]

2019 , booktitle=

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference , author=. 2019 , booktitle=

work page 2019
[23]

arXiv preprint arXiv:2111.08440 , year=

On the Importance of Difficulty Calibration in Membership Inference Attacks , author=. arXiv preprint arXiv:2111.08440 , year=

work page arXiv
[24]

TCC , year=

Calibrating noise to sensitivity in private data analysis , author=. TCC , year=

work page
[25]

arXiv preprint arXiv:2110.06500 , year=

Differentially private fine-tuning of language models , author=. arXiv preprint arXiv:2110.06500 , year=

work page arXiv
[26]

Microsoft Bets Big on the Creator of

Cade Metz and Karen Weise , journal=. Microsoft Bets Big on the Creator of

work page
[27]

2022 , booktitle=

Large Language Models Can Be Strong Differentially Private Learners , author=. 2022 , booktitle=

work page 2022
[28]

Privacy Preserving Machine Learning Workshop , year=

Training data leakage analysis in language models , author=. Privacy Preserving Machine Learning Workshop , year=

work page
[29]

USENIX Security Symposium , year=

The secret sharer: Evaluating and testing unintended memorization in neural networks , author=. USENIX Security Symposium , year=

work page
[30]

Do we train on test data?

Barz, Bj. Do we train on test data?. Journal of Imaging , year=

work page
[31]

A first look at rote learning in

Albert Ziegler , month=. A first look at rote learning in

work page
[32]

IEEE S&P , year=

Membership Inference Attacks From First Principles , author=. IEEE S&P , year=

work page
[33]

Nakano, Reiichiro and Hilton, Jacob and Balaji, Suchir and Wu, Jeff and Ouyang, Long and Kim, Christina and Hesse, Christopher and Jain, Shantanu and Kosaraju, Vineet and Saunders, William and others , journal=

work page
[34]

NeurIPS , year=

Learning to summarize with human feedback , author=. NeurIPS , year=

work page
[35]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Training a helpful and harmless assistant with reinforcement learning from human feedback , author=. arXiv preprint arXiv:2204.05862 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[36]

How Close is

Guo, Biyang and Zhang, Xin and Wang, Ziyuan and Jiang, Minqi and Nie, Jinran and Ding, Yuxuan and Yue, Jianwei and Wu, Yupeng , journal=. How Close is

work page
[37]

and Stoica, Ion and Xing, Eric P

Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , year =. Vicuna: An Open-Source Chatbot Impressing

work page
[38]

Natural questions:

Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and others , journal=. Natural questions:

work page
[39]

ICLR , year=

Measuring Massive Multitask Language Understanding , author=. ICLR , year=

work page
[40]

ICLR , year=

Quantifying memorization across neural language models , author=. ICLR , year=

work page
[41]

2018 , booktitle=

Hierarchical Neural Story Generation , author=. 2018 , booktitle=

work page 2018
[42]

OpenAI Blog https://openai

Better language models and their implications , author=. OpenAI Blog https://openai. com/blog/better-language-models , volume=

work page
[43]

2020 , booktitle=

Language Models are Few-Shot Learners , author=. 2020 , booktitle=

work page 2020
[44]

2021 , booktitle=

Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models , author=. 2021 , booktitle=

work page 2021
[45]

Recht, Benjamin and Roelofs, Rebecca and Schmidt, Ludwig and Shankar, Vaishaal , journal=. Do

work page
[46]

SemEval , author=

SemEval-2017 Task 1:. SemEval , author=

work page 2017
[47]

Proceedings of the 53rd Annual

Gavin Brown and Mark Bun and Vitaly Feldman and Adam Smith and Kunal Talwar , title =. Proceedings of the 53rd Annual

work page
[48]

2019 , publisher =

Song, Congzheng and Shmatikov, Vitaly , title =. 2019 , publisher =

work page 2019
[49]

Model Inversion Attacks for Prediction Systems: Without Knowledge of Non-Sensitive Attributes , year=

Hidano, Seira and Murakami, Takao and Katsumata, Shuichi and Kiyomoto, Shinsaku and Hanaoka, Goichiro , booktitle=. Model Inversion Attacks for Prediction Systems: Without Knowledge of Non-Sensitive Attributes , year=

work page
[50]

Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security , pages =

Song, Congzheng and Raghunathan, Ananth , title =. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security , pages =. 2020 , publisher =

work page 2020
[51]

2019 , booktitle =

Yang, Ziqi and Zhang, Jiyi and Chang, Ee-Chien and Liang, Zhenkai , title =. 2019 , booktitle =

work page 2019
[52]

Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?

Lehman, Eric and Jain, Sarthak and Pichotta, Karl and Goldberg, Yoav and Wallace, Byron. Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021

work page 2021
[53]

arXiv preprint arXiv:2205.01863 , year=

Provably Confidential Language Modelling , author=. arXiv preprint arXiv:2205.01863 , year=

work page arXiv
[54]

Privacy Regularization: Joint Privacy-Utility Optimization in L anguage M odels

Mireshghallah, Fatemehsadat and Inan, Huseyin and Hasegawa, Marcello and R. Privacy Regularization: Joint Privacy-Utility Optimization in L anguage M odels. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021

work page 2021
[55]

Towards Robust and Privacy-preserving Text Representations

Li, Yitong and Baldwin, Timothy and Cohn, Trevor. Towards Robust and Privacy-preserving Text Representations. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018

work page 2018
[56]

Privacy-preserving Neural Representations of Text

Coavoux, Maximin and Narayan, Shashi and Cohen, Shay B. Privacy-preserving Neural Representations of Text. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018

work page 2018
[57]

Scaling Laws and Interpretability of Learning from Repeated Data

Scaling Laws and Interpretability of Learning from Repeated Data , author=. arXiv preprint arXiv:2205.10487 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

Language Models as Knowledge Bases?

Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019

work page 2019
[59]

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Roberts, Adam and Raffel, Colin and Shazeer, Noam. How Much Knowledge Can You Pack Into the Parameters of a Language Model?. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020

work page 2020
[60]

ICLR , year=

Multitask prompted training enables zero-shot task generalization , author=. ICLR , year=

work page
[61]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[62]

arXiv preprint arXiv:2302.10724 , year=

Koco. arXiv preprint arXiv:2302.10724 , year=

work page arXiv
[63]

ICLR , year=

Finetuned language models are zero-shot learners , author=. ICLR , year=

work page
[64]

EMNLP , year=

Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections , author=. EMNLP , year=

work page
[65]

Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback , author=. arXiv preprint arXiv:2203.02155 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[66]

Min, Sewon and Lewis, Mike and Zettlemoyer, Luke and Hajishirzi, Hannaneh , booktitle=

work page
[67]

Black, Sid and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and others , booktitle=

work page
[68]

Zhang, Susan and Roller, Stephen and Goyal, Naman and Artetxe, Mikel and Chen, Moya and Chen, Shuohui and Dewan, Christopher and Diab, Mona and Li, Xian and Lin, Xi Victoria and others , journal=

work page
[69]

EMNLP , year=

Benchmarking generalization via in-context instructions on 1,600+ language tasks , author=. EMNLP , year=

work page
[70]

Zelle and Raymond J

John M. Zelle and Raymond J. Mooney , booktitle = aaai, year =

work page
[71]

Analyzing uncertainty in neural machine translation , author=

work page
[72]

NeurIPS , year=

Denoising diffusion probabilistic models , author=. NeurIPS , year=

work page
[73]

Prediction Poisoning: Towards Defenses Against

Orekondy, Tribhuvanesh and Schiele, Bernt and Fritz, Mario , booktitle=. Prediction Poisoning: Towards Defenses Against

work page
[74]

ICLR , year=

Increasing the cost of model extraction with calibrated proof of work , author=. ICLR , year=

work page
[75]

ICLR , year=

On the difficulty of defending self-supervised learning against model extraction , author=. ICLR , year=

work page
[76]

Watermarking the outputs of structured prediction with an application in statistical machine translation , author=

work page
[77]

IEEE S&P , year=

Membership inference attacks against machine learning models , author=. IEEE S&P , year=

work page
[78]

Black-box adversarial attacks with limited queries and information , author=

work page
[79]

Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , author=

work page
[80]

Gil, Yotam and Chai, Yoav and Gorodissky, Or and Berant, Jonathan , booktitle=naacl, year=

work page

Showing first 80 references.