ConfusionPrompt: Practical Private Inference for Online Large Language Models

Peihua Mai; Ran Yan; Rui Ye; Yan Pang; Youjia Yang

arxiv: 2401.00870 · v5 · submitted 2023-12-30 · 💻 cs.CR · cs.AI

ConfusionPrompt: Practical Private Inference for Online Large Language Models

Peihua Mai , Youjia Yang , Ran Yan , Rui Ye , Yan Pang This is my paper

Pith reviewed 2026-05-24 04:56 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords private inferencelarge language modelsprompt decompositionprivacy-utility tradeoffblack-box LLMstext perturbationrecomposition

0 comments

The pith

ConfusionPrompt protects prompts sent to black-box LLMs by splitting them into sub-prompts mixed with generated pseudo-prompts that the user later filters and recomposes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ConfusionPrompt to address privacy risks when users send detailed prompts to online LLM services. It decomposes a user prompt into smaller genuine sub-prompts, creates accompanying pseudo-prompts, sends the mixed group to the server, and lets the user recompose the returned responses into the final answer. This design works with existing closed models without requiring changes to the LLM itself. It claims a better privacy-utility balance than prior text-perturbation approaches and lower memory use than running open-source models locally. The authors also define a (λ, μ, ρ)-privacy model for prompt groups and analyze the complexity savings from decomposition.

Core claim

ConfusionPrompt achieves private inference on black-box LLMs by decomposing the original prompt into sub-prompts, generating pseudo-prompts to form a privacy-preserving group, transmitting the mixed set to the server, and allowing the user to filter and recompose the responses into the correct output, yielding higher utility than local open-source inference or perturbation methods while using less memory than full local models.

What carries the argument

The ConfusionPrompt framework, which decomposes prompts into genuine sub-prompts, interleaves them with pseudo-prompts, and relies on user-side recomposition of server responses.

If this is right

Black-box LLM services can be used privately without model changes or local model hosting.
Prompt decomposition reduces the computational burden compared to full local open-source models.
The (λ, μ, ρ)-privacy model quantifies the protection level of any mixed prompt group.
Complexity analysis shows decomposition lowers the effective query cost for privacy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may extend to multi-turn conversations if recomposition logic can track context across exchanges.
If pseudo-prompt generation can be made domain-specific, utility loss could drop further for specialized tasks.
Adoption would require users to run a local client for decomposition and recomposition, shifting some compute from server to client.

Load-bearing premise

The user can reliably identify which server responses come from genuine sub-prompts and recombine them into the correct final output without large accuracy loss.

What would settle it

A test set of prompts where an automated or human recomposer fails to recover the original answer at a rate comparable to direct LLM use, or where an adversary distinguishes genuine from pseudo sub-prompts above the (λ, μ, ρ) threshold.

Figures

Figures reproduced from arXiv: 2401.00870 by Peihua Mai, Ran Yan, Rui Ye, Yan Pang, Youjia Yang.

**Figure 1.** Figure 1: Overview of ConfusionPrompt. for the evaluation of privacy level and training of models (i.e., decomposer, generator, and recomposer). To explain the rationale of our privacy model, we follow [33] to quantify the privacy risk of the queries exposed to the server. Consider a set of prompts denoted as P = {p1 , p2 , ..., pn}. For any p ∈ P , let π(p) be the adversary’s prior probability that p is the genuine… view at source ↗

**Figure 2.** Figure 2: Example of decomposition savings in query complexity. Decomposition module reduces [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Prompt identification attack accuracy under various combinations of privacy parameters. [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Attribute inference attack accuracy for ConfusionPrompt and LDP-based methods. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Monetary ratio of strategyQA and MuSiQue dataset before ( [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

State-of-the-art large language models (LLMs) are typically deployed as online services, requiring users to transmit detailed prompts to cloud servers. This raises significant privacy concerns. In response, we introduce ConfusionPrompt, a novel framework for private LLM inference that protects user privacy by: (i) decomposing the original prompt into smaller sub-prompts, and (ii) generating pseudo-prompts alongside the genuine sub-prompts, which are then sent to the LLM. The server responses are later recomposed by the user to reconstruct the final output. This approach offers key advantages over previous LLM privacy protection methods: (i) it integrates seamlessly with existing black-box LLMs, and (ii) it delivers a significantly improved privacy-utility trade-off compared to existing text perturbation methods. We also develop a $(\lambda, \mu, \rho)$-privacy model to formulate the requirements for a privacy-preserving group of prompts and provide a complexity analysis to justify the role of prompt decomposition. Our empirical evaluation shows that ConfusionPrompt achieves significantly higher utility than local inference methods using open-source models and perturbation-based techniques, while also reducing memory consumption compared to open-source LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ConfusionPrompt gives a black-box LLM privacy method via prompt decomposition and pseudo-prompt mixing, with a new (λ, μ, ρ) model, but recomposition accuracy is the unproven hinge.

read the letter

The main thing to know is that this paper describes a way to add privacy to existing cloud LLM services by splitting a user prompt into sub-prompts, generating pseudo-prompts to send alongside them, and letting the client recompose the server responses afterward. It also defines a (λ, μ, ρ)-privacy model and includes a complexity argument for the decomposition step. The approach is positioned as compatible with black-box APIs and better on the privacy-utility curve than text perturbation baselines while using less memory than running open-source models locally. The empirical claims are that utility stays higher than those alternatives. What the work does well is keep the focus on deployable techniques that do not require model changes or heavy client compute, which matches real constraints for many users. The privacy model and the decomposition analysis add some structure that prior perturbation papers often lack. The soft spot is the recomposition step. The description stays high-level, and the stress-test concern lands: there is no clear mechanism shown for how a user reliably isolates the genuine responses from the mixed batch without utility loss, especially under LLM nondeterminism or semantic overlap. If that filtering does not hold up in practice, the claimed advantage over baselines disappears. The abstract-only view makes it impossible to check the exact procedure or error bars, so the central empirical result stays unverified for now. This paper is aimed at engineers and applied researchers who need workable privacy add-ons for commercial LLM APIs rather than new model training methods. A reader focused on implementation trade-offs would find the framework and the privacy definition useful to examine. It deserves a serious referee to test the recomposition details and the evaluation setup. Recommendation: send it to peer review.

Referee Report

2 major / 1 minor

Summary. The paper proposes ConfusionPrompt, a framework for private inference on black-box online LLMs. The method decomposes a user prompt into sub-prompts, mixes them with generated pseudo-prompts according to a (λ, μ, ρ)-privacy model, sends the batch to the LLM server, and relies on the user to recompose the returned responses into the final output. It claims seamless integration with existing LLMs, a significantly improved privacy-utility tradeoff versus text-perturbation baselines, lower memory use than local open-source models, and supports these claims with a complexity analysis plus empirical evaluation.

Significance. If the recomposition step can be shown to recover outputs reliably, the approach would provide a practical black-box privacy mechanism that avoids both the utility loss of perturbation methods and the memory/compute cost of local open-source LLMs. The explicit (λ, μ, ρ) privacy formulation and complexity analysis are positive elements that could be built upon.

major comments (2)

[Framework description and recomposition step] The recomposition step is described only at high level in the framework overview. The central utility claim—that ConfusionPrompt delivers higher utility than perturbation baselines—rests on the unverified assumption that users can accurately isolate and recombine genuine sub-prompt responses from a batch of indistinguishable pseudo-prompt responses without substantial loss; no concrete filtering mechanism, algorithm, or experimental measurement of filtering fidelity (e.g., accuracy or semantic overlap under LLM nondeterminism) is supplied, rendering the reported gains unsupported.
[Empirical evaluation] Empirical evaluation section: the abstract states that ConfusionPrompt “achieves significantly higher utility” than local open-source inference and perturbation techniques, yet no dataset details, baseline implementations, error bars, statistical tests, or exact recomposition procedure are referenced. Without these, the quantitative privacy-utility claims cannot be assessed and the comparison to perturbation methods remains unverifiable.

minor comments (1)

[Privacy model] The (λ, μ, ρ) privacy model is introduced without an explicit equation or formal definition in the provided abstract; a numbered definition or boxed formulation would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the framework and evaluation. We address each major comment below and will revise the manuscript to provide the requested details.

read point-by-point responses

Referee: [Framework description and recomposition step] The recomposition step is described only at high level in the framework overview. The central utility claim—that ConfusionPrompt delivers higher utility than perturbation baselines—rests on the unverified assumption that users can accurately isolate and recombine genuine sub-prompt responses from a batch of indistinguishable pseudo-prompt responses without substantial loss; no concrete filtering mechanism, algorithm, or experimental measurement of filtering fidelity (e.g., accuracy or semantic overlap under LLM nondeterminism) is supplied, rendering the reported gains unsupported.

Authors: We agree that the current manuscript describes the recomposition step at a high level and does not supply a concrete algorithm or fidelity measurements. In the revised version we will add a detailed filtering and recombination algorithm, including handling of nondeterminism, together with new experiments quantifying its accuracy and semantic overlap. revision: yes
Referee: [Empirical evaluation] Empirical evaluation section: the abstract states that ConfusionPrompt “achieves significantly higher utility” than local open-source inference and perturbation techniques, yet no dataset details, baseline implementations, error bars, statistical tests, or exact recomposition procedure are referenced. Without these, the quantitative privacy-utility claims cannot be assessed and the comparison to perturbation methods remains unverifiable.

Authors: We acknowledge that the empirical section lacks the listed details. The revised manuscript will include dataset descriptions, baseline implementations, error bars, statistical tests, and the precise recomposition procedure used in the experiments. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; privacy model and recomposition described independently of results

full rationale

The paper defines a (λ, μ, ρ)-privacy model to formulate prompt-group requirements and provides a complexity analysis for decomposition. These are presented as design choices rather than derived predictions that reduce to fitted inputs or self-citations. The recomposition step is described at a high level without equations that loop back to the privacy claims by construction. No self-citation chains or ansatzes are invoked to justify the core privacy-utility tradeoff. This yields a minor score reflecting normal self-referential method description without forcing the central result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only view limits visibility; the (λ, μ, ρ)-privacy model introduces three parameters whose selection or fitting is not detailed, and the recomposition step relies on an unstated domain assumption that mixed responses remain separable.

free parameters (1)

λ, μ, ρ
Parameters defining the privacy model; their values are not specified as derived from first principles or external benchmarks in the abstract.

axioms (1)

domain assumption User can accurately recompose final output from mixed sub-prompt responses
Invoked in the description of the reconstruction process; no justification or error analysis provided in abstract.

pith-pipeline@v0.9.0 · 5735 in / 1301 out tokens · 19316 ms · 2026-05-24T04:56:28.040686+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 8 internal anchors

[1]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[2]

Code Llama: Open Foundation Models for Code

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiao- qing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, J´ er´ emy Rapin, et al. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Instruct2act: Mapping multi-modality instructions to robotic actions with large language model

Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, and Hongsheng Li. Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176 , 2023

work page arXiv 2023
[4]

Large language models in medicine

Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023

work page 1930
[5]

BloombergGPT: A Large Language Model for Finance

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Se- bastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Training language models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems , 35:27730–27744, 2022

work page 2022
[7]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Llms can understand encrypted prompt: Towards privacy-computing friendly transformers

Xuanqi Liu and Zhuotao Liu. Llms can understand encrypted prompt: Towards privacy-computing friendly transformers. arXiv preprint arXiv:2305.18396 , 2023. 20

work page arXiv 2023
[9]

The-x: Privacy-preserving transformer inference with homomorphic encryption

Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, and Furu Wei. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv preprint arXiv:2206.00216 , 2022

work page arXiv 2022
[10]

Dp-forward: Fine-tuning and inference on language models with differential privacy in forward pass

Minxin Du, Xiang Yue, Sherman SM Chow, Tianhao Wang, Chenyu Huang, and Huan Sun. Dp-forward: Fine-tuning and inference on language models with differential privacy in forward pass. arXiv preprint arXiv:2309.06746 , 2023

work page arXiv 2023
[11]

A survey on homomorphic encryption schemes: Theory and implementation

Abbas Acar, Hidayet Aksu, A Selcuk Uluagac, and Mauro Conti. A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys (Csur), 51(4):1–35, 2018

work page 2018
[12]

Secure multiparty computation

Ronald Cramer, Ivan Bjerre Damg˚ ard, et al. Secure multiparty computation. Cambridge University Press, 2015

work page 2015
[13]

Differentially private representation for nlp: Formal guarantee and an empirical study on privacy and fairness

Lingjuan Lyu, Xuanli He, and Yitong Li. Differentially private representation for nlp: Formal guarantee and an empirical study on privacy and fairness. In Findings of the Association for Computational Linguistics: EMNLP 2020 , pages 2355–2365, 2020

work page 2020
[14]

Differential privacy

Cynthia Dwork. Differential privacy. In International colloquium on automata, languages, and programming, pages 1–12. Springer, 2006

work page 2006
[15]

Natural language understanding with privacy-preserving bert

Chen Qu, Weize Kong, Liu Yang, Mingyang Zhang, Michael Bendersky, and Marc Najork. Natural language understanding with privacy-preserving bert. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 1488–1497, 2021

work page 2021
[16]

Split-and- denoise: Protect large language model inference with local differential privacy

Peihua Mai, Ran Yan, Zhe Huang, Youjia Yang, and Yan Pang. Split-and- denoise: Protect large language model inference with local differential privacy. In Forty-first International Conference on Machine Learning

work page
[17]

Salted inference: Enhancing privacy while maintaining efficiency of split inference in mobile computing

Mohammad Malekzadeh and Fahim Kawsar. Salted inference: Enhancing privacy while maintaining efficiency of split inference in mobile computing. In Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications, pages 14–20, 2024

work page 2024
[18]

Trusted execution environment: What it is, and what it is not

Mohamed Sabt, Mohammed Achemlal, and Abdelmadjid Bouabdallah. Trusted execution environment: What it is, and what it is not. In 2015 IEEE Trust- com/BigDataSE/Ispa, volume 1, pages 57–64. IEEE, 2015. 21

work page 2015
[19]

Named entity recognition and classification in historical documents: A survey

Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, and Antoine Doucet. Named entity recognition and classification in historical documents: A survey. ACM Computing Surveys , 56(2):1–47, 2023

work page 2023
[20]

Neural Architectures for Named Entity Recognition

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. Neural architectures for named entity recogni- tion. arXiv preprint arXiv:1603.01360 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Protecting user privacy in remote conversational systems: A privacy-preserving framework based on text sanitization

Zhigang Kan, Linbo Qiao, Hao Yu, Liwen Peng, Yifu Gao, and Dongsheng Li. Protecting user privacy in remote conversational systems: A privacy-preserving framework based on text sanitization. arXiv preprint arXiv:2306.08223 , 2023

work page arXiv 2023
[22]

Hide and seek (has): A lightweight framework for prompt privacy protection

Yu Chen, Tingxin Li, Huiming Liu, and Yang Yu. Hide and seek (has): A lightweight framework for prompt privacy protection. arXiv preprint arXiv:2309.03057, 2023

work page arXiv 2023
[23]

t-plausibility: Generalizing words to desensitize text

Balamurugan Anandan, Chris Clifton, Wei Jiang, Mummoorthy Murugesan, Pedro Pastrana-Camacho, and Luo Si. t-plausibility: Generalizing words to desensitize text. Trans. Data Priv., 5(3):505–534, 2012

work page 2012
[24]

Cryptonets: Applying neural networks to en- crypted data with high throughput and accuracy

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to en- crypted data with high throughput and accuracy. In International conference on machine learning, pages 201–210. PMLR, 2016

work page 2016
[25]

Iron: Private inference on transformers

Meng Hao, Hongwei Li, Hanxiao Chen, Pengzhi Xing, Guowen Xu, and Tianwei Zhang. Iron: Private inference on transformers. Advances in Neural Information Processing Systems, 35:15718–15731, 2022

work page 2022
[26]

Differentially private language models benefit from public pre-training

Gavin Kerrigan, Dylan Slack, and Jens Tuyls. Differentially private language models benefit from public pre-training. arXiv preprint arXiv:2009.05886 , 2020

work page arXiv 2009
[27]

Differentially private fine-tuning of language models

Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, et al. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500, 2021

work page arXiv 2021
[28]

Flocks of stochastic parrots: Differentially private prompt learning for large language models, 2023

Haonan Duan, Adam Dziedzic, Nicolas Papernot, and Franziska Boenisch. Flocks of stochastic parrots: Differentially private prompt learning for large language models, 2023. 22

work page 2023
[29]

Privacy-preserving prompt tuning for large language model services, 2023

Yansong Li, Zhixing Tan, and Yang Liu. Privacy-preserving prompt tuning for large language model services, 2023

work page 2023
[30]

Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations

Oluwaseyi Feyisetan, Borja Balle, Thomas Drake, and Tom Diethe. Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations. In Proceedings of the 13th international conference on web search and data mining , pages 178–186, 2020

work page 2020
[31]

Locally differentially private document generation using zero shot prompting.arXiv preprint arXiv:2310.16111, 2023

Saiteja Utpala, Sara Hooker, and Pin Yu Chen. Locally differentially private document generation using zero shot prompting.arXiv preprint arXiv:2310.16111, 2023

work page arXiv 2023
[32]

The limits of word level differential privacy

Justus Mattern, Benjamin Weggenmann, and Florian Kerschbaum. The limits of word level differential privacy. In Findings of the Association for Computational Linguistics: NAACL 2022 , pages 867–881, 2022

work page 2022
[33]

Embellishing text search queries to protect user privacy.(2010)

Hwee Hwa PANG, Xuhua DING, and Xiaokui XIAO. Embellishing text search queries to protect user privacy.(2010). In Proceedings of the VLDB Endowment: 36th International Conference on Very Large Data Bases: Singapore, pages 13–17, 2010

work page 2010
[34]

Constructing plausible innocuous pseudo queries to protect user query intention

Zongda Wu, Jie Shi, Chenglang Lu, Enhong Chen, Guandong Xu, Guiling Li, Sihong Xie, and S Yu Philip. Constructing plausible innocuous pseudo queries to protect user query intention. Information Sciences, 325:215–226, 2015

work page 2015
[35]

The McKinsey Way

Ethan M Rasiel. The McKinsey Way . McGraw-Hill New York, 1999

work page 1999
[36]

Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies

Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021

work page 2021
[37]

Musique: Multihop questions via single-hop question composition

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics , 10:539–554, 2022

work page 2022
[38]

Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras

Bhargav Srinivasa-Desikan. Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd, 2018. 23

work page 2018
[39]

Flair: An easy-to-use framework for state-of-the-art nlp

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. Flair: An easy-to-use framework for state-of-the-art nlp. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations) , pages 54–59, 2019

work page 2019
[40]

Gender classification using twitter text data

Pradeep Vashisth and Kevin Meehan. Gender classification using twitter text data. In 2020 31st Irish Signals and Systems Conference (ISSC) , pages 1–6. IEEE, 2020

work page 2020
[41]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023) , 2023

work page 2023
[44]

Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 7871–7880, 2020

work page 2020
[45]

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020

work page 2020
[46]

Scaling instruction-finetuned language models

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53, 2024. 24

work page 2024
[47]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1908
[48]

Unsupervised approach to evaluate sentence-level fluency: Do we really need reference? arXiv preprint arXiv:2312.01500 , 2023

Gopichand Kanumolu, Lokesh Madasu, Pavan Baswani, Ananya Mukherjee, and Manish Shrivastava. Unsupervised approach to evaluate sentence-level fluency: Do we really need reference? arXiv preprint arXiv:2312.01500 , 2023

work page arXiv 2023
[49]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[50]

Squad: 100,000+ questions for machine comprehension of text

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages 2383–2392, 2016

work page 2016
[51]

Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs

Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers...

work page 2019
[52]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages 311–318, 2002. 25 Appendix A. Proof of Theorem 11 and 12 We begin with the proof for Theorem 11 as followed: Proof. For the prompt wi...

work page 2002
[53]

Appendix C.3

and DROP [51]. Appendix C.3. Semantic Similarity Model and Discriminator The comparison data collection for the generator involves a local similarity evalu- ation model and discriminator. Similarity evaluation model: We adopt a finetuned version of MiniLM-6L model [47] to extract the embedding of each private attribute. The semantic relevance between a pa...

work page
[54]

Inarticulate/ non-fluent sentence

Score 1: Incomprehensible. Inarticulate/ non-fluent sentence

work page
[55]

Score 2: Low Quality. Partially fluent sentence: (a) only half of the sentence 31 is fluent or (b) more than 1 missing words or (c) more than 1 misspelt words or d) contains individual fluent word-groups with missing coherence between them

work page
[56]

Sentence is predominantly fluent but contains either (a) misspelt word or (b) missing word or (c) multiple occurrence of a word

Score 3: Moderate. Sentence is predominantly fluent but contains either (a) misspelt word or (b) missing word or (c) multiple occurrence of a word

work page
[57]

Perfectly fluent sentence without any syntactic or grammatical error

Score 4: Perfect. Perfectly fluent sentence without any syntactic or grammatical error. Strictly respond in the form of JSON with the following format: {”S1”: the score, ”S2”: the score }. Sentences: {dictionary of sentences} On obtaining 4000 training and 700 validation samples, we finetune a Bert-base (110M parameters) to train a local discriminator. Ap...

work page

[1] [1]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[2] [2]

Code Llama: Open Foundation Models for Code

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiao- qing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, J´ er´ emy Rapin, et al. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Instruct2act: Mapping multi-modality instructions to robotic actions with large language model

Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, and Hongsheng Li. Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176 , 2023

work page arXiv 2023

[4] [4]

Large language models in medicine

Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023

work page 1930

[5] [5]

BloombergGPT: A Large Language Model for Finance

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Se- bastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

Training language models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems , 35:27730–27744, 2022

work page 2022

[7] [7]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

Llms can understand encrypted prompt: Towards privacy-computing friendly transformers

Xuanqi Liu and Zhuotao Liu. Llms can understand encrypted prompt: Towards privacy-computing friendly transformers. arXiv preprint arXiv:2305.18396 , 2023. 20

work page arXiv 2023

[9] [9]

The-x: Privacy-preserving transformer inference with homomorphic encryption

Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, and Furu Wei. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv preprint arXiv:2206.00216 , 2022

work page arXiv 2022

[10] [10]

Dp-forward: Fine-tuning and inference on language models with differential privacy in forward pass

Minxin Du, Xiang Yue, Sherman SM Chow, Tianhao Wang, Chenyu Huang, and Huan Sun. Dp-forward: Fine-tuning and inference on language models with differential privacy in forward pass. arXiv preprint arXiv:2309.06746 , 2023

work page arXiv 2023

[11] [11]

A survey on homomorphic encryption schemes: Theory and implementation

Abbas Acar, Hidayet Aksu, A Selcuk Uluagac, and Mauro Conti. A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys (Csur), 51(4):1–35, 2018

work page 2018

[12] [12]

Secure multiparty computation

Ronald Cramer, Ivan Bjerre Damg˚ ard, et al. Secure multiparty computation. Cambridge University Press, 2015

work page 2015

[13] [13]

Differentially private representation for nlp: Formal guarantee and an empirical study on privacy and fairness

Lingjuan Lyu, Xuanli He, and Yitong Li. Differentially private representation for nlp: Formal guarantee and an empirical study on privacy and fairness. In Findings of the Association for Computational Linguistics: EMNLP 2020 , pages 2355–2365, 2020

work page 2020

[14] [14]

Differential privacy

Cynthia Dwork. Differential privacy. In International colloquium on automata, languages, and programming, pages 1–12. Springer, 2006

work page 2006

[15] [15]

Natural language understanding with privacy-preserving bert

Chen Qu, Weize Kong, Liu Yang, Mingyang Zhang, Michael Bendersky, and Marc Najork. Natural language understanding with privacy-preserving bert. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 1488–1497, 2021

work page 2021

[16] [16]

Split-and- denoise: Protect large language model inference with local differential privacy

Peihua Mai, Ran Yan, Zhe Huang, Youjia Yang, and Yan Pang. Split-and- denoise: Protect large language model inference with local differential privacy. In Forty-first International Conference on Machine Learning

work page

[17] [17]

Salted inference: Enhancing privacy while maintaining efficiency of split inference in mobile computing

Mohammad Malekzadeh and Fahim Kawsar. Salted inference: Enhancing privacy while maintaining efficiency of split inference in mobile computing. In Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications, pages 14–20, 2024

work page 2024

[18] [18]

Trusted execution environment: What it is, and what it is not

Mohamed Sabt, Mohammed Achemlal, and Abdelmadjid Bouabdallah. Trusted execution environment: What it is, and what it is not. In 2015 IEEE Trust- com/BigDataSE/Ispa, volume 1, pages 57–64. IEEE, 2015. 21

work page 2015

[19] [19]

Named entity recognition and classification in historical documents: A survey

Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, and Antoine Doucet. Named entity recognition and classification in historical documents: A survey. ACM Computing Surveys , 56(2):1–47, 2023

work page 2023

[20] [20]

Neural Architectures for Named Entity Recognition

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. Neural architectures for named entity recogni- tion. arXiv preprint arXiv:1603.01360 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

Protecting user privacy in remote conversational systems: A privacy-preserving framework based on text sanitization

Zhigang Kan, Linbo Qiao, Hao Yu, Liwen Peng, Yifu Gao, and Dongsheng Li. Protecting user privacy in remote conversational systems: A privacy-preserving framework based on text sanitization. arXiv preprint arXiv:2306.08223 , 2023

work page arXiv 2023

[22] [22]

Hide and seek (has): A lightweight framework for prompt privacy protection

Yu Chen, Tingxin Li, Huiming Liu, and Yang Yu. Hide and seek (has): A lightweight framework for prompt privacy protection. arXiv preprint arXiv:2309.03057, 2023

work page arXiv 2023

[23] [23]

t-plausibility: Generalizing words to desensitize text

Balamurugan Anandan, Chris Clifton, Wei Jiang, Mummoorthy Murugesan, Pedro Pastrana-Camacho, and Luo Si. t-plausibility: Generalizing words to desensitize text. Trans. Data Priv., 5(3):505–534, 2012

work page 2012

[24] [24]

Cryptonets: Applying neural networks to en- crypted data with high throughput and accuracy

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to en- crypted data with high throughput and accuracy. In International conference on machine learning, pages 201–210. PMLR, 2016

work page 2016

[25] [25]

Iron: Private inference on transformers

Meng Hao, Hongwei Li, Hanxiao Chen, Pengzhi Xing, Guowen Xu, and Tianwei Zhang. Iron: Private inference on transformers. Advances in Neural Information Processing Systems, 35:15718–15731, 2022

work page 2022

[26] [26]

Differentially private language models benefit from public pre-training

Gavin Kerrigan, Dylan Slack, and Jens Tuyls. Differentially private language models benefit from public pre-training. arXiv preprint arXiv:2009.05886 , 2020

work page arXiv 2009

[27] [27]

Differentially private fine-tuning of language models

Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, et al. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500, 2021

work page arXiv 2021

[28] [28]

Flocks of stochastic parrots: Differentially private prompt learning for large language models, 2023

Haonan Duan, Adam Dziedzic, Nicolas Papernot, and Franziska Boenisch. Flocks of stochastic parrots: Differentially private prompt learning for large language models, 2023. 22

work page 2023

[29] [29]

Privacy-preserving prompt tuning for large language model services, 2023

Yansong Li, Zhixing Tan, and Yang Liu. Privacy-preserving prompt tuning for large language model services, 2023

work page 2023

[30] [30]

Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations

Oluwaseyi Feyisetan, Borja Balle, Thomas Drake, and Tom Diethe. Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations. In Proceedings of the 13th international conference on web search and data mining , pages 178–186, 2020

work page 2020

[31] [31]

Locally differentially private document generation using zero shot prompting.arXiv preprint arXiv:2310.16111, 2023

Saiteja Utpala, Sara Hooker, and Pin Yu Chen. Locally differentially private document generation using zero shot prompting.arXiv preprint arXiv:2310.16111, 2023

work page arXiv 2023

[32] [32]

The limits of word level differential privacy

Justus Mattern, Benjamin Weggenmann, and Florian Kerschbaum. The limits of word level differential privacy. In Findings of the Association for Computational Linguistics: NAACL 2022 , pages 867–881, 2022

work page 2022

[33] [33]

Embellishing text search queries to protect user privacy.(2010)

Hwee Hwa PANG, Xuhua DING, and Xiaokui XIAO. Embellishing text search queries to protect user privacy.(2010). In Proceedings of the VLDB Endowment: 36th International Conference on Very Large Data Bases: Singapore, pages 13–17, 2010

work page 2010

[34] [34]

Constructing plausible innocuous pseudo queries to protect user query intention

Zongda Wu, Jie Shi, Chenglang Lu, Enhong Chen, Guandong Xu, Guiling Li, Sihong Xie, and S Yu Philip. Constructing plausible innocuous pseudo queries to protect user query intention. Information Sciences, 325:215–226, 2015

work page 2015

[35] [35]

The McKinsey Way

Ethan M Rasiel. The McKinsey Way . McGraw-Hill New York, 1999

work page 1999

[36] [36]

Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies

Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021

work page 2021

[37] [37]

Musique: Multihop questions via single-hop question composition

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics , 10:539–554, 2022

work page 2022

[38] [38]

Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras

Bhargav Srinivasa-Desikan. Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd, 2018. 23

work page 2018

[39] [39]

Flair: An easy-to-use framework for state-of-the-art nlp

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. Flair: An easy-to-use framework for state-of-the-art nlp. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations) , pages 54–59, 2019

work page 2019

[40] [40]

Gender classification using twitter text data

Pradeep Vashisth and Kevin Meehan. Gender classification using twitter text data. In 2020 31st Irish Signals and Systems Conference (ISSC) , pages 1–6. IEEE, 2020

work page 2020

[41] [41]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[42] [42]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023) , 2023

work page 2023

[44] [44]

Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 7871–7880, 2020

work page 2020

[45] [45]

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020

work page 2020

[46] [46]

Scaling instruction-finetuned language models

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53, 2024. 24

work page 2024

[47] [47]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1908

[48] [48]

Unsupervised approach to evaluate sentence-level fluency: Do we really need reference? arXiv preprint arXiv:2312.01500 , 2023

Gopichand Kanumolu, Lokesh Madasu, Pavan Baswani, Ananya Mukherjee, and Manish Shrivastava. Unsupervised approach to evaluate sentence-level fluency: Do we really need reference? arXiv preprint arXiv:2312.01500 , 2023

work page arXiv 2023

[49] [49]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[50] [50]

Squad: 100,000+ questions for machine comprehension of text

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages 2383–2392, 2016

work page 2016

[51] [51]

Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs

Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers...

work page 2019

[52] [52]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages 311–318, 2002. 25 Appendix A. Proof of Theorem 11 and 12 We begin with the proof for Theorem 11 as followed: Proof. For the prompt wi...

work page 2002

[53] [53]

Appendix C.3

and DROP [51]. Appendix C.3. Semantic Similarity Model and Discriminator The comparison data collection for the generator involves a local similarity evalu- ation model and discriminator. Similarity evaluation model: We adopt a finetuned version of MiniLM-6L model [47] to extract the embedding of each private attribute. The semantic relevance between a pa...

work page

[54] [54]

Inarticulate/ non-fluent sentence

Score 1: Incomprehensible. Inarticulate/ non-fluent sentence

work page

[55] [55]

Score 2: Low Quality. Partially fluent sentence: (a) only half of the sentence 31 is fluent or (b) more than 1 missing words or (c) more than 1 misspelt words or d) contains individual fluent word-groups with missing coherence between them

work page

[56] [56]

Sentence is predominantly fluent but contains either (a) misspelt word or (b) missing word or (c) multiple occurrence of a word

Score 3: Moderate. Sentence is predominantly fluent but contains either (a) misspelt word or (b) missing word or (c) multiple occurrence of a word

work page

[57] [57]

Perfectly fluent sentence without any syntactic or grammatical error

Score 4: Perfect. Perfectly fluent sentence without any syntactic or grammatical error. Strictly respond in the form of JSON with the following format: {”S1”: the score, ”S2”: the score }. Sentences: {dictionary of sentences} On obtaining 4000 training and 700 validation samples, we finetune a Bert-base (110M parameters) to train a local discriminator. Ap...

work page