Federated Co-tuning Framework for Large and Small Language Models

Guoqiang Ma; Kai Chen; Lixin Fan; Qiang Yang; Shuoling Liu; Tao Fan; Yan Kang

arxiv: 2411.11707 · v3 · submitted 2024-11-18 · 💻 cs.CL · cs.AI

Federated Co-tuning Framework for Large and Small Language Models

Tao Fan , Yan Kang , Guoqiang Ma , Lixin Fan , Shuoling Liu , Kai Chen , Qiang Yang This is my paper

Pith reviewed 2026-05-23 17:03 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords federated learninglarge language modelssmall language modelsco-tuningknowledge transferparameter-efficientNLP text generationprivacy-preserving

0 comments

The pith

A federated co-tuning framework lets server LLMs and client SLMs mutually improve performance through private adapter-based knowledge exchange.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FedCoLLM as a method for co-tuning a central large language model with multiple client small language models. Lightweight adapters enable the server model to send knowledge to the clients while receiving domain-specific updates back, all without sharing raw client data. A sympathetic reader would care if this holds because it opens a path for distributed teams to upgrade their local models using powerful external resources and simultaneously strengthen the shared model, under typical privacy limits in NLP applications. The reported experiments across public models and text generation tasks indicate gains for the small models and near-equivalent results for the large model compared to direct fine-tuning.

Core claim

FedCoLLM is a parameter-efficient federated framework that uses lightweight adapters attached to SLMs to transfer server LLM knowledge to clients while enriching the LLM with client domain insights, achieving this exchange in a privacy-preserving way with low computational and communication overhead. Evaluations across various public LLMs and SLMs on NLP text generation tasks show that client SLMs improve notably with LLM assistance, while the co-tuned LLMs reach performance levels comparable to those from direct fine-tuning on client data.

What carries the argument

lightweight adapters attached to SLMs that enable bidirectional knowledge transfer in the federated co-tuning process

If this is right

Client SLMs achieve notable performance improvements on NLP text generation tasks when assisted by the server LLM.
The server LLM enhanced through FedCoLLM reaches performance comparable to direct fine-tuning on client data.
Knowledge exchange occurs while respecting data privacy and keeping computational and communication overhead low.
The framework works with various public LLMs and SLMs across multiple text generation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the adapters scale reliably, the same co-tuning pattern could apply to other distributed settings where large models serve many small-device clients.
Groups holding sensitive data might use this pattern to gain from external LLM capacity without exposing raw records.
Testing the method on non-text tasks or with much larger client counts would clarify whether the overhead savings persist.

Load-bearing premise

Lightweight adapters attached to SLMs can enable effective bidirectional knowledge transfer between server LLMs and client SLMs while preserving privacy and keeping computational and communication costs low.

What would settle it

An experiment in which SLMs trained under FedCoLLM show no performance gain over independent local training on the same tasks, or in which the co-tuned LLM underperforms a version directly fine-tuned on the pooled client data.

Figures

Figures reproduced from arXiv: 2411.11707 by Guoqiang Ma, Kai Chen, Lixin Fan, Qiang Yang, Shuoling Liu, Tao Fan, Yan Kang.

**Figure 1.** Figure 1: FedCoLLM (Federated parameter-efficient co-tuning of clients’ domain SLMs and the server’s LLMs. Clients’ SLMs learn from each other via federated fine-tuning of their adapter modules and transfer knowledge from and to the server’s LLM) 3.3 Computation and Communication Complexity One key advantage of FedCoLLM is its computational efficiency. By utilizing PEFT, it markedly decreases the parameters needing … view at source ↗

read the original abstract

By adapting Large Language Models (LLMs) to domain-specific tasks or enriching them with domain-specific knowledge, we can fully harness the capabilities of LLMs. Nonetheless, a gap persists in achieving simultaneous mutual enhancement between the server's LLM and the downstream clients' Small Language Models (SLMs). To address this, we propose FedCoLLM, a novel and parameter-efficient federated framework designed for co-tuning LLMs and SLMs. This approach is aimed at adaptively transferring server-side LLMs knowledge to clients' SLMs while simultaneously enriching the LLMs with domain insights from the clients. To accomplish this, FedCoLLM utilizes lightweight adapters in conjunction with SLMs, facilitating knowledge exchange between server and clients in a manner that respects data privacy while also minimizing computational and communication overhead. Our evaluation of FedCoLLM, utilizing various public LLMs and SLMs across a range of NLP text generation tasks, reveals that the performance of clients' SLMs experiences notable improvements with the assistance of the LLMs. Simultaneously, the LLMs enhanced via FedCoLLM achieves comparable performance to that obtained through direct fine-tuning on clients' data. Our code has been contributed to the FATE open-source project and is now publicly accessible at https://github.com/FederatedAI/FATE-LLM/tree/main/python/fate_llm/algo/fedcollm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedCoLLM gives a concrete adapter-based federated setup for bidirectional LLM-SLM help without data sharing, but the abstract supplies no numbers to back the claimed gains.

read the letter

The main thing to know is that this paper describes FedCoLLM, a federated framework that attaches lightweight adapters to client SLMs so a server LLM can send knowledge their way while the clients' domain data flows back to improve the LLM, all under privacy constraints and with low overhead. The code is already in the FATE project, which helps anyone who wants to test it directly. That bidirectional angle fills a gap that most federated LLM work has left open, and the parameter-efficient design is a sensible practical choice. The evaluation uses public models on text generation tasks and reports that SLMs improve noticeably while the LLM stays close to what direct fine-tuning would achieve. The stress-test note finds no internal contradictions in the argument structure, which matches what the abstract lays out. The soft spot is straightforward: the abstract asserts the improvements and comparability but shows zero quantitative results, baselines, error bars, or ablation details. That leaves the actual size of the gains and the reliability of the transfer uncheckable from the provided text, even though the full manuscript is referenced. This is aimed at people working on federated NLP or efficient domain adaptation who need a ready-to-try method rather than pure theory. It deserves a serious referee because the problem is timely, the framework is specified enough to implement, and the open code lowers the barrier to verification, even if the experiments will need closer scrutiny in review.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces FedCoLLM, a parameter-efficient federated co-tuning framework that uses lightweight adapters attached to SLMs to facilitate bidirectional knowledge transfer between a server-side LLM and client-side SLMs. The framework aims to improve SLM performance through LLM assistance while enriching the LLM with domain-specific knowledge from clients, all in a privacy-preserving manner with low computational and communication costs. Evaluations on public LLMs and SLMs for NLP text generation tasks are claimed to show notable SLM improvements and LLM performance comparable to direct fine-tuning. The implementation is open-sourced in the FATE project.

Significance. If the empirical claims hold, the work could contribute to the field by providing a practical method for co-adapting heterogeneous language models in federated settings. The open-sourcing of the code in FATE is a clear strength for reproducibility.

major comments (1)

[Abstract] Abstract: the central claims of 'notable improvements' in SLM performance and LLM performance 'comparable' to direct fine-tuning are asserted without any quantitative results, baselines, error bars, ablation details, or specific metrics. This is load-bearing for the evaluation component of the central claim.

minor comments (1)

The description of the adapter mechanism and knowledge exchange protocol would benefit from a high-level diagram or pseudocode to clarify the bidirectional transfer process.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of 'notable improvements' in SLM performance and LLM performance 'comparable' to direct fine-tuning are asserted without any quantitative results, baselines, error bars, ablation details, or specific metrics. This is load-bearing for the evaluation component of the central claim.

Authors: We agree that the abstract would be strengthened by including concrete quantitative support for the central claims. In the revised manuscript we will update the abstract to report key metrics (e.g., average relative improvement on client SLMs across the evaluated tasks and the performance delta versus direct fine-tuning on the server LLM), while referencing the corresponding tables and figures. The full set of baselines, error bars, and ablation studies already appear in Sections 4–5; the revision will simply surface the most salient numbers in the abstract itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes FedCoLLM, a parameter-efficient federated co-tuning framework that uses lightweight adapters for bidirectional knowledge transfer between server LLMs and client SLMs. Claims of SLM improvement and LLM performance comparable to direct fine-tuning rest on empirical evaluation across public models and NLP text generation tasks, with code released in FATE. No mathematical derivation chain, equations, fitted parameters renamed as predictions, or self-citations appear as load-bearing elements; the argument is self-contained via experimental results rather than any reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are introduced; the approach builds on standard federated learning and adapter tuning concepts.

pith-pipeline@v0.9.0 · 5783 in / 1133 out tokens · 39483 ms · 2026-05-23T17:03:11.443752+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model
cs.CR 2025-06 unverdicted novelty 5.0

FedShield-LLM integrates pruning and FHE on LoRA parameters to support secure, scalable federated fine-tuning of LLMs such as Llama-2.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper · 8 internal anchors

[1]

Adriana, R., Nicolas, B., Ebrahimi, K.S., Antoine, C., Carlo, G., Yoshua, B.: Fitnets: Hints for thin deep nets. Proc. ICLR2(3), 1 (2015)

work page 2015
[2]

Practical Secure Aggregation for Federated Learning on User-Held Data

Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H.B., Patel, S., Ramage, D., Segal, A., Seth, K.: Practical secure aggregation for federated learning on user-held data. arXiv preprint arXiv:1611.04482 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

arXiv preprint arXiv:2205.10162 (2022)

Cai, D., Wu, Y., Wang, S., Lin, F.X., Xu, M.: Autofednlp: An efficient fednlp framework. arXiv preprint arXiv:2205.10162 (2022)

work page arXiv 2022
[4]

Advances in neural information processing systems 30 (2017)

Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems 30 (2017)

work page 2017
[5]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

https://doi.org/10.5281/zenodo.10256836,https://zenodo

Gao, L., Tow, J., Abbasi, B., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., Le Noac’h, A., Li, H., McDonell, K., Muennighoff, N., Ociepa, C., Phang, J., Reynolds, L., Schoelkopf, H., Skowron, A., Sutawika, L., Tang, E., 12 Tao Fan, Yan Kang, Guoqiang Ma, Lixin Fan, Kai Chen, and Qiang Yang Thite, A., Wang, B., Wang, K., Zou, A.: A...

work page doi:10.5281/zenodo.10256836 2023
[7]

Interna- tional Journal of Computer Vision129(6), 1789–1819 (2021)

Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: A survey. Interna- tional Journal of Computer Vision129(6), 1789–1819 (2021)

work page 2021
[8]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[10]

In: Artificial intelligence and statistics

McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication- efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. pp. 1273–1282. PMLR (2017)

work page 2017
[11]

In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Meng, Z., Li, J., Zhao, Y., Gong, Y.: Conditional teacher-student learning. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 6445–6449. IEEE (2019)

work page 2019
[12]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Mihaylov, T., Clark, P., Khot, T., Sabharwal, A.: Can a suit of armor con- duct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

OpenAI: Gpt-4 (2023)

work page 2023
[14]

In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition

Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3967–3976 (2019)

work page 2019
[15]

OpenAI blog1(8), 9 (2019)

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog1(8), 9 (2019)

work page 2019
[16]

arXiv preprint arXiv:2404.15381 (2024)

Ren, C., Yu, H., Peng, H., Tang, X., Li, A., Gao, Y., Tan, A.Z., Zhao, B., Li, X., Li, Z., et al.: Advances and open challenges in federated learning with foundation models. arXiv preprint arXiv:2404.15381 (2024)

work page arXiv 2024
[17]

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

Talmor, A., Herzig, J., Lourie, N., Berant, J.: Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

arXiv preprint arXiv:2310.06694 (2023)

Xia, M., Gao, T., Zeng, Z., Chen, D.: Sheared llama: Accelerating language model pre-training via structured pruning. arXiv preprint arXiv:2310.06694 (2023)

work page arXiv 2023
[20]

Synthesis Lectures on Artificial Intelligence and Machine Learning13(3), 1–207 (2019)

Yang, Q., Liu, Y., Cheng, Y., Kang, Y., Chen, T., Yu, H.: Federated learning. Synthesis Lectures on Artificial Intelligence and Machine Learning13(3), 1–207 (2019)

work page 2019
[21]

OPT: Open Pre-trained Transformer Language Models

Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X.V., et al.: Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[22]

In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition

Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition. pp. 4320–4328 (2018)

work page 2018
[23]

arXiv preprint arXiv:2212.10025 (2022)

Zhang, Z., Yang, Y., Dai, Y., Qu, L., Xu, Z.: When federated learning meets pre-trained language models’ parameter-efficient tuning methods. arXiv preprint arXiv:2212.10025 (2022)

work page arXiv 2022
[24]

arXiv preprint arXiv:2208.12268 (2022)

Zhao, H., Du, W., Li, F., Li, P., Liu, G.: Reduce communication costs and preserve privacy: Prompt tuning method in federated learning. arXiv preprint arXiv:2208.12268 (2022)

work page arXiv 2022

[1] [1]

Adriana, R., Nicolas, B., Ebrahimi, K.S., Antoine, C., Carlo, G., Yoshua, B.: Fitnets: Hints for thin deep nets. Proc. ICLR2(3), 1 (2015)

work page 2015

[2] [2]

Practical Secure Aggregation for Federated Learning on User-Held Data

Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H.B., Patel, S., Ramage, D., Segal, A., Seth, K.: Practical secure aggregation for federated learning on user-held data. arXiv preprint arXiv:1611.04482 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[3] [3]

arXiv preprint arXiv:2205.10162 (2022)

Cai, D., Wu, Y., Wang, S., Lin, F.X., Xu, M.: Autofednlp: An efficient fednlp framework. arXiv preprint arXiv:2205.10162 (2022)

work page arXiv 2022

[4] [4]

Advances in neural information processing systems 30 (2017)

Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems 30 (2017)

work page 2017

[5] [5]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

https://doi.org/10.5281/zenodo.10256836,https://zenodo

Gao, L., Tow, J., Abbasi, B., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., Le Noac’h, A., Li, H., McDonell, K., Muennighoff, N., Ociepa, C., Phang, J., Reynolds, L., Schoelkopf, H., Skowron, A., Sutawika, L., Tang, E., 12 Tao Fan, Yan Kang, Guoqiang Ma, Lixin Fan, Kai Chen, and Qiang Yang Thite, A., Wang, B., Wang, K., Zou, A.: A...

work page doi:10.5281/zenodo.10256836 2023

[7] [7]

Interna- tional Journal of Computer Vision129(6), 1789–1819 (2021)

Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: A survey. Interna- tional Journal of Computer Vision129(6), 1789–1819 (2021)

work page 2021

[8] [8]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[9] [9]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[10] [10]

In: Artificial intelligence and statistics

McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication- efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. pp. 1273–1282. PMLR (2017)

work page 2017

[11] [11]

In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Meng, Z., Li, J., Zhao, Y., Gong, Y.: Conditional teacher-student learning. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 6445–6449. IEEE (2019)

work page 2019

[12] [12]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Mihaylov, T., Clark, P., Khot, T., Sabharwal, A.: Can a suit of armor con- duct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

OpenAI: Gpt-4 (2023)

work page 2023

[14] [14]

In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition

Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3967–3976 (2019)

work page 2019

[15] [15]

OpenAI blog1(8), 9 (2019)

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog1(8), 9 (2019)

work page 2019

[16] [16]

arXiv preprint arXiv:2404.15381 (2024)

Ren, C., Yu, H., Peng, H., Tang, X., Li, A., Gao, Y., Tan, A.Z., Zhao, B., Li, X., Li, Z., et al.: Advances and open challenges in federated learning with foundation models. arXiv preprint arXiv:2404.15381 (2024)

work page arXiv 2024

[17] [17]

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

Talmor, A., Herzig, J., Lourie, N., Berant, J.: Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

arXiv preprint arXiv:2310.06694 (2023)

Xia, M., Gao, T., Zeng, Z., Chen, D.: Sheared llama: Accelerating language model pre-training via structured pruning. arXiv preprint arXiv:2310.06694 (2023)

work page arXiv 2023

[20] [20]

Synthesis Lectures on Artificial Intelligence and Machine Learning13(3), 1–207 (2019)

Yang, Q., Liu, Y., Cheng, Y., Kang, Y., Chen, T., Yu, H.: Federated learning. Synthesis Lectures on Artificial Intelligence and Machine Learning13(3), 1–207 (2019)

work page 2019

[21] [21]

OPT: Open Pre-trained Transformer Language Models

Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X.V., et al.: Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[22] [22]

In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition

Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition. pp. 4320–4328 (2018)

work page 2018

[23] [23]

arXiv preprint arXiv:2212.10025 (2022)

Zhang, Z., Yang, Y., Dai, Y., Qu, L., Xu, Z.: When federated learning meets pre-trained language models’ parameter-efficient tuning methods. arXiv preprint arXiv:2212.10025 (2022)

work page arXiv 2022

[24] [24]

arXiv preprint arXiv:2208.12268 (2022)

Zhao, H., Du, W., Li, F., Li, P., Liu, G.: Reduce communication costs and preserve privacy: Prompt tuning method in federated learning. arXiv preprint arXiv:2208.12268 (2022)

work page arXiv 2022