pith. machine review for the scientific record. sign in

arxiv: 1903.10635 · v1 · submitted 2019-03-26 · 💻 cs.CL

Recognition: unknown

Federated Learning Of Out-Of-Vocabulary Words

Authors on Pith no claims yet
classification 💻 cs.CL
keywords wordsfederatedlearningapproachcharacter-leveldemonstrategoodlearn
0
0 comments X
read the original abstract

We demonstrate that a character-level recurrent neural network is able to learn out-of-vocabulary (OOV) words under federated learning settings, for the purpose of expanding the vocabulary of a virtual keyboard for smartphones without exporting sensitive text to servers. High-frequency words can be sampled from the trained generative model by drawing from the joint posterior directly. We study the feasibility of the approach in two settings: (1) using simulated federated learning on a publicly available non-IID per-user dataset from a popular social networking website, (2) using federated learning on data hosted on user mobile devices. The model achieves good recall and precision compared to ground-truth OOV words in setting (1). With (2) we demonstrate the practicality of this approach by showing that we can learn meaningful OOV words with good character-level prediction accuracy and cross entropy loss.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distributed Online Convex Optimization with Compressed Communication: Optimal Regret and Applications

    cs.LG 2026-04 unverdicted novelty 7.0

    Optimal regret bounds O(δ^{-1/2}√T) for convex and O(δ^{-1} log T) for strongly convex losses are achieved in distributed online convex optimization under compressed communication.

  2. A Catalog of Data Errors

    cs.DB 2026-04 unverdicted novelty 6.0

    A new catalog classifying 35 data error types into missing, incorrect, and redundant categories for tabular data, with definitions and examples to improve data quality management.