Learning Compressed Sentence Representations for On-Device Text Processing

Asli Celikyilmaz; Dhanasekar Sundararaman; Dinghan Shen; Lawrence Carin; Meng Tang; Pengyu Cheng; Qian Yang; Xinyuan Zhang

arxiv: 1906.08340 · v1 · pith:PXUOUUHYnew · submitted 2019-06-19 · 💻 cs.CL · cs.LG

Learning Compressed Sentence Representations for On-Device Text Processing

Dinghan Shen , Pengyu Cheng , Dhanasekar Sundararaman , Xinyuan Zhang , Qian Yang , Meng Tang , Asli Celikyilmaz , Lawrence Carin This is my paper

Pith reviewed 2026-05-25 20:07 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords sentence embeddingsbinarizationsemantic similarityon-device NLPHamming distancemodel compressionvector quantization

0 comments

The pith

Binarized sentence embeddings retain nearly all semantic power while cutting storage by over 98 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that continuous sentence embeddings, trained on large text corpora, can be converted into binary form through four specific strategies without substantial loss of meaning. These binary versions support the same downstream NLP tasks with only about 2 percent relative performance drop, yet require far less memory and enable faster similarity checks via Hamming distance rather than inner products. A sympathetic reader would care because this change removes a key barrier to running semantic text processing on phones and other low-resource hardware. The work focuses on preserving the original embeddings' utility rather than training new models from scratch.

Core claim

Four binarization strategies convert generic continuous sentence embeddings into binary representations that preserve rich semantic information. Across a range of downstream tasks the binarized embeddings show only about 2 percent relative performance degradation compared with their continuous counterparts while reducing storage needs by over 98 percent. Semantic relatedness between two sentences can then be measured simply by computing their Hamming distance, which is more computationally efficient than the inner-product operation on continuous vectors.

What carries the argument

Four binarization strategies that map continuous sentence vectors to binary form while retaining semantic content.

If this is right

Sentence-level semantic search and classification become practical on devices with tight memory limits.
Similarity computations switch from floating-point inner products to simple bit-count operations.
Embedding storage scales to much larger sentence collections without proportional hardware growth.
On-device NLP pipelines can reuse existing continuous embedding models after a one-time binarization step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same binarization approach might apply directly to word or document embeddings with similar efficiency gains.
Hamming-distance lookup tables could enable constant-time nearest-neighbor search on mobile hardware.
Combining these binary vectors with lightweight on-device fine-tuning could further reduce the performance gap on specific tasks.

Load-bearing premise

The four binarization methods keep enough of the original semantic information for the tested downstream tasks to serve as a reliable stand-in for general on-device use.

What would settle it

A new downstream task or different embedding model where the binarized versions show more than a 5 percent relative performance drop or fail to achieve at least 90 percent storage reduction.

Figures

Figures reproduced from arXiv: 1906.08340 by Asli Celikyilmaz, Dhanasekar Sundararaman, Dinghan Shen, Lawrence Carin, Meng Tang, Pengyu Cheng, Qian Yang, Xinyuan Zhang.

**Figure 3.** Figure 3: The test accuracy of different model on the [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 2.** Figure 2: The comparison between deterministic and stochastic sampling for the autoencoder strategy. 5.3.3 The effect of embedding dimension Except for the hard threshold method, other three proposed strategies all possess the flexibility of adaptively choosing the dimension of learned binary representations. To explore the sensitivity of 512 1024 2048 4096 Number of Bits 71 72 73 74 75 76 77 78 79 80 Accuracy (%) … view at source ↗

read the original abstract

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a binarized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2% relative to their continuous counterparts, while reducing the storage requirement by over 98%. Moreover, with the learned binary representations, the semantic relatedness of two sentences can be evaluated by simply calculating their Hamming distance, which is more computational efficient compared with the inner product operation between continuous embeddings. Detailed analysis and case study further validate the effectiveness of proposed methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Binarizing sentence embeddings cuts storage 98% with ~2% task drop and swaps dot products for Hamming distance, but the four strategies need full details to judge.

read the letter

The main point is that this paper converts existing continuous sentence embeddings into binary vectors via four strategies, keeping downstream performance within about 2% of the originals while shrinking storage by over 98% and letting you measure similarity with simple Hamming distance instead of inner products. That directly targets on-device constraints like limited memory and compute on phones. The work is empirical and reports new numbers for those specific strategies on a range of tasks, which is the concrete advance here. It does a solid job laying out the practical payoff: the storage math is straightforward from 32-bit to 1-bit, the efficiency claim follows immediately from the binary format, and the small degradation suggests the semantic content survives well enough for the tested uses. Credit for focusing on generic sentence reps rather than just model weights or words. The soft spots are mostly about missing pieces in the abstract. Without the method descriptions, exact task list, error bars, or ablations, the 2% figure is hard to verify or generalize. If the full paper shows consistent results across base models and includes controls against other compression tricks, that would tighten it up; otherwise it risks reading as a direct application of known quantization without enough new controls. The assumption that these tasks stand in for general on-device NLP is reasonable but could be probed more. This is for people working on efficient NLP deployments rather than core theory. A reader who needs to ship embeddings on low-resource hardware would find the trade-off numbers useful. It deserves a serious referee to check the details and confirm the results hold, even though the novelty is incremental rather than a new capability.

Referee Report

0 major / 2 minor

Summary. The paper proposes four strategies to binarize continuous sentence embeddings while preserving semantic content. These binarized representations are evaluated on downstream NLP tasks, where they show approximately 2% relative performance degradation compared to continuous embeddings, achieve over 98% storage reduction, and enable efficient semantic similarity computation via Hamming distance rather than inner product.

Significance. If the empirical results hold across the claimed tasks, the work has clear practical significance for on-device NLP applications by drastically cutting memory footprint and inference cost with only minor accuracy loss. The direct use of Hamming distance is a useful engineering contribution. The purely empirical framing with task-specific numbers (rather than universal claims) is a strength.

minor comments (2)

[Abstract] Abstract: the quantitative claim of '~2% degradation' and 'wide range of downstream tasks' would be more informative if the exact tasks, metrics, and any error bars or variance were named even at a high level.
[Methods] The four binarization strategies are introduced but their precise formulations, hyperparameters, and any training details should be cross-referenced to a dedicated methods subsection or table for reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; purely empirical evaluation

full rationale

The paper introduces four binarization strategies for sentence embeddings and reports empirical results on downstream tasks showing ~2% average degradation and 98% storage reduction. No equations, derivations, or predictions are present that reduce by construction to fitted inputs or self-citations within the paper. All claims rest on direct experimental measurements rather than any self-referential mathematical structure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract supplies no explicit free parameters, invented entities, or non-standard axioms; the work rests on the standard NLP premise that continuous embeddings encode semantics.

axioms (1)

domain assumption Continuous sentence embeddings trained on massive corpora capture rich semantic information usable across downstream tasks.
Stated as background in the abstract.

pith-pipeline@v0.9.0 · 5724 in / 1091 out tokens · 27855 ms · 2026-05-25T20:07:11.424893+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OPT: Open Pre-trained Transformer Language Models
cs.CL 2022-05 unverdicted novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · cited by 1 Pith paper · 16 internal anchors

[1]

Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. CoRR, abs/1608.04207

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

Bowman, Gabor Angeli, Christopher Potts, and Christopher D

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In EMNLP

work page 2015
[3]

Miguel A Carreira-Perpin \'a n and Ramin Raziperchikolaei. 2015. Hashing with binary autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 557--566

work page 2015
[4]

Universal Sentence Encoder

Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal sentence encoder. CoRR, abs/1803.11175

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Ting Chen, Martin Renqiang Min, and Yizhou Sun. 2018. Learning k-way d-dimensional discrete codes for compact embedding representations. arXiv preprint arXiv:1806.09464

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Alexis Conneau and Douwe Kiela. 2018. Senteval: An evaluation toolkit for universal sentence representations. arXiv preprint arXiv:1803.05449

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo \"i c Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In EMNLP

work page 2017
[8]

Andrew M Dai and Quoc V Le. 2015. Semi-supervised sequence learning. In Advances in neural information processing systems, pages 3079--3087

work page 2015
[9]

Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, and Le Song. 2017. Stochastic generative hashing. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 913--922. JMLR. org

work page 2017
[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, and Lawrence Carin. 2017. Learning generic sentence representations using convolutional neural networks. In EMNLP

work page 2017
[12]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249--256

work page 2010
[13]

Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In HLT-NAACL

work page 2016
[14]

G Hinton. 2012. Neural networks for machine learning. coursera,[video lectures]

work page 2012
[15]

Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144

work page internal anchor Pith review Pith/arXiv arXiv 2016
[16]

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

Yacine Jernite, Samuel R. Bowman, and David A Sontag. 2017. Discourse-based objectives for fast unsupervised sentence representation learning. CoRR, abs/1705.00557

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[19]

Jamie Kiros and William Chan. 2018. Inferlite: Simple universal sentence representations from natural language inference data. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4868--4874

work page 2018
[20]

Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. Skip-thought vectors. In NIPS

work page 2015
[21]

Lajanugen Logeswaran and Honglak Lee. 2018. An efficient framework for learning sentence representations. ICLR

work page 2018
[22]

DisSent: Sentence Representation Learning from Explicit Discourse Relations

Allen Nie, Erin D. Bennett, and Noah D. Goodman. 2017. Dissent: Sentence representation learning from explicit discourse relations. CoRR, abs/1710.04334

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2018. Unsupervised learning of sentence embeddings using compositional n-gram features. In NAACL-HLT

work page 2018
[24]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf

work page 2018
[25]

Sujith Ravi and Zornitsa Kozareva. 2018. Self-governing neural networks for on-device short text classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 804--810

work page 2018
[26]

Sebastian Ruder and Jeremy Howard. 2018. Universal language model fine-tuning for text classification. In ACL

work page 2018
[27]

Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning, 50(7):969--978

work page 2009
[28]

Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Lawrence Carin, and Ricardo Henao. 2018. Nash: Toward end-to-end neural architecture for generative semantic hashing. In ACL

work page 2018
[29]

Raphael Shu and Hideki Nakayama. 2017. Compressing word embeddings via deep compositional code learning. arXiv preprint arXiv:1711.01068

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

Shuai Tang and Virginia R de Sa. 2018. Improving sentence representations with multi-view frameworks. arXiv preprint arXiv:1810.01064

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Julien Tissier, Amaury Habrard, and Christophe Gravier. 2019. Near-lossless binarization of word embeddings. AAAI

work page 2019
[32]

Benjamin Van Durme and Ashwin Lall. 2010. Online generation of locality sensitive hash signatures. In Proceedings of the ACL 2010 Conference Short Papers, pages 231--235. Association for Computational Linguistics

work page 2010
[33]

Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. arXiv preprint arXiv:1408.2927

work page internal anchor Pith review Pith/arXiv arXiv 2014
[34]

John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. CoRR, abs/1511.08198

work page internal anchor Pith review Pith/arXiv arXiv 2016
[35]

John Wieting and Kevin Gimpel. 2018. Paranmt-50m: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. In ACL

work page 2018
[36]

John Wieting and Douwe Kiela. 2018. No training required: Exploring random encoders for sentence classification. CoRR, abs/1901.10444

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

Adina Williams, Nikita Nangia, and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426

work page internal anchor Pith review Pith/arXiv arXiv 2017
[38]

Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, and Hongwei Hao. 2015. Convolutional neural networks for text hashing. In Twenty-Fourth International Joint Conference on Artificial Intelligence

work page 2015
[39]

Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010. Self-taught hashing for fast similarity search. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 18--25. ACM

work page 2010
[40]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[41]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. CoRR, abs/1608.04207

work page internal anchor Pith review Pith/arXiv arXiv 2017

[2] [2]

Bowman, Gabor Angeli, Christopher Potts, and Christopher D

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In EMNLP

work page 2015

[3] [3]

Miguel A Carreira-Perpin \'a n and Ramin Raziperchikolaei. 2015. Hashing with binary autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 557--566

work page 2015

[4] [4]

Universal Sentence Encoder

Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal sentence encoder. CoRR, abs/1803.11175

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Ting Chen, Martin Renqiang Min, and Yizhou Sun. 2018. Learning k-way d-dimensional discrete codes for compact embedding representations. arXiv preprint arXiv:1806.09464

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Alexis Conneau and Douwe Kiela. 2018. Senteval: An evaluation toolkit for universal sentence representations. arXiv preprint arXiv:1803.05449

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo \"i c Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In EMNLP

work page 2017

[8] [8]

Andrew M Dai and Quoc V Le. 2015. Semi-supervised sequence learning. In Advances in neural information processing systems, pages 3079--3087

work page 2015

[9] [9]

Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, and Le Song. 2017. Stochastic generative hashing. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 913--922. JMLR. org

work page 2017

[10] [10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, and Lawrence Carin. 2017. Learning generic sentence representations using convolutional neural networks. In EMNLP

work page 2017

[12] [12]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249--256

work page 2010

[13] [13]

Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In HLT-NAACL

work page 2016

[14] [14]

G Hinton. 2012. Neural networks for machine learning. coursera,[video lectures]

work page 2012

[15] [15]

Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144

work page internal anchor Pith review Pith/arXiv arXiv 2016

[16] [16]

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

Yacine Jernite, Samuel R. Bowman, and David A Sontag. 2017. Discourse-based objectives for fast unsupervised sentence representation learning. CoRR, abs/1705.00557

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[19] [19]

Jamie Kiros and William Chan. 2018. Inferlite: Simple universal sentence representations from natural language inference data. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4868--4874

work page 2018

[20] [20]

Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. Skip-thought vectors. In NIPS

work page 2015

[21] [21]

Lajanugen Logeswaran and Honglak Lee. 2018. An efficient framework for learning sentence representations. ICLR

work page 2018

[22] [22]

DisSent: Sentence Representation Learning from Explicit Discourse Relations

Allen Nie, Erin D. Bennett, and Noah D. Goodman. 2017. Dissent: Sentence representation learning from explicit discourse relations. CoRR, abs/1710.04334

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2018. Unsupervised learning of sentence embeddings using compositional n-gram features. In NAACL-HLT

work page 2018

[24] [24]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf

work page 2018

[25] [25]

Sujith Ravi and Zornitsa Kozareva. 2018. Self-governing neural networks for on-device short text classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 804--810

work page 2018

[26] [26]

Sebastian Ruder and Jeremy Howard. 2018. Universal language model fine-tuning for text classification. In ACL

work page 2018

[27] [27]

Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning, 50(7):969--978

work page 2009

[28] [28]

Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Lawrence Carin, and Ricardo Henao. 2018. Nash: Toward end-to-end neural architecture for generative semantic hashing. In ACL

work page 2018

[29] [29]

Raphael Shu and Hideki Nakayama. 2017. Compressing word embeddings via deep compositional code learning. arXiv preprint arXiv:1711.01068

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

Shuai Tang and Virginia R de Sa. 2018. Improving sentence representations with multi-view frameworks. arXiv preprint arXiv:1810.01064

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Julien Tissier, Amaury Habrard, and Christophe Gravier. 2019. Near-lossless binarization of word embeddings. AAAI

work page 2019

[32] [32]

Benjamin Van Durme and Ashwin Lall. 2010. Online generation of locality sensitive hash signatures. In Proceedings of the ACL 2010 Conference Short Papers, pages 231--235. Association for Computational Linguistics

work page 2010

[33] [33]

Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. arXiv preprint arXiv:1408.2927

work page internal anchor Pith review Pith/arXiv arXiv 2014

[34] [34]

John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. CoRR, abs/1511.08198

work page internal anchor Pith review Pith/arXiv arXiv 2016

[35] [35]

John Wieting and Kevin Gimpel. 2018. Paranmt-50m: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. In ACL

work page 2018

[36] [36]

John Wieting and Douwe Kiela. 2018. No training required: Exploring random encoders for sentence classification. CoRR, abs/1901.10444

work page internal anchor Pith review Pith/arXiv arXiv 2018

[37] [37]

Adina Williams, Nikita Nangia, and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426

work page internal anchor Pith review Pith/arXiv arXiv 2017

[38] [38]

Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, and Hongwei Hao. 2015. Convolutional neural networks for text hashing. In Twenty-Fourth International Joint Conference on Artificial Intelligence

work page 2015

[39] [39]

Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010. Self-taught hashing for fast similarity search. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 18--25. ACM

work page 2010

[40] [40]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[41] [41]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page