Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings

Guoyin Wang; Jiwei Li; Minsik Oh

arxiv: 2305.14299 · v3 · submitted 2023-05-23 · 💻 cs.CL · cs.AI

Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings

Minsik Oh , Jiwei Li , Guoyin Wang This is my paper

Pith reviewed 2026-05-24 09:09 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords dialogue sentence embeddingscontrastive learningtemplate-aware augmentationtask-oriented dialogueself-supervised learningslot fillingsemantic compression test

0 comments

The pith

Template information enables stronger sentence embeddings for dialogues through contrastive learning with slot-filling augmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TaDSE to learn high-quality utterance embeddings by feeding template knowledge into a self-supervised contrastive framework. Token-level annotations such as templates and slots are far cheaper to obtain than full sentence-relation labels, so the method aims to reduce annotation cost while still producing embeddings useful for downstream dialogue tasks. A synthetic dataset created by slot-filling further diversifies the template-utterance pairs seen during training. Results on five benchmark datasets show gains over prior state-of-the-art dialogue embedding methods, and a new semantic compression test is introduced that correlates with standard uniformity and alignment metrics.

Core claim

TaDSE is a template-aware contrastive learning method that treats utterances sharing the same template as positive pairs and uses a preliminary slot-filling step to create a synthetically augmented dataset that strengthens those associations; the resulting embeddings outperform previous methods on multiple dialogue benchmarks.

What carries the argument

Template-aware contrastive objective that pulls together utterances linked by shared templates while the slot-filling augmentation enlarges the set of such associations.

If this is right

Dialogue embeddings become obtainable at lower annotation cost than sentence-level labeling approaches.
Performance improves on five standard task-oriented dialogue benchmarks.
A semantic compression test can serve as an analytic check that tracks uniformity and alignment of the learned embeddings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same template-driven contrastive signal could be tested on non-dialogue sentence embedding tasks where partial structural annotations exist.
If slot-filling augmentation works because it forces the model to ignore surface form, similar controlled perturbations might help other contrastive embedding methods.
The correlation between semantic compression and embedding quality metrics suggests a cheap way to monitor training without full downstream evaluation.

Load-bearing premise

Template and slot annotations must be available or cheap to obtain, and the synthetic slot-filling step must not create distribution shift or noise that hurts downstream performance.

What would settle it

Run TaDSE on a dialogue corpus that lacks any template annotations and measure whether downstream task scores still exceed the previous best embedding method.

Figures

Figures reproduced from arXiv: 2305.14299 by Guoyin Wang, Jiwei Li, Minsik Oh.

**Figure 1.** Figure 1: Illustration of how our method improves dataset to learn better representation space, from (a), (b) to (c). Ellipses denote sentence representations from the dataset, belonging to unique intent groups. (a) shows the limited data from in original non-augmented data, (b) shows the augmented dataset from utterance-only augmentation methods where we observe overlaps between intent clusters, and (c) shows enha… view at source ↗

**Figure 2.** Figure 2: We show our template contrastive learning methods in this diagram. The first diagram displays template [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: TaDSE System Diagram. Inference only concern test set utterances, with optional augmented templates. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: T-SNE diagram for SNIPS models, left : SimCSE, middle : TaDSE, right : TaDSE-compressed [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Uniformity / Alignment plot for ATIS models, [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: MASSIVE intent accuracy performance with TaDSE and SimCSE. The horizontal axis is the [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Intent Classification performance on test set during 10 training epochs. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Uniformity / Alignment plot for SNIPS models, trained on augmented source data. Lower values are [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: T-SNE diagram for ATIS representation hyperspace from SimCSE model, trained with our data. The [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: T-SNE diagram for ATIS representation hyperspace from our optimal TaDSE model ( [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

read the original abstract

Learning high quality sentence embeddings from dialogues has drawn increasing attentions as it is essential to solve a variety of dialogue-oriented tasks with low annotation cost. Annotating and gathering utterance relationships in conversations are difficult, while token-level annotations, \eg, entities, slots and templates, are much easier to obtain. Other sentence embedding methods are usually sentence-level self-supervised frameworks and cannot utilize token-level extra knowledge. We introduce Template-aware Dialogue Sentence Embedding (TaDSE), a novel augmentation method that utilizes template information to learn utterance embeddings via self-supervised contrastive learning framework. We further enhance the effect with a synthetically augmented dataset that diversifies utterance-template association, in which slot-filling is a preliminary step. We evaluate TaDSE performance on five downstream benchmark dialogue datasets. The experiment results show that TaDSE achieves significant improvements over previous SOTA methods for dialogue. We further introduce a novel analytic instrument of semantic compression test, for which we discover a correlation with uniformity and alignment. Our code is available at https://github.com/minsik-ai/Template-Contrastive-Embedding

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TaDSE folds templates into contrastive learning for dialogue embeddings and adds synthetic slot-filling, but the augmentation risks noise that could drive the reported gains.

read the letter

The punchline is that TaDSE brings template info into contrastive learning for task-oriented dialogue sentence embeddings and uses synthetic slot-filling to expand the training data. It reports better performance on downstream tasks and introduces a semantic compression test. What is new is the way they incorporate the template structure directly into the positive and negative pair construction for the contrastive loss, plus the augmentation that diversifies the associations. This is a reasonable way to use the token-level annotations that are already available in many dialogue datasets. The paper does well in evaluating on multiple benchmarks and in linking the new test to properties like uniformity and alignment. Having the code public is a plus for anyone wanting to build on it or verify the results. Where it could be soft is in the reliance on the synthetic data generation. Slot-filling to create new pairs might produce examples that don't match real dialogue distributions, which could mean the model learns spurious correlations instead of true semantic content. The concern about label noise or distribution shift is worth checking closely in the experiments. If the full paper has solid ablations showing the augmentation's contribution without those issues, that would strengthen it. Otherwise the gains over SOTA might be overstated. This paper is for researchers focused on improving sentence embeddings specifically for dialogue systems, especially in settings where full supervision is expensive. Someone looking for ways to bootstrap better representations from existing annotations would find it relevant. It deserves a serious referee because it has a clear technical contribution, multiple evaluation datasets, and open code. The central idea holds up as a practical extension even if the augmentation needs more scrutiny. Recommendation: Yes, send to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Template-aware Dialogue Sentence Embedding (TaDSE), a self-supervised contrastive learning approach for task-oriented dialogue utterances that incorporates template information. It augments the training data synthetically via a slot-filling step to diversify utterance-template associations and evaluates the resulting embeddings on five downstream dialogue benchmarks, claiming significant gains over prior SOTA sentence-embedding methods. A new semantic compression test is also proposed and shown to correlate with uniformity and alignment metrics.

Significance. If the performance gains prove robust, the method could meaningfully lower annotation costs for dialogue tasks by exploiting readily available token-level labels (slots, templates) rather than sentence-level relations. The semantic compression diagnostic offers a potentially useful new lens on embedding properties in dialogue settings, and the public code release aids reproducibility.

major comments (3)

[Method (augmentation procedure)] The synthetic slot-filling augmentation (described as diversifying utterance-template pairs) is load-bearing for the self-supervised claim, yet no analysis is supplied on whether generated pairs preserve contextual plausibility or avoid label noise/distribution shift; if implausible slot values are introduced, the contrastive objective may learn spurious correlations rather than semantic content.
[Experiments and results] The central claim of 'significant improvements over previous SOTA methods' on five downstream datasets is asserted without any reported numbers, baselines, ablation results, or statistical tests, preventing verification of whether the gains are real, consistent, or attributable to the template-aware component.
[Semantic compression test] The novel semantic compression test is introduced with a claimed correlation to uniformity and alignment, but its precise definition, computation, and controls are not detailed enough to evaluate whether it constitutes an independent diagnostic or simply restates existing embedding-quality metrics.

minor comments (2)

[Abstract] The abstract states performance claims without any quantitative support or specific metrics, which is atypical for an empirical NLP paper.
[Method] Notation for templates, slots, and the contrastive loss is introduced without an explicit equation or diagram showing how positive/negative pairs are constructed from the augmented data.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments identify important areas where additional detail and analysis would strengthen the presentation. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Method (augmentation procedure)] The synthetic slot-filling augmentation (described as diversifying utterance-template pairs) is load-bearing for the self-supervised claim, yet no analysis is supplied on whether generated pairs preserve contextual plausibility or avoid label noise/distribution shift; if implausible slot values are introduced, the contrastive objective may learn spurious correlations rather than semantic content.

Authors: We agree that explicit validation of the augmentation quality is important for supporting the self-supervised claim. The slot-filling step draws replacement values from the empirical distribution of slots observed in the original training data, which is intended to limit distribution shift. However, we acknowledge that no quantitative or qualitative analysis of plausibility or noise was included. In the revised manuscript we will add an analysis subsection that reports (a) the fraction of augmented utterances whose slot values appear in the original corpus and (b) a small-scale human plausibility rating on a random sample of 100 augmented pairs, together with any observed impact on downstream performance. revision: yes
Referee: [Experiments and results] The central claim of 'significant improvements over previous SOTA methods' on five downstream datasets is asserted without any reported numbers, baselines, ablation results, or statistical tests, preventing verification of whether the gains are real, consistent, or attributable to the template-aware component.

Authors: The full manuscript contains tables in Section 4 that report absolute performance numbers, comparisons against prior sentence-embedding baselines (including SOTA methods), and component ablations for TaDSE. Nevertheless, the referee is correct that statistical significance tests across multiple random seeds are not reported. We will add these tests (paired t-tests or bootstrap confidence intervals) in the revision so that readers can assess whether observed gains are consistent and attributable to the template-aware contrastive objective. revision: partial
Referee: [Semantic compression test] The novel semantic compression test is introduced with a claimed correlation to uniformity and alignment, but its precise definition, computation, and controls are not detailed enough to evaluate whether it constitutes an independent diagnostic or simply restates existing embedding-quality metrics.

Authors: We will expand the description of the semantic compression test in the revised manuscript. The expansion will include the exact mathematical definition, the algorithm used to compute the compression ratio, the precise controls employed when measuring correlation with uniformity and alignment, and an explicit comparison showing that the test captures a distinct property (information density under template masking) not reducible to the two existing metrics. revision: yes

Circularity Check

0 steps flagged

No circularity; augmentation uses provided annotations and downstream evaluation is external

full rationale

The paper presents TaDSE as a contrastive learning method that incorporates template and slot annotations (explicitly noted as easier to obtain) to construct positive/negative pairs, followed by evaluation on five held-out downstream dialogue benchmarks. No equations, derivations, or claims reduce a 'prediction' to a fitted input by construction, nor do any steps rely on self-citation chains, uniqueness theorems from the same authors, or ansatzes smuggled via prior work. The synthetic slot-filling step is an explicit design choice whose output is tested externally rather than assumed to match the target distribution by definition. The derivation chain therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that templates and slots are cheaper to annotate than full utterance relations and that contrastive learning can exploit them; no explicit free parameters, invented entities, or non-standard axioms are stated in the abstract.

axioms (1)

domain assumption Token-level annotations such as templates are much easier to obtain than utterance-level relationship labels.
Stated in the abstract as motivation for the method.

pith-pipeline@v0.9.0 · 5718 in / 1166 out tokens · 24056 ms · 2026-05-24T09:09:28.791375+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 4 internal anchors

[1]

Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, and Verena Rieser. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.588 SLURP : A spoken language understanding resource package . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7252--7262, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.588 2020
[2]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020 a . A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597--1607. PMLR

work page 2020
[3]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020 b . A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709

work page internal anchor Pith review Pith/arXiv arXiv 2020
[4]

Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. 2020 c . Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029

work page arXiv 2020
[5]

Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljacic, Shang-Wen Li, Scott Yih, Yoon Kim, and James Glass. 2022. https://doi.org/10.18653/v1/2022.naacl-main.311 D iff CSE : Difference-based contrastive learning for sentence embeddings . In Proceedings of the 2022 Conference of the North American Chapter of the Association...

work page doi:10.18653/v1/2022.naacl-main.311 2022
[6]

Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Maël Primet, and Joseph Dureau. 2018. http://arxiv.org/abs/1805.10190 Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

work page doi:10.18653/v1/n19-1423 2019
[8]

Kawin Ethayarajh. 2019. https://doi.org/10.18653/v1/D19-1006 How contextual are contextualized word representations? C omparing the geometry of BERT , ELM o, and GPT -2 embeddings . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCN...

work page doi:10.18653/v1/d19-1006 2019
[9]

Jack FitzGerald, Christopher Hench, Charith Peris, Scott Mackie, Kay Rottmann, Ana Sanchez, Aaron Nash, Liam Urbach, Vishesh Kakarala, Richa Singh, Swetha Ranganath, Laurie Crist, Misha Britan, Wouter Leeuwis, Gokhan Tur, and Prem Natarajan. 2022. http://arxiv.org/abs/2204.08582 Massive: A 1m-example multilingual natural language understanding dataset wit...

work page arXiv 2022
[10]

Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tieyan Liu. 2019. https://openreview.net/forum?id=SkEYojRqtm Representation degeneration problem in training natural language generation models . In International Conference on Learning Representations

work page 2019
[11]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.552 S im CSE : Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894--6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics

work page doi:10.18653/v1/2021.emnlp-main.552 2021
[12]

John Giorgi, Osvald Nitski, Bo Wang, and Gary Bader. 2021. https://doi.org/10.18653/v1/2021.acl-long.72 D e CLUTR : Deep contrastive learning for unsupervised textual representations . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volum...

work page doi:10.18653/v1/2021.acl-long.72 2021
[13]

Hadsell, S

R. Hadsell, S. Chopra, and Y. LeCun. 2006. https://doi.org/10.1109/CVPR.2006.100 Dimensionality reduction by learning an invariant mapping . In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), volume 2, pages 1735--1742

work page doi:10.1109/cvpr.2006.100 2006
[14]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729--9738

work page 2020
[15]

Hemphill, John J

Charles T. Hemphill, John J. Godfrey, and George R. Doddington. 1990. https://aclanthology.org/H90-1021 The ATIS spoken language systems pilot corpus . In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, P ennsylvania, June 24-27,1990

work page 1990
[16]

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Yutai Hou, Wanxiang Che, Yongkui Lai, Zhihan Zhou, Yijia Liu, Han Liu, and Ting Liu. 2020. Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. arXiv preprint arXiv:2006.05702

work page arXiv 2020
[18]

Ting Jiang, Jian Jiao, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Denvy Deng, and Qi Zhang. 2022. https://doi.org/10.48550/ARXIV.2201.04337 Promptbert: Improving bert sentence embeddings with prompts

work page doi:10.48550/arxiv.2201.04337 2022
[19]

Mihir Kale and Abhinav Rastogi. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.527 Template guided text generation for task-oriented dialogue . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6505--6520, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.527 2020
[20]

Young-Bum Kim, Dongchan Kim, Joo-Kyung Kim, and Ruhi Sarikaya. 2018. https://doi.org/10.18653/v1/N18-3003 A scalable neural shortlisting-reranking approach for large-scale domain classification in natural language understanding . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human La...

work page doi:10.18653/v1/n18-3003 2018
[21]

Jason Krone, Yi Zhang, and Mona Diab. 2020. Learning to classify intents and slot labels given a handful of examples. arXiv preprint arXiv:2004.10793

work page arXiv 2020
[22]

Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K

Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, and Jason Mars. 2019. https://doi.org/10.18653/v1/D19-1131 An evaluation dataset for intent classification and out-of-scope prediction . In Proceedings of the 2019 Conference on Empirical M...

work page doi:10.18653/v1/d19-1131 2019
[23]

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.733 On the sentence embeddings from pre-trained language models . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9119--9130, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.733 2020
[24]

Han Li, Sunghyun Park, Aswarth Dara, Jinseok Nam, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, and Ruhi Sarikaya. 2021. https://doi.org/10.48550/ARXIV.2103.03373 Neural model robustness for skill routing in large-scale conversational ai systems: A design choice exploration

work page doi:10.48550/arxiv.2103.03373 2021
[25]

Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, and James Zou. 2022. https://doi.org/10.48550/ARXIV.2203.02053 Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning . In Thirty-sixth Conference on Neural Information Processing Systems, NeurIPS 2022

work page doi:10.48550/arxiv.2203.02053 2022
[26]

Che Liu, Rui Wang, Jinghua Liu, Jian Sun, Fei Huang, and Luo Si. 2021. Dialoguecse: Dialogue-based contrastive learning of sentence embeddings. arXiv preprint arXiv:2109.12599

work page arXiv 2021
[27]

Xingkun Liu, Arash Eshghi, Pawel Swietojanski, and Verena Rieser. 2019. http://arxiv.org/abs/1903.05566 Benchmarking natural language understanding services for building conversational agents

work page internal anchor Pith review Pith/arXiv arXiv 2019
[28]

Sosuke Nishikawa, Ryokan Ri, Ikuya Yamada, Yoshimasa Tsuruoka, and Isao Echizen. 2022. https://doi.org/10.18653/v1/2022.naacl-main.284 EASE : Entity-aware contrastive learning of sentence embedding . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3870...

work page doi:10.18653/v1/2022.naacl-main.284 2022
[29]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. http://proceedings.mlr.press/v139/radford21a.html Learning transferable visual models from natural language supervision . In Proceedings of the 38th International Co...

work page 2021
[30]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...

work page doi:10.18653/v1/d19-1410 2019
[31]

Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30

work page 2017
[32]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. Advances in neural information processing systems, 29

work page 2016
[33]

Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929--9939. PMLR

work page 2020
[34]

Dian Yu, Luheng He, Yuan Zhang, Xinya Du, Panupong Pasupat, and Qi Li. 2021. Few-shot intent classification and slot filling with retrieved examples. arXiv preprint arXiv:2104.05763

work page arXiv 2021
[35]

Zhihan Zhou, Dejiao Zhang, Wei Xiao, Nicholas Dingwall, Xiaofei Ma, Andrew O Arnold, and Bing Xiang. 2022. Learning dialogue representations from consecutive utterances. NAACL

work page 2022
[36]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[37]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, and Verena Rieser. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.588 SLURP : A spoken language understanding resource package . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7252--7262, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.588 2020

[2] [2]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020 a . A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597--1607. PMLR

work page 2020

[3] [3]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020 b . A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709

work page internal anchor Pith review Pith/arXiv arXiv 2020

[4] [4]

Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. 2020 c . Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029

work page arXiv 2020

[5] [5]

Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljacic, Shang-Wen Li, Scott Yih, Yoon Kim, and James Glass. 2022. https://doi.org/10.18653/v1/2022.naacl-main.311 D iff CSE : Difference-based contrastive learning for sentence embeddings . In Proceedings of the 2022 Conference of the North American Chapter of the Association...

work page doi:10.18653/v1/2022.naacl-main.311 2022

[6] [6]

Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Maël Primet, and Joseph Dureau. 2018. http://arxiv.org/abs/1805.10190 Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

work page doi:10.18653/v1/n19-1423 2019

[8] [8]

Kawin Ethayarajh. 2019. https://doi.org/10.18653/v1/D19-1006 How contextual are contextualized word representations? C omparing the geometry of BERT , ELM o, and GPT -2 embeddings . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCN...

work page doi:10.18653/v1/d19-1006 2019

[9] [9]

Jack FitzGerald, Christopher Hench, Charith Peris, Scott Mackie, Kay Rottmann, Ana Sanchez, Aaron Nash, Liam Urbach, Vishesh Kakarala, Richa Singh, Swetha Ranganath, Laurie Crist, Misha Britan, Wouter Leeuwis, Gokhan Tur, and Prem Natarajan. 2022. http://arxiv.org/abs/2204.08582 Massive: A 1m-example multilingual natural language understanding dataset wit...

work page arXiv 2022

[10] [10]

Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tieyan Liu. 2019. https://openreview.net/forum?id=SkEYojRqtm Representation degeneration problem in training natural language generation models . In International Conference on Learning Representations

work page 2019

[11] [11]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.552 S im CSE : Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894--6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics

work page doi:10.18653/v1/2021.emnlp-main.552 2021

[12] [12]

John Giorgi, Osvald Nitski, Bo Wang, and Gary Bader. 2021. https://doi.org/10.18653/v1/2021.acl-long.72 D e CLUTR : Deep contrastive learning for unsupervised textual representations . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volum...

work page doi:10.18653/v1/2021.acl-long.72 2021

[13] [13]

Hadsell, S

R. Hadsell, S. Chopra, and Y. LeCun. 2006. https://doi.org/10.1109/CVPR.2006.100 Dimensionality reduction by learning an invariant mapping . In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), volume 2, pages 1735--1742

work page doi:10.1109/cvpr.2006.100 2006

[14] [14]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729--9738

work page 2020

[15] [15]

Hemphill, John J

Charles T. Hemphill, John J. Godfrey, and George R. Doddington. 1990. https://aclanthology.org/H90-1021 The ATIS spoken language systems pilot corpus . In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, P ennsylvania, June 24-27,1990

work page 1990

[16] [16]

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Yutai Hou, Wanxiang Che, Yongkui Lai, Zhihan Zhou, Yijia Liu, Han Liu, and Ting Liu. 2020. Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. arXiv preprint arXiv:2006.05702

work page arXiv 2020

[18] [18]

Ting Jiang, Jian Jiao, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Denvy Deng, and Qi Zhang. 2022. https://doi.org/10.48550/ARXIV.2201.04337 Promptbert: Improving bert sentence embeddings with prompts

work page doi:10.48550/arxiv.2201.04337 2022

[19] [19]

Mihir Kale and Abhinav Rastogi. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.527 Template guided text generation for task-oriented dialogue . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6505--6520, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.527 2020

[20] [20]

Young-Bum Kim, Dongchan Kim, Joo-Kyung Kim, and Ruhi Sarikaya. 2018. https://doi.org/10.18653/v1/N18-3003 A scalable neural shortlisting-reranking approach for large-scale domain classification in natural language understanding . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human La...

work page doi:10.18653/v1/n18-3003 2018

[21] [21]

Jason Krone, Yi Zhang, and Mona Diab. 2020. Learning to classify intents and slot labels given a handful of examples. arXiv preprint arXiv:2004.10793

work page arXiv 2020

[22] [22]

Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K

Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, and Jason Mars. 2019. https://doi.org/10.18653/v1/D19-1131 An evaluation dataset for intent classification and out-of-scope prediction . In Proceedings of the 2019 Conference on Empirical M...

work page doi:10.18653/v1/d19-1131 2019

[23] [23]

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.733 On the sentence embeddings from pre-trained language models . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9119--9130, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.733 2020

[24] [24]

Han Li, Sunghyun Park, Aswarth Dara, Jinseok Nam, Sungjin Lee, Young-Bum Kim, Spyros Matsoukas, and Ruhi Sarikaya. 2021. https://doi.org/10.48550/ARXIV.2103.03373 Neural model robustness for skill routing in large-scale conversational ai systems: A design choice exploration

work page doi:10.48550/arxiv.2103.03373 2021

[25] [25]

Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, and James Zou. 2022. https://doi.org/10.48550/ARXIV.2203.02053 Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning . In Thirty-sixth Conference on Neural Information Processing Systems, NeurIPS 2022

work page doi:10.48550/arxiv.2203.02053 2022

[26] [26]

Che Liu, Rui Wang, Jinghua Liu, Jian Sun, Fei Huang, and Luo Si. 2021. Dialoguecse: Dialogue-based contrastive learning of sentence embeddings. arXiv preprint arXiv:2109.12599

work page arXiv 2021

[27] [27]

Xingkun Liu, Arash Eshghi, Pawel Swietojanski, and Verena Rieser. 2019. http://arxiv.org/abs/1903.05566 Benchmarking natural language understanding services for building conversational agents

work page internal anchor Pith review Pith/arXiv arXiv 2019

[28] [28]

Sosuke Nishikawa, Ryokan Ri, Ikuya Yamada, Yoshimasa Tsuruoka, and Isao Echizen. 2022. https://doi.org/10.18653/v1/2022.naacl-main.284 EASE : Entity-aware contrastive learning of sentence embedding . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3870...

work page doi:10.18653/v1/2022.naacl-main.284 2022

[29] [29]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. http://proceedings.mlr.press/v139/radford21a.html Learning transferable visual models from natural language supervision . In Proceedings of the 38th International Co...

work page 2021

[30] [30]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...

work page doi:10.18653/v1/d19-1410 2019

[31] [31]

Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30

work page 2017

[32] [32]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. Advances in neural information processing systems, 29

work page 2016

[33] [33]

Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929--9939. PMLR

work page 2020

[34] [34]

Dian Yu, Luheng He, Yuan Zhang, Xinya Du, Panupong Pasupat, and Qi Li. 2021. Few-shot intent classification and slot filling with retrieved examples. arXiv preprint arXiv:2104.05763

work page arXiv 2021

[35] [35]

Zhihan Zhou, Dejiao Zhang, Wei Xiao, Nicholas Dingwall, Xiaofei Ma, Andrew O Arnold, and Bing Xiang. 2022. Learning dialogue representations from consecutive utterances. NAACL

work page 2022

[36] [36]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[37] [37]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page