SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

Huan Sun; Simon Lin; Soheil Moosavinasab; Xiang Yue; Yungui Huang; Zhen Wang

arxiv: 1906.09285 · v1 · pith:XCVZTTPQnew · submitted 2019-06-21 · 💻 cs.CL

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

Zhen Wang , Xiang Yue , Soheil Moosavinasab , Yungui Huang , Simon Lin , Huan Sun This is my paper

Pith reviewed 2026-05-25 18:43 UTC · model grok-4.3

classification 💻 cs.CL

keywords synonym discoveryprivacy-aware clinical datasurface formglobal contextmedical termsout-of-vocabularyco-occurrence countsclinical texts

0 comments

The pith

SurfCon discovers medical synonyms from aggregated co-occurrence counts without raw clinical texts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Privacy rules often prevent access to full clinical texts even though they contain rich health information. SurfCon solves synonym discovery using only extracted medical terms paired with their aggregated co-occurrence counts. The method pairs surface-form similarity, which catches terms that look alike, with global context patterns that surface semantically related terms in different forms. It also manages queries for terms absent from the supplied data. Experiments on public privacy-aware datasets show consistent gains over strong baselines across multiple settings.

Core claim

SurfCon is a framework that leverages surface form information to detect synonyms with similar appearances and global context information from aggregated co-occurrence counts to detect semantically similar synonyms, allowing synonym discovery on privacy-aware clinical data while also addressing out-of-vocabulary query terms.

What carries the argument

SurfCon framework with a surface form module and a complementary global context module that together operate on aggregated term co-occurrences.

If this is right

SurfCon identifies both surface-similar and semantically similar synonyms from the same aggregated input.
The framework handles out-of-vocabulary query terms not present in the given data.
All processing stays within privacy-aware aggregated counts and never requires raw patient texts.
Performance exceeds strong baseline methods by large margins under varied experimental conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hospitals could generate and share synonym resources derived from their local aggregates while keeping raw records private.
The same dual-module approach may transfer to other domains that release only aggregated term statistics.
Combining string-level features with global statistics can offset the loss of full sentence context.

Load-bearing premise

Surface form details together with global co-occurrence patterns in aggregated data contain enough signal to identify accurate synonyms for both similar and dissimilar surface forms.

What would settle it

Run SurfCon on a held-out list of established medical synonyms and measure whether correct synonyms are ranked substantially lower than incorrect ones or whether many known synonyms are missed entirely.

Figures

Figures reproduced from arXiv: 1906.09285 by Huan Sun, Simon Lin, Soheil Moosavinasab, Xiang Yue, Yungui Huang, Zhen Wang.

**Figure 2.** Figure 2: Framework overview. For each query term, a list [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Dynamic Context Matching Mechanism. In contrast to the static approach, we propose the dynamic context matching mechanism (as shown in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Performance w.r.t. (a) the coefficient of context [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Unstructured clinical texts contain rich health-related information. To better utilize the knowledge buried in clinical texts, discovering synonyms for a medical query term has become an important task. Recent automatic synonym discovery methods leveraging raw text information have been developed. However, to preserve patient privacy and security, it is usually quite difficult to get access to large-scale raw clinical texts. In this paper, we study a new setting named synonym discovery on privacy-aware clinical data (i.e., medical terms extracted from the clinical texts and their aggregated co-occurrence counts, without raw clinical texts). To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i.e., the surface form information, and the global context information for synonym discovery. In particular, the surface form module enables us to detect synonyms that look similar while the global context module plays a complementary role to discover synonyms that are semantically similar but in different surface forms, and both allow us to deal with the OOV query issue (i.e., when the query is not found in the given data). We conduct extensive experiments and case studies on publicly available privacy-aware clinical data, and show that SurfCon can outperform strong baseline methods by large margins under various settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SurfCon defines a useful privacy-constrained synonym task and combines surface plus context signals, but the OOV claim for the global module does not hold up.

read the letter

The paper's real contribution is framing synonym discovery when only term lists and aggregated co-occurrence counts are available, no raw clinical notes. SurfCon splits the work into a surface-form module for look-alike terms and a global-context module for semantic matches, then tests on public privacy-aware data. That setting matches a genuine access problem in clinical NLP, and the two-module split is a reasonable way to cover different synonym types without needing full text.

Referee Report

1 major / 1 minor

Summary. The paper proposes SurfCon, a framework for synonym discovery on privacy-aware clinical data (extracted medical terms and aggregated co-occurrence counts, without raw text). It uses a surface form module for similar-looking synonyms and a global context module for semantically similar but dissimilar-surface synonyms; both are claimed to handle OOV queries, with extensive experiments showing large-margin outperformance over strong baselines under various settings.

Significance. If the results are robust, the work addresses a practical constraint in medical NLP by enabling synonym discovery from privacy-preserving aggregated data rather than raw clinical notes, which could facilitate knowledge extraction in regulated settings.

major comments (1)

[Abstract] Abstract: the claim that 'both [modules] allow us to deal with the OOV query issue' is inconsistent with the global context module's definition via aggregated co-occurrence counts in the privacy-aware data. An OOV query term is absent from that data by definition and therefore has no associated counts, so the global context module supplies no signal; only the surface form module remains. This directly affects the central claim that the two information types together suffice for semantically similar synonyms on OOV queries, which underpins the reported large-margin gains under 'various settings' that include OOV.

minor comments (1)

[Abstract] The abstract supplies no experimental details (datasets, baselines, metrics, or OOV-specific evaluation protocol), making it impossible to assess whether the outperformance claim is supported.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and for highlighting the inconsistency in the abstract's claim regarding OOV queries. We address this point directly below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'both [modules] allow us to deal with the OOV query issue' is inconsistent with the global context module's definition via aggregated co-occurrence counts in the privacy-aware data. An OOV query term is absent from that data by definition and therefore has no associated counts, so the global context module supplies no signal; only the surface form module remains. This directly affects the central claim that the two information types together suffice for semantically similar synonyms on OOV queries, which underpins the reported large-margin gains under 'various settings' that include OOV.

Authors: We agree that the abstract wording is imprecise and inconsistent with the technical definition of the global context module. By construction, an OOV query has no co-occurrence counts in the privacy-aware data, so the global context module cannot contribute any signal for such queries; only the surface form module applies. The two modules are complementary for in-vocabulary queries, where global context can identify semantically similar terms with dissimilar surface forms. We will revise the abstract (and any corresponding statements in the introduction or method sections) to state clearly that the surface form module handles OOV queries while the global context module augments performance on in-vocabulary queries. We will also verify that the experimental results under the OOV setting are presented without implying contribution from the global context module. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes the SurfCon framework leveraging surface-form and global-context modules on aggregated privacy-aware clinical data for synonym discovery, including OOV handling. No equations, derivations, or parameter-fitting steps are described in the abstract or text that reduce any prediction or result to its inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing elements. The method is presented as a direct combination of two information types without self-referential definitions or renaming of known results. The derivation remains self-contained as an empirical method proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review limited to abstract; ledger populated from stated assumptions in the problem definition and module descriptions.

axioms (2)

domain assumption Aggregated co-occurrence counts encode semantic similarity between medical terms
Invoked by the global context module description in the abstract.
domain assumption Surface form similarity is a reliable signal for synonymy in clinical terminology
Invoked by the surface form module description in the abstract.

pith-pipeline@v0.9.0 · 5763 in / 1163 out tokens · 24236 ms · 2026-05-25T18:43:27.925573+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Ballesteros, C

M. Ballesteros, C. Dyer, and N. A. Smith. 2015. Improved transition-based parsing by modeling characters instead of words with LSTMs. In EMNLP

work page 2015
[2]

A. L. Beam, B. Kompa, I. Fried, N. P. Palmer, X. Shi, T. Cai, and I. S. Kohane. 2018. Clinical Concept Embeddings Learned from Massive Sources of Medical Data. arXiv preprint arXiv:1804.01486 (2018)

work page arXiv 2018
[3]

Bodenreider

O. Bodenreider. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research 32, suppl_1 (2004), D267–D270

work page 2004
[4]

Bojanowski, E

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. 2016. Enriching word vectors with subword information. TACL (2016)

work page 2016
[5]

Z. Cao, T. Qin, T. Liu, M. Tsai, and H. Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML

work page 2007
[6]

Ohio Supercomputer Center. 1987. Ohio Supercomputer Center. http://osc.edu/ ark:/19495/f5s1ph73

work page 1987
[7]

D. A. Dorr, W.F. Phillips, S. Phansalkar, S. A. Sims, and J. F. Hurdle. 2006. Assessing the difficulty and time cost of de-identification in clinical narratives. Methods of information in medicine (2006)

work page 2006
[8]

S. G. Finlayson, P. LePendu, and N. H. Shah. 2014. Building the graph of medicine from millions of clinical narratives. Scientific data 1 (2014), 140032

work page 2014
[9]

L Garfinkel

S. L Garfinkel. 2015. De-identification of personal information. NISTIR (2015)

work page 2015
[10]

W. H. Gomaa and A. A. Fahmy. 2013. A survey of text similarity approaches. In IJCA

work page 2013
[11]

Hagiwara, Y

M. Hagiwara, Y. Ogawa, and K. Toyama. 2009. Supervised synonym acquisition using distributional features and syntactic patterns. IMT (2009)

work page 2009
[12]

Hamilton, Z

W. Hamilton, Z. Ying, and J. Leskovec. 2017. Inductive representation learning on large graphs. In NeurIPS

work page 2017
[13]

Hashimoto, Y

K. Hashimoto, Y. Tsuruoka, R. Socher, and o. 2017. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. In ACL

work page 2017
[14]

Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush. 2016. Character-Aware Neural Language Models.. In AAAI

work page 2016
[15]

D. P. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In ICLR

work page 2015
[16]

LePendu, S

P. LePendu, S. V. Iyer, C. Fairon, and N. H. Shah. 2012. Annotation analysis for testing drug safety signals using unstructured clinical notes. In Journal of biomedical semantics, Vol. 3. BioMed Central, S5

work page 2012
[17]

Levy and Y

O. Levy and Y. Goldberg. 2014. Linguistic regularities in sparse and explicit word representations. In ACL

work page 2014
[18]

Levy and Y

O. Levy and Y. Goldberg. 2014. Neural word embedding as implicit matrix factorization. In NeurIPS

work page 2014
[19]

Liang, P

J. Liang, P. Jacobs, J. Sun, and S. Parthasarathy. 2018. Semi-supervised embedding in attributed networks with outliers. In SDM

work page 2018
[20]

H. J. Lowe, T. A. Ferris, P. M. Hernandez, and S. C. Weber. 2009. STRIDE–An integrated standards-based translational research informatics platform. InAMIA

work page 2009
[21]

Matsuo, T

Y. Matsuo, T. Sakaki, and K. Uchiyama. 2006. Graph-based word clustering using a web search engine. In EMNLP

work page 2006
[22]

Efficient Estimation of Word Representations in Vector Space

T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[23]

Mikolov, I

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS

work page 2013
[24]

Mueller and A

J. Mueller and A. Thyagarajan. 2016. Siamese Recurrent Architectures for Learn- ing Sentence Similarity.. In AAAI

work page 2016
[25]

Neculoiu, M

P. Neculoiu, M. Versteegh, and M. Rotaru. 2016. Learning text similarity with siamese recurrent networks. In Workshop on Representation Learning for NLP

work page 2016
[26]

S. V. Pakhomov, G. Finley, R. McEwan, Y. Wang, and G. B. Melton. 2016. Corpus domain effects on distributional semantic modeling of medical terms. Bioinfor- matics 32, 23 (2016), 3635–3644

work page 2016
[27]

Paszke, S

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, et al . 2017. Automatic differentiation in PyTorch. InNIPS-W

work page 2017
[28]

Pennington, R

J. Pennington, R. Socher, and C. Manning. 2014. Glove: Global vectors for word representation. In EMNLP

work page 2014
[29]

Perozzi, R

B. Perozzi, R. Al-Rfou, and S. Skiena. 2014. Deepwalk: Online learning of social representations. In KDD

work page 2014
[30]

M. Qu, X. Ren, and J. Han. 2017. Automatic synonym discovery with knowledge bases. In KDD

work page 2017
[31]

J. Shen, R. Lv, X. Ren, M. Vanni, B. Sadler, and J. Han. 2019. Mining Entity Synonyms with Efficient Neural Set Generation. In AAAI

work page 2019
[32]

Stubbs and Ö

A. Stubbs and Ö. Uzuner. 2015. Annotating longitudinal clinical narratives for de- identification: The 2014 i2b2/UTHealth corpus. Journal of biomedical informatics 58 (2015), S20–S29

work page 2015
[33]

C. N. Ta, M. Dumontier, G. Hripcsak, N. P. Tatonetti, and C. Weng. 2018. Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Scientific data 5 (2018), 180273

work page 2018
[34]

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. 2015. Line: Large-scale information network embedding. In WWW

work page 2015
[35]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In NeurIPS

work page 2017
[36]

Velickovic, G

P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. 2018. Graph attention networks. In ICLR

work page 2018
[37]

C. Wang, L. Cao, and B. Zhou. 2015. Medical synonym extraction with concept space models. In IJCAI

work page 2015
[38]

Q. Wang, B. Wang, and L. Guo. 2015. Knowledge Base Completion Using Embed- dings and Rules.. In IJCAI

work page 2015
[39]

Weeds, D

J. Weeds, D. Weir, and D. McCarthy. 2004. Characterising measures of lexical distributional similarity. In COLING

work page 2004
[40]

Wieting, M

J. Wieting, M. Bansal, K. Gimpel, and K. Livescu. 2016. Charagram: Embedding words and sentences via character n-grams. In EMNLP

work page 2016
[41]

Z. Yang, W. W. Cohen, and R. Salakhutdinov. 2016. Revisiting semi-supervised learning with graph embeddings. In ICML

work page 2016
[42]

Zhang, Y

C. Zhang, Y. Li, N. Du, W. Fan, and P. S. Yu. 2018. SynonymNet: Multi-context Bilateral Matching for Entity Synonyms. arXiv preprint arXiv:1901.00056 (2018)

work page arXiv 2018

[1] [1]

Ballesteros, C

M. Ballesteros, C. Dyer, and N. A. Smith. 2015. Improved transition-based parsing by modeling characters instead of words with LSTMs. In EMNLP

work page 2015

[2] [2]

A. L. Beam, B. Kompa, I. Fried, N. P. Palmer, X. Shi, T. Cai, and I. S. Kohane. 2018. Clinical Concept Embeddings Learned from Massive Sources of Medical Data. arXiv preprint arXiv:1804.01486 (2018)

work page arXiv 2018

[3] [3]

Bodenreider

O. Bodenreider. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research 32, suppl_1 (2004), D267–D270

work page 2004

[4] [4]

Bojanowski, E

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. 2016. Enriching word vectors with subword information. TACL (2016)

work page 2016

[5] [5]

Z. Cao, T. Qin, T. Liu, M. Tsai, and H. Li. 2007. Learning to rank: from pairwise approach to listwise approach. In ICML

work page 2007

[6] [6]

Ohio Supercomputer Center. 1987. Ohio Supercomputer Center. http://osc.edu/ ark:/19495/f5s1ph73

work page 1987

[7] [7]

D. A. Dorr, W.F. Phillips, S. Phansalkar, S. A. Sims, and J. F. Hurdle. 2006. Assessing the difficulty and time cost of de-identification in clinical narratives. Methods of information in medicine (2006)

work page 2006

[8] [8]

S. G. Finlayson, P. LePendu, and N. H. Shah. 2014. Building the graph of medicine from millions of clinical narratives. Scientific data 1 (2014), 140032

work page 2014

[9] [9]

L Garfinkel

S. L Garfinkel. 2015. De-identification of personal information. NISTIR (2015)

work page 2015

[10] [10]

W. H. Gomaa and A. A. Fahmy. 2013. A survey of text similarity approaches. In IJCA

work page 2013

[11] [11]

Hagiwara, Y

M. Hagiwara, Y. Ogawa, and K. Toyama. 2009. Supervised synonym acquisition using distributional features and syntactic patterns. IMT (2009)

work page 2009

[12] [12]

Hamilton, Z

W. Hamilton, Z. Ying, and J. Leskovec. 2017. Inductive representation learning on large graphs. In NeurIPS

work page 2017

[13] [13]

Hashimoto, Y

K. Hashimoto, Y. Tsuruoka, R. Socher, and o. 2017. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. In ACL

work page 2017

[14] [14]

Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush. 2016. Character-Aware Neural Language Models.. In AAAI

work page 2016

[15] [15]

D. P. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In ICLR

work page 2015

[16] [16]

LePendu, S

P. LePendu, S. V. Iyer, C. Fairon, and N. H. Shah. 2012. Annotation analysis for testing drug safety signals using unstructured clinical notes. In Journal of biomedical semantics, Vol. 3. BioMed Central, S5

work page 2012

[17] [17]

Levy and Y

O. Levy and Y. Goldberg. 2014. Linguistic regularities in sparse and explicit word representations. In ACL

work page 2014

[18] [18]

Levy and Y

O. Levy and Y. Goldberg. 2014. Neural word embedding as implicit matrix factorization. In NeurIPS

work page 2014

[19] [19]

Liang, P

J. Liang, P. Jacobs, J. Sun, and S. Parthasarathy. 2018. Semi-supervised embedding in attributed networks with outliers. In SDM

work page 2018

[20] [20]

H. J. Lowe, T. A. Ferris, P. M. Hernandez, and S. C. Weber. 2009. STRIDE–An integrated standards-based translational research informatics platform. InAMIA

work page 2009

[21] [21]

Matsuo, T

Y. Matsuo, T. Sakaki, and K. Uchiyama. 2006. Graph-based word clustering using a web search engine. In EMNLP

work page 2006

[22] [22]

Efficient Estimation of Word Representations in Vector Space

T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[23] [23]

Mikolov, I

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS

work page 2013

[24] [24]

Mueller and A

J. Mueller and A. Thyagarajan. 2016. Siamese Recurrent Architectures for Learn- ing Sentence Similarity.. In AAAI

work page 2016

[25] [25]

Neculoiu, M

P. Neculoiu, M. Versteegh, and M. Rotaru. 2016. Learning text similarity with siamese recurrent networks. In Workshop on Representation Learning for NLP

work page 2016

[26] [26]

S. V. Pakhomov, G. Finley, R. McEwan, Y. Wang, and G. B. Melton. 2016. Corpus domain effects on distributional semantic modeling of medical terms. Bioinfor- matics 32, 23 (2016), 3635–3644

work page 2016

[27] [27]

Paszke, S

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, et al . 2017. Automatic differentiation in PyTorch. InNIPS-W

work page 2017

[28] [28]

Pennington, R

J. Pennington, R. Socher, and C. Manning. 2014. Glove: Global vectors for word representation. In EMNLP

work page 2014

[29] [29]

Perozzi, R

B. Perozzi, R. Al-Rfou, and S. Skiena. 2014. Deepwalk: Online learning of social representations. In KDD

work page 2014

[30] [30]

M. Qu, X. Ren, and J. Han. 2017. Automatic synonym discovery with knowledge bases. In KDD

work page 2017

[31] [31]

J. Shen, R. Lv, X. Ren, M. Vanni, B. Sadler, and J. Han. 2019. Mining Entity Synonyms with Efficient Neural Set Generation. In AAAI

work page 2019

[32] [32]

Stubbs and Ö

A. Stubbs and Ö. Uzuner. 2015. Annotating longitudinal clinical narratives for de- identification: The 2014 i2b2/UTHealth corpus. Journal of biomedical informatics 58 (2015), S20–S29

work page 2015

[33] [33]

C. N. Ta, M. Dumontier, G. Hripcsak, N. P. Tatonetti, and C. Weng. 2018. Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Scientific data 5 (2018), 180273

work page 2018

[34] [34]

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. 2015. Line: Large-scale information network embedding. In WWW

work page 2015

[35] [35]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In NeurIPS

work page 2017

[36] [36]

Velickovic, G

P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. 2018. Graph attention networks. In ICLR

work page 2018

[37] [37]

C. Wang, L. Cao, and B. Zhou. 2015. Medical synonym extraction with concept space models. In IJCAI

work page 2015

[38] [38]

Q. Wang, B. Wang, and L. Guo. 2015. Knowledge Base Completion Using Embed- dings and Rules.. In IJCAI

work page 2015

[39] [39]

Weeds, D

J. Weeds, D. Weir, and D. McCarthy. 2004. Characterising measures of lexical distributional similarity. In COLING

work page 2004

[40] [40]

Wieting, M

J. Wieting, M. Bansal, K. Gimpel, and K. Livescu. 2016. Charagram: Embedding words and sentences via character n-grams. In EMNLP

work page 2016

[41] [41]

Z. Yang, W. W. Cohen, and R. Salakhutdinov. 2016. Revisiting semi-supervised learning with graph embeddings. In ICML

work page 2016

[42] [42]

Zhang, Y

C. Zhang, Y. Li, N. Du, W. Fan, and P. S. Yu. 2018. SynonymNet: Multi-context Bilateral Matching for Entity Synonyms. arXiv preprint arXiv:1901.00056 (2018)

work page arXiv 2018