Conjuring Semantic Similarity

Stefano Soatto; Tian Yu Liu

arxiv: 2410.16431 · v4 · submitted 2024-10-21 · 💻 cs.AI

Conjuring Semantic Similarity

Tian Yu Liu , Stefano Soatto This is my paper

Pith reviewed 2026-05-23 18:20 UTC · model grok-4.3

classification 💻 cs.AI

keywords semantic similaritydiffusion modelsJeffreys divergencetext-to-image generationMonte Carlo samplinggenerative model evaluation

0 comments

The pith

Semantic similarity between texts equals the Jeffreys divergence between the image distributions each evokes from a diffusion model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines semantic similarity of two textual expressions as the distance between the distributions of images generated when each is used as a prompt to a text-to-image diffusion model. It identifies this distance specifically with the Jeffreys divergence of the reverse-time diffusion stochastic differential equations conditioned on each text. This quantity is estimated by drawing Monte Carlo samples from the diffusion process. The resulting similarity scores match human annotations on standard benchmarks. The approach also provides a new tool for assessing how well generative models capture textual meaning.

Core claim

Semantic similarity between textual expressions is characterized as the Jeffreys divergence between the reverse-time diffusion SDEs induced by each expression, which can be computed via Monte-Carlo sampling of the conditioned generative process and aligns with human-annotated similarity scores.

What carries the argument

The Jeffreys divergence between reverse-time diffusion stochastic differential equations (SDEs) induced by conditioning a diffusion model on each textual prompt.

If this is right

This measure can be used to evaluate the quality of text-to-image models by how well their distributions capture semantic relations.
It offers better interpretability of learnt representations in generative models.
It opens new avenues for the evaluation of text-conditioned generative models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This definition implies that meaning can be probed through generative processes rather than static embeddings.
The method could extend to measuring similarity in other modalities if similar generative models exist.
Discrepancies between this measure and human scores might reveal biases in current diffusion models' understanding of language.

Load-bearing premise

The image distributions generated by a diffusion model conditioned on a text prompt faithfully represent the meaning of that text, so that divergence between distributions measures semantic similarity.

What would settle it

Compute the Jeffreys divergence scores for a dataset of text pairs with known human similarity annotations using a standard diffusion model and check if they correlate as claimed; a significant mismatch would falsify the alignment.

Figures

Figures reproduced from arXiv: 2410.16431 by Stefano Soatto, Tian Yu Liu.

**Figure 2.** Figure 2: Qualitative evaluation of conjured semantic similarity. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: (Left:) We ablate over different choices of priors over timesteps – a uniform distribution over timesteps {T ′ , . . . , T} where T ′ ≤ T = 10, represented by the blue line (cumulative), and the Direc Delta on any particular timestep T ′ ∈ {1, . . . , T}, represented by the orange line (pointwise). We show that a uniform prior over all timesteps gives the best results. The same plot also ablates over the … view at source ↗

**Figure 4.** Figure 4: “Merlion” vs “Mermaid Lion”: While both prompts express compositions of the same [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: “Bag of Chips” vs “Bag of Fries”: The interpretation of “chips” depends on cultural [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

The semantic similarity between sample expressions measures the distance between their latent 'meaning'. These meanings are themselves typically represented by textual expressions. We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the imagery they evoke. While this is not possible with humans, generative models allow us to easily visualize and compare generated images, or their distribution, evoked by a textual prompt. Therefore, we characterize the semantic similarity between two textual expressions simply as the distance between image distributions they induce, or 'conjure.' We show that by choosing the Jeffreys divergence between the reverse-time diffusion stochastic differential equations (SDEs) induced by each textual expression, this can be directly computed via Monte-Carlo sampling. Our method contributes a novel perspective on semantic similarity that not only aligns with human-annotated scores, but also opens up new avenues for the evaluation of text-conditioned generative models while offering better interpretability of their learnt representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper equates semantic similarity to Jeffreys divergence on reverse diffusion SDEs from text prompts, which is a concrete new reduction but rests on the untested claim that those distributions capture meaning rather than model artifacts.

read the letter

The main thing here is a direct reduction of text semantic similarity to the Jeffreys divergence between the reverse-time SDEs induced by each prompt in a diffusion model, computed by Monte Carlo sampling of the paths. This moves away from rephrasing or embedding comparisons and instead uses the image distributions the prompts conjure. That specific move is not in the cited prior work and gives a workable procedure where none existed in quite this form. It also positions the measure as a side benefit for checking what text-conditioned generators actually represent. The claim of alignment with human scores is the practical selling point, and the method is explicit enough that others could implement and test it. The soft spot is the load-bearing assumption that the conditional image laws faithfully encode the text's latent meaning. Nothing in the abstract rules out the divergence reflecting training-data biases, mode gaps, or conditioning artifacts instead. The validation is presented as external to the definition, which is fine in principle, but without error analysis, data rules, or robustness checks against different models the central claim stays thin. If the full paper supplies those, the limitation shrinks; otherwise it stays the main open question. This is for people working on generative model evaluation and interpretability metrics. A reader who wants a fresh, computable angle on what prompts mean to these systems would find it useful to discuss. It deserves a serious referee because the technique is spelled out and the idea is distinct enough to test, even if the assumption needs more work. Send it for review with requests for the missing validation details and checks on model dependence.

Referee Report

2 major / 1 minor

Summary. The paper claims that semantic similarity between textual expressions is given by the Jeffreys divergence between the reverse-time diffusion SDEs induced by conditioning a generative diffusion model on each prompt; this quantity is computable by Monte-Carlo sampling of the SDEs and is asserted to align with human-annotated similarity scores, thereby providing a new imagery-based measure of meaning and an evaluation tool for text-conditioned models.

Significance. If the central claim holds, the work supplies a parameter-free, sampling-based similarity measure grounded in the path measures of a diffusion process rather than in textual embeddings or rephrasings. The explicit reduction to Monte-Carlo estimation of Jeffreys divergence on reverse SDEs is a concrete technical contribution that could be reproduced and extended. The approach also suggests a route to auditing what text-to-image models have internalized, which would be valuable if the divergence can be shown to track semantics rather than model-specific artifacts.

major comments (2)

[Abstract / experimental validation] Abstract and experimental validation: the claim that the method 'aligns with human-annotated scores' is presented without any reported quantitative metrics (e.g., correlation coefficients, sample sizes, or statistical tests), dataset identifiers, or controls for prompt length, model choice, or conditioning strength. Because this alignment is the sole empirical support for the semantic interpretation, the absence of these details is load-bearing for the central claim.
[Method definition] The reduction of semantic similarity to Jeffreys divergence on the induced reverse-time SDEs presupposes that the conditional law p(x|text) faithfully encodes latent meaning rather than training-data biases or mode-coverage gaps. No ablation (different backbone models, varying classifier-free guidance scales, or comparison against a non-diffusion generative model) is described that would test whether the divergence remains stable under changes that preserve semantics but alter the generative distribution. A concrete test would be to recompute the divergence on the same prompt pair using two independently trained diffusion models and report the rank correlation of the resulting scores.

minor comments (1)

[Method] Notation for the reverse-time SDE and the precise form of the Jeffreys divergence (e.g., whether it is evaluated on the full path measure or only on the terminal marginal) should be stated explicitly with equation numbers in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / experimental validation] Abstract and experimental validation: the claim that the method 'aligns with human-annotated scores' is presented without any reported quantitative metrics (e.g., correlation coefficients, sample sizes, or statistical tests), dataset identifiers, or controls for prompt length, model choice, or conditioning strength. Because this alignment is the sole empirical support for the semantic interpretation, the absence of these details is load-bearing for the central claim.

Authors: We agree that the abstract and experimental sections require explicit quantitative support. The manuscript contains experiments comparing the proposed divergence to human similarity annotations, but these lack the requested metrics and controls. In the revision we will report Pearson and Spearman correlations, sample sizes, p-values, the specific datasets used, and ablations on prompt length and classifier-free guidance scale. revision: yes
Referee: [Method definition] The reduction of semantic similarity to Jeffreys divergence on the induced reverse-time SDEs presupposes that the conditional law p(x|text) faithfully encodes latent meaning rather than training-data biases or mode-coverage gaps. No ablation (different backbone models, varying classifier-free guidance scales, or comparison against a non-diffusion generative model) is described that would test whether the divergence remains stable under changes that preserve semantics but alter the generative distribution. A concrete test would be to recompute the divergence on the same prompt pair using two independently trained diffusion models and report the rank correlation of the resulting scores.

Authors: The current work demonstrates the measure on a single, publicly available diffusion backbone and relies on the observed alignment with human scores as initial evidence. We will add an explicit discussion of the modeling assumption and include new experiments that vary the guidance scale and compare two different publicly released diffusion checkpoints on the same prompt pairs, reporting rank correlation of the resulting scores. A broader comparison against non-diffusion generators lies outside the scope of the present study and will be noted as future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; explicit definition with external validation

full rationale

The paper explicitly defines semantic similarity as the Jeffreys divergence between reverse-time diffusion SDEs induced by each prompt (abstract: 'we characterize the semantic similarity between two textual expressions simply as the distance between image distributions they induce'). This is a first-principles proposal, not a derivation claiming to recover or predict human scores from other quantities. Alignment with annotations is presented only as empirical validation, not as part of the definitional chain. No equations, self-citations, fitted parameters, or ansatzes appear in the provided text that would reduce the central claim to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that generative models produce image distributions that meaningfully represent textual semantics; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Image distributions produced by text-conditioned diffusion models capture the latent meaning of the text.
Central to equating divergence of those distributions with semantic similarity.

pith-pipeline@v0.9.0 · 5687 in / 1057 out tokens · 29361 ms · 2026-05-23T18:20:41.191474+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 10 internal anchors

[1]

Semeval-2012 task 6: A pilot on semantic textual similarity

Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. Semeval-2012 task 6: A pilot on semantic textual similarity. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation ...

work page 2012
[2]

* sem 2013 shared task: Semantic textual similarity

Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. * sem 2013 shared task: Semantic textual similarity. In Second joint conference on lexical and computa- tional semantics (* SEM), volume 1: proceedings of the Main conference and the shared task: semantic textual similarity , pp. 32–43,

work page 2013
[3]

Semeval-2014 task 10: Multilingual semantic textual similarity

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. Semeval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th international workshop on semantic evaluation (SemEval

work page 2014
[4]

Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, et al. Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (SemEval

work page 2015
[5]

Semeval-2016 task 1: Semantic textual similar- ity, monolingual and cross-lingual evaluation

Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez Agirre, Rada Mihalcea, German Rigau Claramunt, and Janyce Wiebe. Semeval-2016 task 1: Semantic textual similar- ity, monolingual and cross-lingual evaluation. In SemEval-2016. 10th International Workshop on Semantic Evaluation; 2016 Jun 16-17; San Diego, CA. Stroudsburg (PA): ACL

work page 2016
[6]

LLM2Vec: Large language models are secretly powerful text encoders

Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Cha- pados, and Siva Reddy. Llm2vec: Large language models are secretly powerful text encoders. arXiv preprint arXiv:2404.05961 ,

work page arXiv
[7]

Demystifying MMD GANs

Miko laj Bi´ nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans. arXiv preprint arXiv:1801.01401 ,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Concept sliders: Lora adaptors for precise control in diffusion models

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, and David Bau. Concept sliders: Lora adaptors for precise control in diffusion models. arXiv preprint arXiv:2311.12092,

work page arXiv
[11]

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 ,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Auto-Encoding Variational Bayes

Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 ,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Information-theoretic diffusion

Xianghao Kong, Rob Brekelmans, and Greg Ver Steeg. Information-theoretic diffusion. arXiv preprint arXiv:2302.03792, 2023a. Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, and Greg Ver Steeg. Interpretable diffusion via information decomposition. arXiv preprint arXiv:2310.07972 , 2023b. 11 Mingi Kwon, Jaeseok Jeong, and Youngjung Uh. Diffusion models alr...

work page arXiv
[16]

Meaning representations from trajectories in autoregressive models

Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, and Stefano Soatto. Meaning representations from trajectories in autoregressive models. arXiv preprint arXiv:2310.18348,

work page arXiv
[17]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 ,

work page internal anchor Pith review Pith/arXiv arXiv 1907
[18]

Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment

Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and Roberto Zamparelli. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of the 8th international workshop on semantic evaluation (SemEval

work page 2014
[19]

Conditional Generative Adversarial Nets

Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

H.; Constant, N.; Ma, J.; Hall, K

Jianmo Ni, Gustavo Hern´ andez´Abrego, Noah Constant, Ji Ma, Keith B Hall, Daniel Cer, and Yinfei Yang. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877 ,

work page arXiv
[21]

Unsupervised discovery of semantic latent directions in diffusion models

Yong-Hyun Park, Mingi Kwon, Junghyo Jo, and Youngjung Uh. Unsupervised discovery of semantic latent directions in diffusion models. arXiv preprint arXiv:2302.12469 ,

work page arXiv
[22]

arXiv:2305.18449 [cs, eess]

URL http://arxiv.org/ abs/2305.18449. arXiv:2305.18449 [cs, eess]. 12 Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR,

work page arXiv
[23]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a. Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b. Yang Song, Conor Du...

work page internal anchor Pith review Pith/arXiv arXiv 2010
[24]

An unsupervised sentence embedding method by mutual information maximization

Yan Zhang, Ruidan He, Zuozhu Liu, Kwan Hui Lim, and Lidong Bing. An unsupervised sentence embedding method by mutual information maximization. arXiv preprint arXiv:2009.12061 ,

work page arXiv 2009

[1] [1]

Semeval-2012 task 6: A pilot on semantic textual similarity

Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. Semeval-2012 task 6: A pilot on semantic textual similarity. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation ...

work page 2012

[2] [2]

* sem 2013 shared task: Semantic textual similarity

Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. * sem 2013 shared task: Semantic textual similarity. In Second joint conference on lexical and computa- tional semantics (* SEM), volume 1: proceedings of the Main conference and the shared task: semantic textual similarity , pp. 32–43,

work page 2013

[3] [3]

Semeval-2014 task 10: Multilingual semantic textual similarity

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. Semeval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th international workshop on semantic evaluation (SemEval

work page 2014

[4] [4]

Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability

Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, et al. Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (SemEval

work page 2015

[5] [5]

Semeval-2016 task 1: Semantic textual similar- ity, monolingual and cross-lingual evaluation

Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez Agirre, Rada Mihalcea, German Rigau Claramunt, and Janyce Wiebe. Semeval-2016 task 1: Semantic textual similar- ity, monolingual and cross-lingual evaluation. In SemEval-2016. 10th International Workshop on Semantic Evaluation; 2016 Jun 16-17; San Diego, CA. Stroudsburg (PA): ACL

work page 2016

[6] [6]

LLM2Vec: Large language models are secretly powerful text encoders

Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Cha- pados, and Siva Reddy. Llm2vec: Large language models are secretly powerful text encoders. arXiv preprint arXiv:2404.05961 ,

work page arXiv

[7] [7]

Demystifying MMD GANs

Miko laj Bi´ nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans. arXiv preprint arXiv:1801.01401 ,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Concept sliders: Lora adaptors for precise control in diffusion models

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, and David Bau. Concept sliders: Lora adaptors for precise control in diffusion models. arXiv preprint arXiv:2311.12092,

work page arXiv

[11] [11]

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 ,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Auto-Encoding Variational Bayes

Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 ,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Information-theoretic diffusion

Xianghao Kong, Rob Brekelmans, and Greg Ver Steeg. Information-theoretic diffusion. arXiv preprint arXiv:2302.03792, 2023a. Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, and Greg Ver Steeg. Interpretable diffusion via information decomposition. arXiv preprint arXiv:2310.07972 , 2023b. 11 Mingi Kwon, Jaeseok Jeong, and Youngjung Uh. Diffusion models alr...

work page arXiv

[16] [16]

Meaning representations from trajectories in autoregressive models

Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, and Stefano Soatto. Meaning representations from trajectories in autoregressive models. arXiv preprint arXiv:2310.18348,

work page arXiv

[17] [17]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 ,

work page internal anchor Pith review Pith/arXiv arXiv 1907

[18] [18]

Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment

Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and Roberto Zamparelli. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of the 8th international workshop on semantic evaluation (SemEval

work page 2014

[19] [19]

Conditional Generative Adversarial Nets

Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784,

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

H.; Constant, N.; Ma, J.; Hall, K

Jianmo Ni, Gustavo Hern´ andez´Abrego, Noah Constant, Ji Ma, Keith B Hall, Daniel Cer, and Yinfei Yang. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877 ,

work page arXiv

[21] [21]

Unsupervised discovery of semantic latent directions in diffusion models

Yong-Hyun Park, Mingi Kwon, Junghyo Jo, and Youngjung Uh. Unsupervised discovery of semantic latent directions in diffusion models. arXiv preprint arXiv:2302.12469 ,

work page arXiv

[22] [22]

arXiv:2305.18449 [cs, eess]

URL http://arxiv.org/ abs/2305.18449. arXiv:2305.18449 [cs, eess]. 12 Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR,

work page arXiv

[23] [23]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a. Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b. Yang Song, Conor Du...

work page internal anchor Pith review Pith/arXiv arXiv 2010

[24] [24]

An unsupervised sentence embedding method by mutual information maximization

Yan Zhang, Ruidan He, Zuozhu Liu, Kwan Hui Lim, and Lidong Bing. An unsupervised sentence embedding method by mutual information maximization. arXiv preprint arXiv:2009.12061 ,

work page arXiv 2009