Nomic embed vision: Expanding the latent space

Nussbaum, Z · 2024 · arXiv 2406.18587

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers

cs.CL · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

GELATO extends frozen text embedding models with locked image and audio encoders, training minimal connectors to produce a single semantic embedding space for text, image, audio, and video while keeping original text performance unchanged.

HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval

cs.IR · 2026-04-08 · unverdicted · novelty 6.0

HIVE raises multimodal retrieval nDCG@10 to 41.7 on the MM-BRIGHT benchmark by inserting LLM-driven hypothesis generation and verification between retrieval passes, delivering +9.5 over the best text-only baseline and +14.1 over the best multimodal baseline.

Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

stat.ML · 2026-04-23 · unverdicted · novelty 4.0

Augmenting zone-level MTPL claim frequency models with coordinates, environmental features at 5 km scale, and image embeddings improves predictive accuracy on unseen postcodes across GLM, regularized GLM, and tree-based models.

Multimodal Contextualized Support for Enhancing Video Retrieval System

cs.CV · 2024-12-10 · unverdicted · novelty 3.0

Proposes a multimodal pipeline for video retrieval that incorporates information from multiple frames to enable higher-level abstraction beyond single-image object detection.

citing papers explorer

Showing 4 of 4 citing papers.

jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers cs.CL · 2026-05-08 · unverdicted · none · ref 31 · 2 links
GELATO extends frozen text embedding models with locked image and audio encoders, training minimal connectors to produce a single semantic embedding space for text, image, audio, and video while keeping original text performance unchanged.
HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval cs.IR · 2026-04-08 · unverdicted · none · ref 26
HIVE raises multimodal retrieval nDCG@10 to 41.7 on the MM-BRIGHT benchmark by inserting LLM-driven hypothesis generation and verification between retrieval passes, delivering +9.5 over the best text-only baseline and +14.1 over the best multimodal baseline.
Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors stat.ML · 2026-04-23 · unverdicted · none · ref 41
Augmenting zone-level MTPL claim frequency models with coordinates, environmental features at 5 km scale, and image embeddings improves predictive accuracy on unseen postcodes across GLM, regularized GLM, and tree-based models.
Multimodal Contextualized Support for Enhancing Video Retrieval System cs.CV · 2024-12-10 · unverdicted · none · ref 8
Proposes a multimodal pipeline for video retrieval that incorporates information from multiple frames to enable higher-level abstraction beyond single-image object detection.

Nomic embed vision: Expanding the latent space

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer