TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.
Jina clip: Your clip model is also your text retriever
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6representative citing papers
MARVEL reaches 37.9 nDCG@10 on the MM-BRIGHT benchmark by combining LLM query expansion, a reasoning-enhanced dense retriever, and GPT-4o CoT reranking, beating prior multimodal encoders by 10.3 points.
GELATO extends frozen text embedding models with locked image and audio encoders, training minimal connectors to produce a single semantic embedding space for text, image, audio, and video while keeping original text performance unchanged.
Multimodal embedding distance predicts typographic attack success rate across VLMs, and optimizing it under bounded perturbations on surrogates exposes two co-occurring failure modes of lost readability and reduced safety refusals.
Text-image embedding distance negatively correlates with typographic attack success rates (r = -0.71 to -0.93) on VLMs, with font size and image degradations strongly modulating effectiveness.
BRIDGE reaches 29.7 nDCG@10 on MM-BRIGHT by RL-aligning multimodal queries to text and using a reasoning retriever, beating multimodal encoders and, when combined with Nomic-Vision, exceeding the best text-only retriever at 33.3.
citing papers explorer
-
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.
-
MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL
MARVEL reaches 37.9 nDCG@10 on the MM-BRIGHT benchmark by combining LLM query expansion, a reasoning-enhanced dense retriever, and GPT-4o CoT reranking, beating prior multimodal encoders by 10.3 points.
-
jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers
GELATO extends frozen text embedding models with locked image and audio encoders, training minimal connectors to produce a single semantic embedding space for text, image, audio, and video while keeping original text performance unchanged.
-
One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations
Multimodal embedding distance predicts typographic attack success rate across VLMs, and optimizing it under bounded perturbations on surrogates exposes two co-occurring failure modes of lost readability and reduced safety refusals.
-
Reading Between the Pixels: Linking Text-Image Embedding Alignment to Typographic Attack Success on Vision-Language Models
Text-image embedding distance negatively correlates with typographic attack success rates (r = -0.71 to -0.93) on VLMs, with font size and image degradations strongly modulating effectiveness.
-
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment
BRIDGE reaches 29.7 nDCG@10 on MM-BRIGHT by RL-aligning multimodal queries to text and using a reasoning retriever, beating multimodal encoders and, when combined with Nomic-Vision, exceeding the best text-only retriever at 33.3.