Title resolution pending

Sigmoid Loss for Language Image Pre-Training , author= · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Geometry Guided Self-Consistency for Physical AI

cs.RO · 2026-05-09 · unverdicted · novelty 6.0

KeyStone improves task success rates in diffusion-based physical AI models by up to 13.3% by sampling K trajectories in parallel, clustering them in action space, and returning the medoid of the largest cluster.

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models

cs.CL · 2024-02-18 · unverdicted · novelty 6.0

ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.

RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

RTPrune introduces a reading-twice inspired two-stage pruning technique for DeepSeek-OCR that retains 84.25% tokens while delivering 99.47% accuracy and 1.23x faster prefill on OmniDocBench.

Unified Pix Token And Word Token Generative Language Model

cs.CV · 2026-05-13 · unverdicted · novelty 4.0

A new model unifies per-pixel and word tokens in a generative language model with per-pixel embeddings, color folding, and unsupervised image pretraining, reporting good performance on small models with limited data.

citing papers explorer

Showing 4 of 4 citing papers.

Geometry Guided Self-Consistency for Physical AI cs.RO · 2026-05-09 · unverdicted · none · ref 7
KeyStone improves task success rates in diffusion-based physical AI models by up to 13.3% by sampling K trajectories in parallel, clustering them in action space, and returning the medoid of the largest cluster.
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models cs.CL · 2024-02-18 · unverdicted · none · ref 69
ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.
RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference cs.CV · 2026-05-01 · unverdicted · none · ref 34
RTPrune introduces a reading-twice inspired two-stage pruning technique for DeepSeek-OCR that retains 84.25% tokens while delivering 99.47% accuracy and 1.23x faster prefill on OmniDocBench.
Unified Pix Token And Word Token Generative Language Model cs.CV · 2026-05-13 · unverdicted · none · ref 4
A new model unifies per-pixel and word tokens in a generative language model with per-pixel embeddings, color folding, and unsupervised image pretraining, reporting good performance on small models with limited data.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer