Sigmoid loss for language image pre-training

· 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

Q-Zoom achieves up to 4.39x inference speedup in high-resolution MLLM scenarios via query-aware gating and region localization, matching or exceeding baseline accuracy on document and high-res benchmarks.

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

cs.RO · 2025-11-10 · unverdicted · novelty 5.0

SlotVLA uses slot attention to model object-relation representations for multitask robotic manipulation, reducing visual tokens while achieving competitive generalization on the new LIBERO+ benchmark.

citing papers explorer

Showing 2 of 2 citing papers.

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models cs.CV · 2026-04-08 · unverdicted · none · ref 39
Q-Zoom achieves up to 4.39x inference speedup in high-resolution MLLM scenarios via query-aware gating and region localization, matching or exceeding baseline accuracy on document and high-res benchmarks.
SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation cs.RO · 2025-11-10 · unverdicted · none · ref 7
SlotVLA uses slot attention to model object-relation representations for multitask robotic manipulation, reducing visual tokens while achieving competitive generalization on the new LIBERO+ benchmark.

Sigmoid loss for language image pre-training

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer