pith. machine review for the scientific record. sign in

arxiv: 2510.27584 · v3 · submitted 2025-10-31 · 💻 cs.CV · cs.IR· cs.LG

Recognition: unknown

Image Hashing via Cross-View Code Alignment in the Age of Foundation Models

Authors on Pith no claims yet
classification 💻 cs.CV cs.IRcs.LG
keywords hashingcodesalignmentbinarycrovcasingleacrossbalanced
0
0 comments X
read the original abstract

Efficient large-scale retrieval requires representations that are both compact and discriminative. Foundation models provide powerful visual and multimodal embeddings, but nearest neighbor search in these high-dimensional spaces is computationally expensive. Hashing offers an efficient alternative by enabling fast Hamming distance search with binary codes, yet existing approaches often rely on complex pipelines, multi-term objectives, designs specialized for a single learning paradigm, and long training times. We introduce CroVCA (Cross-View Code Alignment), a simple and unified principle for learning binary codes that remain consistent across semantically aligned views. A single binary cross-entropy loss enforces alignment, while coding-rate maximization serves as an anti-collapse regularizer to promote balanced and diverse codes. To implement this, we design HashCoder, a lightweight MLP hashing network with a final batch normalization layer to enforce balanced codes. HashCoder can be used as a probing head on frozen embeddings or to adapt encoders efficiently via LoRA fine-tuning. Across benchmarks, CroVCA achieves state-of-the-art results in just 5 training epochs. At 16 bits, it performs particularly well; for instance, unsupervised hashing on COCO completes in under 2 minutes and supervised hashing on ImageNet100 in about 3 minutes on a single GPU. These results highlight CroVCA's efficiency, adaptability, and broad applicability.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval

    cs.IR 2026-01 unverdicted novelty 6.0

    Compact binary hypercube embeddings enable efficient text-to-image and text-to-audio retrieval in wildlife databases with performance competitive to continuous embeddings but far lower memory and search costs.