pith. sign in

Imagenet: A large-scale hierarchical image database

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

fields

cs.CV 5

roles

dataset 1

polarities

use dataset 1

representative citing papers

Hierarchically Robust Zero-shot Vision-language Models

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

A hierarchical adversarial fine-tuning method for VLMs aligns image and text embeddings at multiple hierarchy depths with theoretical margin connections to boost robustness to leaf and superclass attacks while using multiple trees for semantic variety.

Seed1.5-VL Technical Report

cs.CV · 2025-05-11 · unverdicted · novelty 4.0

Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

citing papers explorer

Showing 5 of 5 citing papers.

  • Hierarchically Robust Zero-shot Vision-language Models cs.CV · 2026-04-20 · unverdicted · none · ref 9

    A hierarchical adversarial fine-tuning method for VLMs aligns image and text embeddings at multiple hierarchy depths with theoretical margin connections to boost robustness to leaf and superclass attacks while using multiple trees for semantic variety.

  • OD3: Optimization-free Dataset Distillation for Object Detection cs.CV · 2025-06-02 · unverdicted · none · ref 7

    OD3 presents an optimization-free dataset distillation framework for object detection that reports new state-of-the-art accuracy on COCO and VOC at compression ratios from 0.25% to 5%.

  • End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer cs.CV · 2026-05-01 · unverdicted · none · ref 7

    An end-to-end autoregressive model with a jointly trained 1D semantic tokenizer achieves state-of-the-art FID 1.48 on ImageNet 256x256 generation without guidance.

  • SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis cs.CV · 2023-07-04 · conditional · none · ref 4

    SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.

  • Seed1.5-VL Technical Report cs.CV · 2025-05-11 · unverdicted · none · ref 22

    Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.