Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Scene parsing through ade20k dataset , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

CHASM: Cross-frequency Harmonized Axis-Separable Mixing for Spectral Token Operators

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

CHASM introduces a cross-frequency harmonized axis-separable spectral mixer using a shared channel eigenbasis plus per-frequency positive gains, yielding consistent gains over same-backbone baselines in medical and natural image tasks.

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 5.0 · 2 refs

TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.

Information theoretic underpinning of self-supervised learning by clustering

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.

citing papers explorer

Showing 4 of 4 citing papers.

CHASM: Cross-frequency Harmonized Axis-Separable Mixing for Spectral Token Operators cs.CV · 2026-05-14 · unverdicted · none · ref 24
CHASM introduces a cross-frequency harmonized axis-separable spectral mixer using a shared channel eigenbasis plus per-frequency positive gains, yielding consistent gains over same-backbone baselines in medical and natural image tasks.
VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing cs.CL · 2026-05-07 · unverdicted · none · ref 89
VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.
Temporal Aware Pruning for Efficient Diffusion-based Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 125 · 2 links
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
Information theoretic underpinning of self-supervised learning by clustering cs.LG · 2026-05-12 · unverdicted · none · ref 99
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

fields

years

verdicts

representative citing papers

citing papers explorer