Unictokens: Boosting personalized understand- ing and generation via unified concept tokens

Ruichuan An, Sihan Yang, Renrui Zhang, Zijun Shen, Ming Lu, Gaole Dai, Hao Liang, Ziyu Guo, Shilin Yan, Yulin Luo, et al · 2025 · arXiv 2505.14671

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Omni-Persona benchmark with 18 tasks shows open-source models have audio-visual grounding gaps, RLVR narrows them but leads to conservative outputs, and scale or recall alone fail as diagnostics.

VABench: A Comprehensive Benchmark for Audio-Video Generation

cs.CV · 2025-12-10 · unverdicted · novelty 7.0

VABench is a new multi-dimensional benchmark for evaluating synchronous audio-video generation across text-to-AV, image-to-AV, and stereo tasks.

MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

MICo-150K is a new 150K-image dataset with 7 tasks, a De&Re real-image subset, MICo-Bench, and Weighted-Ref-VIEScore metric that improves AI models for generating consistent composites from arbitrary numbers of reference images.

Personal Visual Context Learning in Large Multimodal Models

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.

Enhancing Foundation VLM Robustness to Missing Modality: Scalable Diffusion for Bi-directional Feature Restoration

cs.AI · 2026-02-03 · unverdicted · novelty 6.0

A diffusion model with dynamic modality gating and cross-modal mutual learning restores missing features in VLMs bi-directionally while preserving the original model's generalization.

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

cs.CV · 2025-11-24 · unverdicted · novelty 6.0

A new dataset with high-fidelity close-up garment images and full/close-up try-on videos plus the VGID metric enables better texture and structure preservation in high-resolution video virtual try-on.

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

cs.CV · 2026-04-06

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

cs.CV · 2025-12-14

citing papers explorer

Showing 8 of 8 citing papers.

Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization cs.CV · 2026-05-11 · unverdicted · none · ref 16
Omni-Persona benchmark with 18 tasks shows open-source models have audio-visual grounding gaps, RLVR narrows them but leads to conservative outputs, and scale or recall alone fail as diagnostics.
VABench: A Comprehensive Benchmark for Audio-Video Generation cs.CV · 2025-12-10 · unverdicted · none · ref 1
VABench is a new multi-dimensional benchmark for evaluating synchronous audio-video generation across text-to-AV, image-to-AV, and stereo tasks.
MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition cs.CV · 2025-12-08 · unverdicted · none · ref 2
MICo-150K is a new 150K-image dataset with 7 tasks, a De&Re real-image subset, MICo-Bench, and Weighted-Ref-VIEScore metric that improves AI models for generating consistent composites from arbitrary numbers of reference images.
Personal Visual Context Learning in Large Multimodal Models cs.CV · 2026-05-11 · unverdicted · none · ref 3
Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.
Enhancing Foundation VLM Robustness to Missing Modality: Scalable Diffusion for Bi-directional Feature Restoration cs.AI · 2026-02-03 · unverdicted · none · ref 5
A diffusion model with dynamic modality gating and cross-modal mutual learning restores missing features in VLMs bi-directionally while preserving the original model's generalization.
Eevee: Towards Close-up High-resolution Video-based Virtual Try-on cs.CV · 2025-11-24 · unverdicted · none · ref 2
A new dataset with high-fidelity close-up garment images and full/close-up try-on videos plus the VGID metric enables better texture and structure preservation in high-resolution video virtual try-on.
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models cs.CV · 2026-04-06 · unreviewed · ref 3
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling cs.CV · 2025-12-14 · unreviewed · ref 2

Unictokens: Boosting personalized understand- ing and generation via unified concept tokens

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer