LatentOmni proposes a latent-space cross-modal reasoning framework that uses feature-level supervision and Omni-Sync Position Embedding to align and synchronize audio-visual latents, supported by a new 35K interleaved reasoning dataset and showing gains over text CoT baselines.
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4representative citing papers
UniGraphLM uses a multi-domain multi-task GNN encoder and adaptive alignment to create unified graph tokens for LLMs across diverse domains and tasks.
CVA aggregates frozen VFM embeddings via latent reasoning to create compact video embeddings for efficient micro-video recommendation, delivering consistent performance gains and orders-of-magnitude efficiency improvements.
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
citing papers explorer
-
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
LatentOmni proposes a latent-space cross-modal reasoning framework that uses feature-level supervision and Omni-Sync Position Embedding to align and synchronize audio-visual latents, supported by a new 35K interleaved reasoning dataset and showing gains over text CoT baselines.
-
A Unified Graph Language Model for Multi-Domain Multi-Task Graph Alignment Instruction Tuning
UniGraphLM uses a multi-domain multi-task GNN encoder and adaptive alignment to create unified graph tokens for LLMs across diverse domains and tasks.
-
Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation
CVA aggregates frozen VFM embeddings via latent reasoning to create compact video embeddings for efficient micro-video recommendation, delivering consistent performance gains and orders-of-magnitude efficiency improvements.
-
Large Language Model-Brained GUI Agents: A Survey
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.