MetaPoint represents 2D coordinates as special tokens in visual generative models to enable precise spatial control using existing positional encodings without architectural modifications.
arXiv preprint arXiv:2401.15688 (2024) 5
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
ELLA introduces a timestep-aware semantic connector to link LLMs with diffusion models for improved dense prompt following, validated on a new 1K-prompt benchmark.
ReGRPO augments group-relative policy optimization with a reflective data engine that generates ErrorType-Evidence-FixPlan triplets from near-miss tool actions to improve recovery in multimodal agents.
A two-stage framework uses JSCC for discriminative transmission of important image regions followed by MLLM-driven generative editing to improve semantic fidelity and perceptual quality under bandwidth limits and varying channel conditions.
citing papers explorer
-
MetaPoint: Unlocking Precise Spatial Control in Agentic Visual Generation
MetaPoint represents 2D coordinates as special tokens in visual generative models to enable precise spatial control using existing positional encodings without architectural modifications.
-
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
ELLA introduces a timestep-aware semantic connector to link LLMs with diffusion models for improved dense prompt following, validated on a new 1K-prompt benchmark.
-
ReGRPO: Reflection-Augmented Policy Optimization for Tool-Using Agents
ReGRPO augments group-relative policy optimization with a reflective data engine that generates ErrorType-Evidence-FixPlan triplets from near-miss tool actions to improve recovery in multimodal agents.
-
GenED-SC: Generative Editing Semantic Communication with Integrated Multi-Modal LLMs
A two-stage framework uses JSCC for discriminative transmission of important image regions followed by MLLM-driven generative editing to improve semantic fidelity and perceptual quality under bandwidth limits and varying channel conditions.