A new discrete diffusion model for scene graph generation from text captures object-relation dependencies via hierarchical constraints and training-free conditioning, yielding better graph metrics and downstream image alignment than prior baselines.
Learning transferable visual models from natural language supervision
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.
Using lexical concreteness to guide contrastive negative mining and a new margin-based Cement loss, the Slipform framework reaches state-of-the-art on compositional benchmarks for vision-language models.
citing papers explorer
-
Dependency-Aware Discrete Diffusion for Scene Graph Generation
A new discrete diffusion model for scene graph generation from text captures object-relation dependencies via hierarchical constraints and training-free conditioning, yielding better graph metrics and downstream image alignment than prior baselines.
-
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.
-
Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding
Using lexical concreteness to guide contrastive negative mining and a new margin-based Cement loss, the Slipform framework reaches state-of-the-art on compositional benchmarks for vision-language models.