ShadeBench is a multimodal benchmark dataset for urban shade understanding that includes temporally varying shade maps, satellite imagery, building representations, and text to support shade generation, segmentation, and 3D reconstruction tasks.
Title resolution pending
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 12roles
background 1polarities
background 1representative citing papers
RLFSeg repurposes pretrained generative models via Rectified Flow for direct latent-space image-to-mask mapping in text-based segmentation, outperforming diffusion-based methods especially in zero-shot cases.
FlowAnchor stabilizes editing signals in flow-based inversion-free video editing via spatial-aware attention refinement and adaptive magnitude modulation for improved faithfulness and temporal coherence.
CFSR reframes shadow removal as a physics-constrained process using geometric and semantic priors from depth, DINO, CLIP, and frequency decoupling to achieve claimed state-of-the-art results.
OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.
PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
Presents a new theoretically grounded hard-label attack with zero-query initialization and low-complexity optimization that outperforms prior methods across image datasets and models.
MiXR enables in-situ 3D compositional modeling by harvesting real-world geometry in XR and using generative AI to synthesize coherent models from user-defined assemblies.
SandSim reconstructs temporally coherent sand painting processes from single images using curve-guided Gaussian splatting, subtractive compositing for accumulation, and semantic-guided stroke planning.
Delta-LLaVA adds Change-Enhanced Attention, Change-SEG with prior embeddings, and Local Causal Attention to MLLMs to overcome temporal blindness, outperforming general models on a new unified benchmark for bi- and tri-temporal remote sensing tasks.
SSDM decouples global geospatial embeddings into structural modulation and semantic injection pathways to improve accuracy and consistency in high-resolution remote sensing land cover mapping.
SynthLab provides a modular visual data synthesis platform and interactive drag-and-drop interface for semantic segmentation datasets, shown accessible via user studies across diverse users.
citing papers explorer
-
ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society
ShadeBench is a multimodal benchmark dataset for urban shade understanding that includes temporally varying shade maps, satellite imagery, building representations, and text to support shade generation, segmentation, and 3D reconstruction tasks.
-
From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation
RLFSeg repurposes pretrained generative models via Rectified Flow for direct latent-space image-to-mask mapping in text-based segmentation, outperforming diffusion-based methods especially in zero-shot cases.
-
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing
FlowAnchor stabilizes editing signals in flow-based inversion-free video editing via spatial-aware attention refinement and adaptive magnitude modulation for improved faithfulness and temporal coherence.
-
CFSR: Geometry-Conditioned Shadow Removal via Physical Disentanglement
CFSR reframes shadow removal as a physics-constrained process using geometric and semantic priors from depth, DINO, CLIP, and frequency decoupling to achieve claimed state-of-the-art results.
-
OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance
OVS-DINO structurally aligns DINO with SAM to revitalize attenuated boundary features, achieving SOTA gains of 2.1% average and 6.3% on Cityscapes in weakly-supervised open-vocabulary segmentation.
-
Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation
PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
-
Low-Cost Hard-Label Adversarial Attack with Theoretical Foundations
Presents a new theoretically grounded hard-label attack with zero-query initialization and low-complexity optimization that outperforms prior methods across image datasets and models.
-
MiXR: Harvesting and Recomposing Geometry from Real-World Objects for In-Situ 3D Design
MiXR enables in-situ 3D compositional modeling by harvesting real-world geometry in XR and using generative AI to synthesize coherent models from user-defined assemblies.
-
SandSim: Curve-Guided Gaussian Splatting for Reconstructing Sand Painting Processes
SandSim reconstructs temporally coherent sand painting processes from single images using curve-guided Gaussian splatting, subtractive compositing for accumulation, and semantic-guided stroke planning.
-
Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models
Delta-LLaVA adds Change-Enhanced Attention, Change-SEG with prior embeddings, and Local Causal Attention to MLLMs to overcome temporal blindness, outperforming general models on a new unified benchmark for bi- and tri-temporal remote sensing tasks.
-
Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping
SSDM decouples global geospatial embeddings into structural modulation and semantic injection pathways to improve accuracy and consistency in high-resolution remote sensing land cover mapping.
-
Interactive Interface For Semantic Segmentation Dataset Synthesis
SynthLab provides a modular visual data synthesis platform and interactive drag-and-drop interface for semantic segmentation datasets, shown accessible via user studies across diverse users.