CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.
Title resolution pending
18 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
other 1polarities
unclear 1representative citing papers
LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.
SET detects input-level backdoors in T2I diffusion models by learning a benign cross-attention response space from clean samples and flagging deviations under multi-scale perturbations.
SurFITR is a new collection of 137k+ surveillance-style forged images that causes existing detectors to degrade while enabling substantial gains when used for training in both in-domain and cross-domain settings.
COCO-Inpaint supplies a large-scale dataset and evaluation protocol focused on inpainting-based image forgeries to benchmark existing detection methods.
Knowledge of the ViT backbone alone enables highly effective gray-box adversarial attacks on synthetic image detectors, often nearing white-box performance.
A generative-AI pipeline dynamically generates and anchors virtual assets to match the shape of physical props, enabling adaptive passive haptics in MR that users rate higher in realism, immersion, and enjoyment than static baselines.
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
Creo scaffolds text-to-image generation through progressive stages with editable abstractions and decision locking to improve controllability, agency, and output diversity.
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
HRFD aligns multi-dimensional preferences in text-to-image diffusion via hierarchical relevance feedback and statistical distribution divergence measurement between liked and disliked image sets, remaining training-free and model-agnostic.
A diffusion model with dynamic modality gating and cross-modal mutual learning restores missing features in VLMs bi-directionally while preserving the original model's generalization.
LAION-Aesthetics Predictor reinforces Western and male biases by preferentially selecting images associated with women and realistic Western/Japanese art while excluding men, LGBTQ+ references, and other styles.
FGINet uses a band-masked frequency encoder and layer-wise gated injection to fuse frequency artifacts with vision foundation model semantics, plus hyperspherical compactness learning, to achieve better generalization in AI-generated image detection.
OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.
PrefPaint uses D3PO and a Model Tree web interface to incorporate gastroenterologist feedback into Stable Diffusion inpainting, producing anatomically accurate polyp images that outperform prior methods in user studies.
citing papers explorer
-
A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation
CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.
-
LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection
LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
-
ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent
ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.
-
Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling
SET detects input-level backdoors in T2I diffusion models by learning a benign cross-attention response space from clean samples and flagging deviations under multi-scale perturbations.
-
SurFITR: A Dataset for Surveillance Image Forgery Detection and Localisation
SurFITR is a new collection of 137k+ surveillance-style forged images that causes existing detectors to degrade while enabling substantial gains when used for training in both in-domain and cross-domain settings.
-
COCO-Inpaint: A Benchmark for Detecting and Localizing Inpainting-Based Image Manipulations
COCO-Inpaint supplies a large-scale dataset and evaluation protocol focused on inpainting-based image forgeries to benchmark existing detection methods.
-
Backbone is All You Need: Assessing Vulnerabilities of Frozen Foundation Models in Synthetic Image Forensics
Knowledge of the ViT backbone alone enables highly effective gray-box adversarial attacks on synthetic image detectors, often nearing white-box performance.
-
Prop-Chromeleon: Adaptive Haptic Props in Mixed Reality through Generative Artificial Intelligence
A generative-AI pipeline dynamically generates and anchors virtual assets to match the shape of physical props, enabling adaptive passive haptics in MR that users rate higher in realism, immersion, and enjoyment than static baselines.
-
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
-
Deepfake Detection Generalization with Diffusion Noise
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
-
Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation
Creo scaffolds text-to-image generation through progressive stages with editable abstractions and decision locking to improve controllability, agency, and output diversity.
-
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
-
Bridging the Intention-Expression Gap: Aligning Multi-Dimensional Preferences via Hierarchical Relevance Feedback in Text-to-Image Diffusion
HRFD aligns multi-dimensional preferences in text-to-image diffusion via hierarchical relevance feedback and statistical distribution divergence measurement between liked and disliked image sets, remaining training-free and model-agnostic.
-
Enhancing Foundation VLM Robustness to Missing Modality: Scalable Diffusion for Bi-directional Feature Restoration
A diffusion model with dynamic modality gating and cross-modal mutual learning restores missing features in VLMs bi-directionally while preserving the original model's generalization.
-
The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor
LAION-Aesthetics Predictor reinforces Western and male biases by preferentially selecting images associated with women and realistic Western/Japanese art while excluding men, LGBTQ+ references, and other styles.
-
Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection
FGINet uses a band-masked frequency encoder and layer-wise gated injection to fuse frequency artifacts with vision foundation model semantics, plus hyperspherical compactness learning, to achieve better generalization in AI-generated image detection.
-
OOPrompt: Reifying Intents into Structured Artifacts for Modular and Iterative Prompting
OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.
-
PrefPaint: Enhancing Medical Image Inpainting through Expert Human Feedback
PrefPaint uses D3PO and a Model Tree web interface to incorporate gastroenterologist feedback into Stable Diffusion inpainting, producing anatomically accurate polyp images that outperform prior methods in user studies.