Promptception: How Sensitive Are Large Multimodal Models to Prompts?

· 2025 · DOI 10.18653/v1/2025.findings-emnlp.1302

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

FraudBench shows that current multimodal LLMs and specialized AI-image detectors often fail to spot AI-generated fake damage in refund evidence, with true positive rates frequently below 50% on synthetic subsets while producing false positives on real damage.

PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses

cs.CL · 2026-03-11 · unverdicted · novelty 7.0

PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.

citing papers explorer

Showing 2 of 2 citing papers.

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence cs.CV · 2026-05-09 · unverdicted · none · ref 36
FraudBench shows that current multimodal LLMs and specialized AI-image detectors often fail to spot AI-generated fake damage in refund evidence, with true positive rates frequently below 50% on synthetic subsets while producing false positives on real damage.
PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses cs.CL · 2026-03-11 · unverdicted · none · ref 46
PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.

Promptception: How Sensitive Are Large Multimodal Models to Prompts?

fields

years

verdicts

representative citing papers

citing papers explorer