pith:KJOUMWUN
CROP: Expert-Aligned Image Cropping via Compositional Reasoning and Optimizing Preference
A vision-language model crops images to match expert aesthetics by reasoning through scene analysis, composition rules, and preference alignment.
arxiv:2605.12545 v1 · 2026-05-09 · cs.CV · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KJOUMWUNJKBWB4OVMXNSUXGMFI}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We design a Compositional Reasoning and Optimizing Preference method (CROP) that directs the VLM to think like a professional photographer. It deconstructs a complex and subjective aesthetic problem into an 'analysis-proposal-decision' process, reasoning step by step through the analysis of scene elements and compositional principles. Meanwhile, our expert preference alignment module makes the model's decision consistent with human expert aesthetics.
That a VLM can be reliably guided through an analysis-proposal-decision process and aligned via an expert preference module to produce cropping decisions that consistently outperform saliency and retrieval baselines across varied scenes.
CROP uses compositional reasoning and expert preference alignment in VLMs to produce aesthetic crops that match human experts more closely than previous methods.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T03:10:02.243785Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
525d465a8d4a8360f1d565db2a5ccc2a22e7bce3be758f8ae979af9468e12311
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KJOUMWUNJKBWB4OVMXNSUXGMFI \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 525d465a8d4a8360f1d565db2a5ccc2a22e7bce3be758f8ae979af9468e12311
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "75344f0094e27d42850036b76748cc0e826d8798da475221f301f2fd9ffec282",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-09T10:21:51Z",
"title_canon_sha256": "ea5b40783cdc39c1e045672bc0b75b9565633522cf0f7ecf26d3949bd09e2153"
},
"schema_version": "1.0",
"source": {
"id": "2605.12545",
"kind": "arxiv",
"version": 1
}
}