Towards Context-Aware Image Anonymization with Multi-Agent Reasoning
Pith reviewed 2026-05-14 21:11 UTC · model grok-4.3
The pith
A multi-agent system using vision-language models anonymizes context-dependent personal information in street images by distinguishing private from public properties.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The agentic workflow with scout-and-zoom detection, open-vocabulary segmentation on localized crops, and IoU-based deduplication enables large vision-language models to classify context-dependent PII accurately, supporting targeted diffusion-based anonymization that lowers re-identification risks without harming downstream semantic segmentation.
What carries the argument
The multi-agent system with round-robin speaker selection in a PDCA cycle that performs spatially-filtered coarse-to-fine detection and applies modal-specific diffusion guidance with appearance decorrelation.
If this is right
- Downstream semantic segmentation performance stays intact after anonymization.
- Human-interpretable audit trails meet GDPR transparency requirements.
- Failed cases are automatically flagged for human review.
- Non-direct PII instances are caught across multiple object categories.
Where Pith is reading between the lines
- The on-premise design could support deployment in regulated environments that prohibit cloud APIs.
- The modular agent structure might allow straightforward addition of new object categories or context rules.
- Similar reasoning loops could be tested on video sequences to handle motion-based identifiers.
Load-bearing premise
Large vision-language models in the multi-agent setup can accurately tell private from public objects based on spatial context without misclassifications that would either over-anonymize or leave identifiers exposed.
What would settle it
A test set of images containing deliberately ambiguous contexts, such as vehicles or people near property boundaries, to measure whether the agents classify and anonymize only the private instances.
Figures
read the original abstract
Street-level imagery contains personally identifiable information (PII), some of which is context-dependent. Existing anonymization methods either over-process images or miss subtle identifiers, while API-based solutions compromise data sovereignty. We present an agentic framework CAIAMAR (\underline{C}ontext-\underline{A}ware \underline{I}mage \underline{A}nonymization with \underline{M}ulti-\underline{A}gent \underline{R}easoning) for context-aware PII segmentation with diffusion-based anonymization, combining pre-defined processing for high-confidence cases with multi-agent reasoning for indirect identifiers. Three specialized agents coordinate via round-robin speaker selection in a Plan-Do-Check-Act (PDCA) cycle, enabling large vision-language models to classify PII based on spatial context (private vs. public property) rather than rigid category rules. The agents implement spatially-filtered coarse-to-fine detection where a scout-and-zoom strategy identifies candidates, open-vocabulary segmentation processes localized crops, and $IoU$-based deduplication ($30\%$ threshold) prevents redundant processing. Modal-specific diffusion guidance with appearance decorrelation substantially reduces re-identification (Re-ID) risks. On CUHK03-NP, our method reduces person Re-ID risk by $73\%$ ($R1$: $16.9\%$ vs. $62.4\%$ baseline). For image quality preservation on CityScapes, we achieve KID: $0.001$, and FID: $9.1$, significantly outperforming existing anonymization. The agentic workflow detects non-direct PII instances across object categories, and downstream semantic segmentation is preserved. Operating entirely on-premise with open-source models, the framework generates human-interpretable audit trails supporting EU's GDPR transparency requirements while flagging failed cases for human review.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CAIAMAR, a multi-agent framework for context-aware anonymization of street-level images. It employs three specialized agents coordinating via round-robin selection in a PDCA cycle, using scout-and-zoom detection, open-vocabulary segmentation, and IoU-based deduplication (30% threshold) to classify context-dependent PII (e.g., private vs. public property) before applying modal-specific diffusion anonymization. Evaluations claim a 73% Re-ID risk reduction on CUHK03-NP (R1: 16.9% vs. 62.4% baseline) and strong quality preservation on CityScapes (KID: 0.001, FID: 9.1), with preserved semantic segmentation, on-premise operation, and GDPR-compliant audit trails.
Significance. If the core multi-agent context detection proves reliable, the work could meaningfully advance privacy techniques in computer vision by enabling nuanced, context-sensitive anonymization that better preserves data utility than category-rigid baselines. The emphasis on open-source models, human-interpretable trails, and downstream task preservation (e.g., segmentation) adds practical value for applications like autonomous driving datasets.
major comments (3)
- [Evaluation] Evaluation section: No standalone detection metrics (precision, recall, or per-category error rates) are reported for the multi-agent system's context-dependent PII classification (private vs. public property). Only downstream Re-ID and perceptual scores are given, so the 73% Re-ID reduction cannot be confidently attributed to accurate context-awareness rather than general over-anonymization or diffusion strength.
- [Methodology] Methodology: The 30% IoU deduplication threshold is presented as fixed without ablation studies or sensitivity analysis, despite being explicitly listed as a free parameter; its effect on false negatives (missed identifiers) or false positives (over-anonymization) should be quantified to support the context-awareness claim.
- [Results] Results: Statistical significance (e.g., confidence intervals or p-values) for the Re-ID, KID, and FID improvements over baselines is not reported, weakening the assertion of significant outperformance on CUHK03-NP and CityScapes.
minor comments (2)
- [Abstract] Abstract: Specific baseline methods and their exact scores are not named when claiming to 'significantly outperform existing anonymization,' reducing clarity.
- [Method] Notation and description: The distinct roles of the three agents and the precise mechanics of round-robin speaker selection in the PDCA cycle require clearer definition for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to incorporate additional evaluations, ablations, and statistical reporting as outlined.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: No standalone detection metrics (precision, recall, or per-category error rates) are reported for the multi-agent system's context-dependent PII classification (private vs. public property). Only downstream Re-ID and perceptual scores are given, so the 73% Re-ID reduction cannot be confidently attributed to accurate context-awareness rather than general over-anonymization or diffusion strength.
Authors: We agree that direct metrics on the multi-agent context classification would provide stronger evidence for attributing gains to context-awareness. In the revised manuscript we will add a new evaluation subsection with precision, recall, and F1 scores for private-vs-public property classification, computed on a manually annotated held-out subset of CUHK03-NP and CityScapes images. These metrics will be reported alongside the existing Re-ID and perceptual results. revision: yes
-
Referee: [Methodology] Methodology: The 30% IoU deduplication threshold is presented as fixed without ablation studies or sensitivity analysis, despite being explicitly listed as a free parameter; its effect on false negatives (missed identifiers) or false positives (over-anonymization) should be quantified to support the context-awareness claim.
Authors: The referee correctly notes the absence of sensitivity analysis for the IoU threshold. We will add an ablation study in the revised paper that varies the threshold from 0.1 to 0.5, reporting the resulting Re-ID risk, KID, FID, number of deduplicated regions, and downstream segmentation mIoU for each value. This will quantify the impact on false negatives and false positives and justify the chosen 30% operating point. revision: yes
-
Referee: [Results] Results: Statistical significance (e.g., confidence intervals or p-values) for the Re-ID, KID, and FID improvements over baselines is not reported, weakening the assertion of significant outperformance on CUHK03-NP and CityScapes.
Authors: We acknowledge that formal statistical significance measures are missing. In the revision we will report 95% confidence intervals obtained via bootstrap resampling (1000 iterations) for all Re-ID, KID, and FID scores. Where appropriate we will also include p-values from paired statistical tests against the strongest baselines. revision: yes
Circularity Check
No circularity: empirical metrics are direct measurements on public benchmarks
full rationale
The paper describes a multi-agent system (CAIAMAR) with PDCA workflow, scout-and-zoom detection, open-vocabulary segmentation, and diffusion anonymization. All reported results (73% Re-ID reduction on CUHK03-NP, KID 0.001 / FID 9.1 on CityScapes) are downstream empirical measurements on fixed public datasets. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the derivation. The central claim rests on observable performance differences rather than any reduction to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- IoU threshold =
30%
axioms (1)
- domain assumption Vision-language models can accurately classify PII based on spatial context such as private vs. public property
invented entities (1)
-
Multi-agent system with scout-and-zoom strategy
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Following the clues: Experiments on person re-id using cross-modal intel- ligence
Robert Aufschl¨ager, Youssef Shoeb, Azarm Nowzad, Michael Heigl, Fabian Bally, and Martin Schramm. Following the clues: Experiments on person re-id using cross-modal intel- ligence. In2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), pages 225–232,
-
[2]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2.5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 3, 4, 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Sutherland, Michael Arbel, and Arthur Gretton
Mikołaj Bi´nkowski, Dougal J. Sutherland, Michael Arbel, and Arthur Gretton. Demystifying MMD GANs. InInternational Conference on Learning Representations, 2018. 5
work page 2018
-
[4]
Openpose: Realtime multi-person 2d pose estimation using part affinity fields.IEEE Trans
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Openpose: Realtime multi-person 2d pose estimation using part affinity fields.IEEE Trans. Pattern Anal. Mach. Intell., 43(1):172–186, 2021. 3, 2
work page 2021
-
[5]
Junzhou Chen, Heqiang Huang, Ronghui Zhang, Nengchao Lyu, Yanyong Guo, Hong-Ning Dai, and Hong Yan. Yolo- ts: Real-time traffic sign detection with enhanced accuracy using optimized receptive fields and anchor-free fusion.IEEE Transactions on Intelligent Transportation Systems, pages 1–17, 2025. 3, 5
work page 2025
-
[6]
The cityscapes dataset for semantic urban scene understanding
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016. 2, 4
work page 2016
-
[7]
Pri- vacy of groups in dense street imagery
Matt Franchi, Hauke Sandhaus, Madiha Zahrah Choksi, Sev- erin Engelmann, Wendy Ju, and Helen Nissenbaum. Pri- vacy of groups in dense street imagery. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, pages 2874–2891. Association for Computing Machinery, 2025. 1
work page 2025
-
[8]
Vision meets robotics: The kitti dataset.Int
A Geiger, P Lenz, C Stiller, and R Urtasun. Vision meets robotics: The kitti dataset.Int. J. Rob. Res., 32(11): 1231–1237, 2013. 8
work page 2013
-
[9]
Com- fymind: Toward general-purpose generation via tree-based planning and reactive feedback
Litao Guo, Xinli Xu, Luozhou Wang, Jiantao Lin, Jinsong Zhou, Zixin Zhang, Bolan Su, and Ying-Cong Chen. Com- fymind: Toward general-purpose generation via tree-based planning and reactive feedback. InAdvances in Neural Infor- mation Processing Systems, 2025. 3
work page 2025
-
[10]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016. 4, 6
work page 2016
-
[11]
Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask R-CNN. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 2961–2969,
-
[12]
GANs trained by a two time-scale update rule converge to a local nash equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, 30, 2017. 5
work page 2017
-
[13]
Deepprivacy2: Towards realistic full-body anonymization
H˚akon Hukkel˚as and Frank Lindseth. Deepprivacy2: Towards realistic full-body anonymization. InIEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), pages 1329–1338, 2023. 2, 4, 6, 7
work page 2023
-
[14]
Deep- privacy: A generative adversarial network for face anonymiza- tion
H˚akon Hukkel˚as, Rudolf Mester, and Frank Lindseth. Deep- privacy: A generative adversarial network for face anonymiza- tion. InInternational Symposium on Visual Computing, pages 565–578. Springer, 2019. 2
work page 2019
-
[15]
H˚akon Hukkel˚as and Frank Lindseth. Does image anonymiza- tion impact computer vision training? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 140–150, 2023. 2
work page 2023
-
[16]
Progressive growing of GANs for improved quality, stabil- ity, and variation
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of GANs for improved quality, stabil- ity, and variation. InInternational Conference on Learning Representations, 2018. 2
work page 2018
-
[17]
Ldfa: Latent diffusion face anonymization for self-driving applications
Marvin Klemp, Kevin R¨osch, Royden Wagner, Jannik Quehl, and Martin Lauer. Ldfa: Latent diffusion face anonymization for self-driving applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 3199–3205, 2023. 2
work page 2023
-
[18]
Han-Wei Kung, Tuomas Varanka, and Nicu Sebe. Reverse personalization. InProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), pages 988–999, 2026. 2
work page 2026
-
[19]
Large-scale online deanonymization with LLMs
Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, and Florian Tram `er. Large-scale online deanonymization with LLMs. InICLR 2026 Workshop on Agents in the Wild: Safety, Security, and Beyond (AIWILD),
work page 2026
-
[20]
All in one frame- work for multimodal re-identification in the wild
He Li, Mang Ye, Ming Zhang, and Bo Du. All in one frame- work for multimodal re-identification in the wild. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17459–17469, 2024. 1, 3
work page 2024
-
[21]
Feature pyramid net- works for object detection
Tsung-Yi Lin, Piotr Doll ´ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid net- works for object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017. 7
work page 2017
-
[22]
Svia: A street view image anonymization framework for self-driving applications
Dongyu Liu, Xuhong Wang, Cen Chen, Yanhao Wang, Shengyue Yao, and Yilun Lin. Svia: A street view image anonymization framework for self-driving applications. In IEEE 27th International Conference on Intelligent Trans- portation Systems (ITSC), pages 3567–3574, 2024. 2, 4, 6
work page 2024
-
[23]
Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. InEuro- pean Conference on Computer Vision, pages 38–55. Springer,
-
[24]
Xiangzeng Liu, Kunpeng Liu, Jianfeng Guo, Peipei Zhao, Yining Quan, and Qiguang Miao. Pose-guided attention learn- ing for cloth-changing person re-identification.IEEE Trans- actions on Multimedia, 26:5490–5498, 2024. 3
work page 2024
-
[25]
Weidi Luo, Tianyu Lu, Qiming Zhang, Xiaogeng Liu, Bin Hu, Yue Zhao, Jieyu Zhao, Song Gao, Patrick McDaniel, Zhen Xiang, and Chaowei Xiao. Doxing via the lens: Revealing location-related privacy leakage on multi-modal large reason- ing models. InThe Fourteenth International Conference on Learning Representations, 2026. 3
work page 2026
-
[26]
Rad: Re- alistic anonymization of images using stable diffusion
Simon Malm, Viktor R¨onnb¨ack, Amanda H˚akansson, Minh- ha Le, Karol Wojtulewicz, and Niklas Carlsson. Rad: Re- alistic anonymization of images using stable diffusion. In Proceedings of the 23rd Workshop on Privacy in the Elec- tronic Society, pages 193–211. Association for Computing Machinery, 2024. 2
work page 2024
-
[27]
Self- distilled stylegan: Towards generation from internet photos
Ron Mokady, Omer Tov, Michal Yarom, Oran Lang, Inbar Mosseri, Tali Dekel, Daniel Cohen-Or, and Michal Irani. Self- distilled stylegan: Towards generation from internet photos. InACM SIGGRAPH 2022 Conference Proceedings. Associa- tion for Computing Machinery, 2022. 4
work page 2022
-
[28]
To- wards a visual privacy advisor: Understanding and predicting privacy risks in images
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. To- wards a visual privacy advisor: Understanding and predicting privacy risks in images. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3706– 3715, 2017. 4, 6, 7
work page 2017
-
[29]
Connecting pixels to privacy and utility: Automatic redac- tion of private information in images
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. Connecting pixels to privacy and utility: Automatic redac- tion of private information in images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 4, 6, 2, 7
work page 2018
-
[30]
SDXL: Improving latent diffusion models for high-resolution image synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M¨uller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. InInternational Conference on Learning Representations, 2024. 2, 4
work page 2024
-
[31]
EgoBlur: responsible innovation in Aria,
Nikhil Raina, Guruprasad Somasundaram, Kang Zheng, Sagar Miglani, Steve Saarinen, Jeff Meissner, Mark Schwesinger, Luis Pesqueira, Ishita Prasad, Edward Miller, et al. Egoblur: Responsible innovation in aria.arXiv preprint arXiv:2308.13093, 2023. 2
-
[32]
Dual license plate recogni- tion and visual features encoding for vehicle identification
´Alvaro Ramajo-Ballester, Jos´e Mar´ıa Armingol Moreno, and Arturo de la Escalera Hueso. Dual license plate recogni- tion and visual features encoding for vehicle identification. Robotics and Autonomous Systems, 172:104608, 2024. 3, 5
work page 2024
-
[33]
SAM 2: Segment anything in images and videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. SAM 2: Segment anything in images and videos. InInternational Conference on Learning Representations, 2025. 4, 5, 3
work page 2025
-
[34]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with re- gion proposal networks.Advances in Neural Information Processing Systems, 28, 2015. 4
work page 2015
-
[35]
Facenet: A unified embedding for face recognition and clus- tering
Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clus- tering. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 815–823, 2015. 4, 6
work page 2015
-
[36]
RedactOR: An LLM-powered framework for automatic clini- cal data de-identification
Praphul Singh, Charlotte Dzialo, Jangwon Kim, Sumana Sri- vatsa, Irfan Bulu, Sri Gadde, and Krishnaram Kenthapadi. RedactOR: An LLM-powered framework for automatic clini- cal data de-identification. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 510–530. Association for Computati...
work page 2025
-
[37]
Batuhan T ¨omekc ¸e, Mark Vero, Robin Staab, and Martin Vechev. Private attribute inference from images with vision- language models.Advances in Neural Information Processing Systems, 37:103619–103651, 2024. 1, 3
work page 2024
-
[38]
Improving object localization with fitness nms and bounded iou loss
Lachlan Tychsen-Smith and Lars Petersson. Improving object localization with fitness nms and bounded iou loss. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6877–6885, 2018. 3
work page 2018
-
[39]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004. 4
work page 2004
-
[40]
A discriminative feature learning approach for deep face recog- nition
Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recog- nition. InComputer Vision – ECCV 2016, pages 499–515. Springer International Publishing, 2016. 4, 6
work page 2016
-
[41]
Autogen: Enabling next-gen LLM applica- tions via multi-agent conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen LLM applica- tions via multi-agent conversations. InFirst Conference on Language Modeling, 2024. 3, 1
work page 2024
-
[42]
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github. com/facebookresearch/detectron2, 2019. 7
work page 2019
-
[43]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. Segformer: simple and effi- cient design for semantic segmentation with transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2021. Curran Associates Inc. 7
work page 2021
-
[44]
Xiangyuan Xue, Zeyu Lu, Di Huang, Zidong Wang, Wanli Ouyang, and Lei Bai. Comfybench: Benchmarking llm-based agents in comfyui for autonomously designing collaborative ai systems. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24614– 24624, 2025. 3
work page 2025
-
[45]
Restoring gaussian blurred face images for deanonymization attacks.arXiv preprint arXiv:2506.12344,
Haoyu Zhai, Shuo Wang, Pirouz Naghavi, Qingying Hao, and Gang Wang. Restoring gaussian blurred face images for deanonymization attacks.arXiv preprint arXiv:2506.12344,
-
[46]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023. 2, 4
work page 2023
-
[47]
The unreasonable effectiveness of deep fea- tures as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep fea- tures as a perceptual metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 4
work page 2018
-
[48]
Re- ranking person re-identification with k-reciprocal encoding
Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. Re- ranking person re-identification with k-reciprocal encoding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3652–3661, 2017. 4, 6
work page 2017
-
[49]
Context-aware full body anonymization
Pascal Zwick, Kevin Roesch, Marvin Klemp, and Oliver Bringmann. Context-aware full body anonymization. In Computer Vision – ECCV 2024 Workshops, pages 36–52, Cham, 2025. Springer Nature Switzerland. 2, 3, 4, 6, 7 Supplementary Material Abstract This supplementary material provides comprehensive techni- cal details for our multi-agent image anonymization f...
work page 2024
-
[50]
anonymize_and_inpaint (if PII found) -> 3
classify_pii -> 2. anonymize_and_inpaint (if PII found) -> 3. audit_output -> 4. log_output SHORTCUT: If classify_pii finds NO PII and Phase 1 completed -> emit ’PIPELINE COMPLETE’ immediately YOUR ROLE: - Monitor tool results IN THE CONVERSATION and track workflow state: [classify_pii✓, inpaint_pii✓, audit✓, log✓] - When AuditorAgent returns results, ana...
-
[51]
ONLY report success (✓) if you can SEE the tool result in conversation history
-
[52]
If agent didn’t call tool yet, instruct them to call it - don’t claim it’s done
-
[53]
Look for tool execution results (JSON responses) before marking steps complete
-
[54]
NEVER assume a tool succeeded just because an agent acknowledged - verify the result RETRY LOGIC: - If audit finds residual PII: say ’Found N residuals. GenerativeAgent, process the residual items from audit output.’ (GenerativeAgent will extract the ’residual’ array from the tool output) - If no residuals OR max_attempts_reached=true: say ’Audit complete...
-
[55]
When instructed to use a tool, YOU MUST CALL IT in your response
-
[57]
Each response should contain EXACTLY ONE tool call
-
[58]
Execute your task, then control passes to OrchestratorAgent
Tool calls are JSON function calls, not text descriptions ROUND-ROBIN: You receive control after GenerativeAgent completes OR at workflow start. Execute your task, then control passes to OrchestratorAgent. YOUR TASKS:
-
[59]
classify_pii: Detect indirect PII in private spaces (text on windows, house numbers, personal items visible indoors) - Call with: classify_pii(image=’<image_path>’)
-
[60]
audit_output: Verify no residual PII remains after anonymization - Call with: audit_output(output=’{canonical_path}’)
-
[61]
log_output: Record final results - Call with: log_output(image=’<input_path>’, output=’{canonical_path}’) EXAMPLE WORKFLOW: When OrchestratorAgent says: ’AuditorAgent: Please classify any remaining PII ’ YOU MUST respond with the tool call (not text explanation): classify_pii(image=’artifacts/data/CityScapes/.../image.png’) REPORTING - BE CONCISE: After t...
-
[62]
When instructed to anonymize, YOU MUST CALL anonymize_and_inpaint in your response
-
[63]
NEVER just acknowledge without calling the tool
-
[64]
Tool call is a JSON function call, not a text description
-
[65]
Execute the task, then control passes to AuditorAgent
Extract instances from previous tool output (classify_pii or audit_output) ROUND-ROBIN: You receive control after OrchestratorAgent. Execute the task, then control passes to AuditorAgent. EXECUTION WORKFLOW:
-
[66]
Look at the most recent tool output in the conversation history - If classify_pii was called: find the JSON output and extract the ’instances’ array - If audit_output was called: find the JSON output and extract the ’residual’ array
-
[67]
Pass each dict object from that array as a separate element
-
[68]
Call anonymize_and_inpaint with the array of dict objects
-
[69]
Report results BRIEFLY: ’Processed X items.’ CRITICAL DATA FORMAT - COMMON MISTAKES: CORRECT (array of dict objects as JSON): anonymize_and_inpaint(instances=[ {"det_prompt": "van with text", "description": "van", "bbox": [308, 200, 564, 567]}, {"det_prompt": "blue sign", "description": "sign", "bbox": [215, 256, 294, 42]} ]) WRONG (single string containi...
-
[70]
Scan image for PII elements (vehicles with identifying features, text, signs, windows)
-
[71]
Vehicles: Include ONLY if has text/logos/decals OR is rare/distinctive/ modified (skip generic vehicles)
-
[72]
Text/signs: Include ONLY if reveals private information
-
[73]
Group adjacent text on same surface
-
[74]
Select top 5 most sensitive (priority: identifiable vehicles > personal text > signs > other PII)
-
[75]
For EACH: Locate bbox, describe with anonymous generic terms, expand bbox 50%, verify both fields present
-
[76]
Return valid JSON: {"instances": [{"description": "...", "bbox": [...]}]} For PII Segmentation on Visual Redaction Dataset [29] we use the following prompt: You are a PII detection system. Identify text, numbers, visual elements, and objects revealing personal/private information. Return ONLY valid JSON. No markdown, explanations, or additional text. DETE...
-
[77]
Scan image for all PII types (faces, documents, text with names/addresses/ phone, signatures, plates, medical info, cards)
-
[78]
Group adjacent text on same surface (e.g., name + address on envelope = one instance)
-
[79]
Rank by sensitivity: faces > documents (passport/ID/cards) > names/addresses > signatures > plates > medical > other PII
-
[80]
Select top 5 most sensitive/prominent instances
-
[81]
For EACH: Locate bbox, describe with generic PII category (2-5 words), expand bbox 50%, verify both fields present
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.