arXiv preprint arXiv:2307.12980 , year=

A systematic survey of prompt engineering on vision-language foundation models , author= · 2023 · arXiv 2307.12980

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

Does Your VFM Speak Plant? The Botanical Grammar of Vision Foundation Models for Object Detection

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

Optimized prompts for vision foundation models improve cowpea detection accuracy by over 0.35 mAP on synthetic data and transfer effectively to real fields without manual annotations.

Are vision-language models ready to zero-shot replace supervised classification models in agriculture?

cs.CV · 2025-12-17 · unverdicted · novelty 4.0

Zero-shot VLMs reach at most 62% accuracy on agricultural classification tasks while supervised models like YOLO11 perform markedly higher, indicating they are not ready to replace task-specific systems.

A Survey of Personalized Federated Foundation Models for Privacy-Preserving Recommendation

cs.LG · 2025-06-13 · unverdicted · novelty 3.0

A survey of personalization techniques and foundation model adaptations in federated settings for privacy-preserving recommendations, emphasizing their architectural intersection.

Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

cs.CL · 2026-05-16 · unverdicted · novelty 2.0

A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

citing papers explorer

Showing 5 of 5 citing papers.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 176
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
Does Your VFM Speak Plant? The Botanical Grammar of Vision Foundation Models for Object Detection cs.CV · 2026-04-10 · unverdicted · none · ref 8
Optimized prompts for vision foundation models improve cowpea detection accuracy by over 0.35 mAP on synthetic data and transfer effectively to real fields without manual annotations.
Are vision-language models ready to zero-shot replace supervised classification models in agriculture? cs.CV · 2025-12-17 · unverdicted · none · ref 41
Zero-shot VLMs reach at most 62% accuracy on agricultural classification tasks while supervised models like YOLO11 perform markedly higher, indicating they are not ready to replace task-specific systems.
A Survey of Personalized Federated Foundation Models for Privacy-Preserving Recommendation cs.LG · 2025-06-13 · unverdicted · none · ref 24
A survey of personalization techniques and foundation model adaptations in federated settings for privacy-preserving recommendations, emphasizing their architectural intersection.
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages cs.CL · 2026-05-16 · unverdicted · none · ref 29
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

arXiv preprint arXiv:2307.12980 , year=

fields

years

verdicts

representative citing papers

citing papers explorer