Flamingo: a visual language model for few-shot learning

· 2022

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation

cs.CV · 2026-03-28 · unverdicted · novelty 7.0

RailVQA-bench supplies 21,168 QA pairs for ATO visual cognition while RailVQA-CoM combines large-model reasoning with small-model efficiency via transparent modules and temporal sampling.

$M^2$-VLA: Boosting Vision-Language Models for Generalizable Manipulation via Layer Mixture and Meta-Skills

cs.RO · 2026-04-27 · unverdicted · novelty 6.0

M²-VLA shows that generalized VLMs can serve as direct backbones for robotic manipulation by selectively extracting task-critical features via Mixture of Layers and adding Meta Skill Modules for efficient trajectory learning.

Instruction-Free Tuning of Large Vision Language Models for Medical Instruction Following

cs.CV · 2026-03-19 · unverdicted · novelty 6.0

Instruction-free tuning of LVLMs on medical image-description pairs via momentum proxy instructions and response shuffling achieves SOTA accuracy on VQA tasks across SKINCON, WBCAtt, CBIS, and MIMIC-CXR.

Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

CVA aggregates frozen VFM embeddings via latent reasoning to create compact video embeddings for efficient micro-video recommendation, delivering consistent performance gains and orders-of-magnitude efficiency improvements.

Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models

cs.LG · 2026-04-29 · unverdicted · novelty 5.0

A Meta AutoEncoder framework enables adaptive, progressive compression of visual features for low-latency edge-cloud VLM inference without model fine-tuning.

Personalized Cross-Modal Emotional Correlation Learning for Speech-Preserving Facial Expression Manipulation

cs.CV · 2026-04-28 · unverdicted · novelty 5.0

PCMECL improves speech-preserving facial expression manipulation by learning personalized prompts from individual visuals and using feature differencing to align visual and semantic changes from VLMs.

Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems

eess.SY · 2026-04-03 · unverdicted · novelty 2.0

A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.

citing papers explorer

Showing 7 of 7 citing papers.

RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation cs.CV · 2026-03-28 · unverdicted · none · ref 8
RailVQA-bench supplies 21,168 QA pairs for ATO visual cognition while RailVQA-CoM combines large-model reasoning with small-model efficiency via transparent modules and temporal sampling.
$M^2$-VLA: Boosting Vision-Language Models for Generalizable Manipulation via Layer Mixture and Meta-Skills cs.RO · 2026-04-27 · unverdicted · none · ref 2
M²-VLA shows that generalized VLMs can serve as direct backbones for robotic manipulation by selectively extracting task-critical features via Mixture of Layers and adding Meta Skill Modules for efficient trajectory learning.
Instruction-Free Tuning of Large Vision Language Models for Medical Instruction Following cs.CV · 2026-03-19 · unverdicted · none · ref 30
Instruction-free tuning of LVLMs on medical image-description pairs via momentum proxy instructions and response shuffling achieves SOTA accuracy on VQA tasks across SKINCON, WBCAtt, CBIS, and MIMIC-CXR.
Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation cs.LG · 2026-05-09 · unverdicted · none · ref 38
CVA aggregates frozen VFM embeddings via latent reasoning to create compact video embeddings for efficient micro-video recommendation, delivering consistent performance gains and orders-of-magnitude efficiency improvements.
Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models cs.LG · 2026-04-29 · unverdicted · none · ref 4
A Meta AutoEncoder framework enables adaptive, progressive compression of visual features for low-latency edge-cloud VLM inference without model fine-tuning.
Personalized Cross-Modal Emotional Correlation Learning for Speech-Preserving Facial Expression Manipulation cs.CV · 2026-04-28 · unverdicted · none · ref 7
PCMECL improves speech-preserving facial expression manipulation by learning personalized prompts from individual visuals and using feature differencing to align visual and semantic changes from VLMs.
Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems eess.SY · 2026-04-03 · unverdicted · none · ref 167
A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.

Flamingo: a visual language model for few-shot learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer