Emo-llama: Enhancing facial emotion understanding with instruction tuning

Bohao Xing, Zitong Yu, Xin Liu, Kaishen Yuan, Qilang Ye, Weicheng Xie, Huanjing Yue, Jingyu Yang, Heikki K¨alvi¨ainen · 2024 · arXiv 2408.11424

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

ActFER reformulates facial expression recognition as active tool-augmented visual reasoning with a custom reinforcement learning algorithm UC-GRPO that outperforms passive MLLM baselines on AU prediction.

FPBench: A Comprehensive Benchmark of Multimodal Large Language Models for Fingerprint Analysis

cs.CV · 2025-12-19 · conditional · novelty 6.0

FPBench evaluates 20 MLLMs across 8 fingerprint tasks on 7 datasets and shows fine-tuning vision and language encoders improves performance by 7-39%.

Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer

cs.CV · 2026-04-08 · unverdicted · novelty 5.0

The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.

citing papers explorer

Showing 3 of 3 citing papers.

ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning cs.CV · 2026-04-10 · unverdicted · none · ref 46
ActFER reformulates facial expression recognition as active tool-augmented visual reasoning with a custom reinforcement learning algorithm UC-GRPO that outperforms passive MLLM baselines on AU prediction.
FPBench: A Comprehensive Benchmark of Multimodal Large Language Models for Fingerprint Analysis cs.CV · 2025-12-19 · conditional · none · ref 58
FPBench evaluates 20 MLLMs across 8 fingerprint tasks on 7 datasets and shows fine-tuning vision and language encoders improves performance by 7-39%.
Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer cs.CV · 2026-04-08 · unverdicted · none · ref 93
The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.

Emo-llama: Enhancing facial emotion understanding with instruction tuning

fields

years

verdicts

representative citing papers

citing papers explorer