hub

Rtmpose: Real-time multi-person pose estimation based on mmpose

· 2023 · arXiv 2303.07399

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 2 background 1

citation-polarity summary

use method 2 background 1

representative citing papers

iMiGUE-3K: A Large-Scale Benchmark for Micro-Gesture Analysis with Self-Supervised Learning

cs.CV · 2026-05-16 · unverdicted · novelty 8.0

iMiGUE-3K is the largest in-the-wild micro-gesture video dataset with 3.4K clips and 37M frames from real interviews, supporting self-supervised foundation models and benchmarks that show micro-gestures improve emotion understanding.

PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

PoseBridge recovers semantic information lost during skeletonization by extracting pose-anchored cues from human pose estimation and transferring them via skeleton-conditioned bridging and semantic prototype adaptation, yielding 13.3-17.4 point gains on the Kinetics PURLS benchmark.

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.

BarbieGait: An Identity-Consistent Synthetic Human Dataset with Versatile Cloth-Changing for Gait Recognition

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

BarbieGait is a new synthetic gait dataset with identity-consistent cloth changes paired with the GaitCLIF model that improves cross-clothing recognition on the new data and existing benchmarks.

Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

The paper presents a multimodal framework, dataset, and reconstruction pipeline to create immersive volumetric videos supporting large 6-DoF audiovisual interaction from real multi-view captures.

LA-Sign: Looped Transformers with Geometry-aware Alignment for Skeleton-based Sign Language Recognition

cs.CV · 2026-03-30 · unverdicted · novelty 7.0

LA-Sign achieves state-of-the-art skeleton-based sign language recognition on WLASL and MSASL by using recurrent looped transformers with adaptive hyperbolic geometry alignment.

Emotion Recognition in Sign Language Conversation

cs.CL · 2026-05-22 · unverdicted · novelty 6.0

Introduces the eJSL Dialog dataset (1,920 videos in 480 dialogues from STUDIES corpus) for conversational sign language emotion recognition and benchmarks models revealing a domain gap with generic multimodal approaches.

High-Fidelity Single-Image Head Modeling with Industry-Grade Topology

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

cs.CV · 2025-09-24 · unverdicted · novelty 6.0

EditVerse unifies image and video editing and generation in one transformer model via unified token sequences and in-context learning, trained jointly on curated video editing data plus image/video corpora and evaluated on a new instruction-based benchmark.

OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025

cs.CV · 2026-05-21 · accept · novelty 5.0

The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.

Inferring World Belief States in Dynamic Real-World Environments

cs.RO · 2026-04-13 · unverdicted · novelty 5.0

A robot infers human world belief states from observations in dynamic 3D household environments to enable fluent human-robot teamwork.

Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation

cs.CV · 2025-09-12 · unverdicted · novelty 5.0

Extends online 2D multi-camera tracking to 3D via depth-based point cloud reconstruction, clustering for 3D boxes, and local ID consistency for global data association, placing 3rd on 2025 AI City Challenge 3D MTMC dataset.

World Simulation with Video Foundation Models for Physical AI

cs.CV · 2025-10-28 · unverdicted · novelty 4.0

Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.

AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

cs.CV · 2026-05-20

DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos

cs.CV · 2026-05-18

citing papers explorer

Showing 3 of 3 citing papers after filters.

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning cs.CV · 2025-09-24 · unverdicted · none · ref 10
EditVerse unifies image and video editing and generation in one transformer model via unified token sequences and in-context learning, trained jointly on curated video editing data plus image/video corpora and evaluated on a new instruction-based benchmark.
Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation cs.CV · 2025-09-12 · unverdicted · none · ref 16
Extends online 2D multi-camera tracking to 3D via depth-based point cloud reconstruction, clustering for 3D boxes, and local ID consistency for global data association, placing 3rd on 2025 AI City Challenge 3D MTMC dataset.
World Simulation with Video Foundation Models for Physical AI cs.CV · 2025-10-28 · unverdicted · none · ref 37
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.

Rtmpose: Real-time multi-person pose estimation based on mmpose

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer