hub Mixed citations

MediaPipe: A Framework for Building Perception Pipelines

Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays · 2019 · cs.DC · arXiv 1906.08172

Mixed citation behavior. Most common role is background (60%).

47 Pith papers citing it

Background 60% of classified citations

open full Pith review browse 47 citing papers arXiv PDF

abstract

Building applications that perceive the world around them is challenging. A developer needs to (a) select and develop corresponding machine learning algorithms and models, (b) build a series of prototypes and demos, (c) balance resource consumption against the quality of the solutions, and finally (d) identify and mitigate problematic cases. The MediaPipe framework addresses all of these challenges. A developer can use MediaPipe to build prototypes by combining existing perception components, to advance them to polished cross-platform applications and measure system performance and resource consumption on target platforms. We show that these features enable a developer to focus on the algorithm or model development and use MediaPipe as an environment for iteratively improving their application with results reproducible across different devices and platforms. MediaPipe will be open-sourced at https://github.com/google/mediapipe.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 2

citation-polarity summary

background 3 use method 2

representative citing papers

Recognizing Co-Speech Gestures in-the-Wild

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

Introduces the first large-scale GRW dataset for semantic co-speech gesture classification, word recognition, and temporal localization in unconstrained videos, along with benchmarks for the three tasks.

CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

CHOIR reconstructs articulated hand motion, object shape with 6D pose, and contact from monocular videos via coarse initialization, generative spatial rectification, and contact-aware joint optimization.

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

cs.DC · 2026-05-20 · conditional · novelty 7.0

LlamaWeb is a WebGPU backend for llama.cpp that uses static memory planning, tunable kernels, and templated multi-precision support to cut memory use by 29-33% and raise decode throughput by 45-69% versus prior browser frameworks on tested hardware.

EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

EgoEV-HandPose uses stereo event cameras and a bird's-eye-view fusion module to achieve 30.54 mm MPJPE and 86.87% gesture accuracy on a new large-scale egocentric dataset, outperforming prior RGB and event methods especially in low light and occlusion.

SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

cs.HC · 2026-05-07 · unverdicted · novelty 7.0

SIGMA-ASL is a multimodal dataset with 93,545 word-level ASL clips from Kinect RGB-D, mmWave radar, and dual IMUs, plus benchmarking protocols for single- and multi-modal recognition.

Tamaththul3D: High-Fidelity 3D Saudi Sign Language Avatars from Monocular Video

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Tamaththul3D releases SMPL-X annotations for the Ishara-500 dataset and a decoupled reconstruction pipeline using geometric inverse kinematics that cuts hand error by up to 32% and runs 32x faster while generalizing across sign languages.

D-Rex : Diffusion Rendering for Relightable Expressive Avatars

cs.GR · 2026-04-30 · conditional · novelty 7.0

D-Rex applies a LoRA-fine-tuned video diffusion model as an image-space post-process to add consistent relighting to any expressive full-body avatar pipeline while preserving motion and facial detail.

Intervention-Based Self-Supervised Learning: A Causal Probe Paradigm for Remote Photoplethysmography

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

A new intervention-based SSL paradigm for rPPG uses video editing and falsifiability checks to learn the true physiological signal instead of dominant artifacts.

AvatarPointillist: AutoRegressive 4D Gaussian Avatarization

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.

The DeepSpeak Dataset

cs.CV · 2024-08-09 · unverdicted · novelty 7.0

DeepSpeak provides over 100 hours of consented, identity-matched real and modern deepfake audiovisual content focused on talking heads, with evaluations showing existing detectors fail to generalize without retraining.

MirrorPPR: Exemplar-Based Portrait Photo Retouching

cs.CV · 2026-06-28 · unverdicted · novelty 6.0

MirrorPPR extracts retouching operations from exemplar pairs via a dedicated extractor and transfers them to query images through a LoRA-adapted Diffusion Transformer, enabled by a new 47-million-pair dataset and self-augmentation for alignment.

Monocular Avatar Reconstruction via Cascaded Diffusion Priors and UV-Space Differentiable Shading

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

A cascaded LoRA diffusion pipeline in UV space with cross-intrinsic attention and differentiable BRDF shading produces 4K PBR avatar assets from single images after training on under 100 scans.

EMOSH: Expressive Motion and Shape Disentanglement for Human Animation

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

EMOSH proposes an Expressive Human Model with disentangled parameters, coarse-to-fine motion injection, and spatially-aligned conditioning to generate high-fidelity expressive human videos without driving-subject shape leakage.

Bengal-HP_RU: A Dataset of Bengal People For Head Pose Estimation

cs.CV · 2026-06-23 · unverdicted · novelty 6.0

Bengal-HP_RU is the first publicly available head pose dataset for Bengali subjects, with 12,894 images collected from Wikimedia Commons and partitioned by uploader identity.

Mining Multi-Modality Spatio-Temporal Cues for Video Important Person Identification

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

Introduces VIP identification task, releases Temporal-VIP dataset, and presents VIP-Net framework that achieves 67.3% accuracy on identifying important persons in videos while providing rationale similarity of 0.63.

CogPortrait: Fine-Grained Eye-Region Control in Portrait Animation via Hierarchical Agent Planning

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

CogPortrait uses MLLM-based hierarchical planning to convert high-level labels into eye keypoints and a conditioned DiT model to produce portrait animations with improved eye-region accuracy on the new EMH benchmark.

PaintCopilot: Modeling Painting as Autonomous Artistic Continuation

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

PaintCopilot models painting as an open-ended autoregressive process that predicts coherent brushstrokes from partial canvas observations using a ViT target predictor, flow-matching stroke generator, and VAE region sampler.

SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 55+ Sign Languages

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

SignVerse-2M provides a 2-million-clip multilingual pose-native dataset for sign language derived from public videos via DWPose preprocessing to enable robust modeling in real-world conditions.

FaceValue: Exploring Real-Time Self-View Overlays to Prompt Meaning-Oriented Self-Awareness in Remote Meetings

cs.HC · 2026-04-30 · unverdicted · novelty 6.0

A technology probe called FaceValue uses real-time self-view overlays to support meaning-oriented self-awareness in remote meetings, with participants reporting increased cue awareness and communication improvements.

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

A multimodal CNN on 87,547 Vogue images classifies fashion houses at 78.2% top-1 accuracy, decades at 88.6%, and years at 58.3% with 2.2-year mean error, and shows texture and luminance carry most of the house-identity signal.

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

CoInteract adds a human-aware mixture-of-experts and spatially-structured co-generation to a diffusion transformer to synthesize videos with stable structures and physically plausible human-object contacts.

AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

AIFIND stabilizes incremental face forgery detection by aligning volatile features to invariant semantic anchors from low-level artifacts using attention and harmonization modules.

Bootstrapping Sign Language Annotations with Sign Language Models

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

A pseudo-annotation pipeline combines fingerspelling and isolated sign recognizers with K-Shot LLM estimation to produce ranked time-aligned gloss annotations from signed video and English input.

A Synthetic Eye Movement Dataset for Script Reading Detection: Real Trajectory Replay on a 3D Simulator

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

A replay pipeline on a 3D eye simulator generates 144 sessions of synthetic eye movement video that preserves source temporal dynamics for script-reading detection.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU cs.DC · 2026-05-20 · conditional · none · ref 41 · internal anchor
LlamaWeb is a WebGPU backend for llama.cpp that uses static memory planning, tunable kernels, and templated multi-precision support to cut memory use by 29-33% and raise decode throughput by 45-69% versus prior browser frameworks on tested hardware.

MediaPipe: A Framework for Building Perception Pipelines

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer