pith. sign in

hub Mixed citations

MediaPipe: A Framework for Building Perception Pipelines

Mixed citation behavior. Most common role is background (60%).

47 Pith papers citing it
Background 60% of classified citations
abstract

Building applications that perceive the world around them is challenging. A developer needs to (a) select and develop corresponding machine learning algorithms and models, (b) build a series of prototypes and demos, (c) balance resource consumption against the quality of the solutions, and finally (d) identify and mitigate problematic cases. The MediaPipe framework addresses all of these challenges. A developer can use MediaPipe to build prototypes by combining existing perception components, to advance them to polished cross-platform applications and measure system performance and resource consumption on target platforms. We show that these features enable a developer to focus on the algorithm or model development and use MediaPipe as an environment for iteratively improving their application with results reproducible across different devices and platforms. MediaPipe will be open-sourced at https://github.com/google/mediapipe.

hub tools

citation-role summary

background 3 method 2

citation-polarity summary

representative citing papers

Recognizing Co-Speech Gestures in-the-Wild

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

Introduces the first large-scale GRW dataset for semantic co-speech gesture classification, word recognition, and temporal localization in unconstrained videos, along with benchmarks for the three tasks.

CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

CHOIR reconstructs articulated hand motion, object shape with 6D pose, and contact from monocular videos via coarse initialization, generative spatial rectification, and contact-aware joint optimization.

D-Rex : Diffusion Rendering for Relightable Expressive Avatars

cs.GR · 2026-04-30 · conditional · novelty 7.0

D-Rex applies a LoRA-fine-tuned video diffusion model as an image-space post-process to add consistent relighting to any expressive full-body avatar pipeline while preserving motion and facial detail.

AvatarPointillist: AutoRegressive 4D Gaussian Avatarization

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.

The DeepSpeak Dataset

cs.CV · 2024-08-09 · unverdicted · novelty 7.0

DeepSpeak provides over 100 hours of consented, identity-matched real and modern deepfake audiovisual content focused on talking heads, with evaluations showing existing detectors fail to generalize without retraining.

MirrorPPR: Exemplar-Based Portrait Photo Retouching

cs.CV · 2026-06-28 · unverdicted · novelty 6.0

MirrorPPR extracts retouching operations from exemplar pairs via a dedicated extractor and transfers them to query images through a LoRA-adapted Diffusion Transformer, enabled by a new 47-million-pair dataset and self-augmentation for alignment.

EMOSH: Expressive Motion and Shape Disentanglement for Human Animation

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

EMOSH proposes an Expressive Human Model with disentangled parameters, coarse-to-fine motion injection, and spatially-aligned conditioning to generate high-fidelity expressive human videos without driving-subject shape leakage.

PaintCopilot: Modeling Painting as Autonomous Artistic Continuation

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

PaintCopilot models painting as an open-ended autoregressive process that predicts coherent brushstrokes from partial canvas observations using a ViT target predictor, flow-matching stroke generator, and VAE region sampler.

citing papers explorer

Showing 47 of 47 citing papers.