pith. sign in
Pith Number

pith:Z43EXVQM

pith:2026:Z43EXVQM5WJVAK22YLY345VZPJ
not attested not anchored not stored refs resolved

RHINO: Reconstructing Human Interactions with Novel Objects from Monocular Videos

Chen Guo, Chengwei Zheng, Dimitrios Tzionas, Georgios Paschalidis, Juan Zarate, Lixin Xue, Manuel Kaufmann

A three-step method reconstructs 3D humans, novel objects, and scenes together from monocular videos of interactions.

arxiv:2605.17014 v1 · 2026-05-16 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Z43EXVQM5WJVAK22YLY345VZPJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

RHINO recovers in 3D a human, novel (unseen) manipulated object, and static scene in a common world frame from a monocular RGB video.

C2weakest assumption

The pipeline assumes that off-the-shelf 3D-aware foundation models supply sufficiently accurate cues to stabilize SfM on low-texture object regions and that subtracting estimated camera motion cleanly isolates true object motion without residual errors that propagate into the final registration.

C3one line summary

RHINO recovers 3D human, novel manipulated object, and static scene from monocular video by stabilizing SfM with foundation models, separating motions, and refining with compositional neural SDFs plus contact priors.

References

79 extracted · 79 resolved · 0 Pith anchors

[1] SDFit: 3D object pose and shape by fitting a morphable SDF to a single image 2025
[2] BEHA VE: Dataset and method for tracking human object interactions 2022
[3] FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models 2024
[4] Back on track: Bundle adjustment for dynamic scene recon- struction 2025
[5] Holistic++ scene understanding: Single-view 3D holistic scene parsing and human pose es- timation with human-object interaction and physical com- monsense 2019

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:03:35.965999Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

cf364bd60ced93502b5ac2f1be76b97a5e5ce8ad4a3183aae6f073255c0659c1

Aliases

arxiv: 2605.17014 · arxiv_version: 2605.17014v1 · doi: 10.48550/arxiv.2605.17014 · pith_short_12: Z43EXVQM5WJV · pith_short_16: Z43EXVQM5WJVAK22 · pith_short_8: Z43EXVQM
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Z43EXVQM5WJVAK22YLY345VZPJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: cf364bd60ced93502b5ac2f1be76b97a5e5ce8ad4a3183aae6f073255c0659c1
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c828be70067aa72d278955354f68fdde4adf106f2c13e34620e22365dd60d0d1",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-16T14:25:42Z",
    "title_canon_sha256": "83dbab86800097e3504e2d7d56337c726de38c422cc70663cd831fd98b0c201f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17014",
    "kind": "arxiv",
    "version": 1
  }
}