pith. sign in

arxiv: 2510.18976 · v2 · submitted 2025-10-21 · 💻 cs.CV · cs.HC

Ninja Codes: Neurally Generated Fiducial Markers for Stealthy 6-DoF Tracking

Pith reviewed 2026-05-18 04:28 UTC · model grok-4.3

classification 💻 cs.CV cs.HC
keywords fiducial markersneural image encoding6-DoF pose estimationstealthy trackingaugmented realityrobotics visionmarker detection
0
0 comments X

The pith

Neural networks generate printable markers that blend into surroundings while supporting accurate 6-DoF tracking via RGB cameras.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to turn ordinary images into fiducial markers using a neural encoder that makes only small visual changes. These Ninja Codes can be printed on regular paper and then detected by standard RGB cameras running inference to give precise position and orientation in three dimensions. This matters because traditional markers are obvious and can spoil the appearance of a space, limiting their use in homes, offices, or artistic installations. If the approach works as described, tracking technology becomes practical in more everyday settings without the visual clutter. A reader would care if they want to add AR elements or robot navigation to real environments without obvious stickers or patterns.

Core claim

Ninja Codes are created by an encoder network that applies modest alterations to arbitrary images, allowing the printed results to provide stealthy 6-DoF location tracking when detected by RGB camera inference, while naturally blending into various real-world environmental textures under typical indoor lighting.

What carries the argument

The encoder network that modifies input images with subtle alterations to embed tracking information while maintaining visual similarity to the original texture.

If this is right

  • Tracking systems can operate in visually sensitive areas where standard fiducial markers would be too noticeable.
  • Applications in robotics and augmented reality gain access to reliable pose data from everyday surfaces.
  • Deployment requires only standard color printers and consumer devices with cameras and inference support.
  • Markers adapt to different backgrounds by starting from images that match the target texture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending training to more lighting conditions could allow use outdoors or in dynamic environments.
  • This method might combine with other computer vision tasks like object recognition on the same images.
  • Future work could test durability of the printed codes over time or under wear.

Load-bearing premise

The alterations made by the network are small enough to blend into many different textures and lighting conditions but large enough to allow reliable extraction of 6-DoF pose information.

What would settle it

Printing Ninja Codes on multiple surface types, placing them in varied indoor scenes with common lighting, and checking if the detection succeeds in providing correct 6-DoF estimates in most cases; failure would be if accuracy drops significantly.

Figures

Figures reproduced from arXiv: 2510.18976 by Shunya Kato, Yuichiro Takeuchi, Yusuke Imoto.

Figure 1
Figure 1. Figure 1: We present Ninja Codes, inconspicuous fiducial markers that can be made to blend into various real-world environ [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Ninja Codes end-to-end training architecture. A total of five modules are trained simultaneously: encoder, decoder, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Noise functions to simulate perturbations. Perturbations owing to the printing method/material are simulated using [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: We employ a two-phase training process. After the [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The 25 digital images used to evaluate code detection [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Poster boards used to evaluate 6-DoF tracking per [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Average time from image display to code detection [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Ninja Codes (top) and a simple augmented reality [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Artifacts register more strongly for plain or light [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: A reverse encoder that takes a Ninja Code as input [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Faulty color calibration results in color discontinu [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
read the original abstract

In this paper we describe Ninja Codes, neurally generated fiducial markers that can be made to naturally blend into various real-world environments. An encoder network converts arbitrary images into Ninja Codes by applying visually modest alterations; the resulting codes, printed and pasted onto surfaces, can provide stealthy 6-DoF location tracking for a wide range of applications including robotics and augmented reality. Ninja Codes can be printed using standard color printers on regular printing paper, and can be detected using any device equipped with a modern RGB camera and capable of running inference. Through experiments, we demonstrate Ninja Codes' ability to provide reliable location tracking under common indoor lighting conditions, while successfully concealing themselves within diverse environmental textures. We expect Ninja Codes to offer particular value in scenarios where the conspicuous appearance of conventional fiducial markers makes them undesirable for aesthetic and other reasons.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces Ninja Codes, neurally generated fiducial markers created by an encoder network that applies visually modest alterations to arbitrary images. These markers can be printed on standard paper with color printers and detected via RGB cameras on devices running inference, enabling stealthy 6-DoF pose estimation for robotics and AR. Experiments claim reliable tracking under common indoor lighting while the markers blend into diverse environmental textures.

Significance. If the central claims hold, Ninja Codes could provide a practical advance over traditional conspicuous fiducial markers by enabling aesthetically integrated tracking. This has potential value in applications where visible markers are undesirable, combining neural image synthesis with pose regression in a way that could influence future work on unobtrusive computer vision systems.

major comments (1)
  1. [Experiments] The experimental description provides no quantitative perceptual metrics (e.g., SSIM, LPIPS, or user studies) to verify that the neural alterations remain modest enough to blend naturally, nor failure-case analysis or accuracy metrics (with error bars) for 6-DoF estimation across varied textures and lighting. This directly bears on the central claim, as modest perturbations can easily eliminate the structured signals needed for reliable pose regression once printing quantization, camera noise, and real-world lighting are introduced.
minor comments (1)
  1. The abstract and introduction would benefit from a concise statement of the encoder and detector network architectures and training objectives to improve clarity for readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address the major comment regarding the experimental section below and agree that additional quantitative analysis will strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Experiments] The experimental description provides no quantitative perceptual metrics (e.g., SSIM, LPIPS, or user studies) to verify that the neural alterations remain modest enough to blend naturally, nor failure-case analysis or accuracy metrics (with error bars) for 6-DoF estimation across varied textures and lighting. This directly bears on the central claim, as modest perturbations can easily eliminate the structured signals needed for reliable pose regression once printing quantization, camera noise, and real-world lighting are introduced.

    Authors: We agree that the current experiments would be strengthened by quantitative support for both the perceptual blending and the tracking accuracy claims. In the revised manuscript we will add SSIM and LPIPS scores computed between the original textures and the generated Ninja Codes to provide an objective measure of visual modesty. We will also include results from a user study in which participants rate the naturalness of the markers when placed in diverse indoor scenes. For 6-DoF estimation we will report mean pose errors with standard error bars across multiple textures and lighting conditions, together with a dedicated failure-case analysis that examines cases where tracking degrades due to printing artifacts, camera noise, or extreme lighting. These additions will be placed in an expanded Experiments section and will directly address the concern that modest perturbations may not survive real-world imaging conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via neural training and empirical tests

full rationale

The paper describes an encoder network that converts arbitrary images into Ninja Codes via modest visual alterations, with the stealthy 6-DoF tracking capability shown through experiments on printed markers under indoor lighting. No load-bearing steps reduce by construction to inputs: there are no self-definitional equations, fitted parameters renamed as predictions, or self-citation chains that justify the core claim. The abstract and description indicate a standard generative ML pipeline whose outputs are validated externally rather than assumed by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into training details; the approach implicitly relies on standard neural network assumptions for image-to-marker conversion and detection without explicit free parameters or invented entities stated.

axioms (1)
  • domain assumption Neural networks can be trained to apply visually modest alterations that preserve detectability for pose estimation.
    Invoked in the description of the encoder network converting arbitrary images into Ninja Codes.

pith-pipeline@v0.9.0 · 5676 in / 1202 out tokens · 28204 ms · 2026-05-18T04:28:11.099465+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    and Padmanabhan, V

    Bahl, V. and Padmanabhan, V. RADAR: An In-Building RF-based User Location and Tracking System. In Proc. of IEEE INFOCOM 2000, 775–784

  2. [2]

    Hiding Images in Plain Sight: Deep Steganography

    Baluja, S. Hiding Images in Plain Sight: Deep Steganography. In Proc. of NeurIPS 2017, 2066–2076

  3. [3]

    RUNE-Tag: A High Accuracy Fiducial Marker with Strong Occlusion Resilience

    Bergamasco, F., Albarelli, A., Rodolà, E., Torsello, A. RUNE-Tag: A High Accuracy Fiducial Marker with Strong Occlusion Resilience. In Proc. of CVPR 2011, 113– 120

  4. [4]

    AI-IMU Dead-Reckoning

    Brossard, M., Barrau, A., Bonnabel, S. AI-IMU Dead-Reckoning. IEEE Trans. on Intelligent Vehicles 5 (4), 585–595. 2019

  5. [5]

    Human Pose Estimation via Convolutional Part Heatmap Regression

    Bulat, A., and Tzimiropoulos, G. Human Pose Estimation via Convolutional Part Heatmap Regression. In Proc. of ECCV 2016, 717–732

  6. [6]

    Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Reid, I., Leonard, J. J. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. on Robotics 32 (6), 1309–1332. 2016

  7. [7]

    Describing Textures in the Wild

    Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A. Describing Textures in the Wild. In Proc. of CVPR 2014, 3606–3613

  8. [8]

    ChromaTag: A Colored Marker and Fast Detection Algorithm

    DeGol, J., Bretl, T., Hoiem, D. ChromaTag: A Colored Marker and Fast Detection Algorithm. In Proc. of ICCV 2017, 1472–1481

  9. [9]

    D., Garcia-Martin, R., Haertel, P

    Dogan, M. D., Garcia-Martin, R., Haertel, P. W., O’Keefe, J. J., Taka, A., Aurora, A., Sanchez-Reillo, R., Mueller, S. BrightMarker: 3D Printed Fluorescent Markers for Object Tracking. In Proc. of UIST 2023, 1–13

  10. [10]

    A Fiducial Marker System Using Digital Techniques

    Fiala, M. A Fiducial Marker System Using Digital Techniques. In Proc. of CVPR 2005, 590–596

  11. [11]

    B., Hoff, W

    Gatrell, L. B., Hoff, W. A., Sklair, C. Robust Image Features: Concentric Contrast- ing Circles and Their Image Extraction. In Proc. of SPIE 1612, 235–245

  12. [12]

    Flight Assembled Architecture

    Gramazio, F., Kohler, M., d’Andrea, R. Flight Assembled Architecture. Editions HYX. 2012

  13. [13]

    Learnable Visual Markers

    Grinchuk, O., Lebedev, V., Lempitsky, V. Learnable Visual Markers. In Proc. of NeurIPS 2016, 4150–4158

  14. [14]

    A Photometric Approach to Digitizing Cul- tural Artifacts

    Hawkins, T., Cohen, J., Debevec, P. A Photometric Approach to Digitizing Cul- tural Artifacts. In Proc. of VAST 2001, 333–342

  15. [15]

    Deep Residual Learning for Image Recognition

    He, K., Zhang, X., Ren, S., Sun, J. Deep Residual Learning for Image Recognition. In Proc. of CVPR 2016, 770–778

  16. [16]

    Advanced Self-Contained Object Removal for Realizing Real- Time Diminished Reality in Unconstrained Environments

    Herling, J., Broll, W. Advanced Self-Contained Object Removal for Realizing Real- Time Diminished Reality in Unconstrained Environments. In Proc. of ISMAR 2010, 207–212

  17. [17]

    Deep ChArUco: Dark ChArUco Marker Pose Estimation

    Hu, D., DeTone, D., Chauhan, V., Spivak, I., Malisiewicz, T. Deep ChArUco: Dark ChArUco Marker Pose Estimation. In Proc. of CVPR 2019, 8428–8436

  18. [18]

    Spatial Transformer Networks

    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K. Spatial Transformer Networks. In Proc. of NeurIPS 2015, 2017–2025

  19. [19]

    Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography

    Jia, J., Gao, Z., Zhu, D., Min, X., Zhai, G., Yang, X. Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography. In Proc. of CVPR 2022, 2273–2282

  20. [20]

    The reacTable: Exploring the Synergy between Live Music Performance and Tabletop Tangible Interfaces

    Jordà, S., Geiger, G., Alonso, M., Kaltenbrunner, M. The reacTable: Exploring the Synergy between Live Music Performance and Tabletop Tangible Interfaces. In Proc. of TEI 2007, 139–146

  21. [21]

    Kingma, D. P. and Ba, J. Adam: A Method for Stochastic Optimization. In Proc. of ICLR 2015, 1–15

  22. [22]

    S., Nayar, S

    Li, D., Nair, A. S., Nayar, S. K., Zheng, C. AirCode: Unobtrusive Physical Tags for Digital Fabrication. In Proc. of UIST 2017, 449–460

  23. [23]

    L., Dollár., P

    Lin, T., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., Dollár., P. Microsoft COCO: Common Objects in Context. In Proc. of ECCV 2014, 740–755

  24. [24]

    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A. C. SSD: Single Shot MultiBox Detector. In Proc. of ECCV 2016, 21–37

  25. [25]

    Distinctive Image Features from Scale-Invariant Keypoints

    Lowe, D. Distinctive Image Features from Scale-Invariant Keypoints. Intl. J. of Computer Vision 60 (2), 91–110. 2004

  26. [26]

    Mann, S. Fung, J. EyeTap Devices for Augmented, Deliberately Diminished, or Otherwise Altered Visual Perception of Rigid Planar Patches of Real-World Scenes. Presence 11 (2), 158–175. 2002

  27. [27]

    AprilTag: A Robust and Flexible Visual Fiducial System

    Olson, E. AprilTag: A Robust and Flexible Visual Fiducial System. In Proc. of ICRA 2011, 3400–3407

  28. [28]

    B., Psota, E

    Peace, J. B., Psota, E. T., Liu, Y., Pérez, L. E2ETag: An End-to-End Trainable Method for Generating and Detecting Fiducial Markers. In Proc. of BMVC 2020

  29. [29]

    Affordable Infrared-Optical Pose-Tracking for Virtual and Augmented Reality

    Pintaric, T., Kaufmann, H. Affordable Infrared-Optical Pose-Tracking for Virtual and Augmented Reality. In Proc. of IEEE VR 2007 Workshop on Trends and Issues in Tracking for Virtual Environments, 44–51

  30. [30]

    B., Chakraborty, A., Balakrishnan, H

    Priyantha, N. B., Chakraborty, A., Balakrishnan, H. The Cricket Location-Support System. In Proc. of ACM MOBICOM 2000, 32–43

  31. [31]

    You Only Look Once: Unified, Real-Time Object Detection

    Redmon, J., Divvala, S., Girshick, R., Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proc. of CVPR 2016, 779–788

  32. [32]

    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

    Ren, S., He, K., Girshick, R., Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. on Pattern Analysis and Machine Intelligence 39 (6), 1137–1149. 2016

  33. [33]

    J., Muñoz-Salinas, R., Medina-Carnicer, R

    Romero-Ramirez, F. J., Muñoz-Salinas, R., Medina-Carnicer, R. Speeded Up De- tection of Squared Fiducial Markers. Image and Vision Computing 76, 38–47. 2018

  34. [34]

    U-Net: Convolutional Networks for Biomed- ical Image Segmentation

    Ronneberger, O., Fischer, P., Brox, T. U-Net: Convolutional Networks for Biomed- ical Image Segmentation. In Proc. of MICCAI 2015, 234–241

  35. [35]

    and Zisserman, A

    Simonyan, K. and Zisserman, A. Very Deep Convolutional Networks for Large- Scale Image Recognition. In Proc. of ICLR 2015, 1–14

  36. [36]

    and Perlin, K

    Takeuchi, Y. and Perlin, K. ClayVision: The (Elastic) Image of the City. In Proc. of CHI 2012, 2411–2420

  37. [37]

    StegaStamp: Invisible Hyperlinks in Physical Photographs

    Tancik, M., Mildenhall, B., Ng, R. StegaStamp: Invisible Hyperlinks in Physical Photographs. In Proc. of CVPR 2020, 2117–2126

  38. [38]

    Willis, K. D. D., Shiratori, T., Mahler, M. HideOut: Mobile Projector Interaction with Tangible Objects and Surfaces. In Proc. of TEI 2013, 331–338

  39. [39]

    ART-UP:A Novel Method for Generating Scanning-Robust Aesthetic QR Codes

    Xu, M., Li, Q., Niu, J., Liu, X., Xu, W., Lv, P., Zhou, B. ART-UP:A Novel Method for Generating Scanning-Robust Aesthetic QR Codes. ACM Trans. on Multimedia Computing, Communications and Applications 17 (1), 1–23. 2021

  40. [40]

    Stylized Aesthetic QR Code

    Xu, M., Su, H., Li, Y., Li, X., Liao, J., Niu, J., Lv, P., Zhou, B. Stylized Aesthetic QR Code. IEEE Transl on Multimedia 21 (8), 1960–1970. 2019

  41. [41]

    B., Meuleman, A., Jang, H., Ha, H., Kim, M

    Yaldiz, M. B., Meuleman, A., Jang, H., Ha, H., Kim, M. H. DeepFormableTag: End-to-end Generation and Recognition of Deformable Fiducial Markers. ACM Trans. on Graphics 40 (4), Article 67. 2021

  42. [42]

    ARTcode: Preserve Art and Code In Any Image

    Yang, Z., Bao, Y., Luo, C., Zhao, X., Zhu, S., Peng, C., Liu, Y., Wang, X. ARTcode: Preserve Art and Code In Any Image. In Proc. of UbiComp 2016, 904–915

  43. [43]

    A., Shechtman, E., Wang, O

    Zhang, R., Isola, P., Efros, A. A., Shechtman, E., Wang, O.. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. of CVPR 2018, 586–595

  44. [44]

    HiDDeN: Hiding Data with Deep Networks

    Zhu, J., Kaplan, R., Johnson, J., Li, F.. HiDDeN: Hiding Data with Deep Networks. In Proc. of ECCV 2018, 657–672