pith. sign in

arxiv: 2605.21865 · v1 · pith:CVBGFD53new · submitted 2026-05-21 · 💻 cs.CR · cs.MM· eess.IV

PEMark: Watermarking API Responses Based on Proxy Gateways and Position Encoding

Pith reviewed 2026-05-22 06:13 UTC · model grok-4.3

classification 💻 cs.CR cs.MMeess.IV
keywords API response watermarkingposition encodingproxy gatewayJSON key reorderingdistortion-free watermarkdata leakage traceabilityXML permutation
0
0 comments X

The pith

A proxy gateway can embed traceable watermarks into API responses by reordering JSON or XML keys without changing any data values or business code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a watermarking approach that places a proxy gateway in front of existing API servers. The gateway applies position encoding to permute the order of keys in JSON and XML responses, creating a unique signature for each response. This signature enables tracing of leaked data back to its origin. Because the actual data values remain untouched and no changes are made to the source systems, the method preserves full compatibility with current operations. The approach targets the common problem of unregulated API data leakage by turning an overlooked redundancy in data formatting into an encoding space.

Core claim

The central claim is that a watermark proxy gateway combined with position encoding can embed traceable watermarks into API responses by reordering keys in JSON and XML structures. This exploits the inherent permutation redundancy in key ordering, which carries no semantic information, to achieve distortion-free watermarking that requires zero modification to existing business systems or data values.

What carries the argument

Position Encoding-based Watermarking (PEMark) over a proxy gateway, which encodes watermark bits by selecting specific permutations of key positions in the response.

If this is right

  • Returned API data remains fully traceable to its source even after distribution.
  • The watermark survives tampering and insertion attacks with 100 percent similarity.
  • Normal business operations continue without interruption because data values are never altered.
  • The scheme provides robustness against certain levels of deletion attacks.
  • Existing systems require no code changes to gain watermarking capability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reordering principle could apply to other structured data formats that tolerate key permutation.
  • Integration with rate-limiting or logging gateways might allow watermark embedding at scale with minimal added latency.
  • If key ordering proves stable across multiple API versions, the watermarks could support long-term audit trails for data provenance.

Load-bearing premise

Reordering the keys in JSON or XML responses carries no semantic information and will not disrupt client parsing or normal business operations.

What would settle it

Deploy the method on a production API and check whether any client application fails to parse responses or produces incorrect results after key reordering.

Figures

Figures reproduced from arXiv: 2605.21865 by Lansheng Han, Ming Liu, Xianjun Gu, Xinyu Dai, Yifei Zhou.

Figure 1
Figure 1. Figure 1: API data leakage risks and watermarking-based solutions. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Proxy gateway positioned between API server and client. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of PEMark watermark embedding and extraction process. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Key position determination inside a group during reordering. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Watermark length L vs. capacity threshold T 1) Watermark Length vs. Capacity Threshold: A longer watermark requires more permutations to encode all possible values. The capacity threshold T is the minimum number of keys needed in a group to embed a watermark of length L bits. The condition is given by: 2 L ≤ T! (12) [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Time overhead comparison of embedding and extraction on six datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Robustness comparison of PEMark with baseline methods under different attack types. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of GitHub API JSON response before (left) [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

Data leakage from API responses has drawn wide attention. APIs are often not fully regulated, making them easy to abuse. One common solution is to embed watermarks into API responses for traceability. However, existing watermarking methods often require modifying database content or API response data. This forces changes to business system code, and may even disrupt normal business operations because data values are altered. In this paper, we propose an original pluggable watermarking scheme based on a watermark proxy gateway and PEMark (Position Encoding-based Watermarking). The key novelty of our approach is exploiting the inherent permutation redundancy in the ordering of JSON/XML key-value pairs -- an overlooked dimension that carries no semantic information yet provides abundant encoding capacity. First, we forward server responses to the watermark proxy gateway, a design that requires zero modification to existing business systems. Then, we embed a watermark into each API response using position encoding, which reorders keys without altering any data values. To the best of our knowledge, this is the first work to achieve distortion-free API response watermarking via position encoding over a proxy gateway. Our method does not modify any data values, so normal business operations continue seamlessly after watermark embedding. Experimental results show that our framework maintains business usability while ensuring that returned API data is traceable. Compared with current mainstream schemes, our method is robust against tampering and insertion attacks (100\% similarity), and can withstand certain levels of deletion attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes PEMark, a pluggable watermarking scheme for API responses that employs a proxy gateway to embed watermarks by reordering keys in JSON/XML responses via position encoding. This requires zero modification to backend business systems or data values, claims to be the first distortion-free approach of its kind, and reports experimental robustness with 100% similarity against tampering/insertion attacks and resilience to certain deletion attacks while preserving usability and traceability.

Significance. If validated, the proxy-gateway design enabling seamless integration without code changes represents a practical strength for real-world deployment in API security. The exploitation of key-order permutation redundancy as an encoding channel is a potentially useful novelty if the semantic-neutrality assumption holds across diverse client environments. The work could advance non-intrusive traceability for data-leak prevention.

major comments (3)
  1. [Abstract] Abstract: the robustness claims rest on 'experimental results' showing '100% similarity' and resilience to 'certain levels of deletion attacks,' yet no quantitative metrics, error bars, attack models, test-case counts, or implementation specifics are supplied. This renders the central traceability and robustness assertions unverifiable from the manuscript.
  2. [Method] Method description (position encoding over proxy): the load-bearing claim of distortion-free operation assumes that reordering JSON/XML keys 'carries no semantic information' and does not affect client parsing or business logic. This is not supported by tests against order-preserving implementations (e.g., Python 3.7+ dicts, modern JS) or order-dependent applications, directly risking the 'normal business operations continue seamlessly' assertion.
  3. [Experiments] Experiments section: the manuscript provides no details on the APIs tested, number of responses, specific deletion-attack parameters, or how 'business usability' was measured, making it impossible to evaluate whether the scheme maintains the claimed properties under realistic conditions.
minor comments (2)
  1. [Introduction] The related-work discussion should explicitly compare against prior proxy-based or order-manipulation watermarking techniques to strengthen the 'first work' claim.
  2. [Method] Notation for the position-encoding mapping (how permutations encode bits) would benefit from a concrete worked example or pseudocode to improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate the revisions planned for the next manuscript version to improve verifiability and support for the core claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the robustness claims rest on 'experimental results' showing '100% similarity' and resilience to 'certain levels of deletion attacks,' yet no quantitative metrics, error bars, attack models, test-case counts, or implementation specifics are supplied. This renders the central traceability and robustness assertions unverifiable from the manuscript.

    Authors: We agree that the abstract would benefit from additional quantitative context to make the robustness claims more verifiable. In the revised manuscript we will expand the abstract to report the number of test responses evaluated, the specific deletion-attack parameters used (e.g., percentage of keys removed), and the exact similarity metric employed, while retaining the high-level summary of results. revision: yes

  2. Referee: [Method] Method description (position encoding over proxy): the load-bearing claim of distortion-free operation assumes that reordering JSON/XML keys 'carries no semantic information' and does not affect client parsing or business logic. This is not supported by tests against order-preserving implementations (e.g., Python 3.7+ dicts, modern JS) or order-dependent applications, directly risking the 'normal business operations continue seamlessly' assertion.

    Authors: The JSON and XML specifications define key order as semantically insignificant. We will nevertheless strengthen the method section by adding an explicit discussion of this assumption together with new experimental validation against order-preserving parsers (Python 3.7+ dicts and current JavaScript engines) and a small set of order-sensitive client applications to confirm that business logic remains unaffected. revision: partial

  3. Referee: [Experiments] Experiments section: the manuscript provides no details on the APIs tested, number of responses, specific deletion-attack parameters, or how 'business usability' was measured, making it impossible to evaluate whether the scheme maintains the claimed properties under realistic conditions.

    Authors: We concur that the experiments section requires substantially more detail. The revised manuscript will specify the APIs under test, the total number of responses processed, the exact deletion-attack parameters (including deletion ratios and selection method), and the concrete metrics and procedures used to assess business usability (response-time overhead, functional equivalence checks, and client compatibility). revision: yes

Circularity Check

0 steps flagged

No circularity: new construction via proxy and key reordering

full rationale

The paper introduces PEMark as an original pluggable scheme that forwards responses through a watermark proxy gateway and embeds information by reordering JSON/XML keys without changing values. No equations, fitted parameters, predictions, or derivation steps are presented that reduce the central claim to prior inputs by construction. The approach is framed as a novel exploitation of permutation redundancy rather than a mathematical reduction or self-referential fit; the novelty claim and robustness statements rest on the described mechanism itself, not on self-citations or renamed prior results. The derivation chain is therefore self-contained as an engineering proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that key ordering in JSON/XML is semantically neutral and that sufficient permutation capacity exists for watermark encoding. No free parameters, invented entities, or additional axioms are specified.

axioms (1)
  • domain assumption Reordering of keys in JSON/XML responses carries no semantic information and preserves business usability.
    Invoked implicitly when stating that the method does not disrupt normal operations.

pith-pipeline@v0.9.0 · 5801 in / 1206 out tokens · 47040 ms · 2026-05-22T06:13:41.628714+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Watermarking techniques for relational databases: Survey, classification and comparison,

    R. Halder, S. Pal, and A. Cortesi, “Watermarking techniques for relational databases: Survey, classification and comparison,” Journal of Universal Computer Science, vol. 16, no. 21, pp. 3164–3190, 2010

  2. [2]

    Watermarking relational databases,

    R. Agrawal and J. Kiernan, “Watermarking relational databases,” in VLDB’02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 2002, pp. 155– 166

  3. [3]

    Watermarking rela- tional databases using optimization-based techniques,

    M. Shehab, E. Bertino, and A. Ghafoor, “Watermarking rela- tional databases using optimization-based techniques,” IEEE transactions on Knowledge and Data Engineering, vol. 20, no. 1, pp. 116–129, 2008

  4. [4]

    A new robust approach for re- versible database watermarking with distortion control,

    D. Hu, D. Zhao, and S. Zheng, “A new robust approach for re- versible database watermarking with distortion control,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 6, pp. 1024–1037, 2018

  5. [5]

    A robust database watermarking scheme that preserves statistical characteristics,

    Z. Ren, H. Fang, J. Zhang, Z. Ma, R. Lin, W. Zhang, and N. Yu, “A robust database watermarking scheme that preserves statistical characteristics,” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 6, pp. 2329–2342, 2023

  6. [6]

    Rrw—a robust and reversible watermarking technique for relational data,

    S. Iftikhar, M. Kamran, and Z. Anwar, “Rrw—a robust and reversible watermarking technique for relational data,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 4, pp. 1132–1145, 2015

  7. [7]

    A performance evaluation of alternative mapping schemes for storing xml data in a relational database,

    D. Florescu and D. Kossmann, “A performance evaluation of alternative mapping schemes for storing xml data in a relational database,” Ph.D. dissertation, INRIA, 1999

  8. [8]

    Semi- structured data protection scheme based on robust watermark- ing,

    J. He, Q. Ying, Z. Qian, G. Feng, and X. Zhang, “Semi- structured data protection scheme based on robust watermark- ing,” EURASIP Journal on Image and Video Processing, vol. 2020, no. 1, p. 12, 2020

  9. [9]

    Watermark decoding solution for semi-structured data,

    I. Constantin, C. Dobre, R. Ciobanu, O. Dochia, and A. Barbu, “Watermark decoding solution for semi-structured data,” in ICERI2024 Proceedings. IATED, 2024, pp. 6927–6935

  10. [10]

    Robust soliton distribution- based zero-watermarking for semi-structured power data,

    L. Zhao, Y. Zou, C. Xu, Y. Ma, W. Shen, Q. Shan, S. Jiang, Y. Yu, Y. Cai, Y. Song et al., “Robust soliton distribution- based zero-watermarking for semi-structured power data,” Electronics, vol. 13, no. 3, p. 655, 2024

  11. [11]

    A recent survey on multimedia and database watermarking,

    S. Kumar, B. K. Singh, and M. Yadav, “A recent survey on multimedia and database watermarking,” Multimedia Tools and Applications, vol. 79, no. 27, pp. 20 149–20 197, 2020

  12. [12]

    Semantic-driven watermarking of relational textual databases,

    M. L. P. Gort, M. Olliaro, A. Cortesi, and C. F. Uribe, “Semantic-driven watermarking of relational textual databases,” Expert Systems with Applications, vol. 167, p. 114013, 2021

  13. [13]

    A reversible database watermark scheme for textual and numerical datasets,

    C.-C. Chen, Y. He, X. Peng et al., “A reversible database watermark scheme for textual and numerical datasets,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 8, pp. 1–14, 2022, dOI: 10.1109/TKDE.2022.3144763

  14. [14]

    A robust scheme for securing relational data incremental watermarking,

    M. L. P. Gort and A. Cortesi, “A robust scheme for securing relational data incremental watermarking,” International Jour- nal of Information Management Data Insights, vol. 5, no. 1, p. 100320, 2025

  15. [15]

    Comparative analysis of relational and graph databases for social networks,

    S. Batra and S. Tyagi, “Comparative analysis of relational and graph databases for social networks,” in 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN). IEEE, 2018, pp. 210–215

  16. [16]

    Transforming introductory computer science projects via real- time web data,

    A. C. Bart, E. Tilevich, S. Hall, T. Allevato, and C. A. Shaffer, “Transforming introductory computer science projects via real- time web data,” in Proceedings of the 45th ACM technical symposium on Computer science education, 2014, pp. 289–294

  17. [17]

    Market guide for data masking and synthetic data,

    J. Fritsch and A. Bales, “Market guide for data masking and synthetic data,” Gartner, Inc., Market Guide G00787177, 8 2024

  18. [18]

    Phimark: watermarking relational data robustly with zero distortion,

    J. Ji, Y. Peng, W. Ma, H. Li, J. Cui, and X. Gao, “Phimark: watermarking relational data robustly with zero distortion,” Information Processing & Management, vol. 63, no. 7, p. 104782, 2026

  19. [19]

    Pkmark: A robust zero-distortion blind reversible scheme for watermarking rela- tional databases,

    X. Tang, Z. Cao, X. Dong, and J. Shen, “Pkmark: A robust zero-distortion blind reversible scheme for watermarking rela- tional databases,” in 2021 IEEE 15th International Conference on Big Data Science and Engineering (BigDataSE). IEEE, 2021, pp. 72–79

  20. [20]

    A distortion free watermark scheme for relational databases,

    J. Han, X. Peng, H. Xian, and D. Yang, “A distortion free watermark scheme for relational databases,” in 2024 IEEE 33rd International Conference on Computer Communications and Networks (ICCCN). IEEE, 2024, pp. 1–6

  21. [21]

    A zero-watermarking scheme based on spatial topological relations for vector dataset,

    N. Ren, S. Guo, C. Zhu, and Y. Hu, “A zero-watermarking scheme based on spatial topological relations for vector dataset,” Expert Systems with Applications, vol. 226, p. 120217, 2023

  22. [22]

    A zero-watermarking scheme for medical images based on a stacked sparse autoen- coder network,

    J. Xu, Z. Guo, Y. Tang, and B. Han, “A zero-watermarking scheme for medical images based on a stacked sparse autoen- coder network,” Expert Systems with Applications, vol. 303, p. 130651, 2025

  23. [23]

    A distortion free watermark scheme for relational databases,

    S. Bhattacharya and A. Cortesi, “A distortion free watermark scheme for relational databases,” in 2024 33rd International Conference on Computer Communications and Networks (IC- CCN), 2024, pp. 1–6

  24. [24]

    Enhancing water- marking robustness and invisibility with growth optimizer and improved lu decomposition,

    S. Jiao, Y. Qiu, Q. Su, C. Shi, and Z. Liu, “Enhancing water- marking robustness and invisibility with growth optimizer and improved lu decomposition,” Optik, vol. 329, p. 172353, 2025

  25. [25]

    Secure and high-quality watermarking algorithms for relational database based on semantic,

    W. Li, N. Li, J. Yan, Z. Zhang, P. Yu, and G. Long, “Secure and high-quality watermarking algorithms for relational database based on semantic,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 7, pp. 7440–7456, 2022

  26. [26]

    Consis- tency preserving database watermarking algorithm for decision trees,

    Q. Li, X. Wang, Q. Pei, X. Chen, and K.-Y. Lam, “Consis- tency preserving database watermarking algorithm for decision trees,” Digital Communications and Networks, vol. 10, no. 6, pp. 1851–1863, 2024

  27. [27]

    Database watermarking algorithm based on decision tree shift correction,

    Q. Li, X. Wang, Q. Pei, K.-Y. Lam, N. Zhang, M. Dong, and V. C. Leung, “Database watermarking algorithm based on decision tree shift correction,” IEEE Internet of Things Journal, vol. 9, no. 23, pp. 24 373–24 387, 2022

  28. [28]

    Genetic algorithm and differ- ence expansion based reversible watermarking for relational databases,

    K. Jawad and A. Khan, “Genetic algorithm and differ- ence expansion based reversible watermarking for relational databases,” Journal of Systems and Software, vol. 86, no. 10, pp. 2742–2753, 2013

  29. [29]

    A novel database watermarking tech- nique using blockchain as trusted third party,

    A. S. Alghamdi, S. Naz, A. Saeed, E. Al Solami, M. Kamran, and M. S. Alkatheiri, “A novel database watermarking tech- nique using blockchain as trusted third party,” Computers, Materials & Continua, vol. 70, no. 1, pp. 1585–1601, 2021

  30. [30]

    Comparative analysis of relational database watermarking techniques: An empirical study,

    S. Rani and R. Halder, “Comparative analysis of relational database watermarking techniques: An empirical study,” IEEE Access, vol. 10, pp. 27 970–27 989, 2022

  31. [31]

    Low distortion reversible database watermarking based on hybrid intelligent algorithm,

    C. Cai, C. Peng, J. Niu, W. Tan, and H. Tang, “Low distortion reversible database watermarking based on hybrid intelligent algorithm,” Mathematical Biosciences and Engineering, vol. 20, no. 12, pp. 21 315–21 336, 2023

  32. [32]

    Research on blind reversible database watermarking algorithm based on dual embedding strategy,

    W. Qi, C. Li, and X. Han, “Research on blind reversible database watermarking algorithm based on dual embedding strategy,” The Computer Journal, p. bxae080, 2024

  33. [33]

    Freqywm: Frequency watermarking for the new data economy,

    D. İşler, E. Cabana, A. García-Recuero, G. Koutrika, and N. Laoutaris, “Freqywm: Frequency watermarking for the new data economy,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 2024, pp. 4993–5007

  34. [34]

    An efficient format-independent water- marking framework for large-scale data sets,

    S. Rani and R. Halder, “An efficient format-independent water- marking framework for large-scale data sets,” Expert Systems with Applications, vol. 210, p. 118085, 2022

  35. [35]

    Dynamic watermarking with imgproxy and apache apisix,

    N. Fränkel, “Dynamic watermarking with imgproxy and apache apisix,” DZone, 2024

  36. [36]

    Teaching combinatorial tricks to a computer,

    D. H. Lehmer, “Teaching combinatorial tricks to a computer,” Proceedings of Symposia in Applied Mathematics, vol. 10, pp. 179–193, 1960

  37. [37]

    Study of the wa- termark source’s topology role on relational data watermarking robustness,

    M. L. Pérez Gort, M. Olliaro, and A. Cortesi, “Study of the wa- termark source’s topology role on relational data watermarking robustness,” IEEE Access, vol. 12, pp. 25 857–25 875, 2024

  38. [38]

    Relational data watermarking resilience to brute force attacks in untrusted environments,

    ——, “Relational data watermarking resilience to brute force attacks in untrusted environments,” Expert Systems with Applications, vol. 212, p. 118713, 2023