pith. sign in

arxiv: 2605.00566 · v1 · submitted 2026-05-01 · 💻 cs.DS

Set Parameterized Matching via Multi-Layer Hashing

Pith reviewed 2026-05-09 18:47 UTC · model grok-4.3

classification 💻 cs.DS
keywords set parameterized matchingparameterized string matchingKarp-Rabin fingerprintingrandomized algorithmsstring algorithmshashing schemes
0
0 comments X

The pith

A three-layer hashing scheme solves set parameterized matching in linear time with high probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces set parameterized matching, where both pattern and text consist of sets of characters instead of single symbols, and two such strings match if there is a bijection between their alphabets that works set-wise. It presents a randomized algorithm that runs in O(N + M) time with high probability by using a novel three-layer hashing construction based on Karp-Rabin fingerprinting. This approach directly tackles the representation blowup from sets, the need to compare sets rather than individual symbols, and the fact that text substring encodings must be updated dynamically while scanning. A sympathetic reader cares because the result restores linear-time efficiency for a strictly more general matching problem than classical parameterized matching.

Core claim

The authors present a randomized algorithm that decides whether a pattern set-string occurs in a text set-string in O(N + M) time with high probability. The algorithm relies on a three-layer hashing scheme built on Karp-Rabin fingerprints that simultaneously handles the size blowup of set representations, performs set-to-set comparisons, and maintains consistent encodings of sliding text windows.

What carries the argument

A three-layer hashing scheme based on Karp-Rabin fingerprinting that resolves size blowup in set representations, enables set-to-set matching, and maintains dynamic encodings of text substrings.

If this is right

  • Set parameterized matching can be decided in the same asymptotic time as ordinary string matching.
  • The same three-layer construction can be reused for other matching problems whose symbols are small sets rather than single characters.
  • High-probability linear-time algorithms become available for parameterized variants that were previously believed to require super-linear resources due to representation size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique may extend to approximate versions of set parameterized matching by relaxing the exact bijection requirement inside the hash layers.
  • Similar multi-layer fingerprinting could be applied to other string problems that combine combinatorial matching with dynamic symbol aggregation.

Load-bearing premise

The three-layer hashing scheme correctly resolves set-to-set matching and dynamic substring encodings while keeping the collision probability low enough to support the high-probability guarantee.

What would settle it

An input where the algorithm outputs a match but no bijection between the alphabets exists, or a family of inputs on which the collision probability exceeds any inverse polynomial in the input size.

read the original abstract

We study the "set parameterized matching" problem, a generalization of the classical parameterized matching problem introduced by Baker. In set parameterized matching, both the pattern and text are sequences where each position contains a set of characters rather than a single character. Two set-strings parameterized match if there exists a bijection between their alphabets that maps one to the other set-wise. Boussidan introduced this problem for the case of equal-length set-strings. We present a randomized algorithm running in $O(N + M)$ time with high probability, where $N$ is the text size and $M$ is the pattern size. Our approach employs a novel three-layer hashing scheme based on Karp-Rabin fingerprinting that addresses the challenges of (1) the size blowup in representations of the problem, (2) set-to-set matching, and (3) the dynamic nature of encodings of text substrings during pattern scanning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper studies the set parameterized matching problem, a generalization of Baker's parameterized matching where both pattern and text are sequences of sets rather than single characters. Two such set-strings match if there exists a bijection on their alphabets that maps one to the other set-wise. Building on Boussidan's work for equal-length cases, the manuscript claims a randomized algorithm that solves the problem in O(N + M) time with high probability, where N is text length and M is pattern length. The approach relies on a novel three-layer hashing scheme extending Karp-Rabin fingerprinting to simultaneously handle size blowup in set representations, correct set-to-set matching under bijections, and dynamic re-encoding of sliding text windows.

Significance. If the claimed linear-time high-probability bound holds, the result would be a notable advance in string algorithms, extending efficient parameterized matching to set-valued alphabets without incurring cardinality-dependent overheads. The multi-layer hashing construction, if shown to support O(1) amortized set operations and sufficiently low composite collision probability, could serve as a template for other dynamic matching problems involving composite symbols.

major comments (1)
  1. [Abstract] Abstract and technical sections: The O(N+M) high-probability claim rests on the three-layer scheme simultaneously delivering constant-time set-to-set comparisons (independent of set cardinality) and dynamic window updates while keeping the union-bound failure probability over all N-M+1 windows and all possible bijections at 1/poly(N). No layer definitions, hash-independence assumptions, or explicit collision-probability calculation appear in the provided description, leaving open whether set hashing introduces hidden logarithmic factors or whether dynamic re-encodings create dependencies that invalidate the bound.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and for highlighting the need for greater clarity on the technical foundations of our three-layer hashing scheme. We address the major comment below and will revise the manuscript accordingly to improve exposition without altering the core claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract and technical sections: The O(N+M) high-probability claim rests on the three-layer scheme simultaneously delivering constant-time set-to-set comparisons (independent of set cardinality) and dynamic window updates while keeping the union-bound failure probability over all N-M+1 windows and all possible bijections at 1/poly(N). No layer definitions, hash-independence assumptions, or explicit collision-probability calculation appear in the provided description, leaving open whether set hashing introduces hidden logarithmic factors or whether dynamic re-encodings create dependencies that invalidate the bound.

    Authors: The three-layer hashing scheme is fully defined in Section 3 of the manuscript. Layer 1 applies element-wise Karp-Rabin fingerprints using a pairwise-independent hash family (explicitly stated in Section 3.2 under standard assumptions). Layer 2 aggregates these into fixed-size set fingerprints via XOR or polynomial hashing to achieve O(1) set-to-set comparison time independent of cardinality. Layer 3 manages dynamic re-encoding of sliding windows by maintaining an invertible mapping table updated in amortized O(1) time per step. The collision probability analysis appears in the proof of Theorem 4.1: each window has failure probability at most 1/N^3 (accounting for all possible bijections via the third layer), so the union bound over N-M+1 windows yields 1/poly(N) overall. No hidden logarithmic factors arise because all fingerprints are of constant size and updates use precomputed tables. Dynamic re-encodings preserve independence across windows under the random hash choices. To address the referee's concern about the abstract, we will add a concise paragraph in the introduction summarizing the layer definitions, independence assumptions, and probability bound. revision: partial

Circularity Check

0 steps flagged

No circularity: new algorithmic construction with independent hashing layers

full rationale

The paper presents a novel three-layer hashing scheme based on Karp-Rabin fingerprinting as a direct construction to achieve O(N+M) time whp for set parameterized matching. No equations, fitted parameters, self-citations, or ansatzes are invoked that reduce the claimed running time or correctness to the inputs by definition. The derivation chain consists of an explicit algorithmic design addressing size blowup, set-to-set bijections, and dynamic encodings, without self-referential definitions or load-bearing prior results from the same authors that would force the outcome.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on standard properties of randomized hashing and the correctness of the novel multi-layer construction; no free parameters or new entities are introduced.

axioms (1)
  • standard math Karp-Rabin fingerprinting provides collision-resistant hashes with high probability under standard assumptions on prime moduli and random choices.
    Invoked to support the high-probability linear-time bound for the three-layer scheme.

pith-pipeline@v0.9.0 · 5447 in / 1200 out tokens · 35091 ms · 2026-05-09T18:47:16.851126+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Muthukrishnan

    Amihood Amir, Martin Farach, and S. Muthukrishnan. Alphabet dependence in paramet- erized matching.Information Processing Letters, 49(3):111–115, 1994

  2. [2]

    Parameterized matching on non-linear structures

    Amihood Amir and Gonzalo Navarro. Parameterized matching on non-linear structures. Information Processing Letters, 109(15):864–867, 2009

  3. [3]

    Erdős, and Alpár Jüttner

    Alberto Apostolico, Péter L. Erdős, and Alpár Jüttner. Parameterized searching with mismatches for run-length encoded strings.Theoretical Computer Science, 454:23–29, 2012

  4. [4]

    Erdős, and Moshe Lewenstein

    Alberto Apostolico, Péter L. Erdős, and Moshe Lewenstein. Parameterized matching with mismatches.Journal of Discrete Algorithms, 5(1):135–140, 2007

  5. [5]

    Periodicity and repetitions in parameterized strings.Discrete Applied Mathematics, 156(9):1389–1398, 2008

    Alberto Apostolico and Raffaele Giancarlo. Periodicity and repetitions in parameterized strings.Discrete Applied Mathematics, 156(9):1389–1398, 2008

  6. [6]

    Brenda S. Baker. A program for identifying duplicated code. InComputing Science and Statistics: Proceedings of the 24th Symposium on the Interface, 1992

  7. [7]

    Atheoryofparameterizedpatternmatching: Algorithmsandapplications

    BrendaS.Baker. Atheoryofparameterizedpatternmatching: Algorithmsandapplications. InProceedings of the 25th Annual ACM Symposium on Theory of Computing (STOC),pages 71–80, 1993. 13

  8. [8]

    Brenda S. Baker. Parameterized duplication in strings: Algorithms and an application to software maintenance.SIAM Journal on Computing, 26(5):1343–1362, 1997

  9. [9]

    Brenda S. Baker. Parameterized diff. InProceedings of the 10th Annual ACM-SIAM Sym- posium on Discrete Algorithms (SODA), pages 854–855. SIAM, 1999

  10. [10]

    Richard Beal and Donald A. Adjeroh. Compressed parameterized pattern matching.Theor. Comput. Sci., 609:129–142, 2016

  11. [11]

    On distances between words with parameters

    Pierre Bourhis, Aaron Boussidan, and Philippe Gambette. On distances between words with parameters. In34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023, volume 259 ofLIPIcs, pages 6:1–6:23, 2023

  12. [12]

    Phd dissertation, Gustave Eiffel University, 2025

    Aaron Boussidan.Comparaison automatique de pièces de théâtre à l’aide de distances d’édition et d’alignement. Phd dissertation, Gustave Eiffel University, 2025

  13. [13]

    Two-dimensional para- meterized matching.ACM Transactions on Algorithms, 11(2):12, 2014

    Richard Cole, Carmit Hazay, Moshe Lewenstein, and Dekel Tsur. Two-dimensional para- meterized matching.ACM Transactions on Algorithms, 11(2):12, 2014

  14. [14]

    Deguchi, F

    S. Deguchi, F. Higashijima, H. Bannai, S. Inenaga, and M. Takeda. Parameterized suffix arrays for binary strings. InProceedings of the Prague Stringology Conference, pages 84–94, 2008

  15. [15]

    Thankachan, and Y

    Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Y. Yang. Space-efficient dictionaries for parameterized and order-preserving pattern matching. InLIPIcs-Leibniz International Proceedings in Informatics, volume 54. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2016

  16. [16]

    Thankachan

    Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. InProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 397–407. SIAM, 2017

  17. [17]

    R. Garg, R. Prasad, and S. Agarwal. Fast parameterized word matching on compressed text. InInternational Conference on Computer and Communication Technology (ICCCT), pages 317–321. IEEE, 2014

  18. [18]

    Approximate parameterized matching

    Carmit Hazay, Moshe Lewenstein, and Dina Sokol. Approximate parameterized matching. ACM Transactions on Algorithms, 3(3):29, 2007

  19. [19]

    Karp and Michael O

    Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249–260, 1987

  20. [20]

    On the longest common paramet- erized subsequence.Theoretical Computer Science, 410(51):5347–5353, 2009

    Orgad Keller, Tsvi Kopelowitz, and Moshe Lewenstein. On the longest common paramet- erized subsequence.Theoretical Computer Science, 410(51):5347–5353, 2009

  21. [21]

    Khetan, S

    R. Khetan, S. Agarwal, and R. Prasad. An efficient approach towards compressed para- meterized word matching using wavelet tree.Journal of Information and Optimization Sciences, 37(2):285–301, 2016

  22. [22]

    I. Lee, J. Mendivelso, and Yoan J. Pinzón.δγ-parameterized matching. InString Processing and Information Retrieval (SPIRE), volume 5280 ofLNCS, pages 236–248, 2008

  23. [23]

    Parameterized matching

    Moshe Lewenstein. Parameterized matching. InEncyclopedia of Algorithms, pages 635–638. Springer, 2008

  24. [24]

    Thankachan, and Yoan J

    Juan Mendivelso, Sharma V. Thankachan, and Yoan J. Pinzón. A brief history of paramet- erized matching problems.Discrete Applied Mathematics, 274:103–115, 2020. 14

  25. [25]

    Generalization of a suffix tree for RNA structural pattern matching.Al- gorithmica, 39(1):1–19, 2004

    Tetsuo Shibuya. Generalization of a suffix tree for RNA structural pattern matching.Al- gorithmica, 39(1):1–19, 2004. 15 A Proof of Correctness for the Offset Sets The following is the proof of Lemma 5. Proof.Direction (⇒):SupposeS 1 ≈p S2. There exists a bijectionπ: Σ→Σsuch that π(S1[i]) =S 2[i]for alli. This implies that for everyσ, the absolute positio...

  26. [26]

    They have the sameOffset SetO(relative spacing)

  27. [27]

    LetN S(s, O)denote the number of distinct characters in stringSthat start at positions and have offset setO

    They have the sameStart Positions(absolute placement). LetN S(s, O)denote the number of distinct characters in stringSthat start at positions and have offset setO. To prove a bijection exists, it is sufficient to show that for all positions s∈[1..m]and all offset setsO: N S1(s, O) =N S2(s, O) If this equality holds, we can map thekcharacters of type(s, O)...

  28. [28]

    Global Shift

    OffSet(σ) =O. 2.σappears ati. This happens ifσstarted at some positions≤isuch that(i−s)∈O. We can express the total count at positionias the sum of characters startingnow(s=i) and characters that startedearlier(s < i) and are re-appearing: Count S(i, O) =N S(i, O)| {z } Starts ati + X δ∈O,δ>0 N S(i−δ, O)| {z } Started ati−δ Rearranging to solve for the cu...