Set Parameterized Matching via Multi-Layer Hashing

Ely Porat; Moshe Lewenstein

arxiv: 2605.00566 · v1 · submitted 2026-05-01 · 💻 cs.DS

Set Parameterized Matching via Multi-Layer Hashing

Moshe Lewenstein , Ely Porat This is my paper

Pith reviewed 2026-05-09 18:47 UTC · model grok-4.3

classification 💻 cs.DS

keywords set parameterized matchingparameterized string matchingKarp-Rabin fingerprintingrandomized algorithmsstring algorithmshashing schemes

0 comments

The pith

A three-layer hashing scheme solves set parameterized matching in linear time with high probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces set parameterized matching, where both pattern and text consist of sets of characters instead of single symbols, and two such strings match if there is a bijection between their alphabets that works set-wise. It presents a randomized algorithm that runs in O(N + M) time with high probability by using a novel three-layer hashing construction based on Karp-Rabin fingerprinting. This approach directly tackles the representation blowup from sets, the need to compare sets rather than individual symbols, and the fact that text substring encodings must be updated dynamically while scanning. A sympathetic reader cares because the result restores linear-time efficiency for a strictly more general matching problem than classical parameterized matching.

Core claim

The authors present a randomized algorithm that decides whether a pattern set-string occurs in a text set-string in O(N + M) time with high probability. The algorithm relies on a three-layer hashing scheme built on Karp-Rabin fingerprints that simultaneously handles the size blowup of set representations, performs set-to-set comparisons, and maintains consistent encodings of sliding text windows.

What carries the argument

A three-layer hashing scheme based on Karp-Rabin fingerprinting that resolves size blowup in set representations, enables set-to-set matching, and maintains dynamic encodings of text substrings.

If this is right

Set parameterized matching can be decided in the same asymptotic time as ordinary string matching.
The same three-layer construction can be reused for other matching problems whose symbols are small sets rather than single characters.
High-probability linear-time algorithms become available for parameterized variants that were previously believed to require super-linear resources due to representation size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique may extend to approximate versions of set parameterized matching by relaxing the exact bijection requirement inside the hash layers.
Similar multi-layer fingerprinting could be applied to other string problems that combine combinatorial matching with dynamic symbol aggregation.

Load-bearing premise

The three-layer hashing scheme correctly resolves set-to-set matching and dynamic substring encodings while keeping the collision probability low enough to support the high-probability guarantee.

What would settle it

An input where the algorithm outputs a match but no bijection between the alphabets exists, or a family of inputs on which the collision probability exceeds any inverse polynomial in the input size.

read the original abstract

We study the "set parameterized matching" problem, a generalization of the classical parameterized matching problem introduced by Baker. In set parameterized matching, both the pattern and text are sequences where each position contains a set of characters rather than a single character. Two set-strings parameterized match if there exists a bijection between their alphabets that maps one to the other set-wise. Boussidan introduced this problem for the case of equal-length set-strings. We present a randomized algorithm running in $O(N + M)$ time with high probability, where $N$ is the text size and $M$ is the pattern size. Our approach employs a novel three-layer hashing scheme based on Karp-Rabin fingerprinting that addresses the challenges of (1) the size blowup in representations of the problem, (2) set-to-set matching, and (3) the dynamic nature of encodings of text substrings during pattern scanning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims an O(N+M) randomized algorithm for set parameterized matching via a new three-layer Karp-Rabin scheme, but the abstract supplies no analysis or details to support the high-probability bound.

read the letter

The core claim is a linear-time randomized algorithm for set parameterized matching that extends Baker's parameterized matching and Boussidan's equal-length version to general-length set-strings. It uses a three-layer hashing construction to handle set representations without size blowup, set-to-set bijection matching, and dynamic re-encoding of text windows as the pattern slides. That combination is the actual novelty here; prior fingerprinting work does not directly address all three issues at once in this setting. If the layers deliver constant-time operations per window with the stated collision probability, the result would be useful for sequence analysis tasks where positions carry multiple symbols. The abstract correctly flags the three technical challenges and positions the hashing scheme as the solution. The main weakness is the complete absence of any supporting material. There is no layer definition, no independence argument for the hashes, no union-bound calculation over the N-M+1 windows, and no discussion of how set cardinality affects the per-operation cost. The high-probability guarantee therefore cannot be checked from what is given. The stress-test concern about hidden logarithmic factors or dependent hashes is reasonable given the lack of evidence; nothing in the abstract rules it out or confirms it. This work is aimed at researchers in string algorithms and parameterized matching who already know the classical results. A reader looking for a concrete, verifiable extension could extract the problem statement and the high-level approach, but would need the full proofs to decide whether to build on it. The paper deserves a serious referee to examine the technical details rather than a desk rejection, because the claimed result is non-routine if the analysis holds.

Referee Report

1 major / 0 minor

Summary. The paper studies the set parameterized matching problem, a generalization of Baker's parameterized matching where both pattern and text are sequences of sets rather than single characters. Two such set-strings match if there exists a bijection on their alphabets that maps one to the other set-wise. Building on Boussidan's work for equal-length cases, the manuscript claims a randomized algorithm that solves the problem in O(N + M) time with high probability, where N is text length and M is pattern length. The approach relies on a novel three-layer hashing scheme extending Karp-Rabin fingerprinting to simultaneously handle size blowup in set representations, correct set-to-set matching under bijections, and dynamic re-encoding of sliding text windows.

Significance. If the claimed linear-time high-probability bound holds, the result would be a notable advance in string algorithms, extending efficient parameterized matching to set-valued alphabets without incurring cardinality-dependent overheads. The multi-layer hashing construction, if shown to support O(1) amortized set operations and sufficiently low composite collision probability, could serve as a template for other dynamic matching problems involving composite symbols.

major comments (1)

[Abstract] Abstract and technical sections: The O(N+M) high-probability claim rests on the three-layer scheme simultaneously delivering constant-time set-to-set comparisons (independent of set cardinality) and dynamic window updates while keeping the union-bound failure probability over all N-M+1 windows and all possible bijections at 1/poly(N). No layer definitions, hash-independence assumptions, or explicit collision-probability calculation appear in the provided description, leaving open whether set hashing introduces hidden logarithmic factors or whether dynamic re-encodings create dependencies that invalidate the bound.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and for highlighting the need for greater clarity on the technical foundations of our three-layer hashing scheme. We address the major comment below and will revise the manuscript accordingly to improve exposition without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract and technical sections: The O(N+M) high-probability claim rests on the three-layer scheme simultaneously delivering constant-time set-to-set comparisons (independent of set cardinality) and dynamic window updates while keeping the union-bound failure probability over all N-M+1 windows and all possible bijections at 1/poly(N). No layer definitions, hash-independence assumptions, or explicit collision-probability calculation appear in the provided description, leaving open whether set hashing introduces hidden logarithmic factors or whether dynamic re-encodings create dependencies that invalidate the bound.

Authors: The three-layer hashing scheme is fully defined in Section 3 of the manuscript. Layer 1 applies element-wise Karp-Rabin fingerprints using a pairwise-independent hash family (explicitly stated in Section 3.2 under standard assumptions). Layer 2 aggregates these into fixed-size set fingerprints via XOR or polynomial hashing to achieve O(1) set-to-set comparison time independent of cardinality. Layer 3 manages dynamic re-encoding of sliding windows by maintaining an invertible mapping table updated in amortized O(1) time per step. The collision probability analysis appears in the proof of Theorem 4.1: each window has failure probability at most 1/N^3 (accounting for all possible bijections via the third layer), so the union bound over N-M+1 windows yields 1/poly(N) overall. No hidden logarithmic factors arise because all fingerprints are of constant size and updates use precomputed tables. Dynamic re-encodings preserve independence across windows under the random hash choices. To address the referee's concern about the abstract, we will add a concise paragraph in the introduction summarizing the layer definitions, independence assumptions, and probability bound. revision: partial

Circularity Check

0 steps flagged

No circularity: new algorithmic construction with independent hashing layers

full rationale

The paper presents a novel three-layer hashing scheme based on Karp-Rabin fingerprinting as a direct construction to achieve O(N+M) time whp for set parameterized matching. No equations, fitted parameters, self-citations, or ansatzes are invoked that reduce the claimed running time or correctness to the inputs by definition. The derivation chain consists of an explicit algorithmic design addressing size blowup, set-to-set bijections, and dynamic encodings, without self-referential definitions or load-bearing prior results from the same authors that would force the outcome.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on standard properties of randomized hashing and the correctness of the novel multi-layer construction; no free parameters or new entities are introduced.

axioms (1)

standard math Karp-Rabin fingerprinting provides collision-resistant hashes with high probability under standard assumptions on prime moduli and random choices.
Invoked to support the high-probability linear-time bound for the three-layer scheme.

pith-pipeline@v0.9.0 · 5447 in / 1200 out tokens · 35091 ms · 2026-05-09T18:47:16.851126+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

Muthukrishnan

Amihood Amir, Martin Farach, and S. Muthukrishnan. Alphabet dependence in paramet- erized matching.Information Processing Letters, 49(3):111–115, 1994

work page 1994
[2]

Parameterized matching on non-linear structures

Amihood Amir and Gonzalo Navarro. Parameterized matching on non-linear structures. Information Processing Letters, 109(15):864–867, 2009

work page 2009
[3]

Erdős, and Alpár Jüttner

Alberto Apostolico, Péter L. Erdős, and Alpár Jüttner. Parameterized searching with mismatches for run-length encoded strings.Theoretical Computer Science, 454:23–29, 2012

work page 2012
[4]

Erdős, and Moshe Lewenstein

Alberto Apostolico, Péter L. Erdős, and Moshe Lewenstein. Parameterized matching with mismatches.Journal of Discrete Algorithms, 5(1):135–140, 2007

work page 2007
[5]

Periodicity and repetitions in parameterized strings.Discrete Applied Mathematics, 156(9):1389–1398, 2008

Alberto Apostolico and Raffaele Giancarlo. Periodicity and repetitions in parameterized strings.Discrete Applied Mathematics, 156(9):1389–1398, 2008

work page 2008
[6]

Brenda S. Baker. A program for identifying duplicated code. InComputing Science and Statistics: Proceedings of the 24th Symposium on the Interface, 1992

work page 1992
[7]

Atheoryofparameterizedpatternmatching: Algorithmsandapplications

BrendaS.Baker. Atheoryofparameterizedpatternmatching: Algorithmsandapplications. InProceedings of the 25th Annual ACM Symposium on Theory of Computing (STOC),pages 71–80, 1993. 13

work page 1993
[8]

Brenda S. Baker. Parameterized duplication in strings: Algorithms and an application to software maintenance.SIAM Journal on Computing, 26(5):1343–1362, 1997

work page 1997
[9]

Brenda S. Baker. Parameterized diff. InProceedings of the 10th Annual ACM-SIAM Sym- posium on Discrete Algorithms (SODA), pages 854–855. SIAM, 1999

work page 1999
[10]

Richard Beal and Donald A. Adjeroh. Compressed parameterized pattern matching.Theor. Comput. Sci., 609:129–142, 2016

work page 2016
[11]

On distances between words with parameters

Pierre Bourhis, Aaron Boussidan, and Philippe Gambette. On distances between words with parameters. In34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023, volume 259 ofLIPIcs, pages 6:1–6:23, 2023

work page 2023
[12]

Phd dissertation, Gustave Eiffel University, 2025

Aaron Boussidan.Comparaison automatique de pièces de théâtre à l’aide de distances d’édition et d’alignement. Phd dissertation, Gustave Eiffel University, 2025

work page 2025
[13]

Two-dimensional para- meterized matching.ACM Transactions on Algorithms, 11(2):12, 2014

Richard Cole, Carmit Hazay, Moshe Lewenstein, and Dekel Tsur. Two-dimensional para- meterized matching.ACM Transactions on Algorithms, 11(2):12, 2014

work page 2014
[14]

Deguchi, F

S. Deguchi, F. Higashijima, H. Bannai, S. Inenaga, and M. Takeda. Parameterized suffix arrays for binary strings. InProceedings of the Prague Stringology Conference, pages 84–94, 2008

work page 2008
[15]

Thankachan, and Y

Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Y. Yang. Space-efficient dictionaries for parameterized and order-preserving pattern matching. InLIPIcs-Leibniz International Proceedings in Informatics, volume 54. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2016

work page 2016
[16]

Thankachan

Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. InProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 397–407. SIAM, 2017

work page 2017
[17]

R. Garg, R. Prasad, and S. Agarwal. Fast parameterized word matching on compressed text. InInternational Conference on Computer and Communication Technology (ICCCT), pages 317–321. IEEE, 2014

work page 2014
[18]

Approximate parameterized matching

Carmit Hazay, Moshe Lewenstein, and Dina Sokol. Approximate parameterized matching. ACM Transactions on Algorithms, 3(3):29, 2007

work page 2007
[19]

Karp and Michael O

Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249–260, 1987

work page 1987
[20]

On the longest common paramet- erized subsequence.Theoretical Computer Science, 410(51):5347–5353, 2009

Orgad Keller, Tsvi Kopelowitz, and Moshe Lewenstein. On the longest common paramet- erized subsequence.Theoretical Computer Science, 410(51):5347–5353, 2009

work page 2009
[21]

Khetan, S

R. Khetan, S. Agarwal, and R. Prasad. An efficient approach towards compressed para- meterized word matching using wavelet tree.Journal of Information and Optimization Sciences, 37(2):285–301, 2016

work page 2016
[22]

I. Lee, J. Mendivelso, and Yoan J. Pinzón.δγ-parameterized matching. InString Processing and Information Retrieval (SPIRE), volume 5280 ofLNCS, pages 236–248, 2008

work page 2008
[23]

Parameterized matching

Moshe Lewenstein. Parameterized matching. InEncyclopedia of Algorithms, pages 635–638. Springer, 2008

work page 2008
[24]

Thankachan, and Yoan J

Juan Mendivelso, Sharma V. Thankachan, and Yoan J. Pinzón. A brief history of paramet- erized matching problems.Discrete Applied Mathematics, 274:103–115, 2020. 14

work page 2020
[25]

Generalization of a suffix tree for RNA structural pattern matching.Al- gorithmica, 39(1):1–19, 2004

Tetsuo Shibuya. Generalization of a suffix tree for RNA structural pattern matching.Al- gorithmica, 39(1):1–19, 2004. 15 A Proof of Correctness for the Offset Sets The following is the proof of Lemma 5. Proof.Direction (⇒):SupposeS 1 ≈p S2. There exists a bijectionπ: Σ→Σsuch that π(S1[i]) =S 2[i]for alli. This implies that for everyσ, the absolute positio...

work page 2004
[26]

They have the sameOffset SetO(relative spacing)

work page
[27]

LetN S(s, O)denote the number of distinct characters in stringSthat start at positions and have offset setO

They have the sameStart Positions(absolute placement). LetN S(s, O)denote the number of distinct characters in stringSthat start at positions and have offset setO. To prove a bijection exists, it is sufficient to show that for all positions s∈[1..m]and all offset setsO: N S1(s, O) =N S2(s, O) If this equality holds, we can map thekcharacters of type(s, O)...

work page
[28]

Global Shift

OffSet(σ) =O. 2.σappears ati. This happens ifσstarted at some positions≤isuch that(i−s)∈O. We can express the total count at positionias the sum of characters startingnow(s=i) and characters that startedearlier(s < i) and are re-appearing: Count S(i, O) =N S(i, O)| {z } Starts ati + X δ∈O,δ>0 N S(i−δ, O)| {z } Started ati−δ Rearranging to solve for the cu...

work page

[1] [1]

Muthukrishnan

Amihood Amir, Martin Farach, and S. Muthukrishnan. Alphabet dependence in paramet- erized matching.Information Processing Letters, 49(3):111–115, 1994

work page 1994

[2] [2]

Parameterized matching on non-linear structures

Amihood Amir and Gonzalo Navarro. Parameterized matching on non-linear structures. Information Processing Letters, 109(15):864–867, 2009

work page 2009

[3] [3]

Erdős, and Alpár Jüttner

Alberto Apostolico, Péter L. Erdős, and Alpár Jüttner. Parameterized searching with mismatches for run-length encoded strings.Theoretical Computer Science, 454:23–29, 2012

work page 2012

[4] [4]

Erdős, and Moshe Lewenstein

Alberto Apostolico, Péter L. Erdős, and Moshe Lewenstein. Parameterized matching with mismatches.Journal of Discrete Algorithms, 5(1):135–140, 2007

work page 2007

[5] [5]

Periodicity and repetitions in parameterized strings.Discrete Applied Mathematics, 156(9):1389–1398, 2008

Alberto Apostolico and Raffaele Giancarlo. Periodicity and repetitions in parameterized strings.Discrete Applied Mathematics, 156(9):1389–1398, 2008

work page 2008

[6] [6]

Brenda S. Baker. A program for identifying duplicated code. InComputing Science and Statistics: Proceedings of the 24th Symposium on the Interface, 1992

work page 1992

[7] [7]

Atheoryofparameterizedpatternmatching: Algorithmsandapplications

BrendaS.Baker. Atheoryofparameterizedpatternmatching: Algorithmsandapplications. InProceedings of the 25th Annual ACM Symposium on Theory of Computing (STOC),pages 71–80, 1993. 13

work page 1993

[8] [8]

Brenda S. Baker. Parameterized duplication in strings: Algorithms and an application to software maintenance.SIAM Journal on Computing, 26(5):1343–1362, 1997

work page 1997

[9] [9]

Brenda S. Baker. Parameterized diff. InProceedings of the 10th Annual ACM-SIAM Sym- posium on Discrete Algorithms (SODA), pages 854–855. SIAM, 1999

work page 1999

[10] [10]

Richard Beal and Donald A. Adjeroh. Compressed parameterized pattern matching.Theor. Comput. Sci., 609:129–142, 2016

work page 2016

[11] [11]

On distances between words with parameters

Pierre Bourhis, Aaron Boussidan, and Philippe Gambette. On distances between words with parameters. In34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023, volume 259 ofLIPIcs, pages 6:1–6:23, 2023

work page 2023

[12] [12]

Phd dissertation, Gustave Eiffel University, 2025

Aaron Boussidan.Comparaison automatique de pièces de théâtre à l’aide de distances d’édition et d’alignement. Phd dissertation, Gustave Eiffel University, 2025

work page 2025

[13] [13]

Two-dimensional para- meterized matching.ACM Transactions on Algorithms, 11(2):12, 2014

Richard Cole, Carmit Hazay, Moshe Lewenstein, and Dekel Tsur. Two-dimensional para- meterized matching.ACM Transactions on Algorithms, 11(2):12, 2014

work page 2014

[14] [14]

Deguchi, F

S. Deguchi, F. Higashijima, H. Bannai, S. Inenaga, and M. Takeda. Parameterized suffix arrays for binary strings. InProceedings of the Prague Stringology Conference, pages 84–94, 2008

work page 2008

[15] [15]

Thankachan, and Y

Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Y. Yang. Space-efficient dictionaries for parameterized and order-preserving pattern matching. InLIPIcs-Leibniz International Proceedings in Informatics, volume 54. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2016

work page 2016

[16] [16]

Thankachan

Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. InProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 397–407. SIAM, 2017

work page 2017

[17] [17]

R. Garg, R. Prasad, and S. Agarwal. Fast parameterized word matching on compressed text. InInternational Conference on Computer and Communication Technology (ICCCT), pages 317–321. IEEE, 2014

work page 2014

[18] [18]

Approximate parameterized matching

Carmit Hazay, Moshe Lewenstein, and Dina Sokol. Approximate parameterized matching. ACM Transactions on Algorithms, 3(3):29, 2007

work page 2007

[19] [19]

Karp and Michael O

Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249–260, 1987

work page 1987

[20] [20]

On the longest common paramet- erized subsequence.Theoretical Computer Science, 410(51):5347–5353, 2009

Orgad Keller, Tsvi Kopelowitz, and Moshe Lewenstein. On the longest common paramet- erized subsequence.Theoretical Computer Science, 410(51):5347–5353, 2009

work page 2009

[21] [21]

Khetan, S

R. Khetan, S. Agarwal, and R. Prasad. An efficient approach towards compressed para- meterized word matching using wavelet tree.Journal of Information and Optimization Sciences, 37(2):285–301, 2016

work page 2016

[22] [22]

I. Lee, J. Mendivelso, and Yoan J. Pinzón.δγ-parameterized matching. InString Processing and Information Retrieval (SPIRE), volume 5280 ofLNCS, pages 236–248, 2008

work page 2008

[23] [23]

Parameterized matching

Moshe Lewenstein. Parameterized matching. InEncyclopedia of Algorithms, pages 635–638. Springer, 2008

work page 2008

[24] [24]

Thankachan, and Yoan J

Juan Mendivelso, Sharma V. Thankachan, and Yoan J. Pinzón. A brief history of paramet- erized matching problems.Discrete Applied Mathematics, 274:103–115, 2020. 14

work page 2020

[25] [25]

Generalization of a suffix tree for RNA structural pattern matching.Al- gorithmica, 39(1):1–19, 2004

Tetsuo Shibuya. Generalization of a suffix tree for RNA structural pattern matching.Al- gorithmica, 39(1):1–19, 2004. 15 A Proof of Correctness for the Offset Sets The following is the proof of Lemma 5. Proof.Direction (⇒):SupposeS 1 ≈p S2. There exists a bijectionπ: Σ→Σsuch that π(S1[i]) =S 2[i]for alli. This implies that for everyσ, the absolute positio...

work page 2004

[26] [26]

They have the sameOffset SetO(relative spacing)

work page

[27] [27]

LetN S(s, O)denote the number of distinct characters in stringSthat start at positions and have offset setO

They have the sameStart Positions(absolute placement). LetN S(s, O)denote the number of distinct characters in stringSthat start at positions and have offset setO. To prove a bijection exists, it is sufficient to show that for all positions s∈[1..m]and all offset setsO: N S1(s, O) =N S2(s, O) If this equality holds, we can map thekcharacters of type(s, O)...

work page

[28] [28]

Global Shift

OffSet(σ) =O. 2.σappears ati. This happens ifσstarted at some positions≤isuch that(i−s)∈O. We can express the total count at positionias the sum of characters startingnow(s=i) and characters that startedearlier(s < i) and are re-appearing: Count S(i, O) =N S(i, O)| {z } Starts ati + X δ∈O,δ>0 N S(i−δ, O)| {z } Started ati−δ Rearranging to solve for the cu...

work page