Towards Efficient Matching of Regexes with Backreferences using Register Set Automata (Technical Report)

Jan Va\v{s}\'ak; Luk\'a\v{s} Hol\'ik; Ond\v{r}ej Leng\'al; Sab\'ina Gul\v{c}\'ikov\'a; Vojt\v{e}ch Havlena

arxiv: 2205.12114 · v5 · submitted 2022-05-24 · 💻 cs.LO · cs.FL

Towards Efficient Matching of Regexes with Backreferences using Register Set Automata (Technical Report)

Vojt\v{e}ch Havlena , Luk\'a\v{s} Hol\'ik , Ond\v{r}ej Leng\'al , Jan Va\v{s}\'ak , Sab\'ina Gul\v{c}\'ikov\'a This is my paper

Pith reviewed 2026-05-24 11:38 UTC · model grok-4.3

classification 💻 cs.LO cs.FL

keywords regex matchingbackreferencesregister set automatadeterministic automatadata wordsderivative constructionemptiness problem

0 comments

The pith

Register set automata transform backreference regex matching into deterministic linear or quadratic time procedures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces register set automata to handle regexes with backreferences efficiently without backtracking slowdowns. It shows how a large class of register automata can be converted to deterministic RSAs that support set-valued registers and specific operations. A derivative-based method converts suitable regexes with single-letter captures and backreferences into register automata. This yields matching algorithms whose time is linear in input length for finite alphabets and quadratic for infinite alphabets. The work also establishes that RSA emptiness is decidable and F_omega-complete while the model is incomparable to other data-word automata.

Core claim

Register set automata (RSAs) extend register automata so that registers hold sets of symbols and support adding the current input symbol, merging or clearing registers, and testing whether a register contains a value. A large class of register automata can be transformed into deterministic RSAs that serve as the basis for fast matching of regexes with single-letter capture groups and backreferences. A derivative-based algorithm converts a large class of such regexes into register automata, and the resulting matchers run in time linear in the input length for finite alphabets and quadratic for infinite alphabets.

What carries the argument

Register set automata (RSAs), an extension of register automata allowing registers to hold sets of symbols with operations for adding input values, merging or clearing registers, and testing membership.

If this is right

Deterministic matching without backtracking is possible for the supported family of backreference regexes.
Matching runs in time linear in input length when the alphabet is finite.
Matching runs in time quadratic in input length when the alphabet is infinite.
The emptiness problem for RSAs is decidable and complete for the F_omega class.
RSAs are incomparable in expressive power to other popular automata models over data words.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-letter restriction might be relaxed by encoding multi-letter captures through multiple single-letter registers if additional bookkeeping preserves the set semantics.
Integration into production regex engines could reduce exposure to ReDoS attacks on patterns that fit the supported class.
Determinism of the resulting matchers opens the possibility of efficient online or streaming implementations for data streams.
The decidability result for emptiness suggests RSAs could serve as a target for verification tasks involving properties of data words.

Load-bearing premise

The transformations and complexity results apply only to regexes restricted to single-letter capture groups and backreferences, assuming this restricted class remains practically relevant.

What would settle it

An input string together with a single-letter backreference regex whose RSA matcher either exceeds linear time on a finite alphabet or accepts a language different from the original regex.

read the original abstract

Matching regexes (regular expressions) is a common problem in many areas of computer science, with requirements on high speed and robust performance. Regexes with backreferences allow one to express certain patterns (even beyond regular) concisely, however, since the matching is usually done by backtracking, the matching speed can degrade to a degree that constitutes a service failure or a security threat. To facilitate high-speed matching of such regexes, we propose register set automata (RSAs), an extension of register automata where registers can contain sets of symbols (from a potentially infinite alphabet) and the following operations are supported: adding input values to registers, merging or clearing registers, and testing whether a register contains a value. We show that a large class of register automata can be transformed into deterministic RSAs, which can serve as a basis for fast matching of a family of regexes with single-letter capture groups and backreferences. We also give a derivative-based algorithm for transforming a large class of regexes with backreferences to register automata and show that the time complexity of matching is linear and quadratic to the length of the input for finite and infinite alphabets respectively. Our prototype implementation of a regex matcher shows that our approach can significantly improve the robustness of state-of-the-art regex matchers on regexes with backreferences. We also study the theoretical properties of the model and show that the emptiness problem for RSAs is decidable and complete for the $\mathbf{F}_\omega$ class and that RSAs are incomparable in expressive power to other popular automata models over data words.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces register set automata and derivative constructions for efficient matching on a narrow class of single-letter backreference regexes, with linear/quadratic time bounds and some decidability results.

read the letter

The main new elements are the RSA model where registers hold sets and support add/merge/clear/test operations, plus a transformation turning a large class of register automata into deterministic RSAs. They also give a derivative algorithm that converts a restricted family of backreference regexes into register automata and prove the matching complexities stated in the abstract. The prototype is said to improve robustness over existing matchers on those cases, and they add emptiness decidability in F_omega plus an incomparability result with other data-word models. These constructions are presented directly rather than as reductions of earlier register automata work, which is the clearest advance here. The practical angle on ReDoS is addressed through the automata route instead of backtracking. The restriction to single-letter capture groups is stated up front, so the claims hold within that scope and do not overreach. That scope is the main soft spot: it covers some security-relevant patterns but excludes many common backreference uses with longer captures, which limits how far the efficiency gains travel. The abstract shows no internal contradictions or circularity in the complexity or decidability statements. Proofs and prototype details would need verification in the full text, but the stated results line up without obvious gaps. This is for people working on automata extensions for data words or on regex engine robustness. A reader already following register automata or ReDoS mitigation would get concrete constructions and bounds to examine. It deserves peer review because the model and algorithms are specific enough to check and the complexity claims are falsifiable.

Referee Report

0 major / 3 minor

Summary. The paper introduces register set automata (RSAs) as an extension of register automata in which registers hold sets of symbols and support add, merge, clear, and membership-test operations. It shows that a large class of register automata can be converted to deterministic RSAs, gives a derivative-based construction that translates a restricted family of regexes with single-letter capture groups and backreferences into register automata, proves linear-time matching for finite alphabets and quadratic-time matching for infinite alphabets, reports a prototype that improves robustness over existing matchers, and establishes that RSA emptiness is F_ω-complete while RSAs are incomparable to several other data-word automata models.

Significance. If the claimed semantic equivalences hold, the work supplies a deterministic, automaton-based route to efficient matching for an explicitly delimited but practically motivated subclass of backreference regexes, directly addressing performance and denial-of-service concerns. The prototype evaluation and the F_ω-completeness result constitute concrete strengths; the explicit restriction to single-letter groups and the separate treatment of finite versus infinite alphabets allow the complexity claims to be assessed within a well-defined scope.

minor comments (3)

[§2] §2 (RSA definition): the precise cost model for set-union and membership tests on potentially unbounded sets is not stated; a sentence clarifying whether these are assumed O(1) or depend on set cardinality would aid reproducibility of the quadratic bound.
[§4] §4 (derivative construction): the algorithm is described at a high level; adding a small pseudocode fragment or an explicit inductive invariant would make the translation from regex to register automaton easier to verify.
[Table 1] Table 1 (prototype benchmarks): the column headings and units for input length versus matching time are not fully aligned with the text description of the test suite; a caption clarifying the alphabet size for each row would remove ambiguity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work on register set automata and the recommendation for minor revision. No specific major comments appear in the provided report, so there are no individual points requiring point-by-point rebuttal or revision at this stage.

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper introduces register set automata (RSAs) as a novel extension of register automata with explicit set operations on registers, then defines transformations to deterministic RSAs and a derivative-based conversion from a restricted class of backreference regexes directly from these new definitions. Complexity bounds (linear/quadratic matching time) and the F_ω-completeness of emptiness follow from the model semantics and standard automata constructions without any equations that reduce the claimed results to fitted parameters, prior self-citations, or renamed known results. The restricted scope (single-letter capture groups) is stated explicitly, and all load-bearing steps remain self-contained within the introduced formalism.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the newly defined RSA operations and the existence of transformations for a 'large class' of inputs; no free parameters are introduced, and the model itself is the main invented construct.

axioms (1)

standard math Standard automata constructions and derivative rules for regular expressions extend to register automata with set registers.
Invoked for the derivative-based algorithm and the register automata to RSA conversion.

invented entities (1)

Register Set Automata (RSAs) no independent evidence
purpose: Provide deterministic automata for efficient backreference regex matching using set-valued registers.
New model defined in the paper with specific operations on sets.

pith-pipeline@v0.9.0 · 5855 in / 1304 out tokens · 41689 ms · 2026-05-24T11:38:46.685473+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

19 Google

Springer International Publishing.doi:10.1007/978-3-030-63461-2_2. 19 Google. RE2, 2022. URL:https://github.com/google/re2. 20 Google. RE2 issue tracker: Feature request #101: Add backreference support, 2022. URL: https://github.com/google/re2/issues/101. 21 Radu Grigore, Dino Distefano, Rasmus Lerchedahl Petersen, and Nikos Tzevelekos. Runtime Veriﬁcatio...

work page doi:10.1007/978-3-030-63461-2_2 2022
[2]

doi:10.1007/978-3-642-36742-7_19

Springer. doi:10.1007/978-3-642-36742-7_19. 22 Lukáš Holík, Ondřej Lengál, Olli Saarikivi, Lenka Turoňová, Margus Veanes, and Tomáš Vojnar. Succinct Determinisation of Counting Automata via Sphere Construction. In Anthony Widjaja Lin, editor,Programming Languages and Systems, Lecture Notes in Computer Science, pages 468–489, Cham, 2019. Springer Internati...

work page doi:10.1007/978-3-642-36742-7_19 2019
[3]

41 Sylvain Schmitz and Philippe Schnoebelen

URL: https://cel.archives-ouvertes.fr/cel-00727025. 41 Sylvain Schmitz and Philippe Schnoebelen. The power of well-structured systems. In Pedro R. D’Argenio and Hernán C. Melgratti, editors,CONCUR 2013, Buenos Aires, Argentina, August 27-30, 2013. Proceedings, volume 8052 ofLecture Notes in Computer Science, pages 5–24. Springer, 2013. doi:10.1007/978-3-6...

work page doi:10.1007/978-3-642-40184-8 2013
[4]

t is enabled inM and

work page
[5]

positive

M ′ is the marking such that for everyp ∈P we have M ′(p) = Out(p) + ∑ {Maux(p′) | Transfer(p′) =p} where Maux =M − In. (1) That is, the successor markingM ′ is obtained by (i) removingIn tokens from inputs oft, (ii) transferring tokens according toTransfer, and (iii) addingOut tokens tot’s outputs. We say that a markingM is reachableif there is a (possib...

work page 2016
[6]

The Lossy Removalprotogadget, which simulates a (lossy) removal of one token from a placep is the RsA deﬁned as LossyRm(p) = ({q1,q 2,q 3}, R, ∆′, {q1}, {q3}) where ∆′ contains the following three transitions (cf. Figure 9a): ∆′ =    q1 a | {rp}, ∅, {rin ↦→ {in}, rtmp ↦→ ∅} q2, q2 a | {rp}, {rin}, {rtmp ↦→ {rtmp, in}} q2, q2 a | ∅, ∅, {rp ↦→ {rtmp}} q3...

work page
[7]

Intuitively, the protogadet empties out the register which represents the placep1

The Move protogadget, which simulatesmoving all tokens from a placep1 to a placep2, is the followingRsA (also depicted in Figure 9b): Move(p1,p 2) = ({q1,q 2}, R, {q1 a | ∅, ∅, {rp1 ↦→ ∅, rp2 ↦→ {rp1 , rp2 }} q2}, {q1}, {q2}). Intuitively, the protogadet empties out the register which represents the placep1. Its previous value is assigned to register repr...

work page
[8]

Intuitively, the protogadget adds the unique data value from the input tape into the register representing the placep

The New Tokenprotogadget, which simulates adding a token to a placep, is deﬁned as follows (depiction is in Figure 9c): NewToken(p) = ({q1,q 2}, R, {q1 a | ∅, R, {rp ↦→ {rp, in}} q2}, {q1}, {q2}). Intuitively, the protogadget adds the unique data value from the input tape into the register representing the placep. The uniqueness of the data value is ensur...

work page
[9]

· AIn(pn) where every AIn(pi) is deﬁned as AIn(pi) = LossyRm(pi)[In(pi)], i.e., it is a concatenation ofIn(pi) copies of LossyRm(pi)

First, we transformIn into the RsA AIn = AIn(p1) ·... · AIn(pn) where every AIn(pi) is deﬁned as AIn(pi) = LossyRm(pi)[In(pi)], i.e., it is a concatenation ofIn(pi) copies of LossyRm(pi)

work page
[10]

· AOut(pn) with AOut(pi) deﬁned as AOut(pi) = NewToken(pi)[Out(pi)], i.e., it is a concatenation ofOut(pi) copies of NewToken(pi)

Second, from Out we create the RsA AOut = AOut(p1) ·... · AOut(pn) with AOut(pi) deﬁned as AOut(pi) = NewToken(pi)[Out(pi)], i.e., it is a concatenation ofOut(pi) copies of NewToken(pi)

work page
[11]

· ATransfer(pn) · Aunprime(p1) ·

Third, from Transfer we obtain theRsA ATransfer = ATransfer(p1) ·... · ATransfer(pn) · Aunprime(p1) · ... · A unprime(pn) such that ATransfer(pi) = Move(rpi,r p′ j ) with pj = Transfer(pi) and Aunprime(pi) = Move(p′ i,p i). Intuitively, ATransfer ﬁrst moves the contents of all registers according toTransfer to primed instances of the target registers (in ...

work page
[12]

most precise

Finally, we combine theRsAs created above into the single gadgetAt = AIn · ATransfer · AOut. The initial marking will be encoded by a gadget that puts one new data value in the register representing the placep1. For this, we construct theRsA AM0 = NewToken(p1) and rename its initial state toqinit. The last ingredient we need is to create a gadget that wil...

work page 2016
[13]

RsA is closed under union and intersection

work page
[14]

RsA is not closed under complement. Proof. The proofs for closure under union and intersection are standard: for twoRsAs A1 = (Q1, R1, ∆1,I 1,F 1) and A2 = (Q2, R2, ∆2,I 2,F 2) with disjoint sets of states and registers, the RsA A∪ accepting the union of their languages is obtained asA∪ = (Q1 ∪ Q2, R1 ∪R2, ∆1 ∪∆2,I 1 ∪I2,F 1 ∪F2). Similarly, A\ accepting ...

work page
[15]

each increment is matched with a decrement

∈ ∆′ iﬀ s1 a | g∈ 1 , g /∈ 1 , up1 s′ 1 ∈ ∆1 and s2 a | g∈ 2 , g /∈ 2 , up2 s′ 2 ∈ ∆2. Correctness of the constructions is clear. For showing the non-closure under complement, consider the languageL¬∀repeat from Example 5, which can be accepted byRsA. Let us show that for the complement of the language, namely, the language L∀repeat = {w | ∀i∃j : i ̸=j ∧ ...

work page
[16]

The ﬁrst instruction is of the form(q1, ·, ·) for q1 being the initial state ofM

work page
[17]

Each instruction of the form(·, ·,q i) is followed by an instruction of the form(qi, ·, ·)

work page
[18]

All increments have diﬀerent data values and all decrements have diﬀerent data values

work page
[19]

there is an increment not followed by a decrement with the same data value

Between every two(·, ifzeroi, ·) instructions (or between the start and the ﬁrst such an ifzeroi instruction), a. every (·, deci, ·) needs to be preceded by an(·, inci, ·) instruction with the same data value and b. every (·, inci, ·) needs to be followed by a(·, deci, ·) instruction with the same data value. Properties 1 and 2 can be easily expressed usi...

work page
[20]

RsAn is closed under union

work page
[21]

RsAn is not closed under intersection and complement. Proof. The proof of closure under union is the same as in the proof of Theorem 9 with the exception that the results uses only registersR1 (we assume |R1| = n): all references to registers r ∈ R2 are changed to references tof(r) where f : R2 → R1 is an injection. CVIT 2016 23:30 Register Set Automata (...

work page 2016
[22]

∈ ∆′ iﬀ s1 a | g∈ 1 , g /∈ 1 , up1 s′ 1 ∈ ∆1 and s2 a | g∈ 2 , g /∈ 2 , up2 s′ 2 ∈ ∆2, and F ′ = {(q1,q 2) |q1 ∈F1 ∨q2 ∈F2}. The construction of theDRsA A\ = (Q1 ×Q2, R1 ∪ R2, ∆′,I 1 ×I2,F ′ \ ) accepting the intersection of L(A1) and L(A2) is similar to the construction ofA∪, with the exception of F ′ \, which is obtained asF ′ \ =F1 ×F2. The complement ...

work page
[23]

DRsAn is closed under complement

work page
[24]

DRsAn is not closed under union and intersection. Proof. The closure under complement is trivial (complete theDRsA and swap ﬁnal and non-ﬁnal states; no new register is introduced). To show thatDRsAn is not closed under intersection, we use languagesLA n and LB n from the proof of Theorem 10. In particular, both these languages are inDRAn (the DRAn needs ...

work page
[26]

We will show that ρ′ is accepting, and sow ∈ L(A′)

⊢⟨a2,d2⟩ t′ 2 · · · ⊢⟨an,dn⟩ t′n ((S′ n,c ′ n),f ′ n) be the run ofA′ on w (A′ is deterministic and complete, soρ′ is unique). We will show that ρ′ is accepting, and sow ∈ L(A′). Let us show that for all0 ≤i ≤n, the following conditions hold:

work page
[27]

∀r ∈ R: fi(r) ̸= ⊥ =⇒ fi(r) ∈f ′ i(r), and

work page
[28]

We proceed by induction oni: i = 0: Since the (only) initial state ofA′ is the macrostate(I,c 0 = {ri ↦→ 0 |ri ∈ R}) (cf

∀r ∈ R: c′ i(r) =    0 iﬀ f ′ i(r) = ∅, 1 iﬀ |f ′ i(r)| = 1, ω iﬀ |f ′ i(r)| ≥ 2. We proceed by induction oni: i = 0: Since the (only) initial state ofA′ is the macrostate(I,c 0 = {ri ↦→ 0 |ri ∈ R}) (cf. Line 25), andq0 ∈I, then condition 1 holds. Moreover,f0(r) = ⊥ for everyr ∈ R, so condition 2 holds trivially, and so does condition 3 (thec′ i of...

work page 2016
[29]

∀r ∈ R: fj+1(r) ̸= ⊥ =⇒ fj+1(r) ∈f ′ j+1(r), and

work page
[30]

∀r ∈ R: c′ j+1(r) =    0 iﬀ f ′ j+1(r) = ∅, 1 iﬀ |f ′ j+1(r)| = 1, ω iﬀ |f ′ j+1(r)| ≥ 2. Proof. 1. Trivial

work page
[31]

Follows from the induction hypothesis and the deﬁnition of the update functionup′ on Lines 9–20

work page
[32]

◁ This concludes the ﬁrst direction of the proof

Follows from the induction hypothesis and the deﬁnition ofc′ on Line 15. ◁ This concludes the ﬁrst direction of the proof. (⊇) Let w = ⟨a1,d 1⟩... ⟨an,d n⟩ ∈ L (A′). Then there is an accepting runρ′ of A′ on w, such that ρ′ : ((S′ 0,c ′ 0),f ′

work page
[33]

⊢⟨a1,d1⟩ t′ 1 ((S′ 1,c ′ 1),f ′

work page
[34]

⊢⟨a2,d2⟩ t′ 2 · · · ⊢⟨an,dn⟩ t′n ((S′ n,c ′ n),f ′ n) with (S′ n,c ′ n) ∈ F ′. We will construct in a backward manner a sequence of sets of A’s conﬁgurations (i.e., pairs containing a state and assignment to registers) U0,U 1,...,U n−1,U n, such that ∀1 ≤ i ≤ n: Ui ∈ Q × (R → D), that represents all accepting runs ofA over w, and show thatU0 contains a co...

work page
[35]

for every registerr ∈ R, it holds thatfi(r) ∈f ′ i(r) ∪ {⊥},

work page
[36]

Let us now show that for all0 ≤ i ≤ n, the setUi is nonempty

there is a transitionti+1 : qi ai+1 | g=, g̸=, up qi+1 ∈ ∆ such that ∗ (qi+1,f i+1) ∈Ui+1 and ∗ (qi,f i) ⊢⟨ai+1,di+1⟩ ti+1 (qi+1,f i+1). Let us now show that for all0 ≤ i ≤ n, the setUi is nonempty. We proceed by backward induction. ∗ i =n (base case):ρ′ is accepting, so there is at least one state inS′ n \ F ′. ∗ i =j + 1 (induction hypothesis): we assum...

work page
[37]

Non-complemented literals: if the literal is not complemented, we use Theorem 15(a) to obtain a DRsA accepting it

work page
[38]

Complemented literals: ﬁrst, we use Fact 1 to obtain theURA= 1 accepting the comple- mented language and then use Theorem 15(b) to convert it into aDRsA

work page
[39]

◀ C Other Examples ▶ Example 34

Unions and intersections: we simply use Theorem 11 to obtain the resultingDRsA. ◀ C Other Examples ▶ Example 34. Consider the languageLdisjoint [31, Example 2.5] of wordsw =uv where u and v are non-empty and the data values inu and v are disjoint, i.e., Ldisjoint = { w | ∃u,v ∈ Σ × D: w =uv ∧u ̸=ϵ ∧v ̸=ϵ ∧ ∀i,j : D[ui] ̸= D[vj] } For instance, the languag...

work page 2016
[40]

Intuitively, theEmptyEq simulates a register emptiness test

The EmptyEq protogadget (depicted in Figure 10a), which simulates the inhibitor arc leading from placep is an RsA=r deﬁned as: EmptyEq(p) = ({q1,q 2,q 3}, R, {q1 a | ∅, ∅, ∅, {re ↦→ ∅} q2,q 2 a | ∅, ∅, {rp ↦→ {re}}, ∅ q3}, {q1}, {q3}). Intuitively, theEmptyEq simulates a register emptiness test. At ﬁrst, the protogadget explicitly assigns the value of emp...

work page
[41]

Intuitively, for each arc originating in the transition and ending in a particular place, a token is added to the register representing the destination

The New Token(depicted in Figure 10b) which simulates adding a token to a placep, is an RsArm =∅ deﬁned in the following way: NewToken(p) = ({q1,q 2}, R, {q1 a | ∅, R, ∅, {rp ↦→ {rp, in}}} q2}, {q1}, {q2}). Intuitively, for each arc originating in the transition and ending in a particular place, a token is added to the register representing the destinatio...

work page
[42]

Intuitively, for each arc originating in a placep and terminating in transition, respective number of values has to be removed from the register representing placep

The Non-lossy Remove Token(depicted in Figure 10c) which simulates removal of a token from a placep is an RsA=r deﬁned in the following way: NonLossyRm(p) = ({q1,q 2,q 3}, R, {q1 a | {rp}, ∅, ∅, {rin ↦→ {in}, rs ↦→ {in}} q2, q2 a | {rp}, {rin}, ∅, {rtmp ↦→ {rtmp, inp}, rs ↦→ {rtmp, rin, in}} q2, q2 a | ∅, ∅, {rp ↦→ {rs}}, {rp ↦→ {rtmp}} q3}, {q1}, {q2}). ...

work page 2016
[43]

The Empty protogadget (depicted in Figure 11a), which simulates the inhibitor arc leading from the placep is an RsArm =∅ deﬁned as: Empty(p) = ({q1,q 2}, R, {q1 a | ∅, ∅, {rp}, ∅, {rp ↦→ ∅}} q2}, {q1}, {q2}). S. Gulčíková and O. Lengál 23:37 q1 q2 a rp = ∅ (a) The Empty(p) protogadget. q1 q2 a {in /∈ r}r∈R rp ← rp ∪ {in} (b) The NewToken(p) protogadget. q...

work page
[44]

Intuitively, for each arc originating in the transition and ending in a particular place, a token is added to the register representing the destination

The New Tokenprotogadget (depicted in Figure 11b) which simulates adding a token to a placep, is anRsArm =∅ deﬁned in the following way: NewToken(p) = ({q1,q 2}, R, {q1 a | ∅, R, ∅, ∅, {rp ↦→ {rp, in}}} q2}, {q1}, {q2}). Intuitively, for each arc originating in the transition and ending in a particular place, a token is added to the register representing ...

work page
[45]

Intuitively, for each token removed from the source place, one value is removed from the register which represents that place

The Remove Tokenprotogadget (depicted in Figure 11c) which simulates the removal of a token from a placep, is anRsArm =∅ deﬁned in the following way: TokenRem(p) = ({q1,q 2}, R, {q1 a | ∅, {rp}, ∅, {rp}, {rp ↦→ {rp}}} q2}, {q1}, {q2}). Intuitively, for each token removed from the source place, one value is removed from the register which represents that p...

work page 2016

[1] [1]

19 Google

Springer International Publishing.doi:10.1007/978-3-030-63461-2_2. 19 Google. RE2, 2022. URL:https://github.com/google/re2. 20 Google. RE2 issue tracker: Feature request #101: Add backreference support, 2022. URL: https://github.com/google/re2/issues/101. 21 Radu Grigore, Dino Distefano, Rasmus Lerchedahl Petersen, and Nikos Tzevelekos. Runtime Veriﬁcatio...

work page doi:10.1007/978-3-030-63461-2_2 2022

[2] [2]

doi:10.1007/978-3-642-36742-7_19

Springer. doi:10.1007/978-3-642-36742-7_19. 22 Lukáš Holík, Ondřej Lengál, Olli Saarikivi, Lenka Turoňová, Margus Veanes, and Tomáš Vojnar. Succinct Determinisation of Counting Automata via Sphere Construction. In Anthony Widjaja Lin, editor,Programming Languages and Systems, Lecture Notes in Computer Science, pages 468–489, Cham, 2019. Springer Internati...

work page doi:10.1007/978-3-642-36742-7_19 2019

[3] [3]

41 Sylvain Schmitz and Philippe Schnoebelen

URL: https://cel.archives-ouvertes.fr/cel-00727025. 41 Sylvain Schmitz and Philippe Schnoebelen. The power of well-structured systems. In Pedro R. D’Argenio and Hernán C. Melgratti, editors,CONCUR 2013, Buenos Aires, Argentina, August 27-30, 2013. Proceedings, volume 8052 ofLecture Notes in Computer Science, pages 5–24. Springer, 2013. doi:10.1007/978-3-6...

work page doi:10.1007/978-3-642-40184-8 2013

[4] [4]

t is enabled inM and

work page

[5] [5]

positive

M ′ is the marking such that for everyp ∈P we have M ′(p) = Out(p) + ∑ {Maux(p′) | Transfer(p′) =p} where Maux =M − In. (1) That is, the successor markingM ′ is obtained by (i) removingIn tokens from inputs oft, (ii) transferring tokens according toTransfer, and (iii) addingOut tokens tot’s outputs. We say that a markingM is reachableif there is a (possib...

work page 2016

[6] [6]

The Lossy Removalprotogadget, which simulates a (lossy) removal of one token from a placep is the RsA deﬁned as LossyRm(p) = ({q1,q 2,q 3}, R, ∆′, {q1}, {q3}) where ∆′ contains the following three transitions (cf. Figure 9a): ∆′ =    q1 a | {rp}, ∅, {rin ↦→ {in}, rtmp ↦→ ∅} q2, q2 a | {rp}, {rin}, {rtmp ↦→ {rtmp, in}} q2, q2 a | ∅, ∅, {rp ↦→ {rtmp}} q3...

work page

[7] [7]

Intuitively, the protogadet empties out the register which represents the placep1

The Move protogadget, which simulatesmoving all tokens from a placep1 to a placep2, is the followingRsA (also depicted in Figure 9b): Move(p1,p 2) = ({q1,q 2}, R, {q1 a | ∅, ∅, {rp1 ↦→ ∅, rp2 ↦→ {rp1 , rp2 }} q2}, {q1}, {q2}). Intuitively, the protogadet empties out the register which represents the placep1. Its previous value is assigned to register repr...

work page

[8] [8]

Intuitively, the protogadget adds the unique data value from the input tape into the register representing the placep

The New Tokenprotogadget, which simulates adding a token to a placep, is deﬁned as follows (depiction is in Figure 9c): NewToken(p) = ({q1,q 2}, R, {q1 a | ∅, R, {rp ↦→ {rp, in}} q2}, {q1}, {q2}). Intuitively, the protogadget adds the unique data value from the input tape into the register representing the placep. The uniqueness of the data value is ensur...

work page

[9] [9]

· AIn(pn) where every AIn(pi) is deﬁned as AIn(pi) = LossyRm(pi)[In(pi)], i.e., it is a concatenation ofIn(pi) copies of LossyRm(pi)

First, we transformIn into the RsA AIn = AIn(p1) ·... · AIn(pn) where every AIn(pi) is deﬁned as AIn(pi) = LossyRm(pi)[In(pi)], i.e., it is a concatenation ofIn(pi) copies of LossyRm(pi)

work page

[10] [10]

· AOut(pn) with AOut(pi) deﬁned as AOut(pi) = NewToken(pi)[Out(pi)], i.e., it is a concatenation ofOut(pi) copies of NewToken(pi)

Second, from Out we create the RsA AOut = AOut(p1) ·... · AOut(pn) with AOut(pi) deﬁned as AOut(pi) = NewToken(pi)[Out(pi)], i.e., it is a concatenation ofOut(pi) copies of NewToken(pi)

work page

[11] [11]

· ATransfer(pn) · Aunprime(p1) ·

Third, from Transfer we obtain theRsA ATransfer = ATransfer(p1) ·... · ATransfer(pn) · Aunprime(p1) · ... · A unprime(pn) such that ATransfer(pi) = Move(rpi,r p′ j ) with pj = Transfer(pi) and Aunprime(pi) = Move(p′ i,p i). Intuitively, ATransfer ﬁrst moves the contents of all registers according toTransfer to primed instances of the target registers (in ...

work page

[12] [12]

most precise

Finally, we combine theRsAs created above into the single gadgetAt = AIn · ATransfer · AOut. The initial marking will be encoded by a gadget that puts one new data value in the register representing the placep1. For this, we construct theRsA AM0 = NewToken(p1) and rename its initial state toqinit. The last ingredient we need is to create a gadget that wil...

work page 2016

[13] [13]

RsA is closed under union and intersection

work page

[14] [14]

RsA is not closed under complement. Proof. The proofs for closure under union and intersection are standard: for twoRsAs A1 = (Q1, R1, ∆1,I 1,F 1) and A2 = (Q2, R2, ∆2,I 2,F 2) with disjoint sets of states and registers, the RsA A∪ accepting the union of their languages is obtained asA∪ = (Q1 ∪ Q2, R1 ∪R2, ∆1 ∪∆2,I 1 ∪I2,F 1 ∪F2). Similarly, A\ accepting ...

work page

[15] [15]

each increment is matched with a decrement

∈ ∆′ iﬀ s1 a | g∈ 1 , g /∈ 1 , up1 s′ 1 ∈ ∆1 and s2 a | g∈ 2 , g /∈ 2 , up2 s′ 2 ∈ ∆2. Correctness of the constructions is clear. For showing the non-closure under complement, consider the languageL¬∀repeat from Example 5, which can be accepted byRsA. Let us show that for the complement of the language, namely, the language L∀repeat = {w | ∀i∃j : i ̸=j ∧ ...

work page

[16] [16]

The ﬁrst instruction is of the form(q1, ·, ·) for q1 being the initial state ofM

work page

[17] [17]

Each instruction of the form(·, ·,q i) is followed by an instruction of the form(qi, ·, ·)

work page

[18] [18]

All increments have diﬀerent data values and all decrements have diﬀerent data values

work page

[19] [19]

there is an increment not followed by a decrement with the same data value

Between every two(·, ifzeroi, ·) instructions (or between the start and the ﬁrst such an ifzeroi instruction), a. every (·, deci, ·) needs to be preceded by an(·, inci, ·) instruction with the same data value and b. every (·, inci, ·) needs to be followed by a(·, deci, ·) instruction with the same data value. Properties 1 and 2 can be easily expressed usi...

work page

[20] [20]

RsAn is closed under union

work page

[21] [21]

RsAn is not closed under intersection and complement. Proof. The proof of closure under union is the same as in the proof of Theorem 9 with the exception that the results uses only registersR1 (we assume |R1| = n): all references to registers r ∈ R2 are changed to references tof(r) where f : R2 → R1 is an injection. CVIT 2016 23:30 Register Set Automata (...

work page 2016

[22] [22]

∈ ∆′ iﬀ s1 a | g∈ 1 , g /∈ 1 , up1 s′ 1 ∈ ∆1 and s2 a | g∈ 2 , g /∈ 2 , up2 s′ 2 ∈ ∆2, and F ′ = {(q1,q 2) |q1 ∈F1 ∨q2 ∈F2}. The construction of theDRsA A\ = (Q1 ×Q2, R1 ∪ R2, ∆′,I 1 ×I2,F ′ \ ) accepting the intersection of L(A1) and L(A2) is similar to the construction ofA∪, with the exception of F ′ \, which is obtained asF ′ \ =F1 ×F2. The complement ...

work page

[23] [23]

DRsAn is closed under complement

work page

[24] [24]

DRsAn is not closed under union and intersection. Proof. The closure under complement is trivial (complete theDRsA and swap ﬁnal and non-ﬁnal states; no new register is introduced). To show thatDRsAn is not closed under intersection, we use languagesLA n and LB n from the proof of Theorem 10. In particular, both these languages are inDRAn (the DRAn needs ...

work page

[25] [26]

We will show that ρ′ is accepting, and sow ∈ L(A′)

⊢⟨a2,d2⟩ t′ 2 · · · ⊢⟨an,dn⟩ t′n ((S′ n,c ′ n),f ′ n) be the run ofA′ on w (A′ is deterministic and complete, soρ′ is unique). We will show that ρ′ is accepting, and sow ∈ L(A′). Let us show that for all0 ≤i ≤n, the following conditions hold:

work page

[26] [27]

∀r ∈ R: fi(r) ̸= ⊥ =⇒ fi(r) ∈f ′ i(r), and

work page

[27] [28]

We proceed by induction oni: i = 0: Since the (only) initial state ofA′ is the macrostate(I,c 0 = {ri ↦→ 0 |ri ∈ R}) (cf

∀r ∈ R: c′ i(r) =    0 iﬀ f ′ i(r) = ∅, 1 iﬀ |f ′ i(r)| = 1, ω iﬀ |f ′ i(r)| ≥ 2. We proceed by induction oni: i = 0: Since the (only) initial state ofA′ is the macrostate(I,c 0 = {ri ↦→ 0 |ri ∈ R}) (cf. Line 25), andq0 ∈I, then condition 1 holds. Moreover,f0(r) = ⊥ for everyr ∈ R, so condition 2 holds trivially, and so does condition 3 (thec′ i of...

work page 2016

[28] [29]

∀r ∈ R: fj+1(r) ̸= ⊥ =⇒ fj+1(r) ∈f ′ j+1(r), and

work page

[29] [30]

∀r ∈ R: c′ j+1(r) =    0 iﬀ f ′ j+1(r) = ∅, 1 iﬀ |f ′ j+1(r)| = 1, ω iﬀ |f ′ j+1(r)| ≥ 2. Proof. 1. Trivial

work page

[30] [31]

Follows from the induction hypothesis and the deﬁnition of the update functionup′ on Lines 9–20

work page

[31] [32]

◁ This concludes the ﬁrst direction of the proof

Follows from the induction hypothesis and the deﬁnition ofc′ on Line 15. ◁ This concludes the ﬁrst direction of the proof. (⊇) Let w = ⟨a1,d 1⟩... ⟨an,d n⟩ ∈ L (A′). Then there is an accepting runρ′ of A′ on w, such that ρ′ : ((S′ 0,c ′ 0),f ′

work page

[32] [33]

⊢⟨a1,d1⟩ t′ 1 ((S′ 1,c ′ 1),f ′

work page

[33] [34]

⊢⟨a2,d2⟩ t′ 2 · · · ⊢⟨an,dn⟩ t′n ((S′ n,c ′ n),f ′ n) with (S′ n,c ′ n) ∈ F ′. We will construct in a backward manner a sequence of sets of A’s conﬁgurations (i.e., pairs containing a state and assignment to registers) U0,U 1,...,U n−1,U n, such that ∀1 ≤ i ≤ n: Ui ∈ Q × (R → D), that represents all accepting runs ofA over w, and show thatU0 contains a co...

work page

[34] [35]

for every registerr ∈ R, it holds thatfi(r) ∈f ′ i(r) ∪ {⊥},

work page

[35] [36]

Let us now show that for all0 ≤ i ≤ n, the setUi is nonempty

there is a transitionti+1 : qi ai+1 | g=, g̸=, up qi+1 ∈ ∆ such that ∗ (qi+1,f i+1) ∈Ui+1 and ∗ (qi,f i) ⊢⟨ai+1,di+1⟩ ti+1 (qi+1,f i+1). Let us now show that for all0 ≤ i ≤ n, the setUi is nonempty. We proceed by backward induction. ∗ i =n (base case):ρ′ is accepting, so there is at least one state inS′ n \ F ′. ∗ i =j + 1 (induction hypothesis): we assum...

work page

[36] [37]

Non-complemented literals: if the literal is not complemented, we use Theorem 15(a) to obtain a DRsA accepting it

work page

[37] [38]

Complemented literals: ﬁrst, we use Fact 1 to obtain theURA= 1 accepting the comple- mented language and then use Theorem 15(b) to convert it into aDRsA

work page

[38] [39]

◀ C Other Examples ▶ Example 34

Unions and intersections: we simply use Theorem 11 to obtain the resultingDRsA. ◀ C Other Examples ▶ Example 34. Consider the languageLdisjoint [31, Example 2.5] of wordsw =uv where u and v are non-empty and the data values inu and v are disjoint, i.e., Ldisjoint = { w | ∃u,v ∈ Σ × D: w =uv ∧u ̸=ϵ ∧v ̸=ϵ ∧ ∀i,j : D[ui] ̸= D[vj] } For instance, the languag...

work page 2016

[39] [40]

Intuitively, theEmptyEq simulates a register emptiness test

The EmptyEq protogadget (depicted in Figure 10a), which simulates the inhibitor arc leading from placep is an RsA=r deﬁned as: EmptyEq(p) = ({q1,q 2,q 3}, R, {q1 a | ∅, ∅, ∅, {re ↦→ ∅} q2,q 2 a | ∅, ∅, {rp ↦→ {re}}, ∅ q3}, {q1}, {q3}). Intuitively, theEmptyEq simulates a register emptiness test. At ﬁrst, the protogadget explicitly assigns the value of emp...

work page

[40] [41]

Intuitively, for each arc originating in the transition and ending in a particular place, a token is added to the register representing the destination

The New Token(depicted in Figure 10b) which simulates adding a token to a placep, is an RsArm =∅ deﬁned in the following way: NewToken(p) = ({q1,q 2}, R, {q1 a | ∅, R, ∅, {rp ↦→ {rp, in}}} q2}, {q1}, {q2}). Intuitively, for each arc originating in the transition and ending in a particular place, a token is added to the register representing the destinatio...

work page

[41] [42]

Intuitively, for each arc originating in a placep and terminating in transition, respective number of values has to be removed from the register representing placep

The Non-lossy Remove Token(depicted in Figure 10c) which simulates removal of a token from a placep is an RsA=r deﬁned in the following way: NonLossyRm(p) = ({q1,q 2,q 3}, R, {q1 a | {rp}, ∅, ∅, {rin ↦→ {in}, rs ↦→ {in}} q2, q2 a | {rp}, {rin}, ∅, {rtmp ↦→ {rtmp, inp}, rs ↦→ {rtmp, rin, in}} q2, q2 a | ∅, ∅, {rp ↦→ {rs}}, {rp ↦→ {rtmp}} q3}, {q1}, {q2}). ...

work page 2016

[42] [43]

The Empty protogadget (depicted in Figure 11a), which simulates the inhibitor arc leading from the placep is an RsArm =∅ deﬁned as: Empty(p) = ({q1,q 2}, R, {q1 a | ∅, ∅, {rp}, ∅, {rp ↦→ ∅}} q2}, {q1}, {q2}). S. Gulčíková and O. Lengál 23:37 q1 q2 a rp = ∅ (a) The Empty(p) protogadget. q1 q2 a {in /∈ r}r∈R rp ← rp ∪ {in} (b) The NewToken(p) protogadget. q...

work page

[43] [44]

Intuitively, for each arc originating in the transition and ending in a particular place, a token is added to the register representing the destination

The New Tokenprotogadget (depicted in Figure 11b) which simulates adding a token to a placep, is anRsArm =∅ deﬁned in the following way: NewToken(p) = ({q1,q 2}, R, {q1 a | ∅, R, ∅, ∅, {rp ↦→ {rp, in}}} q2}, {q1}, {q2}). Intuitively, for each arc originating in the transition and ending in a particular place, a token is added to the register representing ...

work page

[44] [45]

Intuitively, for each token removed from the source place, one value is removed from the register which represents that place

The Remove Tokenprotogadget (depicted in Figure 11c) which simulates the removal of a token from a placep, is anRsArm =∅ deﬁned in the following way: TokenRem(p) = ({q1,q 2}, R, {q1 a | ∅, {rp}, ∅, {rp}, {rp ↦→ {rp}}} q2}, {q1}, {q2}). Intuitively, for each token removed from the source place, one value is removed from the register which represents that p...

work page 2016