pith. sign in

arxiv: 1906.10666 · v1 · pith:WWUZA3B6new · submitted 2019-06-25 · 💻 cs.AR

Automatic Conversion from Flip-flop to 3-phase Latch-based Designs

Pith reviewed 2026-05-25 15:44 UTC · model grok-4.3

classification 💻 cs.AR
keywords flip-flop to latch conversion3-phase latch designmaster-slave comparisonautomated RTL transformationlatch count reductionlow-power circuit designtiming preservation
0
0 comments X

The pith

An automated flow converts flip-flop designs into 3-phase latch-based circuits that match master-slave performance while using fewer latches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new automated design flow that transforms conventional flip-flop RTL specifications into 3-phase latch-based implementations. Unlike earlier methods limited to master-slave pairs with two non-overlapping clocks, the 3-phase approach allows greater latch sharing while keeping the same clock speed and timing behavior. On standard benchmark suites the converted circuits show measurable reductions in latch count, total area, and power draw. The method therefore removes a key barrier that has kept latch-based styles from wider use in RTL flows.

Core claim

The central claim is that an automated conversion algorithm can map flip-flop circuits to 3-phase latch designs that preserve original functionality and timing yet require substantially fewer latches than the conventional master-slave conversion, yielding average savings of 21.3 percent in latch count, 5.8 percent in area, and 16.3 percent in power across ISCAS, CEP, and CPU benchmarks.

What carries the argument

The automated design flow that performs the flip-flop to 3-phase latch conversion by generating three non-overlapping clock phases and inserting latches accordingly.

If this is right

  • The resulting 3-phase circuits deliver identical performance to master-slave latch versions.
  • Latch count drops by an average of 21.3 percent relative to master-slave conversion.
  • Area decreases by 5.8 percent and power by 16.3 percent on the tested benchmarks.
  • The flow works on a variety of ISCAS, CEP, and CPU circuits without manual redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the conversion proves reliable on larger industrial designs, existing flip-flop RTL libraries could be reused with lower power cost.
  • The 3-phase style may open new trade-offs between clock distribution complexity and storage element count in future low-power flows.
  • Verification tools would need to handle three-phase timing checks to make the method fully automatic end-to-end.

Load-bearing premise

The conversion algorithm preserves the original circuit's functionality and timing when it replaces flip-flops with 3-phase latches.

What would settle it

A converted 3-phase netlist that produces different outputs from the original flip-flop design on the same input vectors or violates the original timing constraints.

Figures

Figures reproduced from arXiv: 1906.10666 by Huimei Cheng, Peter A. Beerel, Yichen Gu.

Figure 1
Figure 1. Figure 1: Converting a linear FF-based pipeline (a) to a 2-phase [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example non-linear pipeline requiring 4-phase clocking [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Enabled (a) to gated clock (b) transformation [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Duplicated clock gating logic for phase conversion [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: 3-phase clocks for modified retiming assignments. Further optimization is then triggered to op￾timize the sizes of gates in the retimed latch-based design. 5. EXPERIMENTAL RESULTS This section quantifies the benefits of the proposed con￾version algorithm comparing the resulting 3-phase design to the original FF-based as well as traditional master-slave latch-based designs. The experiments rely on an indust… view at source ↗
read the original abstract

Latch-based designs have many benefits over their flip-flop based counterparts but have limited use partially because most RTL specifications are flop-centric and automatic conversion of FF to latch-based designs is challenging. Conventional conversion algorithms target master-slave latch-based designs with two non-overlapping clocks. This paper presents a novel automated design flow that converts flip-flop to 3-phase latch-based designs. The resulting circuits have the same performance as the master-slave based designs but require significantly less latches. Our experimental results demonstrate the potential for savings in the number of latches (21.3%), area (5.8%), and power (16.3%) on a variety of ISCAS, CEP, and CPU benchmark circuits, compared to the master-slave conversions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents a novel automated design flow that converts flip-flop based designs to 3-phase latch-based designs. It claims the resulting circuits achieve the same performance as master-slave latch designs while requiring significantly fewer latches, with experimental results on ISCAS, CEP, and CPU benchmarks demonstrating average savings of 21.3% in latches, 5.8% in area, and 16.3% in power compared to master-slave conversions.

Significance. If the conversion algorithm is shown to preserve functionality and timing, the work could meaningfully advance practical adoption of latch-based designs by automating conversion from common FF-centric RTL, potentially enabling area and power reductions without performance trade-offs on standard benchmarks.

major comments (2)
  1. [Abstract] Abstract: the central claims of functional equivalence, timing preservation, and quantitative savings rest on an automated conversion algorithm whose description, correctness argument, and verification method are entirely absent from the provided text. This is load-bearing because the reported 21.3% latch reduction is only meaningful if the 3-phase designs are provably equivalent to the original FF designs.
  2. [Experimental results] Experimental results section (implied by abstract): no details are given on the experimental setup, benchmark synthesis flow, timing analysis method, or any functional verification (simulation or formal) that would confirm the claimed performance parity and savings. Without these, the 5.8% area and 16.3% power figures cannot be assessed for reproducibility or validity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review and for highlighting areas where the manuscript requires greater detail. We address the major comments point by point below and will revise the paper accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claims of functional equivalence, timing preservation, and quantitative savings rest on an automated conversion algorithm whose description, correctness argument, and verification method are entirely absent from the provided text. This is load-bearing because the reported 21.3% latch reduction is only meaningful if the 3-phase designs are provably equivalent to the original FF designs.

    Authors: We acknowledge that the abstract does not contain the algorithm description or verification details. The full manuscript body includes the conversion algorithm, but to strengthen the paper we will revise the abstract to briefly outline the algorithm's approach to functional equivalence and timing preservation, and add explicit references to the correctness argument and verification method. revision: yes

  2. Referee: [Experimental results] Experimental results section (implied by abstract): no details are given on the experimental setup, benchmark synthesis flow, timing analysis method, or any functional verification (simulation or formal) that would confirm the claimed performance parity and savings. Without these, the 5.8% area and 16.3% power figures cannot be assessed for reproducibility or validity.

    Authors: We agree that the experimental results section is insufficiently detailed. The revised manuscript will expand this section to fully describe the synthesis flow (tools, libraries, and constraints), timing analysis method, area/power estimation approach, benchmark details, and the functional verification process (including simulation and any formal checks) used to confirm equivalence and performance parity. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an algorithmic conversion flow from flip-flop to 3-phase latch designs and validates it via direct experimental comparison on standard ISCAS/CEP/CPU benchmarks, reporting measured savings in latches, area, and power. No equations, fitted parameters, or uniqueness theorems appear in the provided material; the central claims rest on the existence of the implemented flow and its benchmark outcomes rather than any self-referential reduction or self-citation chain. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on any free parameters, axioms, or invented entities in the method.

pith-pipeline@v0.9.0 · 5653 in / 1129 out tokens · 52090 ms · 2026-05-25T15:44:21.561813+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    One of two de- vices: edge-triggered flip-flops (FFs) or level-sensitive latches are typically used as synchronization and state storage

    INTRODUCTION The growing use of portable/wireless electronic systems and Internet-of-Things (IoT) applications motivates the de- sire of smaller and more energy-efficient designs in today’s very large scale integration (VLSI) circuits. One of two de- vices: edge-triggered flip-flops (FFs) or level-sensitive latches are typically used as synchronization and st...

  2. [2]

    BACKGROUND The Sakallah, Mudge, and Okulotun (SMO) model [18] defines an optimal framework for multi-phase latch-based designs. It defines a k-phase clock as a collection of k peri- odic signals with a common cycle time and associated timing constraints, called the General System Timing Constraints arXiv:1906.10666v1 [cs.AR] 25 Jun 2019 (GSTC). The phases (...

  3. [3]

    This section explores the implicit trade-offs associated with these constraints and motivates our three-phase clocking approach

    LATCH-BASED DESIGNS This paper’s goal is to convert an FF-based to latch-based design minimizing the number of latches based on a reason- able set of constraints. This section explores the implicit trade-offs associated with these constraints and motivates our three-phase clocking approach. 3.1 Minimal Constraints There are two constraints we adopt that ar...

  4. [4]

    The group of FFs converted to a single latch are assigned to clock phase p1

    CONVERSION ALGORITHM Our conversion approach is to automatically decompose the FFs into two groups, ones that will be converted to back- to-back connected latches and ones that will be converted into a single latch. The group of FFs converted to a single latch are assigned to clock phase p1. The remaining FFs are converted to latches clocked by either p1 ...

  5. [5]

    pi” program, ARM-M0 was running the “hello world

    EXPERIMENTAL RESULTS This section quantifies the benefits of the proposed con- version algorithm comparing the resulting 3-phase design to the original FF-based as well as traditional master-slave latch-based designs. The experiments rely on an industrial 28-nm FDSOI CMOS cell library and a range of circuits that include, ISCAS89 benchmark circuits [19], CE...

  6. [6]

    CONCLUSIONS This paper presents an algorithm to automatically convert a FF-based design into a 3-phase latch-based design that uses an ILP to minimize the number of required latches. Our experimental synthesis results on a broad range of bench- mark circuits show significant savings are possible in both area and power with practical computational run-times...

  7. [7]

    Blue gene/L compute chip: Control, test, and bring-up infrastructure,

    R. A. Haring, R. Bellofatto, A. A. Bright, P. G. Crumley, M. B. Dombrowa, S. M. Douskey, M. R. Ellavsky, B. Gopalsamy, D. Hoenicke, T. A. Liebsch, J. A. Marcella, and M. Ohmacht, “Blue gene/L compute chip: Control, test, and bring-up infrastructure,” IBM Journal of Research and Development, vol. 49, no. 2.3, pp. 289–301, March 2005

  8. [8]

    Low power latch based design with smart retiming,

    K. Singh, H. Jiao, J. Huisken, H. Fatemi, and J. P. De Gyvez, “Low power latch based design with smart retiming,” in Quality Electronic Design (ISQED), International Symposium on . IEEE, 2018, pp. 329–334

  9. [9]

    Sub-threshold latch-based icyflex2 32-bit processor with wide supply range operation,

    M. Pons, T. Le, C. Arm, D. S´ everac, J. Nagel, M. Morgan, and S. Emery, “Sub-threshold latch-based icyflex2 32-bit processor with wide supply range operation,” in 2016 46th European Solid-State Device Research Conference (ESSDERC), Sept 2016, pp. 33–36

  10. [10]

    The advantages of latch-based design under process variation,

    A. P Hurst and R. K Brayton, “The advantages of latch-based design under process variation,” in Proceedings of the IWLS , 2006

  11. [11]

    Bubble razor: An architecture-independent approach to timing-error detection and correction,

    M. Fojtik, D. Fick, Y. Kim, N. Pinckney, D. Harris, D. Blaauw, and D. Sylvester, “Bubble razor: An architecture-independent approach to timing-error detection and correction,” in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International. IEEE, 2012, pp. 488–490

  12. [12]

    Blade–a timing violation resilient asynchronous template,

    D. Hand, M. T. Moreira, H.-H. Huang, D. Chen, F. Butzke, Z. Li, M. Gibiluka, M. Breuer, N. L. V. Calazans, and P. A. Beerel, “Blade–a timing violation resilient asynchronous template,” in ASYNC. IEEE, 2015, pp. 21–28

  13. [13]

    Low-power pulse-triggered flip-flop design based on a signal feed-through scheme,

    J.-F. Lin, “Low-power pulse-triggered flip-flop design based on a signal feed-through scheme,” IEEE Transaction on Very Large Scale Integration (VLSI) Systems , vol. 22, no. 1, pp. 181–185, 2014

  14. [14]

    Statistical time borrowing for pulsed-latch circuit designs,

    S. Paik, L.-e. Yu, and Y. Shin, “Statistical time borrowing for pulsed-latch circuit designs,” in Proceedings of the 2010 Asia and South Pacific Design Automation Conference . IEEE Press, 2010, pp. 675–680

  15. [15]

    Multi-bit pulsed-latch based low power synchronous circuit design,

    K. Singh, O. A. R. Rosas, H. Jiao, J. Huisken, and J. P. de Gyvez, “Multi-bit pulsed-latch based low power synchronous circuit design,” in Circuits and Systems (ISCAS), 2018 IEEE International Symposium on . IEEE, 2018, pp. 1–5

  16. [16]

    Short path padding with multiple-Vtcells for wide-pulsed-latch based circuits at ultra-low voltage,

    Y. Ding, W. Jin, G. He, and W. He, “Short path padding with multiple-Vtcells for wide-pulsed-latch based circuits at ultra-low voltage,” in 2017 IEEE 12th International Conference on ASIC (ASICON) , Oct 2017, pp. 985–988

  17. [17]

    Pulsed-latch circuits: A new dimension in asic design,

    Y. Shin and S. Paik, “Pulsed-latch circuits: A new dimension in asic design,” IEEE Design & Test of Computers, vol. 28, no. 6, pp. 50–57, 2011

  18. [18]

    Timing optimization by replacing flip-flops to latches,

    K. Yoshikawa, Y. Hagihara, K. Kanamaru, Y. Nakamura, S. Inui, and T. Yoshimura, “Timing optimization by replacing flip-flops to latches,” in Proceedings of the Asia and South Pacific Design Automation Conference . IEEE Press, 2004, pp. 186–191

  19. [19]

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications,

    J. Cortadella, A. Kondratyev, L. Lavagno, and C. P. Sotiriou, “Desynchronization: Synthesis of asynchronous circuits from synchronous specifications,” IEEE Trans. on CAD, vol. 25, no. 10, pp. 1904–1921, 2006

  20. [20]

    Asynchronous design by conversion: Converting synchronous circuits into asynchronous ones,

    A. Branover, R. Kol, and R. Ginosar, “Asynchronous design by conversion: Converting synchronous circuits into asynchronous ones,” in Proceedings of the conference on Design, Automation and Test in Europe-Volume 2 . IEEE Computer Society, 2004, pp. 870–875

  21. [21]

    Performance and area optimization of a bundled-data Intel processor through resynthesis,

    A. Saifhashemi, D. Hand, P. A. Beerel, W. Koven, and H. Wang, “Performance and area optimization of a bundled-data Intel processor through resynthesis,” in ASYNC, May 2014, pp. 110–111

  22. [22]

    Challenges in building an open-source flow from RTL to bundled-data design,

    Y. Zhang, H. Cheng, D. Chen, H. Fu, S. Agarwal, M. Lin, and P. A. Beerel, “Challenges in building an open-source flow from RTL to bundled-data design,” in Asynchronous Circuits and Systems (ASYNC), IEEE International Symposium on, 2018

  23. [23]

    Automatic retiming of two-phase latch-based resilient circuits,

    H. Cheng, H.-L. Wang, M. Zhang, D. Hand, and P. A. Beerel, “Automatic retiming of two-phase latch-based resilient circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2018

  24. [24]

    Optimal clocking of synchronous systems,

    K. A. Sakallah, T. N. Mudge, and O. A. Olukotun, “Optimal clocking of synchronous systems,” in In ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems , 1990, pp. 1–21

  25. [25]

    ISCAS89: International symposium on circuits and systems sequential benchmark. http://www.pld.ttu.ee/˜maksim/benchmarks/iscas89/verilog/

    “ISCAS89: International symposium on circuits and systems sequential benchmark. http://www.pld.ttu.ee/˜maksim/benchmarks/iscas89/verilog/.”

  26. [26]

    MIT-LL common evaluation platform (CEP),

    “MIT-LL common evaluation platform (CEP),” https://github.com/mit-ll/CEP, available: 2019

  27. [27]

    Plasma CPU,

    “Plasma CPU,” http://opencores.org/project,plasma, available: 2014

  28. [28]

    Rocket chip,

    “Rocket chip,” https://github.com/freechipsproject/rocket-chip, available: 2016

  29. [29]

    ARM Cortex M0,

    “ARM Cortex M0,” https://developer.arm.com/products/ processors/cortex-m/cortex-m0

  30. [30]

    The case for retiming with explicit reset circuitry,

    V. Singhal, S. Malik, and R. K. Brayton, “The case for retiming with explicit reset circuitry,” in Proceedings of the 1996 IEEE/ACM international conference on Computer-aided design. IEEE Computer Society, 1997, pp. 618–625

  31. [31]

    Gurobi optimizer reference manual,

    L. Gurobi Optimization, “Gurobi optimizer reference manual,” 2018. [Online]. Available: http://www.gurobi.com

  32. [32]

    Chinnery and K

    D. Chinnery and K. Keutzer, Closing the gap between ASIC & custom: tools and techniques for high-performance ASIC design. Springer Science & Business Media, 2002

  33. [33]

    Combinational profiles of sequential benchmark circuits,

    F. Brglez, D. Bryan, and K. Kozminski, “Combinational profiles of sequential benchmark circuits,” in IEEE International Symposium on Circuits and Systems , 1989, pp. 1929–1934