<table>
<thead>
<tr>
<th>Title</th>
<th>A blind dynamic fingerprinting technique for sequential circuit intellectual property protection</th>
</tr>
</thead>
<tbody>
<tr>
<td>Author(s)</td>
<td>Chang, Chip Hong; Zhang, Li</td>
</tr>
<tr>
<td>Date</td>
<td>2014</td>
</tr>
<tr>
<td>URL</td>
<td><a href="http://hdl.handle.net/10220/19302">http://hdl.handle.net/10220/19302</a></td>
</tr>
<tr>
<td>Rights</td>
<td>© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [<a href="http://dx.doi.org/10.1109/TCAD.2013.2282282">http://dx.doi.org/10.1109/TCAD.2013.2282282</a>].</td>
</tr>
</tbody>
</table>
A Blind Dynamic Fingerprinting Technique for Sequential Circuit Intellectual Property Protection

Chip-Hong Chang, Senior Member, IEEE and Li Zhang, Student Member, IEEE

Abstract—Design fingerprinting is a means to trace the illegally redistributed intellectual property (IP) by creating a unique IP instance with a different signature for each user. Existing fingerprinting techniques for hardware IP protection focus on lowering the design effort to create a large number of different IP instances without paying much attention on the ease of fingerprint detection upon IP integration. This paper presents the first dynamic fingerprinting technique on sequential circuit IP to enable both the owner and legal buyers of an IP embedded in a chip to be readily identified in the field. The proposed fingerprint is an oblivious ownership watermark independently endorsed by each user through a blind signature protocol. Thus the authorship can also be proved through the detection of different user’s fingerprints without the need to separately embed an identical IP owner’s signature in all fingerprinted instances. The proposed technique is applicable to both application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) IPs. Our analyses show that the fingerprint is immune to collusion attack and can withstand all conceivable attacks, with a lower probability of removal than state-of-the-art FSM watermarking schemes. The probability of coincidence of a 32-bit fingerprint in the order of $10^{-9}$ and up to $10^{35}$ 32-bit fingerprinted instances can be generated for a small design of 100 flip-flops.

Index Terms—VLSI IPP, fingerprinting, watermarking, field authentication, system-on-chip, synthesis-for-testability

I. INTRODUCTION

Over the last decade and for many years to come, IP-based design methodology has been and will remain a key enabler to improving integrated circuit (IC) design productivity at advanced design processes. According to a recent market research: “The semiconductor IP market is growing in both the IC IP and System-on-Chip (SoC) IP sub-sectors, but the revenues from the SoC IP segment are expected to grow faster at an estimated compound annual growth rate of 19.16% from 2012 to 2017” [1]. This design paradigm has over years dramatically increased the organizational complexity of IC design and pushes for a vertical specialization of organization where stages of IC design are disintegrated, and outsourced to firms and relocated across national boundaries that possess the tacit knowledge and expertise at lower cost. A rising concern of this geographical dispersion of chip design activities is the contaminated chip supply due to the infiltration of counterfeit chips. Electronics Resellers Association International (ERAI), a private group that tracks and fights counterfeit electronics reported that dubious chips are increasing year by year [2]. The perils are alarming when thousands of fake chips from brand names such as Motorola, Intel, Cypress, Altera and National Semiconductors had been sold by a counterfeit-chip broker VisionTech before it was prosecuted in 2010 [3].

Prevailing means of IP protection includes external communications of legal protection, such as patents, copyrights and contracts, to deter illegal IP infringement, and licensing agreement and encryption to deter unauthorized use of IPs. Such approaches, while effective against the benign users, are inadequate to suppress misappropriation of IPs by aggressors, malicious clients, competitors and curious contractors. Owing to the easily accessible, easily integratable and uncommitted physical manifestation nature of reusable IPs, the fraudsters can easily dispute or challenge the validity of the patent or copyright behind an IP, especially if it is not broad, fundamental, or strong with regards to the obviousness test, novelty of idea, or prior art. The lengthy and costly legal process, together with the high degree of uncertainty over the outcome of IP litigation tends to discourage the pursuit of IP infringements. As cases of IP violations by customers are more difficult to prove, IP vendors usually do not take legal action against their customers, resulting in the buoyant IP theft.

As long as infringement remains easy and perpetration is difficult to prove, sizeable portion of the investment in IP development will be extorted. IP providers are in dire need of an effective mechanism to protect not only their ownership rights, but also the legitimacy of the usage of their IPs. Existing watermarking schemes [4-13], though capable of identifying the legal ownership of an IP in case of piracy, is unable to trace the guilty buyer from the unauthorized resold copies of his legally owned IP. This is because the distributed IP instances to different buyers are identical and carry the same mark. This problem can be solved by fingerprinting, which makes the IP instance distributed to each system integrator unique and distinguishable. In this way, when some user illegally redistributes his legally owned IP instance to another party for use in an unauthorized application, the fingerprint embedded in it can be extracted and used to identify the source of misuse.

IP fingerprinting shares much of the desiderata and challenges as IP watermarking, like functionality preservation, high credibility of ownership proof, low design overheads, robustness against removal attacks, transparency to existing design flow and ease of ownership verification. In addition, it
must be able to generate many different high-quality instances with reasonably low design effort. Thus, the intuitive way to fingerprint a design by repeating the entire optimization process of a typical constraint-based watermarking technique [5] to embed a different mark for each user is unacceptable. Very few proposals [14-17] can avoid the complexity of generating a sufficiently large number of quality fingerprinted instances.

The technique proposed in [14, 15] protects FPGA IPs at the physical design level. Fingerprint bits are embedded into the unused logic blocks in unique locations of different tile instances, and FPGA design tiling and partitioning are used to lower the cost of generating many different functionally equivalent circuit instances. As the logic blocks that host the marks are not the only differences between different fingerprinted instances, it is unlikely that tile comparison collusion will succeed in removing a large portion of the fingerprint. To facilitate the recovery of the recipient fingerprint, the owner signature and the recipient fingerprint are always placed in a constant location relative to one another. The reliance on the identification of the locations of ownership marks for the extraction of the correct instance recipient fingerprint may escalate the risk of common ownership mark locations as targets of attack. After all, marks hidden in unused logic blocks can be easily removed without altering the functionality. Moreover, the retrieval of FPGA configuration for mark validation restricts its applicability to mainly static memory based devices. On the other hand, the fingerprinting technique in [16] creates a large solution space by solving the problem only once. The problem solved is not the original problem, but a modified one. From the generated solution, different solutions are derived for the original problem. The general technique is proved by an NP-complete graph coloring problem. For specific IPs, finding the modification to obtain different solutions can be a challenge. The method in [17] solves the problem once to create a seed solution, from which smaller problems are generated. Fingerprinted solutions are then created by re-solving these smaller problems. While the design effort has been greatly reduced, the overhead is still unacceptable when the required number of different IP instances is large. All the above mentioned methods validate the fingerprint by indirect detection. Depending on the method and the level of abstraction where the mark was applied, non-trivial effort may be needed to check if the constraints generated by the fingerprint are satisfied from a deeply integrated IP. In short, none of the existing schemes are able to conveniently detect the embedded fingerprint off-chip after the IP is integrated and packaged as in the dynamic watermarking schemes of [7, 8].

In this paper, a new fingerprinting technique for the protection of sequential circuit IPs is proposed. The method is unique in many ways. It is the first dynamic fingerprinting scheme in which the embedded ownership and buyer’s identity of an embedded IP can be conveniently detected off-chip by injecting a specific input sequence. A single fused signature that carries the watermark for ownership proof and fingerprint for buyer identification, as opposed to the concatenation of two separate and independent signatures, is generated through a message exchange in a blind signature protocol. Both the watermark and fingerprint can be detected by running the fingerprinted design or the IC that contains the fingerprinted instance with a specific sequence of input bound to a buyer. The IP provider can easily identify the buyer without divulging any sensitive information about the fingerprint yet the buyer’s endorsement of the detected fingerprint is indisputable. These are in stark contrast to other fingerprinting techniques and are made possible by recoding the state variables through the embedding of a test machine into the sequential circuit to be fingerprinted. To the best of our knowledge, this is also the first work that exploits state encoding for fingerprinting sequential circuit IPs. Unlike finite state machine (FSM) watermarking on state transitions or topologies [9-12], which have a tendency to introduce obtrusive dummy inputs, replicated states or redundant transitions, state encoding modifies the state values without introducing new input variable or new state variable and have a global influence on the circuit structure. The fingerprint is thus inherently collusion-resistant as the signatures encoded into the state variables form an integral part of the solution space, and the marked and unmarked circuit components cannot be discriminated by the structural disparities between different fingerprinted instances. To reduce the effort of generating a large number of fingerprinted instances, variable chaining heuristic and dependency-directed partitioning are proposed to break a large sequential circuit into multiple smaller interconnected segments for test machine embedding. The optimized testable segmented circuits are used as seed design for state recoding by the signature. The overheads of different fingerprinted instances have been greatly reduced by the fusion of watermark and fingerprint, the state variable dependency minimization and the pre-optimization of partitioned seed designs. Our method is applicable to sequential circuit IPs targeted for both ASIC and FPGA implementations. As there is no prerequisite of specialized scan flip-flops (FFs) or specific FPGA configuration fabrics, our scheme is transparent and applicable to different technology families of FPGAs. With additional locking and activation mechanism, our fingerprinting technique can also be devised to perform active hardware metering like [18-21]. These schemes can effectively prevent instead of passively confirming IC piracy and chip overproduction by controlling the access to part of the chip’s functionality with a specific input sequence. Although schemes [18, 19] embed a watermark into the added obfuscation FSM, the watermark can only authenticate the IP author but not its buyer, and cannot be easily detected off-chip. By combining the active metering technique with our scheme, even if the locking function is cracked, the IP authorship and the buyer responsible for the fraudulent IP instance can still be traced by the fingerprint.

The rest of this paper is organized as follows. Section II introduces the property of test machine and the fundamentals of test machine embedding. The fingerprint generation, embedding and detection of the proposed fingerprinting method are presented in Section III. Section IV analyzes the security of the proposed scheme with respect to its credibility, robustness and feasibility. Experimental results are presented in Section V. The paper is concluded in Section VI.
II. PRELIMINARIES ON TEST MACHINE INSERTION

Our fingerprinting method is founded on the test machine embedding problem of synchronous sequential circuits [22-25]. This section presents the construction of a n-FF test machine and explains how its test property can be embedded into an n-FF sequential function to synthesize a testable design.

A. Test Machine Fundamentals

An FSM is a quintuple \( M = (S, I, O, \Delta, \Lambda) \), where \( S \) is a finite set of \( N \) states \( \{S_1, \ldots, S_N\} \), \( I \) is a finite set of \( p \) primary inputs \( \{y_1, \ldots, y_p\} \), \( O \) is a finite set of \( q \) primary outputs \( \{z_1, \ldots, z_q\} \), \( \Delta : S \times I \rightarrow S \) is the state transition function and \( \Lambda : S \times I \rightarrow O \) is the output function [26].

\( M \) can be represented by a state transition graph (STG), which is a directed graph whose vertices and edges correspond to the states and state transitions of \( M \); each edge is labeled with the input and output associated with the transition.

An FSM can be tested by checking if its transitions \( t_i = (S_i, S_j, y, z) \), \( S_i, S_j \in S \), \( y \in I \), \( z \in O \), \( S_j = \Delta(S_i, y) \) and \( z = \Lambda(S_i, y) \) are implemented as specified by its STG. This simple conformance test requires a sequence of inputs \( y \in I \) to bring the physical FSM from any state to \( S_j \) and a sequence of inputs to verify that the FSM reaches the designated state \( S_j \) after the stimulating transition \( t_i \).

A synchronizing sequence (SS) is defined as an input sequence which, when applied to any initial state of \( M \), will drive \( M \) to a single specific state [27]. Likewise, a distinguishing sequence (DS) can also be defined as a sequence of inputs such that for any state of \( M \), the sequence of outputs generated in executing that sequence uniquely identifies the state. The output response that uniquely identifies the state is called the distinguishing response (DR) of the state.

An \( n \)-FF test machine is a special FSM, denoted by \( M' = (S', I', O', \Delta', \Lambda') \), with \( N = 2^n \) states, one input \( y' \) and one output \( z' \). Any states \( S_i \in S', i \in \{0, N - 1\} \), of it can be set by applying an \( n \)-bit SS and be distinguished at the output by applying an \( n \)-bit DS. It can be constructed as follows:

\[
\Delta'(S_i, y') = \left\{ \begin{array}{ll}
S_i & \text{if } 2i < N \\
S_{2i-N} & \text{otherwise}
\end{array} \right. \quad (1)
\]

\[
\Delta'(S_i, y') = \left\{ \begin{array}{ll}
S_{2i+1} & \text{if } 2i + 1 < N \\
S_{2i+1-N} & \text{otherwise}
\end{array} \right.
\]

\[
\Lambda'(S_i, y') = \left\{ \begin{array}{ll}
b & \text{if } i < N / 2 \\
\bar{b} & \text{otherwise}
\end{array} \right. \quad (2)
\]

where \( a, b \in \{0, 1\} \) are the arbitrary constants assigned to the input and output of \( M' \).

Fig. 1 shows the STG for a 3-FF test machine. Let \( SS(S_i) \) be the binary representation of SS of state \( S_i \). By convention, the least significant bit (LSB) of \( SS(S_i) \) is input first. One important property of the test machine is that the output response is different from each initial state \( S_i \) regardless of the input sequence. Hence, any 3-bit sequence can be used as the DS. Let \( DR(S_i) \) denotes the binary representation of DR of state \( S_i \) such that the LSB of \( DR(S_i) \) is output first. The binary representations of SS and DR for the 3-FF machine in Fig. 1 are shown in Table I. If \( a = b = 0 \), then \( SS(S_i) \) and \( DR(S_i) \) will be equal to the bit reversal of the binary index \( i \) of \( S_i \).

![Fig. 1. STG of a 3-FF test machine.](image)

<table>
<thead>
<tr>
<th>SS</th>
<th>To state</th>
<th>( S_0 )</th>
<th>( S_1 )</th>
<th>( S_2 )</th>
<th>( S_3 )</th>
<th>( S_4 )</th>
<th>( S_5 )</th>
<th>( S_6 )</th>
<th>( S_7 )</th>
</tr>
</thead>
</table>
| \( ssa \) | \( S_0 \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \)
| \( sba \) | \( S_1 \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \)
| \( a\bar{a}a \) | \( S_2 \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \)
| \( \bar{a}\bar{a}a \) | \( S_3 \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \)
| \( \bar{a}a\bar{a} \) | \( S_4 \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \)
| \( a\bar{a}\bar{a} \) | \( S_5 \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \)
| \( \bar{a}a\bar{a}a \) | \( S_6 \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \)
| \( \bar{a}\bar{a}\bar{a} \) | \( S_7 \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \) | \( bbb \)

B. Test machine Insertion

For an arbitrary FSM, due to the limited controllability and observability, it is not always possible to obtain the SS and DS for all the states. These limitations are usually overcome by replacing the FFs of FSM by specialized scan FFs and chaining them to two externally added serial scan input and serial scan output pins. However, such scan insertion procedure is not a useful platform for fingerprinting a sequential design as it is difficult to keep the embedded signature stealthy due to the obtrusive and easily traceable scan FFs and scan IOs. Without introducing specialized scan FFs and additional IO pins, an alternative approach is to insert the test function into the gate-level design of \( M \) before logic synthesis by test machine insertion [25].

An \( n \)-FF test machine \( M' = (S', I', O', \Delta', \Lambda') \) can be embedded into an arbitrary \( n \)-FF FSM \( M = (S, I, O, \Delta, \Lambda) \) by isomorphic mapping of the states and IO symbols of \( M \) to those of \( M' \). The input \( y' \) and output \( z' \) of \( M' \) can be shared with an input \( y \in I \) and an output \( z \in O \) of \( M \), respectively by using multiplexers. To share the same set of FFs for multiplexing the next state functions, \( \Delta'(S', y') \) and \( \Delta(S, y) \), and the output functions, \( \Lambda'(S', y') \) and \( \Lambda(S, y) \), of \( M' \) and \( M \), respectively, the state variables, \( x_1', x_2', \ldots, x_n' \), of \( M' \) must be mapped to the state variables, \( x_1, x_2, \ldots, x_n \), of \( M \). The mapping can be performed in polynomial time for completely specified function without adding new transition to \( M' \) [24]. If \( M \) has unspecified transitions for input \( y \) at some state in \( S \), \( M' \) may not be compatible to \( M \).
obtain an isomorphic mapping, a minimum number of additional transitions may have to be introduced into $M$ to implicitly specify the responses of some of these unspecified transitions. A new test-enable (TE) input can then be introduced into $M$ to select between the normal function and the test function.

Let $\pi : X' \rightarrow X$ be the mapping of the set of variables from $X' = \{x'_j\}$ to $X = \{x_i\}$ such that $x_i = \pi(x'_j), \forall i, j \in [0, n-1]$ and $S' = S(\pi(X'))$. Then the next state and output equations of $M$ will be modified to

$$X = \Delta'(S(\pi(X'))), y \cdot TE + \Delta(S, y) \cdot \overline{TE} \quad (3)$$

$$O = \Delta'(S(\pi(X'))), y \cdot TE + \Lambda(S, y) \cdot \overline{TE} \quad (4)$$

where $\Delta$ and $\Delta'$ are the next state equations of $X$ and $X'$ of $M$ and $M'$, respectively.

The complexity of the merged next state and output decoders can be reduced by keeping its state variable dependency low in state variable mapping, with the help of a dependency graph (DG). A DG is a directed graph with each vertex representing a FF and each arc representing the signal flow from a FF’s output to the same or a different FF’s input by abstracting away the combinational logic between them. DG of the object machine $M$ can be obtained by tracing the netlist, but DG of the test machine $M'$ cannot be obtained before state assignment.

A two-block partition [26] on the state set $S$ of $M$ divides $S$ into two disjoint subsets, i.e., $B_1 \cup B_2 = S$ and $B_1 \cap B_2 = \emptyset$; Each subset of states, $B_1, B_2 \subset S$, is called a block. Let $s_i \equiv s_j$ denote that states $s_i$ and $s_j$ are in the same block of partition. A partition for $M$ is said to be closed if $\Delta(s_i, y) \equiv \Delta(s_j, y)$ for all $s_i \equiv s_j$ and $y \in I$, where $s_i, s_j \in S$.

To obtain a state assignment for $M'$, a two-block partition is induced by each state variable such that it is assigned the same value for all states in the same block of the partition. As there are $n$ two-block partitions and each block of a partition can be assigned either a ‘0’ or ‘1’ value for its state variable, there are $2^n$ different state encodings. The dependency of state variables is minimized if the $n$ two-block partitions of $M'$ are closed [26].

Fig. 2 shows the DG with the least state-variable dependency for the 3-FF test machine with the following state assignment:

$\text{FF1'}$: $(S_0, S_2, S_3, S_5), (S_1, S_3, S_5, S_7)$,
$\text{FF2'}$: $(S_0, S_1, S_3, S_5), (S_2, S_3, S_5, S_7)$,
$\text{FF3'}$: $(S_0, S_1, S_2, S_4), (S_5, S_6, S_7)$.

![Fig. 2. Minimum dependency graph of 3-FF test machine.](image)

As an example, consider the insertion of the 3-FF test machine into the ISCAS'89 benchmark circuit S27 shown in Fig. 3, with its DG extracted in Fig. 4. If $(\text{FF1'}, \text{FF2'}, \text{FF3'})$ of the test machine is mapped to $(\text{FF3}, \text{FF1}, \text{FF2})$ or $(\text{FF3}, \text{FF2}, \text{FF1})$ of S27, there is no increase in dependencies among the state variables.

![Fig. 4. State variable dependency graph of S27.](image)

Let $a = b = 0$ and $S'(x'_1, x'_2, x'_3)$ of $M'$ be encoded as: $S_0(101), S_1(100), S_2(111), S_3(110), S_4(001), S_5(000), S_6(011)$, and $S_7(010)$. From Fig. 1, the next state and output equations of $M'$ are given by: $x'_1 = y', x'_2 = \overline{x'_1}, x'_3 = \overline{x'_1}$ and $z' = \overline{x'_1}$, where $x'_1$ denotes the next state value of $x'$.

If $(x'_1, x'_2, x'_3)$ are mapped to $(x_1, x_2, x_3)$ of $M$, its state are encoded as: $S_0(101), S_1(001), S_2(111), S_3(011), S_4(100), S_5(000), S_6(110)$, and $S_7(010)$. From Fig. 3, the next state and output equations of $M$ are given by: $x'_1 = \overline{G_0} + \overline{f_1}$, $x'_2 = \overline{G_2} + f_2$ and $\text{OUT } = f_1$, where $f_1 = x_1 + f_3 \cdot f_4$, $f_2 = G_1 + x_3$, $f_3 = G_5 + \overline{G_0} \cdot x_2$ and $f_4 = \overline{G_0} \cdot x_2 + f_2$.

As $M$ has four input pins, if $G_0$ is arbitrarily selected to share with the input $y'$ of $M'$, then the next state and output equations after test insertion are given by:

$$x'_1 = \overline{G_0} \cdot \overline{TE} + \overline{G_2} + f_2 \cdot \overline{TE} \quad (5)$$

$$x'_2 = \overline{x_1} \cdot \overline{TE} + \overline{f_1} \cdot \overline{TE} \quad (6)$$

$$x'_3 = \overline{x_3} \cdot \overline{TE} + \overline{G_0} \cdot \overline{f_1} \cdot \overline{TE} \quad (7)$$

$$\text{OUT } = x_1 \cdot \overline{TE} + f_1 \cdot \overline{TE} \quad (8)$$

The circuits S27 after test insertion is shown in Fig. 5.

![Fig. 5. Schematic of S27 after test machine insertion.](image)

Table II shows the synthesis results of S27 after it has been embedded with each of the $2^3$ different encodings of $M$ and resynthesized by Synopsys Design Compiler (DC) using TSMC 0.18 µm technology. Different state encodings of the test machine results in different instances of varying implementation cost. The largest instance is 4.7% larger than the smallest one, the slowest instance is 11.5% slower than the fastest one, and the most power hungry instance consumes 5.1% more power than the least one.
III. PROPOSED FINGERPRINTING SCHEME

One unique challenge of IP fingerprinting over IP watermarking is the design effort required to derive a large number of distinct high-quality solutions of the same functionality for different buyers. Another challenge is the ease of recovering the signature of the embedded IP core off-chip without compromising its security against erasure and collusion attacks. Last but not least is the difficulty to lower the area and timing overheads of embedding both the authorship proof and IP buyer’s identification in the same design instance. The latter can be addressed by merging the watermark and fingerprint into one single signature but an associated problem is on keeping this signature unobtrusive without losing the non-repudiability of the authorship proof and origin of misappropriation. This section presents a fingerprinting method based on the test machine insertion problem. For ease of exposition, we refer to the original sequential circuit as object design M and the sequential circuit after the test machine insertion as testable design M. The testable design with the best synthesis result in terms of area or timing is chosen as the seed design M for fingerprinting.

A. Fingerprint Generation

Instead of independently generating the watermark and fingerprint as two independent signatures, our scheme uses a unitary signature to provide an undeniable proof of IP ownership and for traceability of illegal IP distribution. This signature is generated through a blind signature protocol [28] to preserve the anonymity of the IP authorship information endorsed by the buyer. The blind signature protocol requires a digital signature mechanism and a commuting function. For simplicity, we consider Chaum’s blind signature protocol based on RSA. Let \((n_A, e_A)\) and \(d_A\) be the certified RSA public and private keys of Buyer A, where \(n_A = p_A \cdot q_A\) is the product of two large random primes. The fingerprint \(F\) to be embedded into the IP instance for Buyer A can be generated through the following message exchange between the IP provider and Buyer A.

1. **Watermark generation:** The IP provider selects a message to convey its ownership information. The ASCII encoded message is converted into a binary string and reduced by a message digest (MD) such as SHA-2 [29] to an integer \(W\), where \(0 \leq W < n_A\).

2. **Blinding phase:** When Buyer A makes a purchase request, the IP provider randomly selects a secret integer \(k_A\) satisfying \(0 \leq k_A < n_A\) and \(\gcd(n_A, k_A) = 1\) to compute \(W_A = W \cdot (k_A)^e_A \mod n_A\). This concealed watermark \(W_A\) is then sent to Buyer A.

3. **Signing phase:** To endorse the purchase, Buyer A signs \(W_A\) with his secret key \(d_A\) and returns his blinded signature \(F_A = (W_A)^{d_A} \mod n_A\) to the IP provider.

4. **Unblinding phase:** By computing \(F = F_A \cdot k_A^{-1} \mod n_A\), the IP provider obtains the signature of Buyer A on his watermark \(W\), which is the fingerprint to be inserted into the IP instance sold to Buyer A.

The IP provider can verify if the fingerprint \(F\) is genuine by decrypting it with Buyer A’s public key. Since \(F\) is the digital signature of Buyer A on the IP provider’s watermark \(W\), it provides an undeniable proof for tracing the redistribution of the copies of IP bought by him. Meantime, as \(W\) is concealed in \(W_A\), Buyer A has no knowledge of the IP provider’s watermark \(W\) and his own signature on \(W\). The blindness of \(F\) increases the deterrent effect of uncertainty.

B. Fingerprint Insertion

By applying automatic test pattern generation on the seed design, a set of combinational test vectors can be obtained. Each test vector is then converted to a test sequence which includes an SS to excite a testable design to a specific state \(S_i\) at test mode (i.e., \(TE = ‘1’\)) and a parallel input stimulus to produce the output of a transition from \(S_i\) to some destination state \(S_d\) at capture mode (i.e., \(TE = ‘0’\)). Irrespective of the input stimulus, the destination state \(S_d\) from any starting state \(S_i\) can always be identified by its DR based on the test machine property. As demonstrated in Table I for \(n = 3\), the DR of any destination state can be observed from the primary output \(z\) by applying an arbitrary input sequence of length \(n\) through the primary input \(y\) at test mode.

To insert the fingerprint \(F = \{f_j\}_{j=0}^{m-1}\) into an object design, a test vector \(V\) for the seed design \(M\) is randomly selected. Let \(S_i\) and \(S_d\) be the initial and destination states of \(M\) for the test vector \(V\). To randomly modulate \(m\) out of \(n\) \((m << n)\) binary bits of \(DR(S_d)\) by \(F\), a keyed one-way pseudorandom number generator (PNG) is used to generate an order set \(L = \{l_i\}_{i=0}^{m-1}\), of \(m\) unique integers between 0 and \(n-1\) such that \(l_i \in [0, n-1]\), \(\forall i = 0, \ldots, m-1\) and \(l_i \neq l_j, \forall i \neq j\). \(l_i\) corresponds to the location of \(f_i\) where \(f_0\) is the LSB of \(F\). The keyed one-way PNG can be realized by SHA-2 or any other cryptographically secure hash function, as long as it is computationally infeasible to find a collision of the same group of numbers without the knowledge of the secret key.

According to \(L\), a fingerprint-compatible \(DR^* = \{r_j\}_{j=0}^{m-1}\) is generated by setting \(r_j = f_i\) if \(j = l_i\) for all \(i = 0, \ldots, m-1\) and \(r_j = -1\) otherwise, where \(-1\) denotes a don’t-care value.
A binary \(\{0, 1\}\) sequence \(A\) is said to be compatible to a ternary \(\{0, 1, \_\}\) sequence \(B\) of the same length, denoted as \(A \subseteq B\), if all but the don’t care values of \(A\) and \(B\) in the same bit positions are matched, i.e., \(a_i = b_i \lor a_i \in A, b_i \in B\) and \(b_i \neq \_\). If \(DR(S_d)\) is compatible with \(DR^*\), \(M_S\) will be used as the fingerprinted instance \(M'\). Otherwise, a subset \(S' = \{S_v \in |DR(S_d) \subseteq DR^*\}\) is extracted from the state set \(S\) of \(M_S\) such that the DRs of all its members are compatible with \(DR^*\). The fingerprinted instance can then be generated by modifying the state encoding of \(M_S\) by \(\phi: x_i \rightarrow \overline{x}, \forall i = 1, \ldots, n\), such that the encoded value of one of its states \(S''_d \in S'\) is equal to the encoded value of \(S_d\), i.e., \(S''_d = \phi(S_d)\). This recoding of state variables can be performed through the embedded test machine of \(M_S\). If there are more than one state encodings that satisfy this requirement, the testable design instance with the least overhead can be derived by replacing the encoded values \(S_i\) in Design 7, Design 3, Design 8 and Design 6 are mapped to the encoded value of \(S_i = "001"\) of \(M_S\). Among them, Design 3 has the least area and is selected as the fingerprinted design \(M'\), with \(\phi: x_i \rightarrow \overline{x}, S''_d = \phi(S_d) = S_d\) and \(S''_d = \phi(S_d) = S_d\). From Table I, \(SS(S'_d) = "001"\) and \(DR(S'_d) = "110"\). Thus, \(V^* = "001; 0100; \ldots "\). By applying \(V^*\) to \(M', f_0 = \_1\) can be detected from the second bit (i.e., \(l_0 = 1\)) of \(DR^*\).

### C. Partitioning for Efficient Fingerprinting

It is difficult to obtain an optimized seed design for fingerprinting by resynthesis when the number of FFs \(n\) is large. The resynthesis can be made more efficient by partitioning the object design into smaller interconnected segments. If each segment contains only \(c\) FFs, it becomes affordable to synthesize all the \(2^c\) testable design instances to obtain an optimized seed design for each segment if \(c < \_n\). The total number of resynthesis processes is reduced to \(\left[\frac{n}{c}\right] \cdot 2^c + [a]_b \cdot 2^H\), where \([\_]\) is the floor function and \([a]_b\) denotes the remainder of \(a\) divided by \(b\), and only a very small part of the design is resynthesized in each process.

As the DG of the test machine has a chain-like structure as shown in Fig. 2, multiple \(c\)-FF test machines can be easily cascaded to form a linear chain of state variables. To minimize the dependencies added in the DGs of the partitioned design, the longest chain removal technique [24] is used to chain the state variables of the object design before it is partitioned into smaller interconnected segments. The result is an ordered list of FFs that can be used to direct the partitioning of an \(n\)-FF object design \(M\). The following procedure outlines the state variable chaining heuristic.

1. Extract the DG of \(M\) by omitting the self-dependencies of the FFs.
2. Compute strongly connected components (SCCs)\(^1\) of the DG and replace each SCC in the DG with a single vertex to form a directed acyclic graph (DAG).
3. Obtain the longest chain of the DAG.
4. Obtain the longest chain in each SCC.
5. Replace the vertices representing the SCCs in the chain obtained in Step (3) with the chains obtained in Step (4) to form the longest chain of the DG.

---

\(^1\) A SCC of a directed graph \(G(V, E)\) is a maximal set of vertices \(C \subseteq V\) such that for every pair of vertices \(\{a, v\} \in C\), there is a directed path from \(a\) to \(v\) and a directed path from \(v\) to \(a\).
(6) Remove the vertices in the current longest chain and their associated edges from the DG. Repeat Steps (1) to (5) for the new DG. The newly extracted longest chain obtained in Step (5) is then prefixed to the existing longest chain.

(7) Repeat Step (6) until there is no vertex left in the DG.

To find the longest chain for a directed graph containing loops is an NP-complete problem [30]. Therefore, loops are removed in Step (2) to simplify the DG to DAG so that dynamic programming [31] with linear computational complexity can be used to extract the longest chain from the DAG. Finding the longest chain in a SCC is also an NP-complete problem. This problem in Step (4) is simplified by determining the longest chain from the spanning tree of SCC. The DG is reduced by pruning its longest chain in Step (6). The process repeats until no vertex is left in the DG. A linear chain of state variables for the object design is obtained with dependencies added merely for the connections from the tail of a newly extracted longest chain to the head of an existing chain between two iterations.

The algorithm for the partitioning of the object design into c-FF segments is shown in Fig. 7. The output list of each segment after the partitioning is recorded. The output list of segment i that are either primary outputs or inputs to FFs or gates in other segments.

dependency_directed_partitioning(M, FF, c)
{
M: Sequential design to be partitioned;
FF: List of FFs of M ordered according to state variable chain;
Gate: List of gates of M;
IO: List of primary inputs and primary outputs of M;
c: Predefined maximum number of FFs in each partition;
Gate[i].used = 0; FF[i].used = 0; // Initialize all gates and FFs of M
j = 0;
while (FF is not empty) {
Partition[j] = create_partition(); //create a new partition
if (number of unused FFs in FF > c)
Partition[j].FF += last c unused FFs of FF;
else
Partition[j].FF += all unused FFs of FF;
mark_used_FFs(); // mark those FFs added to Partition as used;
for each (Partition[j].element) { //element can be Gate or FF
Predecessor = get_predecessor(Partition[j].element);
if (Predecessor == FF[k] and FF[k] ≠ Partition[j].FF)
Partition[j].input += FF[k].output;
else if (Predecessor == Gate[k])
if (Gate[k].used == 0) { //Gate[k] is unused
Partition[j].gate += Gate[k];
Gate[k].used = 1;
} else //Gate[k] is used
if (Gate[k] ≠ Partition[j].Gate)
Partition[j].input += Gate[k].output;
else Partition[j].input += IO[k]; //Predecessor is IO[k]
} //end loop
j++;
} //end while loop
return Partition;
}

Fig. 7. Dependency-directed partitioning algorithm.

The object design M is partitioned into t smaller interconnected segments, M_0, M_1, ..., M_{t-1}, each with n_i FFs, where i ∈ [0, t-1] and M_0 denotes the segment that is nearest to the output z. Then an n_i-FF test machine is inserted into each M_i independently as described in Section II with 2^n_i possible state encodings. These 2^n_i different testable designs for M_i are resynthesized and the most optimized testable design is selected as the seed design M_S for M.

To disperse the fingerprint F = {f_j}_{j=0}^{m-1} with L = {l_i}_{i=0}^{m-1} location indices into the partitioned seed design, the fingerprint-compatible DR is divided into t ternary strings, DR_i = {t_{i,j}}_{j=0}^{n_i-1} for i = 0, ..., t-1 and r_{i,j} ∈ {0, 1, ‘-‘}. DR_i can be generated by setting r_{i,j} = f_k if \sum_{h=0}^{n_i} n_h + j = l_k, \forall k = 0, ..., m-1 and r_{i,j} = ‘-‘ otherwise. If all bits of DR_i are don’t cares, no fingerprint bit needs to be inserted into M_S and the fingerprinted instance M_i' ≡ M_S. Otherwise, the fingerprinting algorithm in Fig. 6 is used to insert the fingerprint bits in DR_i to produce the fingerprinted instance M_i'. All fingerprinted instances M_i' are then cascaded back together and resynthesized to produce the final fingerprinted design M'.

As an example, consider the ISCAS benchmark S344. Its DG after removing the self-dependencies is shown in Fig. 8, where the FF dependencies are represented by solid directed edges. Using the state variable chain-based heuristic, the existing longest FF chain FF1 → FF2 → FF3 → FF15 → FF4 → FF5 → FF6 → FF7 → FF8 → FF9 → FF10 → FF11 is first extracted. After removing the FFs in the longest chain from the DG, FF14, FF13 and FF12 are individually extracted as the longest chains in the subsequent iterations. By appending the existing chain to the newly extracted longest chains, the final chain of FFs is given by FF12 → FF13 → FF14 → FF1 → FF2 → FF3 → FF15 → FF4 → FF5 → FF6 → FF7 → FF8 → FF9 → FF10 → FF11. New dependencies are added to form this FF chain, which are indicated by dashed directed edges in Fig. 8.

Fig. 8. Dependency graph of S344 excluding self-dependencies. Solid edges are the existing dependencies; dark and dashed edges are respectively the selected and added dependencies of the final FF chain.
Assume the maximum number of FFs per segment is 3, the algorithm in Fig. 7 returns the following partitions:

\[ M_0 \text{ with } \{x_3(FF11), x_2(FF10), x_1(FF9)\}, \]
\[ M_1 \text{ with } \{x_3(FF8), x_2(FF7), x_1(FF6)\}, \]
\[ M_2 \text{ with } \{x_3(FF5), x_2(FF4), x_1(FF15)\}, \]
\[ M_3 \text{ with } \{x_3(FF3), x_2(FF2), x_1(FF1)\}, \]
\[ M_4 \text{ with } \{x_3(FF14), x_2(FF13), x_1(FF12)\}. \]

Table III shows the synthesized areas for each partitioned testable design \( M_i \). Designs 1 to 8 correspond to the 8 different state encodings of the embedded 3-FF test machine in the second column of Table II. The embedding is performed by \( \phi(x_1, x_2, x_3) \) of the test machine to \( (x_1, x_2, x_3) \) of each segment \( M_i \). The best designs, Design 6 for \( M_{S0} \), Design 3 for \( M_{S1} \), Design 5 for \( M_{S2} \), Design 1 for \( M_{S3} \) and \( M_{S4} \) are selected as the seed designs for fingerprinting.

Table III

<table>
<thead>
<tr>
<th>Design</th>
<th>( M_0 )</th>
<th>( M_1 )</th>
<th>( M_2 )</th>
<th>( M_3 )</th>
<th>( M_4 )</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>439.1</td>
<td>575.5</td>
<td>459.0</td>
<td>319.3</td>
<td>282.7</td>
</tr>
<tr>
<td>2</td>
<td>415.8</td>
<td>595.4</td>
<td>485.7</td>
<td>349.3</td>
<td>316.0</td>
</tr>
<tr>
<td>3</td>
<td>439.1</td>
<td>568.8</td>
<td>452.4</td>
<td>319.3</td>
<td>292.7</td>
</tr>
<tr>
<td>4</td>
<td>415.8</td>
<td>602.1</td>
<td>479.0</td>
<td>349.3</td>
<td>312.7</td>
</tr>
<tr>
<td>5</td>
<td>439.1</td>
<td>575.5</td>
<td>445.7</td>
<td>319.3</td>
<td>282.7</td>
</tr>
<tr>
<td>6</td>
<td>409.1</td>
<td>595.4</td>
<td>472.3</td>
<td>342.6</td>
<td>316.0</td>
</tr>
<tr>
<td>7</td>
<td>439.1</td>
<td>568.8</td>
<td>452.4</td>
<td>349.3</td>
<td>312.7</td>
</tr>
<tr>
<td>8</td>
<td>415.8</td>
<td>602.1</td>
<td>479.0</td>
<td>349.3</td>
<td>312.7</td>
</tr>
</tbody>
</table>

Assume a fingerprint \( F = "110001" \) and its location indices \( L = \{5, 11, 9, 2, 4, 0\} \). Then \( DR' = "---10---10--" \) is the fingerprint compatible \( DR \). It can be partitioned into \( DR'R_0 = 0-1", \ DR'R_1 = "10", \ DR'R_2 = "---", \ DR'R_3 = "1-0" \) and \( DR'R_4 = "---" \). Since \( DR'R_2 = DR'R_1 = "---", \ M_{S2} \) and \( M_{S4} \) can be used as fingerprinted instances \( M_i' \) and \( M_i'' \), respectively. Let \( V = "01110001011001; 001010111; " \) be the test vector selected for the interconnected seed designs \( M_{S4} \to M_{S3} \to M_{S2} \to M_{S1} \to M_{S0} \) with \( S_i = "S_6S_5S_4S_3S_2S_1S_0", \ S_d = "S_7S_6S_5S_4S_3S_2S_1S_0" \) and \( DR(S_d) = "11101011111001" \). The underlined bits of \( DR(S_i) \) are incompatible with \( DR' \). Since \( DR_0 = "001" \subset "0-1", \ M_{S0} \) can be used as the fingerprinted instance \( M_i' \), \( DR_1 = "111" \not\subset "10-" \). From Table I, for \( a = b = 0 \), \( DR(S_1) = "100" \) and \( DR(S_2) = "101" \) are compatible with "10-".

From Table II, the encoded value of \( S_i \) in Design 1 and \( S_i \) in Design 2 can be mapped to the encoded value of \( S_4 = S_5 = "101" \) of \( M_{S1} \). Design 1 has lower area than Design 2 of \( M_i \), and is selected as \( M_i' \), then \( \phi(x_1, x_2, x_3) \rightarrow (x_1, x_2, x_3) \), and \( S_i' = \phi(S_0) = S_4 \) and \( S_i'' = \phi(S_1) = S_5 \). From Table I, \( SS(S_0) = "001" \) and \( DR(S_0) = "101" \). Similarly, \( DR_1 = "010" \not\subset "1-0", \ DR(S_1) = "100" \) and \( DR(S_2) = "110" \) are compatible with "1-0". The encoded value of \( S_i \) in Design 7 and \( S_i \) in Design 5 can be mapped to the encoded value of \( S_4 = S_5 = "010" \) of \( M_{S2} \). Both designs have the same cost and either of them can be selected as the fingerprinted instance \( M_i' \).

If Design 7 is selected, then \( \phi(x_1, x_2, x_3) \rightarrow (x_1, x_2, x_3) \), and \( S_i' = \phi(S_0) = S_0 \) and \( S_i'' = \phi(S_1) = S_1 \). From Table I, \( SS(S_0) = "000" \) and \( DR(S_0) = "100" \). The fingerprinted instances can be chained to form \( M_{S4} \to M_i' \to M_{S2} \to M_i'' \to M_{S0} \) and resynthesized to produce the fingerprinted design \( M_i' \). Hence, \( V' = "011000001010101; 001011011; " \) and \( DR = "111010111101001" \), where the underlined bits are the modified values of \( S_i' \) and \( S_i'' \) of \( M_i' \) and \( M_i'' \) after fingerprinting. The area, delay and power consumption of the seed and final fingerprinted instances are (1616.6 \mu m^2, 1.19 ns, 1.13 mW) and (1623.3 \mu m^2, 1.18 ns, 1.18 mW), respectively.

D. Fingerprint Detection and Authorship Verification

The IP provider keeps a safe repository of records. Each record consists of the fingerprint \( F \), the secret integer \( k \) used in the blinding phase, the buyer’s public key \( (n, e) \), the fingerprint location vector \( L \) and the test vectors \( V' \) for each fingerprinted instance. To detect the authorship of a fingerprinted design, the IP provider can apply the test vector \( V' \) to obtain a \( DR \), from which the fingerprint \( F \) can be recovered at the bit locations specified by \( L \). The buyer’s public key \( (n, e) \) corresponding to \( F \) can then be retrieved from the database. The IP provider’s watermark can be detected by \( W = F^e \mod n \). The IP provider can prove his ownership of the IP by demonstrating that his legal ownership message can be hashed to \( W \). Since the IP provider’s watermark \( W \) can be successfully recovered by decrypting the fingerprint \( F \) with the specific IP buyer’s public key, it proves that the fingerprinted design is sold to that buyer. Thus the buyer of a fingerprinted design can be tracked and held responsible if the fingerprinted design was found to be illegally copied and redistributed.

Decryption \( F \) to detect \( W \) needs only to be done once but to detect \( F \) from a fingerprinted instance of unknown buyer may require the application of all the different test vectors \( V' \). To aid the detection of \( F \), the IP provider can store the responses \( DR \) to a common test vector \( V_o \) for all fingerprinted instances into the corresponding entries of the database. With high probability, \( DR \) for the same test vector \( V_o \) will be different for different instances. By applying \( V_o \) to the instance, the number of test vectors \( V' \) required to detect \( F \) will be reduced dramatically to only those few matching records of \( DR \).

E. Augmented Fingerprint

To increase the attackers’ difficulty to successfully erase the fingerprint \( F \), error correction code (ECC) can be added to it. More fingerprint bits will have to be modified in order to prevent \( F \) from being correctly detected by the IP provider. Owing to the randomized dispersion by \( L \), the ECC bits are automatically interleaved with the fingerprint bits in \( DR \). If necessary, \( F \) can be compressed by cryptographic hash function to adapt to the embedding capacity of the design. The IP provider needs only to demonstrate additionally that \( F \) can be hashed to the embedded bit stream recovered from the sold design.

IV. SECURITY ANALYSIS OF PROPOSED SCHEME

No hardware IP protection scheme is complete without an analysis of its credibility, robustness and feasibility. Credibility refers to the strength of ownership proof, robustness refers to the resistance against known and perceivable attacks, and feasibility refers to the effort in finding the number of quality
solutions required for the protection scheme. These attributes are analyzed below.

A. Fingerprint Credibility

A common measure to quantify and compare the credibility or strength of a fingerprinted solution is the probability of coincidence \(P_c\), which are the odds that a non-fingerprinted design carries the fingerprint by coincidence. A lower \(P_c\) implies a stronger proof of ownership and original recipient. In our scheme, since each bit of DR of a design is equally probable to be '0' or '1', the probability that \(m\) randomly selected bits in DR match a specific binary string of fingerprint is given by:

\[
P_c = \frac{1}{2^m}
\]  

(9)

With a 64-bit fingerprint, the \(P_c\) of a fingerprinted instance is as low as \(5.4 \times 10^{-20}\). This probability reduces to the order of \(10^{-40}\) for a 128-bit fingerprint.

B. Fingerprint Robustness

The following attack scenarios are analyzed with Alice as the IP provider using our fingerprinting scheme to protect her IP core and Bob as a malefactor attempting to steal her IP.

1) Collusion attack. This attack requires a coalition of multiple copies of fingerprinted instances in possession by one or more malicious users in order to create a forged copy without the fingerprint for redistribution. Bob may collude with other buyers to compare multiple protected instances to find out their structural dissimilarities, with the hope that these dissimilar parts of the circuits can be modified at an acceptable cost and effort to remove the fingerprint without losing the usefulness and correct functionality of the IP. To succeed in such an attack, the fingerprint must be altered to an extent that none of the users in the coalition can be identified by Alice.

While this attack is the biggest threat to data hiding algorithms on multimedia content in which the hidden signature (watermark or fingerprint) does not depend on the host data, several features in our fingerprinted instance make such attack ineffective. First, the fingerprint bits randomly modulate the values of state variables that affect the next state and output decoding logic in both the normal and test functions, making them inextricable from the functional components of the IP. This makes it very challenging for Bob to figure out the functionality of any structural mismatch in order to successfully remove even a single fingerprint bit. Secondly, the same segment \(M_s^i\) of different fingerprinted instances can be identical (i.e., both equal to \(M_s^b\)) or different irrespective of the value and number of fingerprint bits embedded in them. Lastly and most importantly, the entire chain of testable machines is resynthesized as a whole to obtain \(M_s^i\). The domino effect of the hard optimization problem involved in the resynthesis process whitens the differences between the fingerprint-modulated and unmodulated subcircuits, causing the entropy of identifying and associating the matching or mismatching subcircuits between any two instances to increase with the number of fingerprinted instances being compared.

2) Combinational logic resynthesis. Bob may attempt to remove the fingerprint by performing a combinational logic resynthesis through various approaches [32]. Although this attack can successfully modify the circuit structure by changing the combinational logic implementations without affecting the functionality, the input and output behaviors of the sequential elements are preserved. Thus, Alice can still recover the fingerprint \(F\) from the DR of the fingerprinted design modified by Bob. One characteristic of the test machine is that it can be decomposed into multiple smaller test machines and independently embedded into a correspondingly partitioned large machine. Although the original circuit topologies of these interconnected machines may not be maintained upon global optimization of the final testable design, the full controllability and observability of all FFs are not affected. As an example, consider the STGs of one-FF and two-FF test machines shown in Fig. 9. If they are interconnected such that the output of the first test machine is the input of the second test machine, then the \(2^1 \times 2^2 = 8\) combined states can be mapped to the 8 states for the 3-FF test machine of Fig. 1. Let the state of the final test machine be represented as \((S_{1,i}, S_{2,j})\) when the first test machine is in state \(S_{1,i}\) and the second test machine is in state \(S_{2,j}\). If the bijection maps \((S_{1,i}, S_{2,j})\) to \(S_1\) of the 3-FF test machine, then \(SS((S_{1,i}, S_{2,j}))\) and \(DR((S_{1,i}, S_{2,j}))\) of the combined machine can still be determined from \(SS(S_{1,i})\) and \(DR(S_{1,j})\), respectively in Table I for the 3-FF test machine. Our method utilizes this property to efficiently distribute the fingerprint bits and propagate the embedded fingerprint \(F\) to all subsequent stages of the design flow. The fact that \(SS(S_{1,i})\) and \(DR(S_{1,i})\) remain intact after the resynthesis of the final testable design implies that \(F\) will survive all forms of logic synthesis and optimization performed on the fingerprinted instance.

\[
\begin{align*}
\text{a) } & \quad S_{1,0} \quad S_{1,1} \\
\text{b) } & \quad S_{2,0} \quad S_{2,1} \quad S_{2,2} \quad S_{2,3}
\end{align*}
\]

(9)

Fig. 9. STGs of (a) 1-FF and (b) 2-FF test machines.

3) Circuit retiming. Circuit retiming relocates the FFs of sequential circuit to improve its performance while preserving the functionality. Bob may retine the fingerprinted instance. As retiming may change the STG of the circuit, it seems possible for Bob to partially remove the fingerprint inserted by Alice if the altered parts of the STG host the fingerprint bits.

According to [33], retiming merges two states that are one-step equivalent or splits one existing state into two new states that are one-step equivalent. Two states are said to be one-step equivalent if they have the same output and the same destination state for any input. Fortunately, each state of the test machine is distinct and the fingerprinted instance contains no one-step equivalent state. Hence, none of the states in the fingerprinted instance can be merged by circuit retiming. Meantime, as the test machine is fully specified, it is also not possible to split one state of the fingerprinted instance into two one-step equivalent states without adding extra sequential elements. Even if Bob manages to create one-step equivalent states at the cost of additional circuitry without losing the usefulness of Alice’s IP, the states and their associated
transitions involved in the fingerprint extraction are retained. Thus Alice can still recover the correct fingerprint from the DR of the retimed fingerprinted instance redistributed by Bob.

(4) State recoding. There are two possible ways by which Bob can recode the state variables of Alice’s protected design. The first way is to extract the STG directly from the fingerprinted design to carry out the state recoding. It has been proven that extracting the STG from a large sequential circuit after logic synthesis is computationally intractable [9]. Thus this approach is infeasible in practice. Alternatively, Bob may perform a partitioning on the fingerprinted instance to obtain many smaller segments of the design. With no knowledge of the design or architecture of Alice’s IP, Bob may only be able to extract a limited number of STGs for some segments to recode their state variables. Since the buyer has no knowledge of the embedded fingerprint $F$ and input vector $V$ for its recovery, Bob can only select the segments by wild guess. Due to the existence of loops (SCCs) in the dependency graph, the solution for the chaining of state variables with minimum additional dependencies is not unique. Various new SCCs have been formed after the embedding of test machines, and the final resynthesis of the entire fingerprinted design further masks the partitioning and chaining of state variables. Hence, it is highly unlikely that Bob could precisely relate the fingerprint bits to the corresponding state variables. It is reasonable to assume that Bob is incapable of extracting the STG for every partition, recode and re-synthesize the STG to forge a completely new design as this task incurs no less effort than the complete redesign of the IP functionality with a high, and almost certain risk of violating the circuit functionality or performance. Even if he succeeds in forging a new design, he will not be able to generate the proper test vectors for the counterfeit design without knowing the encodings used by the test machines, which will devalue his design.

(5) State reduction. IP protection schemes, such as [9], that leverage on redundancy to mark the states to display some specific property for authorship proof are susceptible to this kind of attack. This vulnerability does not exist in our fingerprinting scheme because the embedded test machine is a fully specified machine with distinct states. The final fingerprinted design contains no redundant state as, if any, it would have been reduced upon the resynthesis performed after fingerprint insertion. In addition, state reduction is usually performed on explicit STG [34], which implies a very high cost of attack on a fingerprinted design instance released at the gate level or lower level of design abstraction.

C. Fingerprint Feasibility

Generally, the feasibility of a fingerprinting scheme is evaluated by three main criteria. First, the solution space required for the generation of a large number of unique instances. Second, the additional cost and effort required to produce a fingerprinted instance. Preferably, the fingerprinting method is compatible with existing design tools and no new investment of tool is needed to embed or detect the fingerprint. The additional effort required should also be substantially lower than designing the IP from scratch. Finally, the impact on quality degradation of any fingerprinted instance should be minimized and kept within a tolerable limit.

The first criterion can be evaluated by the number of possible solutions to fingerprint an IP. The number of combinations to select $m$ state variables from a design of $n$ FFs to host the $m$ fingerprint bits is given by:

$$C_m(n) = \frac{n!}{m!(n-m)!}$$

(10)

As each state variable can be assigned either a value of ‘0’ or ‘1’, the number of different fingerprinted instances that can be generated by our scheme is:

$$2^n \cdot C_m(n) = \frac{2^n \cdot n!}{m!(n-m)!}$$

(11)

For $n = 100$ and $m = 32$, the solution space has exceeded $6 \times 10^{35}$.

For the second criterion, the fingerprint embedding process of our scheme is completely transparent and can be used with existing design tools and standard cell libraries for ASIC implementation. As no specialized scan FF is required, the scheme is also applicable to fingerprint IPs for different configuration technologies of FPGAs, including the antifuse- or flash-memory based FPGAs for which [15] may not be applicable. The additional design effort to generate each fingerprinted design instance is marginal after the seed design is obtained. Only those segments that are not compatible with its corresponding fingerprinted sequence $DR'$ at the fingerprint locations specified by $L$ need to be remapped and a final re-synthesis is needed to globally optimize the entire fingerprinted design after the segments have been restitched.

Lastly, our fingerprinting scheme incurs insignificant overhead on the design quality. This is attributed to the efficient dependency-directed partitioning algorithm and the utilization of seed designs, which are optimized for each partitioned testable machine. The low fingerprinting overhead will be demonstrated by experiments on different benchmark circuits in the next section.

V. Experimental Results

Sequential circuits from ISCAS’89 benchmark suite are used to analyze the quality of the fingerprinted instances of our scheme. Depending on the circuit size, the first 16, 32, 64, 128 or 256 bits of the keyed SHA-2 hash value of the fingerprint $F$ were inserted. The location integers $L$ were generated with a SHA-2 based PNG. The design partitioning, test machine embedding as well as the fingerprint insertion algorithms were coded in C++. Combinational test generation on the seed design was carried out with Dft tool DFTadvisor and ATPG tool Fastscan from Mentor Graphics. As part of the fingerprint insertion algorithm, the selected combinational test pattern is converted into a sequential test pattern of the form $\{SS; test vector; DR\}$. The synthesis was performed using Synopsis DC with TSMC 0.18 µm technology library, with C-Shell scripts written to automate the synthesis processes for the encoding instances of each segment as well as the whole design.

The synthesis results are shown in Table IV. The area and delay were simulated by Synopsys DC, and the power by Synopsys PrimeTime PX with back annotation. The area, delay and power overheads due to fingerprint insertion are on average
Let $R_F = m/n$ represents the fingerprinting ratio, which is the number of fingerprint bits to the number of bits in DR. The area overhead increases with $R_F$ for all designs. This is expected as the originally optimized encoding instance will have to be recoded in more segments as $R_F$ increases. However, the overhead of fingerprinting one design with a smaller $R_F$ may not necessarily be lower than that of another design with a larger $R_F$. This could be due to the design and topology dependent merging of functional and test logic of different encoding instances. For some circuits, the differences in area, delay and power among different encoding instances are very small. The fingerprinting overheads of these designs may only increase mildly with $R_F$ as opposed to those that have a large quality gap between a suboptimal and the most optimized encoding instances.

As area is used as the selection criterion for the seed design, the delay and power overheads have wider spreads than the area overhead among different designs. For instance, some fingerprinted designs are more than 18% slower than their seed designs while some other could be about 17% faster. This inconsistency is because a smaller instance may not necessarily be faster or consume less power. Hence, the fingerprinted segment that uses a suboptimal encoding instance may have shorter delay or consumes less power than the optimal encoding instance. This explains why the delays or power consumptions of some fingerprinted designs in Table IV are smaller than their seed designs.

The choice of state encoding influences the complexity of the logic functions of the FSM and hence the area, timing and power dissipation of the synthesis results. Thus, the policy for selecting the optimal state encoding can be adapted to the optimization target. Table V shows the synthesis results using area-delay product as the selection criterion if the area and delay of the fingerprinted design are equally important. This selection policy reduces the delays of fingerprinted instances from those of Table IV by 0.17% on average but their areas are also correspondingly increased by 0.12% on average.

**Table IV**

Fingerprinting overheads with state-encoding for area reduction. $FF$ is the number of FFs; $A_o$, and $D_o$, $P_o$, and $P_n$, $A_n$, and $D_n$, $P_n$ are areas in $\mu$m$^2$, delays in ns and powers in mW of the non-fingerprinted optimized seed design and fingerprinted design, respectively. $A_{\Delta}$, $D_{\Delta}$ and $A_{\Delta}P_{\Delta}$ are the respective percentage increases.

<table>
<thead>
<tr>
<th>Circuit</th>
<th>$FF$</th>
<th>$A_o$</th>
<th>$D_o$</th>
<th>$P_o$</th>
<th>$m$</th>
<th>$A_n$</th>
<th>$D_n$</th>
<th>$P_n$</th>
<th>$A_{\Delta}$ (%)</th>
<th>$D_{\Delta}$ (%)</th>
<th>$A_{\Delta}P_{\Delta}$ (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>S382</td>
<td>21</td>
<td>2212</td>
<td>1.35</td>
<td>1.26</td>
<td>16</td>
<td>2229</td>
<td>1.42</td>
<td>1.33</td>
<td>0.77</td>
<td>5.19</td>
<td>5.32</td>
</tr>
<tr>
<td>S388</td>
<td>32</td>
<td>3323</td>
<td>3.60</td>
<td>1.82</td>
<td>16</td>
<td>3453</td>
<td>3.31</td>
<td>1.71</td>
<td>3.91</td>
<td>-8.06</td>
<td>-6.04</td>
</tr>
<tr>
<td>S1423</td>
<td>74</td>
<td>8452</td>
<td>5.39</td>
<td>5.10</td>
<td>16</td>
<td>8472</td>
<td>5.38</td>
<td>5.42</td>
<td>0.24</td>
<td>-0.19</td>
<td>6.28</td>
</tr>
<tr>
<td>S378</td>
<td>179</td>
<td>17347</td>
<td>1.68</td>
<td>12.0</td>
<td>32</td>
<td>17414</td>
<td>1.65</td>
<td>12.5</td>
<td>0.39</td>
<td>-1.79</td>
<td>4.17</td>
</tr>
<tr>
<td>S9234</td>
<td>211</td>
<td>24143</td>
<td>2.67</td>
<td>14.3</td>
<td>64</td>
<td>17540</td>
<td>1.78</td>
<td>12.6</td>
<td>1.11</td>
<td>5.95</td>
<td>5.00</td>
</tr>
<tr>
<td>S15850</td>
<td>534</td>
<td>54151</td>
<td>4.04</td>
<td>34.5</td>
<td>64</td>
<td>54862</td>
<td>3.96</td>
<td>38.8</td>
<td>1.31</td>
<td>-9.82</td>
<td>12.46</td>
</tr>
<tr>
<td>S1307</td>
<td>638</td>
<td>56496</td>
<td>3.18</td>
<td>35.5</td>
<td>128</td>
<td>57633</td>
<td>3.16</td>
<td>36.1</td>
<td>2.01</td>
<td>-6.33</td>
<td>1.69</td>
</tr>
<tr>
<td>S3854</td>
<td>1426</td>
<td>159274</td>
<td>3.94</td>
<td>95.5</td>
<td>128</td>
<td>161234</td>
<td>3.57</td>
<td>98.1</td>
<td>1.23</td>
<td>3.78</td>
<td>2.72</td>
</tr>
<tr>
<td>S38417</td>
<td>1636</td>
<td>160097</td>
<td>3.40</td>
<td>67.7</td>
<td>128</td>
<td>167677</td>
<td>3.78</td>
<td>67.7</td>
<td>0.95</td>
<td>11.18</td>
<td>0.00</td>
</tr>
<tr>
<td>S35932</td>
<td>1728</td>
<td>150506</td>
<td>1.49</td>
<td>112</td>
<td>256</td>
<td>155572</td>
<td>1.52</td>
<td>120.9</td>
<td>3.49</td>
<td>2.01</td>
<td>7.95</td>
</tr>
</tbody>
</table>

In order to evaluate the area and timing overheads of our fingerprinted circuits on FPGA, designs in Table IV are implemented on Xilinx XC6VCX240T device. The results are shown in Table VI. The number of slice registers used by each design is the same as $FF$ in Table IV. For most designs, similar trend of increasing overhead with $R_F$ is observed. On average, the area and delay overheads are 1.8% and 1.4% respectively. Unlike the fingerprinted solutions of [15], the area and delay of our fingerprinted solutions can be improved as manifested by the negative $A_{\Delta}$ and $D_{\Delta}$ values in Table VI. This is possible because the resynthesist after state recoding may perturb the optimization heuristic to the FPGA fabric mapping, placement and routing problems to escape the local minima. The fingerprinted design may be easier to be placed and routed than the seed design, leading to the use of less logic slices or shorter interconnections for nearly half of the fingerprinted solutions. In addition, our technique can also be applied to IPs targeting antiuse and flash memory based FPGA technologies, which are not feasible for [15]. Due to the dissimilar approaches to logic minimization, the optimal encodings for FPGA may be different from ASIC. The impact of encoding and decoding logic on power consumption is usually higher in FPGA than in ASIC, particularly for large FSMs. Among the 2$^{n}$ different minimal-bit encodings, Gray encoding and those with reduced hamming distance of the most probable state transitions are more likely to lead to lower resource utilization and power consumption.

Reported experimental results of fingerprinting methods [14–17] based on only one or a few fingerprint instances for each design are inadequate to demonstrate their capability to generate a large number of reasonably high-quality fingerprinted designs. To evaluate this capability, the same experimental settings for obtaining the data in Table IV were used to synthesize 100 different fingerprinted instances with...
randomly generated fingerprints and location integers for each design. The minimum, maximum and average areas and delays of the 100 fingerprinted designs for each circuit are listed in Table VII. The minute margin of error percentages, %Aerr and %Derr, indicate that different fingerprinted instances of the same IP produced by our technique have consistent quality. By comparing the mean area and mean delay in Table VII with the seed design area $A_s$ and delay $D_s$ in Table IV, a more informative group average fingerprinting overheads, $\Delta A$ and $\Delta D$, of 1.4% and 2.3% respectively are obtained.

**TABLE VI**

<table>
<thead>
<tr>
<th>Circuit</th>
<th>$A_s$</th>
<th>$D_s$</th>
<th>$m$</th>
<th>$A_m$</th>
<th>$D_m$</th>
<th>$\Delta A$ (%)</th>
<th>$\Delta D$ (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>S382</td>
<td>43</td>
<td>2.58</td>
<td>16</td>
<td>41</td>
<td>2.34</td>
<td>-4.65</td>
<td>-9.44</td>
</tr>
<tr>
<td>S838</td>
<td>57</td>
<td>2.61</td>
<td>16</td>
<td>60</td>
<td>2.79</td>
<td>5.26</td>
<td>7.18</td>
</tr>
<tr>
<td>S1423</td>
<td>174</td>
<td>4.99</td>
<td>16</td>
<td>175</td>
<td>5.60</td>
<td>1.15</td>
<td>12.19</td>
</tr>
<tr>
<td>S5378</td>
<td>223</td>
<td>3.54</td>
<td>32</td>
<td>166</td>
<td>5.51</td>
<td>-4.60</td>
<td>10.85</td>
</tr>
<tr>
<td>S9234</td>
<td>422</td>
<td>4.16</td>
<td>32</td>
<td>237</td>
<td>3.79</td>
<td>4.04</td>
<td>7.01</td>
</tr>
<tr>
<td>S15850</td>
<td>784</td>
<td>6.05</td>
<td>64</td>
<td>799</td>
<td>5.43</td>
<td>1.91</td>
<td>-10.31</td>
</tr>
<tr>
<td>S13207</td>
<td>709</td>
<td>6.04</td>
<td>64</td>
<td>698</td>
<td>5.68</td>
<td>1.27</td>
<td>-5.98</td>
</tr>
<tr>
<td>S35854</td>
<td>2660</td>
<td>6.43</td>
<td>128</td>
<td>2669</td>
<td>6.48</td>
<td>0.34</td>
<td>0.82</td>
</tr>
<tr>
<td>S38417</td>
<td>2565</td>
<td>6.94</td>
<td>128</td>
<td>2730</td>
<td>6.97</td>
<td>0.43</td>
<td>0.42</td>
</tr>
<tr>
<td>S35932</td>
<td>1729</td>
<td>3.58</td>
<td>128</td>
<td>1749</td>
<td>4.37</td>
<td>1.16</td>
<td>-0.31</td>
</tr>
</tbody>
</table>

To the best of our knowledge, no FSM fingerprinting technique has been reported so far. Among the FSM watermarking schemes, [8] is the only technique that deals explicitly with the problem of field authentication (or off-chip detection) of embedded sequential circuit IP ownership. To facilitate the comparison of our technique with [8], the same synthesis tool SIS [35], technology library msu.genlib, optimization script script.algebraic and ISCAS benchmark circuits in blif format as [8] are used to produce our fingerprinted circuit results in Table VIII. In most cases, our fingerprinted designs have lower area than the watermarked designs of [8], with an average area reduction of about 2% even though reduction of hardware overhead has been well acknowledged as a more challenging problem in fingerprinting than in watermarking. Fingerprinting by state variable encoding of the embedded partitioned test machine has not only resulted in better area improvement but also provided a stronger integration of test and functional logic than the scan-chain based synthesis-for-testability technique. The delay comparison is inconclusive however. Due to the cost function used by the algebraic script from SIS, its metaheuristic tends to create a larger delay discrepancy between the originally optimized cost surface and the alternate cost surfaces than Synopsys design compiler.

**TABLE VII**

<table>
<thead>
<tr>
<th>Circuit</th>
<th>$m$</th>
<th>$A_m$</th>
<th>$D_m$</th>
<th>$\Delta A$ (%)</th>
<th>$\Delta D$ (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>S3330</td>
<td>32</td>
<td>26906</td>
<td>45</td>
<td>30024</td>
<td>33504</td>
</tr>
<tr>
<td>S3384</td>
<td>32</td>
<td>42704</td>
<td>69.6</td>
<td>42920</td>
<td>47736</td>
</tr>
<tr>
<td>S9234</td>
<td>32</td>
<td>57824</td>
<td>67.6</td>
<td>58752</td>
<td>59520</td>
</tr>
<tr>
<td>S6669</td>
<td>32</td>
<td>69720</td>
<td>142.8</td>
<td>70728</td>
<td>71536</td>
</tr>
<tr>
<td>S15850</td>
<td>64</td>
<td>12918</td>
<td>132.6</td>
<td>130568</td>
<td>130928</td>
</tr>
<tr>
<td>S13207</td>
<td>64</td>
<td>11863</td>
<td>128.4</td>
<td>123890</td>
<td>124384</td>
</tr>
<tr>
<td>S38584</td>
<td>64</td>
<td>338816</td>
<td>469.8</td>
<td>343424</td>
<td>348572</td>
</tr>
<tr>
<td>S38417</td>
<td>64</td>
<td>389320</td>
<td>387</td>
<td>391768</td>
<td>372288</td>
</tr>
<tr>
<td>S35932</td>
<td>64</td>
<td>330968</td>
<td>354.8</td>
<td>333712</td>
<td>366528</td>
</tr>
<tr>
<td>B14</td>
<td>64</td>
<td>137512</td>
<td>143.6</td>
<td>137896</td>
<td>139664</td>
</tr>
<tr>
<td>B15</td>
<td>64</td>
<td>220610</td>
<td>132.4</td>
<td>222208</td>
<td>219152</td>
</tr>
<tr>
<td>B21</td>
<td>64</td>
<td>287080</td>
<td>154.2</td>
<td>288184</td>
<td>298468</td>
</tr>
<tr>
<td>B22</td>
<td>64</td>
<td>428008</td>
<td>215.4</td>
<td>429448</td>
<td>429832</td>
</tr>
<tr>
<td>B17</td>
<td>64</td>
<td>710784</td>
<td>374.4</td>
<td>710960</td>
<td>690296</td>
</tr>
</tbody>
</table>

We also compared the areas and delays of our fingerprinted circuits from Table V with those of [9] for the same signature lengths of 64 and 128 bits. The percentage area and delay overheads for each circuit are shown in Table IX. Overall, our scheme incurs smaller area and delay overheads. Moreover, being essentially a watermarking scheme, [8] and [9] are incapable of generating many high-quality topologically different designs (for different marks) of the same IP.
Table IX

<table>
<thead>
<tr>
<th>Circuit</th>
<th>m</th>
<th>$\Delta A$ (%)</th>
<th>$\Delta D$ (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>proposed [9]</td>
<td>proposed [9]</td>
</tr>
<tr>
<td>S5378</td>
<td>64</td>
<td>1.17</td>
<td>2.50</td>
</tr>
<tr>
<td></td>
<td></td>
<td>-2.38</td>
<td>-7.50</td>
</tr>
<tr>
<td>S9234</td>
<td>64</td>
<td>2.93</td>
<td>2.97</td>
</tr>
<tr>
<td></td>
<td></td>
<td>-7.17</td>
<td>-1.50</td>
</tr>
<tr>
<td>S15850</td>
<td>64</td>
<td>0.96</td>
<td>0.86</td>
</tr>
<tr>
<td></td>
<td>128</td>
<td>1.58</td>
<td>2.06</td>
</tr>
<tr>
<td></td>
<td></td>
<td>-2.40</td>
<td>-2.28</td>
</tr>
<tr>
<td>S13207</td>
<td>64</td>
<td>1.39</td>
<td>3.40</td>
</tr>
<tr>
<td></td>
<td>128</td>
<td>1.46</td>
<td>5.15</td>
</tr>
<tr>
<td></td>
<td></td>
<td>1.38</td>
<td>-3.38</td>
</tr>
<tr>
<td>S38584</td>
<td>128</td>
<td>0.86</td>
<td>0.75</td>
</tr>
<tr>
<td></td>
<td>128</td>
<td>0.80</td>
<td>-5.17</td>
</tr>
<tr>
<td></td>
<td></td>
<td>-8.71</td>
<td>-2.99</td>
</tr>
<tr>
<td>S38477</td>
<td>128</td>
<td>1.91</td>
<td>-5.02</td>
</tr>
<tr>
<td>S35932</td>
<td>128</td>
<td>1.91</td>
<td>-5.02</td>
</tr>
</tbody>
</table>

The probability of successful removal of fingerprint bits by state recoding is also analyzed. Based on the resilience analysis of Section IV.B, Bob is able to recode only a small number $r$ ( $r \ll n$ ) of state variables from a few randomly created partitions. The probability that the $r$ state variables he recoded had already been embedded with exactly $j$ fingerprint bits is given by $\left(\frac{mC_i}{mC_i} \cdot \frac{mC_i}{mC_i} \cdots \frac{mC_i}{mC_i}\right)^j / mC_r$, and the probability of modifying exactly $i$ bit values by recoding a $j$-bit binary code is given by $C_r \left(\frac{j}{2}\right) \frac{(\frac{j}{2})^i}{i!} = C_r \left(\frac{j}{2}\right)^i$. Thus, the probability of successfully removing $k$ ($k \leq m$) or more fingerprint bits by randomly recoding $r$ out of $n$ state variables is given by:

$$P_r(E \geq k) = \sum_{j = 0}^{m} \left(\frac{mC_i}{mC_i} \cdot \frac{mC_i}{mC_i} \cdots \frac{mC_i}{mC_i}\right)^j / mC_r^{j!}$$ (12)

Table X compares the robustness of the proposed method with [8] by the probability of successfully erasing a quarter of the mark, assuming conservatively that the attacker is able to recode $r = m$ state variables for our method and randomly derange $r = m$ FFs for [8]. It can be seen that the proposed method is generally more robust against removal attacks than [8]. As we assume $r = m$, $P_r$ is relatively high when $R_{F}$ is large. When $R_{F}$ is around 0.1, $P_r$ reduces drastically to the order of $10^{-8}$. The probability of removal is expected to be reduced across-the-board if $r < m$. This observation suggests that it is unlikely for state recoding attack to succeed in altering a large fraction of the fingerprint bits even with a moderate $R_{F}$. With ECC, it is highly improbable to remove even a single bit of fingerprint for a reasonable-size design. Given a minimum fingerprint length $m_{\text{min}}$ for an allowable probability of coincidence and an empirically determined $R_{F_{\text{max}}}$ for an allowable probability of erasure, the smallest IPs should contain no less than $n_{\text{max}}$ FFs, where

$$n_{\text{max}} = \frac{m_{\text{min}}}{R_{F_{\text{max}}}}$$ (13)

VI. CONCLUSION

A new and robust fingerprinting technique on sequential circuits that offers a convenient way to identify the legal IP owner and users has been proposed. A large number of high quality fingerprinted instances can be created from an originally optimized seed design of the IP with incremental design effort. The overheads incurred by different instances have been effectively bounded to a reasonably low level by our method due to its state variable chaining heuristic, dependency based partitioning algorithm and the use of a single blind signature to double as IP ownership and buyer identity marks. Our security analysis shows that the fingerprint credibility increases exponentially with fingerprint length and is immune to collusion attack, circuit resynthesis, retiming and state reduction attacks. The risk of fingerprint removal by state recoding attack is significantly reduced with the augmentation of ECC or by keeping the ratio of fingerprint length to state variable number low. From 100 fingerprinted instances for each of the tenISCAS benchmark circuits tested, the average area and delay overheads were found to be 1.4% and 2.3% respectively with negligible margins of error for 95% confident interval.

Table X

<table>
<thead>
<tr>
<th>Circuit</th>
<th>#FF</th>
<th>m</th>
<th>$P_r(E \geq m/4)$</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>proposed [8]</td>
</tr>
<tr>
<td>S3330</td>
<td>132</td>
<td>32</td>
<td>2.6E-02</td>
</tr>
<tr>
<td>S3384</td>
<td>183</td>
<td>32</td>
<td>3.5E-03</td>
</tr>
<tr>
<td>S9234</td>
<td>228</td>
<td>32</td>
<td>8.1E-04</td>
</tr>
<tr>
<td>S6669</td>
<td>239</td>
<td>32</td>
<td>5.8E-04</td>
</tr>
<tr>
<td>S15850</td>
<td>597</td>
<td>64</td>
<td>1.0E-08</td>
</tr>
<tr>
<td>S13207</td>
<td>669</td>
<td>64</td>
<td>9.8E-14</td>
</tr>
<tr>
<td>S38584</td>
<td>1452</td>
<td>64</td>
<td>1.6E-14</td>
</tr>
<tr>
<td>S38417</td>
<td>1636</td>
<td>64</td>
<td>1.6E-14</td>
</tr>
<tr>
<td>S35932</td>
<td>1728</td>
<td>64</td>
<td>6.8E-15</td>
</tr>
<tr>
<td>B14</td>
<td>245</td>
<td>32</td>
<td>4.9E-04</td>
</tr>
<tr>
<td>B15_1</td>
<td>449</td>
<td>64</td>
<td>2.7E-06</td>
</tr>
<tr>
<td>B21</td>
<td>490</td>
<td>64</td>
<td>8.1E-07</td>
</tr>
<tr>
<td>B22</td>
<td>735</td>
<td>64</td>
<td>2.6E-09</td>
</tr>
<tr>
<td>B17</td>
<td>1415</td>
<td>64</td>
<td>1.5E-13</td>
</tr>
</tbody>
</table>

REFERENCES


Technological, and C.

ation of

est function

Provably Secure Active IC Metering Techniques for

, "On the optimization power of retiming and

ower

3rd ed., Cambridge, Mass.:

, GA,

"A robust FSM watermarking scheme for IP protection of


L. Zhang, and C. H. Chang, "State encoding watermarking for field

authentication of sequential circuit intellectual property," in Proc. IEEE

Int. Conf. on Circuits and Syst., Seoul, Korea, May 2012, pp. 3013-3016.

J. Lach, W. H. Mangione-Smith, and M. Potkonjak, “FPGA

fingerprinting techniques for protecting intellectual property,” in Proc.


J. Lach, W. H. Mangione-Smith, and M. Potkonjak, “Fingerprinting

techniques for field-programmable gate array intellectual property


G. Qu, and M. Potkonjak, “Fingerprinting intellectual property using

constraint-addition,” in Proc. Des. Autom. Conf., CA, USA, June 2000,

pp. 587-592.

A. E. Caldwell et al., “Effective interactive techniques for fingerprinting


R. S. Chakraborty, and S. Bhunia, "Hardware protection and

authentication through netlist level obfuscation," in Proc. IEEE/ACM


R. S. Chakraborty, and S. Bhunia, "HARPOON: An obfuscation-based

SoC design methodology for hardware protection,” IEEE Trans.


F. Koushanfar, Y. Alkabani, “Active control and digital rights

management of integrated circuit IP cores,” in Proc. Int. Conf.

on Compilers, architectures, and synthesis for embedded systems.,


F. Koushanfar, “Provably Secure Active IC Metering Techniques for


V. Agrawal, and K.-T. Cheng, “Finite state machine synthesis with

embedded test function,” Journal of Electronic Testing, vol. 1, no. 3,


architecture for interconnected finite state machines,” in Proc. Int. Conf.


S. Kanjalil, S. T. Chakradhar, and V. D. Agrawal, “Test function

embedding algorithms with application to interconnected finite-state


S. Kanjalil, S. T. Chakradhar, and V. D. Agrawal, “A partition and

resynthesis approach to testable design of large circuits,” IEEE Trans.


Z. Kohavi, and N. K. Jha, Switching and Finite Automata Theory. 3rd ed.,


C. Pixley, S. W. Jeong, and G. D. Hachtel, “Exact calculation of

synchronization sequences based on binary decision diagrams,” in Proc.


D. Chaum, “Blind signatures for untraceable payments,” in Advances in

Cryptography, D. Chaum, R.L. Rivest, & A.T. Sherman Eds., Plenum,

1982, pp. 199-203.

National Institute of Standards and Technology (NIST). Secure Hash


G. D. Hachtel, and F. Somenzi, Logic Synthesis and Verification


R. K. Ranjan et al., “On the optimization power of retiming and

resynthesis transformations,” in Proc. ACM/IEEE Int. Conf.


J. K. Rho et al., “Exact and heuristic algorithms for the minimization of


E. M. Sentovich et al., “Sequential circuit design using synthesis and


Chip-Hong Chang (S’92–M’98–SM’03) received the B.Eng. (Hons.) degree from

the National University of Singapore in 1989, and the M. Eng. and Ph.D. degrees from Nanyang

Technological University (NTU), Singapore, in 1993 and 1998, respectively. He served as

a Technical Consultant in industry prior to joining the School of Electrical and Electronic

Engineering (EEE), NTU, in 1999, where he is currently an Associate Professor. He holds

joint appointments with the university as Assistant Chair of Alumni of the School of

EEE since June 2008, Deputy Director of the Center for High Performance Embedded Systems from 2000 to 2011, and the

Program Director of the Center for Integrated Circuits and Systems from 2003 to

2009. He has coedited one book, published four book chapters and around 200 research papers in refereed international journals and conferences. His current research interests include hardware security and trust, low power arithmetic circuits and digital filter design.

Li Zhang (S’11) received the B.Eng. (Hons.) degree in Electrical and Electronics

Engineering from Nanyang Technological University (NTU), Singapore, in 2010. He is

currently pursuing the Ph.D degree in the division of Circuits and Systems, School of

EEE, NTU, Singapore.

His research interests are in hardware security, including hardware IP watermarking

and fingerprinting, active IC metering, and hardware Trojan detection and prevention.