<table>
<thead>
<tr>
<th>Title</th>
<th>Circuit-simulated obstacle-aware Steiner routing</th>
</tr>
</thead>
<tbody>
<tr>
<td>Author(s)</td>
<td>Shi, Yiyu; Mesa, Paul; Yu, Hao; He, Lei</td>
</tr>
<tr>
<td>Date</td>
<td>2007</td>
</tr>
<tr>
<td>URL</td>
<td><a href="http://hdl.handle.net/10220/8742">http://hdl.handle.net/10220/8742</a></td>
</tr>
<tr>
<td>Rights</td>
<td>© 2007 ACM. This is the author created version of a work that has been peer reviewed and accepted for publication by ACM Transactions on Design Automation of Electronic Systems, ACM. It incorporates referee’s comments but changes resulting from the publishing process, such as copyediting, structural formatting, may not be reflected in this document. The published version is available at: [<a href="http://dx.doi.org/10.1145/1255456.1255465">http://dx.doi.org/10.1145/1255456.1255465</a>].</td>
</tr>
</tbody>
</table>
Circuit-Simulated Obstacle-Aware Steiner Routing

YIYU SHI, PAUL MESA, HAO YU, and LEI HE
University of California, Los Angeles

This article develops circuit-simulated routing algorithms. We model the routing graph by an RC network with terminals as inputs, and show that the faster an output reaches its peak, the higher the possibility for the corresponding Hanan or escape node to become a Steiner point. This enables us to select Steiner points and then apply any minimum spanning tree algorithm to obtain obstacle-free or obstacle-aware Steiner routing. Compared with existing algorithms, our algorithms have significant gain on either wirelength or runtime for obstacle-free routing, and on both wirelength and runtime for obstacle-aware routing.

Categories and Subject Descriptors: B.7.2 [Integrated Circuits]: Design Aids
General Terms: Algorithms, Design, Performance
Additional Key Words and Phrases: Routing, simulation, RSMT, OARSMT

1. INTRODUCTION

Rectilinear Steiner minimum tree (RSMT) construction is a fundamental research problem in VLSI design. For a given set of terminals, the RSMT problem is to find a set of additional points, namely, Steiner points, such that the rectilinear minimal spanning tree (RMST) connecting all terminals and Steiner points has minimal length. The RSMT problem is NP-complete [Garey and Johnson 1977]. Yet, a few properties have been revealed to help solve this problem: An optimal RSMT can be found in the Hanan grid, which is composed of the

Some initial results of this article appeared in the 2006 Proceedings of the IEEE/ACM Design Automation Conference (DAC) [Shi et al. 2006]. This work was partially supported by NSF CAREER Award CCR-0093273/0401682.
Authors' addresses: Y. Shi, P. Mesa, H. Yu, L. He (contact author), Electrical Engineering Department, University of California at Los Angeles, Los Angeles, CA 90095; email: lhe@ee.ucla.edu.
horizontal and vertical lines from each terminal. Also, at most \( n - 2 \) Steiner points are required to construct an optimal RSMT [Hwang et al. 1992].

**GeoSteiner** [Warme et al. 2007] is an exact algorithm with high complexity. The following heuristics have been proposed to improve algorithm efficiency:

- **1-Steiner** [Kahng and Robins 1995] iteratively adds one Steiner point each time to reduce wirelength. A primal-dual approach based on 1-Steiner has also been proposed [Mandoiu and et al. 2000]. Recent work to further reduce runtime complexity includes the following: A spanning-graph-based \( O(n \log n) \) heuristic was proposed in Zhou [2003]. It uses a spanning graph to help in both generating the initial spanning tree as well as finding good candidates for the edge substitution. A batched greedy triple-based \( O(n \log^2 n) \) heuristic **FastSteiner** [Kahng et al. 2003] was also proposed, which is based on a batched version of the greedy triple contraction algorithm of Zelikovsky [1993]. The lookup-table-based heuristic **FLUTE** [Chu and Wong 2005] is another fast and accurate technique to perform rectilinear Steiner minimal tree (RSMT) construction. There is a user-defined parameter to control the tradeoff between accuracy and runtime. FLUTE is optimal and extremely fast for nets up to degree 9. As it handles low-degree nets particularly well, it is most suitable for VLSI applications in which most nets have degree 30 or less.

The aforementioned approaches all assume no obstacles for routing. In practice, macrocells, IP blocks, and prerouted nets are considered as obstacles for routing. Therefore obstacle-avoiding RSMT (OARSMT) construction must be studied. An escape graph [Ganley and Cohoon 1994] is often used to convert the OARSMT problem to a RSMT problem on a graph. The distance between two points in the presence of obstacles is calculated based on the obstacle-avoiding shortest path algorithm [Zheng et al. 1996]. Zachariasen and Winter [1999] presented an exact algorithm for obstacle-avoiding Euclidean Steiner tree construction, but its high complexity prohibits it from practical use. As opposed to the case for RSMT with no obstacles, few heuristics have been proposed for OARSMT due to the difficulty in handling obstacles. A line search heuristics was introduced in Highower [1969] and Mikami and Tabuchi [1968], but the routing quality is not good for multiple terminals. Most existing obstacle-avoiding RSMT algorithms (e.g., Akers [1967], Soukup [1978], Hadlock [1977], and Rubin [1974]) use multiterminal variants of the maze algorithms, with a high space demand but a result far from optimal. An-OARSMan was proposed recently [Hu et al. 2005] based on ant colony optimization. A greedy obstacle penalty-distance (OP-distance) local heuristic was used in the algorithm and performed on the track graph. It achieves short wirelength with reduced, but still long, runtime.

The quality of an RSMT is measured by the so-called Steiner ratio [Hwang et al. 1992], namely, the ratio of Steiner tree length over MST length. The existing heuristics for RSMT and OARSMT improve either runtime (e.g., Chu and Wong [2005]) or quality (e.g., Zhou [2003], Hu et al. [2005]), but not both, for a large number of terminals. Ideally, we need an algorithm to achieve the best quality and efficiency of existing work simultaneously. To this end, we propose an algorithm, cktSteiner, which simulates the routing problem by circuit behavior. In our algorithm, the routing graph is modeled as an RC mesh. When impulse currents are applied at the terminals, we use dominant voltage
responses at Hanan nodes to decide Steiner points. The faster a node reaches its peak voltage, the higher the possibility for the node to become a Steiner point. Therefore, we can easily select Steiner points from Hanan nodes and build high-quality RSMT and OARSMT efficiently. We call the resulting algorithms cktSteiner. Similar to 1-Steiner, we develop both 1-cktSteiner and B-cktSteiner algorithms, depending on whether one or multiple Steiner points are iteratively added to build or improve the routing.

cktSteiner has a few advantageous algorithmic features. It uses numerical circuit simulation to determine Steiner points, while virtually all existing approaches use combinatorial algorithms. cktSteiner applies to both RSMT and OARSMT with a small runtime difference, but existing RSMT algorithms either cannot be extended to the OARSMT problem or suffer a big runtime increase. Because cktSteiner simulates routing by circuit behavior, it is a new addition to existing simulation-based algorithms such as simulated annealing, genetic, and force-based (placement) algorithms that have been successfully used in VLSI design.

We have also observed exciting experimental results. When there are no routing obstacles, 1-cktSteiner obtains similar wirelength compared with the best existing heuristic FastSteiner [Kahng et al. 2003]. Both are less than 1% worse than the exact solution, but 1-cktSteiner is up to 11.3x faster than FastSteiner. Compared with the fastest existing heuristic FLUTE, B-cktSteiner has a similar runtime but up to 1.9% shorter wirelength. Without modifications, 1-cktSteiner and B-cktSteiner can directly be applied to routing with obstacles and the runtime increase is minimal. Compared with the best existing obstacle-avoiding An-OARSMAn algorithm, 1-cktSteiner has similar runtimes and reduces wirelength by 6.12% and B-cktSteiner has an average speedup of 357x with similar wirelength. Compared with the Magma routing package [Magma 2007] containing eight routing algorithms, 1-cktSteiner reduces the chip-level total wirelength by up to 1.23% and net-based wirelength by up to 8.15%.

The remainder of the article is organized as follows: Section 2 describes the problem modeling and validates the key observation upon which our cktSteiner algorithm is proposed. Section 3 introduces the cktSteiner algorithm and speedup techniques. Section 4 presents experimental results. Section 5 concludes.

2. CIRCUIT MODEL FOR ROUTING

In this section, we first show how a routing graph can be mapped into an RC circuit. Then the relationship between the Steiner points and time-domain responses for the RC circuit is revealed, which is the basis of the cktSteiner algorithm.

2.1 Problem Formulation

In this article we start with the routing model as in Albrecht [2000] and tessellate the routing area into rectangular partitions as global tiles, and pins within the same global tile are mapped into the center of the tile. The routing plane can be formally modeled by an undirected graph $G_h(V, E)$, namely, the
Hanan grid, where each vertex $v \in V$ represents a global tile, and each edge $e \in E$ represents the routing area between two adjacent tiles. An example of the Hanan grid is shown in Figure 1(a). To consider the impact of obstacles such as hard macros or prerouted nets on routing, the routing graph $G_e$ should be constructed by intersecting lines from vertices, as well as the edges of the obstacles. We call the routing graph an escape graph [Ganley and Cohoon 1994]. An example of an escape graph is shown in Figure 1(b).

Without loss of generality, in this article, we use a uniform $G$, or global routing graph (GRG), which is fine enough so that all the Hanan nodes, or nodes of the escape graph are located on its nodes, that is, the Hanan grid $G_h$ or the escape graph $G_e$ is the subgraph of $G$. It is obvious that any Steiner tree constructed in the Hanan grid or escape graph can also be found in our routing graph. Such a routing graph is shown in Figure 1(c), which applies to both (a) and (b). By introducing this routing graph, routing with and without obstacles are indistinguishable from each other, except that the distance between two nodes should be the obstacle-avoiding distance in the case of obstacle-avoiding routing.

With respect to the preceding discussions, we formulate the following problem.

**Formulation 1.** Given a routing graph $G$ as constructed before with an embedded multiterminal net, find a set of Steiner points in $G$ such that the resulting Steiner routing of the multiterminal net has minimum routing wirelength.

2.2 Circuit Model and Its Implication

To map the GRG to an RC-mesh circuit model, we model each edge of the GRG with a unit resistor, and connect each vertex of the GRG to ground via a unit
Circuit-Simulated Obstacle-Aware Steiner Routing

Fig. 2. (a) Illustration of a routing graph with a 3-terminal net (circle) and its corresponding Hanan nodes (triangle); (b) time-domain responses at nodes nos. 39, 61, and 63 in (a). Note that for the y-axis, a logarithm scale is used. For node no. 61 a different x-axis range is used. It reaches voltage peak faster than others and is the Steiner point.

capacitor and a unit resistor in parallel. Terminals are modeled as input ports, each with a unit impulse-current source. The Hanan nodes or nodes of the escape graph are modeled as output ports. Such a circuit model is illustrated in Figure 1(d). Note that when there are obstacles, we still keep the resistors and capacitors in the obstacle area.1

With a unit impulse-current source at each terminal at time \( t = 0 \), the signals start to propagate until the steady state is reached. It takes a finite time for the signal to propagate throughout the mesh and to charge the capacitors. Then the signal at one node starts to decay through the DC path of the grounded resistors. We define peak time as the time for the voltage response to reach its last peak value.

Take Figure 2 as an example. Shown in Figure 2(a) is a net with three terminals (labeled with circles) embedded within a routing graph \( G \) where the corresponding Hanan nodes are also marked (with triangles). For ease of presentation, we assign a label to each node in \( G \). The voltage responses at the Hanan nodes (vertices nos. 39, 61, and 63) are shown in Figure 2(b). The peak time at vertex no. 61 is smaller than those at the other two nodes, and vertex no. 61 is the Steiner point for the 3-terminal set.

In the preceding example, the fastest voltage response indicates the Steiner point. Similar phenomena can be observed in other cases, too. We randomly generate 20 test cases with 100 terminals, and construct the optimal RSMT by GeoSteiner [Warne et al. 2007] to get all the Steiner points. We sort all the Hanan nodes in sequence with increasing peak times and calculate the probability for a Hanan node with a given order in the sequence to become a Steiner point in one of these optimal solutions. The results are shown in Figure 3. The x-axis is the order in the sequence, and the y-axis is the normalized probability. The probability decreases when the order of the Hanan node in

1Therefore, the circuit model is independent of obstacles. This enables us to precalculate circuit behavior and apply it to nets with different obstacles, as described in Section 3.2.
Fig. 3. The probability for a Hanan node to become a Steiner point with respect to its order in the sequence for 20 test cases, each with 100 terminals.

the sequence decreases and the peak time increases. One can conclude the following.

Observation 1. A Hanan node is more likely to become a Steiner point when the voltage response of the corresponding node in the RC mesh reaches its peak earlier.

Note that RSMT is not necessarily unique for a multipin net. For example, we might find that, in fact, nodes 1, 3, and 7 form an optimal RSMT, while nodes 2, 4, and 6 form another. We consider multiple optimal solutions in our tree construction and probability calculation.

2.3 Theoretical Justification

In this section, we theoretically justify Observation 1 by giving an exact proof for 3-terminal cases followed by a qualitative extension to cases with more terminals.

Given a 3-terminal net, we label the rectilinear distance between node A and terminal no. 1 as $r_1$, and similarly define $r_2$ and $r_3$. Then the RSMT problem becomes

$$\min_p r_1 + r_2 + r_3 \quad (1)$$

such that $p \in$ the set of Hanan nodes.

If the vertex $p$ that minimizes Eq. (1) is one of the three terminals, then no Steiner points need to be added and an example for such a case is shown in Figure 4(a). Otherwise, the Steiner point $p$ is added to achieve the minimum wirelength, as shown in Figure 4(b).
Now we prove that the vertex \( p \) that minimizes Eq. (1) is exactly the vertex that has the minimum peak-time \( t_{peak} \). This is done by demonstrating that the objective

\[
\min_p t_{peak}
\]

is a good approximation of Eq. (1).

The effective resistance \( R_{x,y} \) between two vertices \((0, 0)\) and \((x, y)\) in a uniform resistor mesh can be calculated from the following formula [Atkinson and van Steenwijk 1999]:

\[
R_{x,y} = \frac{1}{\pi} \int_0^\pi \frac{d\beta}{\sinh|\alpha|} [1 - e^{-|\alpha|\cos(p\beta)}],
\]

(3)

where \( \alpha \) and \( \beta \) are complex numbers satisfying

\[
\cos\alpha + \cos\beta = 2.
\]

(4)

Moreover, Eq. (3) can be well approximated by

\[
R_{x,y} \approx \frac{1}{2} \sqrt{r},
\]

where \( r \) is the rectilinear distance between the two vertices. This is demonstrated in Figure 5.

A similar result can be derived for the effective capacitance. Therefore, we can compute the time constant between any two vertices from the lumped RC model, namely,

\[
t_{peak} \propto RC \propto \sqrt{r} \times \sqrt{r} = r,
\]

(6)

which indicates that the delay is linearly proportional to the rectilinear distance. We denote the constant coefficient as \( k \), that is, \( t_{peak} = kr \).

With this important result, we proceed to show that Eq. (3) well approximates (1) in different cases. Without loss of generality, we assume \( r_1 \leq r_2 \leq r_3 \). We plot the waveforms for the three terminals separately in Figure 6: \( V_1 \) is generated by keeping the current source at the first terminal and removing the sources at the second and third terminals. \( V_2 \) is computed by only keeping the source at the second terminal and \( V_3 \) by only keeping the source at the third. The real
Fig. 5. The relationship between effective resistance and the rectilinear distance of two vertices, and its square-root approximation.

Fig. 6. Peak time with respect to the delay to each of the three terminals in three different cases.
waveform at this node is the superposition of the three waveforms plotted. We discuss the following three cases.

**Case 1:** \( r_2 \ll r_3 \) (Figure 6(a)). As the figure clearly states, the peak time for the waveform, after superposing waveforms from the three terminals, is mainly decided by \( t_3 = kr_3 \). Therefore, Eq. (3) becomes

\[
\min_{p} kr_3. \tag{7}
\]

Since \( r_2 \ll r_3 \), (1) becomes

\[
\min_{p} r_3. \tag{8}
\]

**Case 2:** \( r_1 \ll r_2 \leq r_3 \); \( r_2 \) and \( r_3 \) are close (Figure 6(b)). As the figure clearly states, the peak time for the superposed waveform is approximately \( (t_2 + t_3)/2 = k(r_2 + r_3)/2 \). Therefore, (3) becomes

\[
\min_{p} k(r_2 + r_3)/2. \tag{9}
\]

Since \( r_2 \ll r_3 \), (1) becomes

\[
\min_{p} r_2 + r_3. \tag{10}
\]

**Case 3:** \( r_1 \leq r_2 \leq r_3 \); \( r_1, r_2, \) and \( r_3 \) are close (Figure 6(c)). As the figure clearly states, the peak time for the superposed waveform is approximately \( (t_1 + t_2 + t_3)/3 = k(r_1 + r_2 + r_3)/2 \). Therefore, (3) becomes

\[
\min_{p} k(r_1 + r_2 + r_3)/3. \tag{11}
\]

Obviously, (1) and (11) are the same.

In conclusion, (3) well approximates (1) for all 3-terminal nets. Accordingly, Observation 1 is valid for all 3-terminal nets.

In general, for nets with more terminals, Observation 1 can be explained as follows: Steiner points tend to have small distances to all nearby terminals. Similarly, in the mesh, the time constant between two points is nearly proportional to their distance. Therefore, the more likely that a node is a Steiner point, the smaller the weighted distance from this node to the terminals, and in turn the smaller the RC time constant for the node, and finally the smaller its peak time.

Due to the earlier described nature of our algorithm, it can work well for most nets. However, for those nets where the pins are extremely unevenly distributed (some terminals of the nets are far away), the accuracy of our algorithm will be impacted. This is because the effects of those terminals are too weak to impact the location of the Steiner points. As a remedy for those cases, we scale the magnitude of the input current source proportional to the sum of its distances to all other terminals. Further improving our algorithm by scaling the magnitudes of the input sources in general cases is still a topic of ongoing research.

3. CKTSTEINER ALGORITHMS

In this section, we propose our circuit-simulation-based algorithm to construct the RSMT for a given set of terminals. We map the global routing graph (GRG)
Fig. 7. A Hanan node must reside inside the convex boundary defined by all terminals to become a possible Steiner point and an output port.

to an uniform RC mesh and add impulse-current sources at the terminals. Then Observation 1 is used to select potential Steiner points. We want to emphasize that once the potential Steiner points are known, any minimum spanning tree (MST) algorithm can be used to construct the RSMT.

3.1 Steiner Tree Construction Algorithm

We first build the circuit model for the routing graph, which has been discussed in detail in Section 2: Each edge of the routing graph is replaced with a uniform resistor, and each node is connected to the ground via a parallel resistor and capacitor. Then an impulse-current source is added at each terminal. Note that not all, but only those Hanan nodes satisfying the following constraint, are considered as output ports and potential Steiner points.

For obstacle-free routing, only Hanan nodes inside the convex boundary defined by all terminals can become Steiner points, as shown in Figure 7.

After the circuit is constructed, we can simulate it and build a sorted Hanan node sequence in ascending order according to their peak times (i.e., in descending order of the likelihood of being a Steiner point). Because at most \( n - 2 \) points need to be added into an RSMT [Hwang et al. 1992], we can use our ordered Hanan node sequence to construct the RSMT based on the 1-Steiner heuristic. Our algorithm is faster than other 1-Steiner based heuristics in the sense that it need not employ a special algorithm to select Steiner points during the tree construction.

In detail, we first calculate the wirelength of the MST, given a set of input terminals. Then an iterated 1-Steiner idea can be employed. We iteratively add one Hanan node according to its order in the sequence. If one Hanan node is already in the tree we have constructed, we skip this step. Then we compare the wirelength of the new MST with the previous MST. If the new wirelength is shorter, then the node is selected. Note that the MST can be constructed incrementally [Kahng and Robins 1992]. We continue this step until we have added \( n - 2 \) Steiner points (which is the maximum possible value), or we have examined a user-defined consecutive number (which is \( n/8 \) in this article) of Hanan nodes that fail to decrease the wirelength. The 1-cktSteiner algorithm is summarized in Figure 8.
INPUT: Routing graph and terminal locations
OUTPUT: Steiner point set $S$, $RSMT = MST(S \cup T)$

Initialization: Steiner point set $S = \Phi$
Initialization: $l_0 = MST(T)$, $flag = 0$

Build the circuit model according to the routing graph and the terminal locations;
Simulate the circuit;
Sort the Hanan node into a sequence $E$ according to their peak time;

WHILE {There are less than $n - 2$ nodes in $S$ and $E \neq \Phi$ and $flag < n/8$}
  WHILE the 1st node $A$ in $E$ is in the current tree
    Remove $A$ from $E$;
  END
  Select the first node $A$ in $E$;
  $l_1 = MST(S \cup T \cup A)$;
  IF {$l_1 < l_0$}
    $S = S \cup A$;
    $l_0 = l_1$;
    $flag = flag + 1$;
  END
  ELSE
    $flag = 0$;
  END
  Remove $A$ from $E$;
END

Fig. 8. 1-cktSteiner algorithm.

Fig. 9. A 4-terminal test case to illustrate the steps for the 1-cktSteiner algorithm.

We use a test case with four terminals $\{W, X, Y, Z\}$, as shown in Figure 9, to demonstrate the procedure of our algorithm. According to the constraint for Steiner points, only nodes labeled with $a$, $b$, and $c$ can become Steiner points. We simulate the corresponding circuits and find that the peak times for those three points are $13.5\, ns$, $21.5\, ns$, and $13.7\, ns$, respectively. Therefore, we sort them as $\{a, c, b\}$. We then add $a$ and construct the MST on the nodes set $\{W, X, Y, Z, a\}$. The total wirelength for the tree is 6, which is smaller than the wirelength of the MST for the node set $\{W, X, Y, Z\}$ (i.e., 7). So we keep node $a$ as the Steiner point. Then we construct an MST on the node set $\{W, X, Y, Z, a, c\}$ and the wirelength is still 6. So we discard $c$, and compute the wirelength of the MST on the node set $\{W, X, Y, Z, a, b\}$, which is still 6. Therefore, we return with Steiner point $a$, and an optimal wirelength of 6.

In general, more than one Steiner node can be added each time for the algorithm in Figure 8. We call the nodes to be simultaneously added a block of nodes. If the total wirelength using a block is reduced, then we take all the nodes in the block as Steiner points; otherwise, we check the vertices in the
block one-by-one. In this case, 1-cktSteiner becomes a block-based algorithm, and the number of points in a block is called the block size. In Figure 10, we study the interaction between wirelength, runtime, and block size. When the block size increases from 1 to 10, clearly the runtime decreases, but the wirelength increases. It is easy to see that block size is an effective knob for tradeoff between wirelength and runtime. To accommodate different numbers of terminals, we can use a self-adjustable block size. In experiments we find that given the total terminal number $n$, setting the block size to $B \in (\frac{n-2}{16}, \frac{n-2}{4})$ can result in a good balance between wirelength and runtime, which leads to the B-cktSteiner algorithm.

3.2 Practical Implementation Issues

It might take a long time to simulate the circuits, especially when the net has a large number of terminals. To reduce circuit simulation-time, we present a lookup-table approach. The table is built for each routing plane with a given routing grid and can be used for any net in this routing plane.

To build the table, we compute the output waveforms at all nodes with an impulse-current source at only one node. This results in $N^3$ waveforms for an $N \times N$ routing grid. All these waveforms are stored as a table. To more efficiently simulate the circuits, techniques such as model order reduction [Odabasioglu et al. 1998] and random walk [Qian et al. 2003] can be applied.

When the table is set-up, we can directly get the simulation results for any net by superposition, that is, we sum the waveforms generated by the current source at each terminal to obtain the waveform. Take a 3-terminal case for example. The terminals are located at nodes nos. 1, 6, and 10. We have precomputed the waveform for the case where there is an impulse-current source at node no. 1, as well as the waveforms when the single-current source is added at node no. 6 (no. 10). By superposing these three waveforms, we can get the final waveform, which is equivalent to imposing three current sources simultaneously at the three terminals, as required by our algorithm. Then we look for the peak time for the superposed waveform. An example of such waveform superposition has already been shown in Figure 6. The procedure is illustrated in Figure 11.

As discussed in Footnote 1, the circuit model, and therefore this table-based waveform calculation, is independent of obstacles.
Finally, numerical error and instability usually prevent us from accurately finding the peak time for a particular waveform. Therefore, in practical implementations, rather than seeking for the exact peak time, we seek at the rising edge for the time where the voltage response reaches $\alpha V_{\text{max}}$ ($0 < \alpha < 1$), which allows us more flexibility.

4. EXPERIMENTS AND DISCUSSIONS

We implement the circuit construction and simulation in MATLAB and the tree construction in C. We run experiments on a few groups of test cases, each group for a selected number of terminals. We generate 20 test cases for each group, with terminals randomly placed in a routing plane with 1,000 × 1,000 grid. The circuit models (RC meshes of different granularities) are prebuilt before routing for different nets, as discussed in Section 3.2. We report the average wirelength and runtime for each group. All experiments are conducted on a UNIX workstation with a 1.9GHz P4 processor and 2GB RAM. In addition, we compare our algorithm with a commercial tool Talus PX [Magma 2007] and test on real industrial circuits. The wirelength and runtime are reported as well. For all the experiments to follow, we always use $B = \frac{n-2}{n}$ for B-cktSteiner.

4.1 Impact of Granularity of the RC Mesh

Intuitively, if we divide GRG (general routing graph) into a finer grid, we may simulate the routing plane more accurately and obtain shorter wirelength, but longer runtime. Figure 12 illustrates how grid granularity influences wirelength and runtime. We use a test case with 20 terminals routed by 1-cktSteiner. When the grid becomes finer, runtime increases and wirelength reduces. A nice tradeoff between runtime and wirelength is achieved by a 4x finer grid, where wirelength is reduced by 6% and runtime increases by 3x compared to using the original grid (equivalent to GRG). A similar tradeoff
Fig. 12. Normalized runtime and wirelength with respect to RC-mesh granularity. The test case has 20 terminals and is routed by 1-cktSteiner.

Table I. Comparison Between Geo-Steiner, FastSteiner, 1-cktSteiner, FLUTE, and B-cktSteiner

<table>
<thead>
<tr>
<th>Terminal #</th>
<th>Wirelength</th>
<th>Runtime (ms)</th>
<th>Wirelength</th>
<th>Runtime (ms)</th>
<th>Wirelength</th>
<th>Runtime (ms)</th>
<th>Wirelength</th>
<th>Runtime (ms)</th>
<th>Wirelength</th>
<th>Runtime (ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Geo</td>
<td>FastSteiner</td>
<td>1-ckt</td>
<td>Flute</td>
<td>B-ckt</td>
<td>Geo</td>
<td>FastSteiner</td>
<td>1-ckt</td>
<td>Flute</td>
<td>B-ckt</td>
</tr>
<tr>
<td>5</td>
<td>9</td>
<td>9</td>
<td>9</td>
<td>9</td>
<td>9</td>
<td>3.05</td>
<td>0.23</td>
<td>0.06</td>
<td>0.0007</td>
<td>0.0006</td>
</tr>
<tr>
<td>10</td>
<td>27</td>
<td>27</td>
<td>27</td>
<td>27</td>
<td>27</td>
<td>3.63</td>
<td>0.32</td>
<td>0.09</td>
<td>0.008</td>
<td>0.009</td>
</tr>
<tr>
<td>20</td>
<td>77</td>
<td>78</td>
<td>78</td>
<td>79</td>
<td>80</td>
<td>14.4</td>
<td>1.8</td>
<td>0.80</td>
<td>0.043</td>
<td>0.038</td>
</tr>
<tr>
<td>50</td>
<td>290</td>
<td>291</td>
<td>292</td>
<td>303</td>
<td>305</td>
<td>38.6</td>
<td>8.1</td>
<td>1.53</td>
<td>0.18</td>
<td>0.23</td>
</tr>
<tr>
<td>100</td>
<td>811</td>
<td>821</td>
<td>819</td>
<td>862</td>
<td>848</td>
<td>298</td>
<td>15</td>
<td>3.12</td>
<td>0.47</td>
<td>0.62</td>
</tr>
<tr>
<td>500</td>
<td>8305</td>
<td>8377</td>
<td>8395</td>
<td>9032</td>
<td>8861</td>
<td>12600</td>
<td>140</td>
<td>12.4</td>
<td>3.97</td>
<td>3.31</td>
</tr>
<tr>
<td>Average</td>
<td>1.000</td>
<td>1.006</td>
<td>1.007</td>
<td>1.037</td>
<td>1.034</td>
<td>1.000</td>
<td>0.093</td>
<td>0.025</td>
<td>0.002</td>
<td>0.002</td>
</tr>
</tbody>
</table>

has been observed for other test cases as well. Therefore, we use a 4x finer grid in all of the following experiments.

4.2 Obstacle-Free Routing

Table I presents experiments for obstacle-free routing. We first compare 1-cktSteiner with the exact solution of GeoSteiner [Warne et al. 2007] and FastSteiner, which generates the shortest wirelength among existing heuristics. Both 1-cktSteiner and FastSteiner [Kahng et al. 2003] are about less than 1% worse than GeoSteiner. 1-cktSteiner is, on average, 5.2x faster (11.3x faster for the largest example) than FastSteiner, which in turn is, on average, 208x faster than GeoSteiner. Table I also compares B-cktSteiner with FLUTE [Chu and Wong 2005], the fastest algorithm among existing heuristics. B-cktSteiner reduces up to 1.9% of the wirelength with a similar runtime when compared to FLUTE. On average, B-cktSteiner is 3.4% worse than the exact solution in terms of wirelength, and FLUTE is 3.7% worse. Compared to 1-cktSteiner, B-cktSteiner obtains similar wirelength for up to 20 terminals, but 2.7% longer wirelength for larger numbers of terminals. The runtime of B-cktSteiner is 24x smaller.
We also compare our algorithm with the Magma routing package [Magma 2007] on real industrial circuits. Both packages are implemented inside the Magma tool flow, and the experiments are run on AMD Opteron double-CPU servers (2.6G CPUs, 4–8G RAM). The results are reported in Table II. As we can see from the table, compared with the Magma routing package containing eight routing algorithms, 1-cktSteiner reduces the chip-level total wirelength by up to 1.23% and reduces net-based wirelength by up to 8.15%.

### 4.3 Obstacle-Avoiding Routing

Table III presents experiments for routing with obstacles. Compared to An-OARSMan [Hu et al. 2005], the best existing heuristic for obstacle-avoiding routing, 1-cktSteiner, reduces wirelength by up to 6.12% for large test cases with a similar runtime, and B-cktSteiner has an average speedup of 352x with wirelength similar to that produced by An-OARSMan. Because the global router in Magma tool Talux PX is not able to consider obstacles, we cannot make a comparison in an industrial setting.

### 4.4 The Number of Steiner Points Required for the Optimal Solution

We also illustrate the relationship between the number of terminals \( n \) and number of Steiner points \( q \) used by 1-cktSteiner. For each terminal number, we use 10 randomly generated test cases for both obstacle-free and
obstacle-avoiding routings. The result is shown in Figure 13. Clearly, the maximum number of Steiner points required is less than \( n - 2 \), and in most cases, about \( \frac{n}{2} \) Steiner points are required for both RSMT and OARSMT. This observation may be used to develop more efficient algorithms in the future.

5. CONCLUSIONS AND DISCUSSIONS

Using an RC network to simulate routing, we show in this article that Steiner points can be selected based on circuit behavior. Then, any routing algorithm can apply the selected Steiner points to construct a (rectilinear) Steiner minimum tree (RSMT). In this article, we apply selected Steiner points to 1-Steiner algorithms and develop the 1cktSteiner algorithm and a faster version, the bcktSteiner algorithm. When constructing RSMT without obstacles, 1cktSteiner
obtains similar length but runs 11.3x faster than FastSteiner, the existing algorithm with minimum wirelength. B-cktSteiner reduces wirelength by up to 1.9% with a similar runtime compared to FLUTE, the existing most efficient algorithm. In addition, our algorithms can deal with obstacle-avoiding cases at similar runtimes compared with obstacle-free cases. 1-cktSteiner reduces up to 6.12% of the wirelength and runs 352x faster than An-OARSMan, the existing best algorithm for obstacle-avoiding routing.

The package of cktSteiner can be downloaded at http://eda.ee.ucla.edu/tools.html. In the future, we will extend circuit-simulated routing to routing congestion estimation, and routing for performance and other routing objectives and constraints.

ACKNOWLEDGMENT

The authors would like to thank Hsiao-Ping Tseng and Yangdong Deng at Magma Design Automation for help on chip-level comparison with the Magma router. The authors would also like to thank the anonymous reviewers for their review comments, which helped to improve the presentation of this work significantly.

REFERENCES


