# Reliable 3-D Clock-Tree Synthesis Considering Nonlinear Capacitive TSV Model With Electrical–Thermal–Mechanical Coupling

Sai Manoj P. D., Student Member, IEEE, Hao Yu, Member, IEEE, Yang Shang, Student Member, IEEE, Chuan Seng Tan, Member, IEEE, and Sung Kyu Lim, Senior Member, IEEE

Abstract-A robust physical design of 3-D IC requires investigation on through-silicon via (TSV). The large temperatures and stress gradients can severely affect TSV delay with large variation. The traditional physical model treats TSV as a resistor with linear electrical-thermal dependence, which ignores the fundamental device physics. In this paper, a physics-based electricalthermal-mechanical delay model is developed for signal TSVs in 3-D IC. With consideration of liner material and also stress, a nonlinear model is established between electrical delay with temperature and stress. Moreover, sensitivity analysis is performed to relate the reduction of temperature and stress gradients with respect to dummy TSVs insertion. Taking the design of 3-D clock tree as a case study, we have formulated a nonlinear optimization problem for clock-skew reduction. By allocating dummy TSVs to reduce the temperature and stress gradients, the clock skew introduced by signal TSVs and drivers can be minimized. A number of 3-D clock-tree benchmarks are utilized in experiments. We have observed that with the use of dummy TSV insertion, clock skew can be reduced by 61.3% on average when the accurate nonlinear electrical-thermal-mechanical delay model is applied.

*Index Terms*—Clock-skew reduction, electrical-thermalmechanical coupling, nonlinear MOSCAP, stress gradient, temperature gradient, thermal TSV, through-silicon via (TSV), TSV stress.

### I. INTRODUCTION

**3**-D INTEGRATED circuits (3-D ICs) have regained the interest for big bandwidth in the design of many-core microprocessor server. The utilization of through-silicon vias (TSVs) in 3-D IC can significantly reduce the latency and power dissipation in global interconnect such as memory buses and also clock trees [1]–[18]. However, one robust 3-D IC design requires a careful examination of TSVs in which electrical states such as delays are coupled from multiple physical domains. First, thermal reliability is one primary concern since

Manuscript received February 22, 2013; revised April 13, 2013, May 15, 2013; accepted June 9, 2013. Date of current version October 16, 2013. This paper was recommended by Associate Editor C.-N. Chu.

S. Manoj P. D., H. Yu, Y. Shang, and C. S. Tan are with the School of Electrical and Electronic Engineering, Nanyang Technological University 639798, Singapore (e-mail: SAIMANOJ002@e.ntu.edu.sg; haoyu@ntu.edu.sg; yshang1@e.ntu.edu.sg; TanCS@ntu.edu.sg).

S. K. Lim is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail: limsk@ece.gatech.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCAD.2013.2270285

heat sink has a long distance for the top layer. Moreover, dynamic voltage and frequency scaling of many-core can result in highly nonuniform power density. As such, there exists a large temperature gradient that can result in delay variation of TSV by electrical-thermal coupling. What is more, because coefficients of thermal expansion (CTE) of TSV material and substrate material are different, large mechanical stress can be introduced, which in turn leads to delay variation of driver by electrical-mechanical coupling. Since delay variation introduces skew for delay sensitive clock-tree design, a robust physical design in 3-D IC with TSVs thereby needs to consider optimization from coupled electrical, mechanical, and thermal domains.

The design of clock tree is primarily involved with reduction of delay difference at different sinks, known as skew [19]-[26]. Compared to clock tree in 2-D IC, the one in 3-D IC will experience much larger temperatures and stress gradients both vertically and horizontally. As signal TSVs are deployed to route 3-D clock tree over the entire 3-D chip, such nonuniform temperature differences can lead to a significant clock skew by electrical-thermal coupling of signal TSVs [24]–[26]. Such an electrical–thermal coupling becomes nonlinear when liner material is considered. Moreover, the TSV-induced stress also affects the mobility and delay of drivers, which further worsens the clock skew over the entire 3-D chip by electrical-mechanical coupling of drivers [10]. As such, the traditional clock-tree design methods [19]-[21] without considering temperature and stress gradients will become inaccurate and unreliable. The thermal-aware 3-D clock-tree synthesis has been discussed in [4] considering thermal profile. A 3-D embedding method was developed in [17] to reduce the wire length. Further optimization in [11] is developed to reduce power and slew rate. However, previous methods only conduct the clock network optimization based on linear or nonlinear electrical-thermal-mechanical coupling, ignoring the TSV physical model and hence is not accurate. As such, there is no specific problem formulated for clock-tree design based on the reduction of both thermal and stress gradients in 3-D IC.

In this paper, based on recent measurement results in [6], [8], [10], [12], and [13], a nonlinear electrical-thermalmechanical delay model is developed for 3-D clock-tree design. Further, sensitivity analysis is performed for dummy TSV density with respect to reduction of temperature or stress



Fig. 1. Simplified TSV fabrication procedure.

gradient. Based on accurate TSV models, a reliable 3-D clocktree design is formulated by dummy TSVs insertion to balance the clock skew by reducing temperature and stress gradient, with consideration of nonlinear electrical-thermal-mechanical delay model. A nonlinear programming-based optimization is developed and implemented to determine the allocation of dummy TSVs. Experimental results show that with insertion of a reasonable number of dummy TSVs, the average clock skew can be reduced by 61.3% for clock-tree benchmarks in [27] in 3-D design [11]. Compared to clock-tree design by linear delay model, our approach by nonlinear delay model reduces clock skew further by 12.2% under same thermal conditions and same TSV density.

The rest of this paper is organized as follows. The reliable 3-D clock-tree design problem formulation is discussed in Section II. Modeling of signal TSVs and drivers under electrical-thermal-mechanical coupling is discussed in Section III, and the dummy TSV model and the sensitivity study are discussed in Section IV. Section V discusses the nonlinear electrical-thermal-mechanical delay and skew model. Nonlinear optimization for reduction of clock skew is presented in Section VI. Numerical experimental results for modeling and optimization are presented in Section VII. The paper is concluded in Section VIII.

### II. 3-D RELIABLE CLOCK-TREE DESIGN PROBLEM

Many approaches have been applied in 2-D clock-tree designs to reduce clock skew such as buffer sizing [4], merging point adjustment [22], and wire-length balancing [21]. Due to nonuniform power densities by many-core microprocessors, there is a large nonuniform temperature gradient. Moreover, a large stress gradient is induced by TSVs during annealing. Accordingly, the 3-D clock tree will experience significant clock skew under large temperature and stress gradients with a new problem formulation required.

### A. TSV Fabrication

An accurate physical model of TSV needs a detailed investigation on its fabrication. As shown in Fig. 1, TSVs are used as vertical interconnections between stacked dies for providing electrical interconnect as well as heat dissipation. To perform these functions, the materials used for TSVs should have good electrical conductivity as well as thermal conductivity. Tungsten (W), poly-silicon, and copper (Cu) can be considered as TSV fill materials. Due to low resistivity and cost, copper is the widely used material for TSV fill [10]. To fabricate TSVs, TSV etching is performed by deep ion reactive etching (DRIE), laser or chemical etching, etc. After the etching is performed, the liner material is deposited to prevent ion particle diffusion. After forming the liner layer, the TSV material, such as copper or tungsten, is filled in the etched region at high temperature. After annealing to low temperature,



Fig. 2. 3-D clock-tree distribution network at different tiers. (a) Clock tree with 14 TSV bundle locations in an H-tree. (b) Clock tree with 28 TSV bundle locations in an H-tree. (c) Layer configuration under nonuniform temperature distribution.

the substrate is thinned and the current layer can be aligned and integrated with other layers. Due to the existence of liner and also the annealing, the physical TSV model becomes electrical, thermal, and mechanical coupled.

### B. Problem Formulation

3-D clock tree, as shown in Fig. 2, makes use of TSV for vertical interconnections, which can have significant delay. As such, in 3-D clock-tree design, unlike 2-D clock-tree synthesis, the impact of TSV also needs to be considered. Stacking of dies in vertical direction in 3-D design increases the overall temperature and also the temperature gradient, due to the increased and nonuniform power density and heat-dissipation path. Due to the temperature gradient, the device characteristics also vary because of electrical-thermal coupling. The temperature distribution on a 3-D clock tree is shown in Fig. 2. Normally, RC delay  $D_{\rm RC}$  for the traditional TSV is modeled as RC-interconnect by a linear electrical-thermal coupling

$$R_T = R_0(1 + \alpha \cdot \delta T); D_{\rm RC} = R_0 C_0(1 + \alpha \cdot \delta T)$$
(1)

where  $R_T$  is thermal-dependent resistance,  $R_0$  represents resistance at room temperature,  $C_0$  represents capacitance at room temperature,  $\delta T$  is the difference of the operating temperature T and room temperature  $T_0$ . In addition, note that  $\alpha$  is the temperature-dependent coefficient for resistance, whose value is experimentally determined.

However, due the existence of liner material around TSVfill, it forms a nonlinear MOSCAP that can significantly affect TSV delay. According to the measurement results in [8], the contribution of second-order temperature-dependent contribution in MOSCAP model is increased to 30% of the overall TSV capacitance at 150 °C. As such, TSV needs to be characterized as a capacitor with nonlinear temperaturedependent instead of linear temperature-dependent resistor. What is more, TSVs exert mechanical stress on the silicon substrate due to mismatched CTEs. The impact of stress can affect mobility and delay of driver. Eventually, one needs a nonlinear electrical-thermal-mechanical coupled delay model in 3-D IC.

By considering all the aforementioned effects, delay at each clock sink i of one 3-D clock tree needs to be modeled as a



Fig. 3. (a) Signal TSV and dummy TSV in 3-D IC. (b) 3-D view of TSV. (c) Equivalent RC circuit of TSV.

nonlinear model of temperature and stress gradient  $\Gamma$ . Thus, delay at sink *i* can be modeled as  $\delta D_i = f(\Gamma)$ . Clock skew *S* is defined as the maximal difference between two clock sinks. Note that by adding dummy TSVs, one can balance the temperature and stress gradient and can further reduce delay and skew. As such, we have the following problem formulation toward reliable 3-D clock-tree design under temperature and stress gradient.

*Problem 1*: For a presynthesized zero-skew 3-D clock tree with  $N_S$  sinks, using signal TSVs for intertier connections, the clock skew **S** needs to be estimated by considering position and number of TSVs, i.e., temperature and stress gradient Γ, and by considering nonlinear electrical-thermal-mechanical coupling. A large number of dummy TSVs can be inserted to minimize **S** under Γ

$$\mathbf{S} = \max : |\delta D_i - \delta D_i|, 0 \le i, j \le N_S$$
(2)

where  $\delta D_i$  and  $\delta D_i$  are delays from sinks *i* and *j*, respectively.

In order to solve this problem, electrical, thermal, and mechanical couplings are initially studied in Section III. Then, a sensitivity analysis is performed to study the reduction of temperature gradient and stress gradient when adding dummy TSVs in Section IV. A nonlinear electrical-thermal-mechanical coupled delay model is derived to calculate clock-tree delay and skew in Section V. Finally, a nonlinear optimization is deployed to further minimize the clock skew under nonuniform temperature and stress gradient in Section VI. Note the presynthesized zero-skew 3-D clock tree is based on the work [11] to consider wire length and driver but without considering electrical-thermal-mechanical coupling.

## III. 3-D NONLINEAR Electrical – Thermal – Mechanical Coupling

## A. Signal TSV Model

The TSV utilized for interlayer signal connection is called signal TSV, which connects clock tree at two different layers. In the previous 3-D clock-tree synthesis [4], [11], [22], vias are electrically modeled with a simple RC-delay model with only resistance (R) as linearly temperature dependent and capacitance (C) as constant. Due to the existence of liner, there is a nonlinear dependence on temperature. An electrical– thermal model of a single signal TSV is thereby studied in this section. Note that the mechanical stress to signal TSVs is too small to be considered.



Fig. 4. (a) Typical C-V curve of nonlinear MOSCAP model for signal TSV. (b) Exertion of stress on transistors/drivers in unit square grid.

1) Nonlinear MOSCAP Model: Due to the existence of liner in TSVs, a MOS-capacitor (MOSCAP) is formed between signal TSV and substrate. This nonlinear capacitance depends on biasing voltage ( $V_{BIAS}$ ) as well as temperature [8]. The difference in work function between metal material of TSV and substrate results in the existence of depletion region. The radius of the depletion region varies with biasing voltage and temperature, resulting in nonlinear capacitance. Typical C-V curve of TSV is shown in Fig. 4(a) and can be divided into accumulation, depletion, and inversion regions separated by flat-band voltage ( $V_{FB}$ ) and threshold voltage ( $V_T$ ). For higher frequencies (> 1MHz) the inversion regions.

As such, the RC-delay parameters of TSV with dependence on temperature can be modeled as

$$R_T = \frac{\rho h}{\pi r_{\text{metal}}^2}; \frac{1}{C_T} = \frac{1}{C_{\text{ox}}} + \frac{1}{C_{\text{dep}}}$$
 (3)

with

$$C_{\rm ox} = \frac{2\pi\varepsilon_{\rm ox}h}{\ln(\frac{r_{\rm ox}}{r_{\rm restal}})}; C_{\rm dep} = \frac{2\pi\varepsilon_{\rm si}h}{\ln(\frac{r_{\rm dep}}{r_{\rm ox}})}$$
(4)

where  $R_T$  and  $C_T$  represent the temperature-dependent resistance and capacitance of TSV, respectively.  $C_{ox}$  and  $C_{dep}$  are liner and depletion region capacitances of TSV, respectively. TSV height is represented by h.  $\rho$  is the resistivity of metal material of TSV.  $\varepsilon_{ox}$  and  $\varepsilon_{si}$  are dielectric constants of silicon dioxide and silicon, respectively.  $r_{metal}$ ,  $r_{ox}$ , and  $r_{dep}$  are the outer radii of TSV metal, silicon, and depletion region, respectively. Since the thermal conductivity of liner (SiO<sub>2</sub>) is 100 times lower than the silicon substrate, liner prevents the dissipation of heat from the substrate and results in hotspot at signal TSVs. As shown by measured results in [8], the TSV capacitance can approach liner capacitance at high temperature due to the existence of hotspot and nonlinear temperature dependence.

2) Electrical–Thermal Coupling: At higher frequencies in deep-depletion region, the signal TSV C-V curve tends to be flat with changing  $V_{\text{BIAS}}$ . However, the deep-depletion region capacitance of TSV still varies nonlinearly with temperature due to  $r_{\text{dep}}$ . The resistance of TSV can be modeled as linearly dependent with temperature, as given in (1). The nonlinear temperature-dependent capacitance of TSV based on measurement results from fabricated testing TSVs [8] can be given by

$$C_T = C_0 + \beta_1 T + \beta_2 T^2$$
 (5)

where  $C_T$  and  $C_0$  are temperature-varying TSV capacitance and capacitance at zero temperature. *T* is the temperature.  $\beta_1$ and  $\beta_2$  are the first- and second-order temperature-dependent coefficients of  $C_T$ , which are determined experimentally and reported in [8]. The nonlinear variation of capacitance is different from the traditional via characterization, which is modeled as linear variants with temperature. However, as shown in this paper, the nonlinearity in TSVs can be observed mainly due to the existence of liner material. As such, the nonlinear electrical-thermal coupling of signal TSV can bring significant impact on delay when using TSVs in 3-D clock tree.

## B. Driver Model

TSVs can further exert mechanical stress on the device layer. This mechanical stress has impact on the mobility and delay of drivers. As TSV density can be nonuniform across the chip, it results in stress gradient and can further introduce delay variation or skew.

1) Thermal–Mechanical Coupling: TSV material and device layer have different CTEs. In addition, they can have different temperatures during the time of operation. The different temperatures lead to difference in the amount of expansion of TSV and the substrate, resulting in mechanical stress. The mechanical stress exerted by multiple TSVs on substrate can be found by principle of superposition [10] by

$$\sigma_{i} = -\frac{B\Delta T\Delta\alpha}{2} (\frac{R_{i}}{r_{i}})^{2}$$

$$\sigma = -\sum_{i=1}^{n} \sigma_{i} = -\frac{Bn\Delta\alpha\Delta T}{2} (\frac{R}{r})^{2}$$

$$n = \eta A$$
(6)

where  $\sigma_i$  is the stress from *i*th TSV, *B* is the biaxial modulus,  $\Delta \alpha$  is CTE difference between TSV material and substrate as a constant,  $\Delta T$  is the annealing temperature difference,  $R_i$  is the radius of *i*th TSV,  $r_i$  represents the distance of a transistor or a driver from the center of TSV, and *n* represents the number of TSVs with a TSV density of  $\eta$  in area *A*. For simplicity, all TSVs are considered to be of the same radius *R*, and the drivers are approximated with the same distance *r* from center of TSVs. As such, we can observe a thermal–mechanical dependence to characterize the mechanical stress between TSV and the substrate. In thermal–mechanical coupling, the main focus is on exertion of stress with respect to temperature and TSV density.

2) *Electrical–Mechanical Coupling:* Exerted mechanical stress from TSVs will affect the carrier mobility of drivers [10], [12]. Higher the amount of exerted stress from TSV, stress impact on carrier mobility will be high. This variation of mobility will affect the electrical delay of drivers. Variation in carrier mobility due to exerted mechanical stress can be given by [10], [12]

$$\frac{\delta\mu}{\mu} = -\Pi \times \sigma; \quad m = -\Pi_x \tag{7}$$

where  $\frac{\delta \mu}{\mu}$  is the ratio of mobility variation,  $\Pi$  is the tensor of piezoresistive coefficients, and  $\sigma$  represents mechanical stress. Note that *m* represents the maximum value of  $\Pi$  among all the directions (*x*, *y*, *z*), in order to capture its most significant impact on transistor. For example,  $\Pi_x$  is used to represent



Fig. 5. (a) 3-D heat-removal path by dummy TSVs. (b) 3-D view of dummy TSV insertion.

the value of the maximum value in tensor  $\Pi$ . The value of *m* indicates the enhancement factor along the direction that results in the maximum stress. It can be different for pMOS and nMOS devices, and can result in a different amount of mobility variations [10], [12]. The ratio of mobilities with and without stress can be calculated as

$$\frac{\mu_s}{\mu} = 1 + \frac{\delta\mu}{\mu} = 1 + m\sigma \tag{8}$$

where  $\mu_s$  and  $\mu$  represent the mobility of charge carriers with and without impact of stress;  $\frac{\delta\mu}{\mu}$  is the mobility variation ratio. As the amount that the stress exerted increases, the ratio of

As the amount that the stress exerted increases, the ratio of mobility with and without stress also increases. This variation in mobility results in a change of the source resistance of one driver. The variation of source resistance with mobility can be given by

$$R_D^M = \frac{R_D}{1 + \frac{\delta\mu}{\mu}} = \frac{R_D}{1 + m\sigma} \tag{9}$$

where  $R_D^M$  is the driver resistance with impact of stress, and  $R_D$  is the driver resistance without impact of stress. As such, when the TSV density is different, the impact of exerted mechanical stress on the driver delay can be different, which further affects the clock-tree delay. Note that there is a thermal–mechanical coupling discussed previously, which can further worsen the delay dependence on both thermal and mechanical couplings.

### **IV. 3-D DUMMY TSV INSERTION AND SENSITIVITY**

In the previous section, modeling of signal TSV and driver is studied under electrical-thermal-mechanical coupling. The characterization of dummy TSVs is equally important to determine the relation between the reduction of thermal and stress gradient with the dummy TSV insertion density.

### A. Dummy TSV Insertion Density

TSV utilized for interlayer insertion but without signal connection is called dummy TSV. As dummy TSV is filled with metal material Cu with good thermal conductivity of 400 W/mK, it can provide the heat dissipation path vertically to balance the temperature gradient. What is more, adding dummy TSVs can balance the density of TSV distribution, which also helps to reduce the stress gradient.

1) Reduction of Thermal Gradient: For a chip-level thermal analysis, single dummy TSV impact is not effective. Dummy TSVs are modeled in terms of local density as shown in Fig. 5, where dummy TSVs occupy an area of  $\eta A$  on a regular chip area A. Considering the vertical heat dissipation, the total thermal conductivity  $\lambda$  is given by

$$\lambda = \eta \lambda_{TSV} + (1 - \eta) \lambda_0;$$
  
=  $(\eta + \delta \eta) \lambda_{TSV} + (1 - (\eta + \delta \eta)) \lambda_0; \delta n = \delta \eta A$  (10)

where initial thermal conductivity is  $\lambda_0$ , initial TSV density is  $\eta$ , change of TSV density is  $\delta\eta$ , and change of TSVs is  $\delta n$ with respect to initial number of TSVs *n*.

As such, the temperature gradient reduction with a change of  $\delta\eta$  TSV density is given by

$$\delta T = T - T_0 = \frac{P \cdot l}{A\lambda_0} \cdot \frac{\delta \eta}{\frac{\lambda}{\lambda_{\text{TSV}} - \lambda_0} + \eta + \delta \eta}$$
(11)

where *P* is the heat power flowing from chip to heat sink, and *l* is the length of heat-transfer path distance with a chip area of *A*. From (11), one can observe that as  $\delta \eta$  approaches or becomes larger than  $\lambda/(\lambda_{\text{TSV}} - \lambda_0) + \eta$ , the reduction in temperature due to dummy TSVs becomes saturated.

2) Reduction of Stress Gradient: Considering a square having four TSVs at its four corners, all transistors inside that particular square will experience stress from all four TSVs. The stress contour from TSVs and its impact on the neighbor transistors are shown in Fig. 4(b). The stress on each of the transistor from different TSVs can be calculated by (6). What is more, it can be observed that there will be a reduction of stress gradient with insertion of dummy TSVs at proper locations. The stress gradient reduction  $\delta\sigma$  caused by TSV density difference can be given as

$$\delta\sigma = -\frac{B\Delta\alpha\Delta T}{2} (\frac{R}{r})^2 \delta n = -\frac{B\Delta\alpha\Delta T}{2} (\frac{R}{r})^2 \delta\eta A \qquad (12)$$

where  $\delta n$  represents the additional number of TSVs added and  $\delta \eta$  represents a change in TSV density due to insertion of additional TSVs. When the density becomes more uniform, the stress gradient also becomes smaller.

### B. Sensitivity of Temperature Gradient Reduction

As can be observed from (11), at a certain dummy TSV density, the reduction in temperature gradient tends to saturate. Sensitivity of temperature gradient reduction with respect to the dummy TSV density can be given by

$$\frac{\partial T}{\partial \eta} = \frac{P \cdot l}{A\lambda_0} \cdot \frac{\eta_0}{(\eta_0 + \eta)^2}; \eta_0 = \frac{\lambda_0}{\lambda_{TSV} + \lambda_0}.$$
 (13)

From (13), it can be clearly concluded that, when the dummy TSV density  $\eta$  is smaller than its saturation value,  $\eta_0$ , i.e.,  $\eta \ll \eta_0$ , the sensitivity of temperature gradient with dummy TSV density remains almost constant; and as  $\eta \gg \eta_0$ , the sensitivity approaches zero, implying that reduction of temperature tends to saturate. Thus, temperature sensitivity function with dependence on dummy TSV density can be used during the optimization of dummy TSV insertion.

### C. Sensitivity of Stress Gradient Reduction

From (14), one can observe that the stress gradient reduction depends on the TSV density. The sensitivity of stress gradient reduction with respect to TSV density is given by

$$\frac{\partial\sigma}{\partial\eta} = -\frac{B\Delta\alpha\Delta TA}{2} (\frac{R}{r})^2.$$
 (14)

The stress gradient reduction sensitivity may look as independent from TSV density, but depends on radius of TSV and area, which has impact on TSV density. For a particular TSV density, within given area, TSVs greater than a particular radius cannot be inserted. When the radius of TSV becomes very small compared to the distance, the stress gradient tends to saturate early compared to TSVs with larger radius, due to smaller mechanical stress from TSV. All these sensitivity analysis will be deployed in the optimization of clock-skew reduction in the later part of this paper.

## V. 3-D NONLINEAR DELAY MODEL

The effect of temperature and stress on electrical parameters can be utilized for a detailed delay analysis when including signal TSV. Moreover, one can further study the delay sensitivity with respect to dummy TSV insertion density as well. In this section, the delay modeling in 3-D IC is developed based on electrical-thermal coupling, electrical-mechanical coupling, and eventually electrical-thermal-mechanical coupling. Its sensitivity is also derived with respect to dummy TSV density. As shown in Fig. 6, the 3-D delay model for clock tree includes driver, 2-D wire, and 3-D signal TSV.

## A. Delay Model With Electrical-Thermal Coupling

Temperature has a major impact on signal TSV nonlinear MOSCAP capacitance according to (3). By considering nonlinear electrical-thermal coupling from signal TSV as in (5), the signal delay  $D_{TSV1}$  for clock tree shown in Fig. 6 is calculated as

$$D_{\text{TSV1}} = R_{\text{in}}\alpha\beta_2 T^3 + R_{\text{in}}[(1 - \alpha T_0)\beta_2 + \alpha\beta_1]T^2 + [\alpha(D_0 + R_{\text{in}}C_0) + (1 - \alpha T_0)R_{\text{in}}\beta_1]T + (1 - \alpha T_0)(R_{\text{in}}C_0 + D_0)$$

$$R_{\rm in} = \frac{R_D}{S_D} + S_{w1}R_{w1} + \frac{R_T}{2S_T};$$

$$D_0 = \frac{1}{2}(S_{w1}^2R_{w1}C_{w1} + S_{w2}R_{w2}S_LC_L)$$
(15)

$$+ (\frac{R_T}{S_T} + S_{w1}R_{w1} + \frac{S_{w2}R_{w2}}{2})(S_{w2}C_{w2} + S_LC_L) + \frac{R_D}{S_D}(S_{w1}C_{w1} + S_{w2}C_{w2}S_LC_L + S_DC_P)$$

where  $R_{in}$  is the total resistance counted from TSV capacitor,  $C_T$  is the total capacitance counted from the input,  $C_L$  is the load capacitance,  $\alpha$  is the temperature-dependent coefficient of  $R_T$ ,  $\beta_1$ ,  $\beta_2$  are the first-order and second-order temperaturedependent coefficients of  $C_T$ , and T represents the temperature. Note that  $D_0$  is the delay of the circuit shown in Fig. 6(b)



Fig. 6. Delay model of 3-D clock tree with nonlinear electrical-thermal-mechanical coupled model. (a) 3-D TSV clock tree with buffer. (b) Delay model with nonlinear electrical-thermal-mechanical coupling.

TABLE I PHYSICAL PARAMETERS USED IN TSV MODELING

| Physical geometries    |                                                                |  |  |  |  |  |
|------------------------|----------------------------------------------------------------|--|--|--|--|--|
| Notation               | Notation Definition                                            |  |  |  |  |  |
| rmetal. rdep. rox      | Radius of TSV metal, silicon and depletion region respectively |  |  |  |  |  |
| R                      | Radius of TSV                                                  |  |  |  |  |  |
| h                      | Height of TSV                                                  |  |  |  |  |  |
| $r_i, i = \{1, 2, 3\}$ | Distance to transistor from center of TSV                      |  |  |  |  |  |
| П                      | Tensor of piezo-resistive constant of material                 |  |  |  |  |  |
| T2, T4, T8, T10        | TSV bundles containing 2,4,8 and 10 TSVs respectively          |  |  |  |  |  |

TABLE II Electrical Parameters Used in TSV Modeling

| Electrical parameters                                   |                                                                           |  |  |  |  |
|---------------------------------------------------------|---------------------------------------------------------------------------|--|--|--|--|
| Notation                                                | Notation Definition                                                       |  |  |  |  |
| $R_0$                                                   | R <sub>0</sub> Interconnect resistance at room temperature T <sub>0</sub> |  |  |  |  |
| $C_0$                                                   | TSV capacitance at zero-temperature                                       |  |  |  |  |
| $\varepsilon_{ox}, \varepsilon_{si}$                    | Dielectric constant of silicon oxide and silicon                          |  |  |  |  |
| $C_{ox}, C_{dep}$                                       | Liner capacitance and depletion capacitance of TSV                        |  |  |  |  |
| $C_P, C_L$                                              | Unit buffer and load capacitances                                         |  |  |  |  |
| R <sub>w1</sub> , R <sub>w2</sub> Unit wire resistances |                                                                           |  |  |  |  |
| $R_D$                                                   | Unit buffer resistance without stress and with impact of stress           |  |  |  |  |
| μ                                                       | Mobility of charge carrier without stress                                 |  |  |  |  |
| $S_{w1}, S_{w2}$                                        | Scaling factor for wires                                                  |  |  |  |  |
| $S_D, S_T, S_L$                                         | Size scaling factor for driver, TSV and load respectively                 |  |  |  |  |
| $R_{in}$                                                | Total input resistance of Fig. 6b looking from $C_T$                      |  |  |  |  |
| $D_0$                                                   | Delay of 3D clock tree in Fig. 6b without $C_T$                           |  |  |  |  |
| $D_w$                                                   | Delay caused by wires in electrical-mechanical model                      |  |  |  |  |
| $D_c$                                                   | Delay caused by wires in electrical-thermal-mechanical model              |  |  |  |  |
| $S_c$                                                   | Temperature and stress gradient independent skew coefficient              |  |  |  |  |

TABLE III THERMAL PARAMETERS USED IN TSV MODELING

| Thermal parameters |                                                              |  |  |  |  |  |
|--------------------|--------------------------------------------------------------|--|--|--|--|--|
| Notation           | Notation Definition                                          |  |  |  |  |  |
| $T_0, T$           | Room temperature and temperature T respectively              |  |  |  |  |  |
| $R_T$              | Interconnect resistance at temperature T                     |  |  |  |  |  |
| α                  | Temperature dependent coefficient for calculating resistance |  |  |  |  |  |
| $C_T$              | TSV capacitance at temperature T                             |  |  |  |  |  |
| $\beta_1, \beta_2$ | first-order and second-order coefficients of $C_T$           |  |  |  |  |  |
| $\delta T$         | Temperature gradient                                         |  |  |  |  |  |
| $\lambda_{Total}$  | Total heat conductivity of the chip with TSVs inserted       |  |  |  |  |  |
| $\lambda_0$        | Heat conductivity of regular chip area                       |  |  |  |  |  |
| $\lambda_{TSV}$    | TSV heat conductivity                                        |  |  |  |  |  |
| P                  | Heat power flowing from chip to heat-sink                    |  |  |  |  |  |

without TSV; and all other parameters can be found from Tables II to III.

In the calculation of delay with electrical-thermal coupling, the impacts of horizontal metal wires and buffers on delay are also considered. The nonlinearity in the delay mainly arises from nonlinear temperature-dependent TSV capacitor, while horizontal metal wire is modeled as a linear temperaturedependent resistor. Majority of the delay is contributed by the signal TSV.

## B. Delay Model With Electrical–Mechanical Coupling

In this section, the impact of mechanical stress on the transistors or drivers is further considered. The mechanical stress from TSVs has nonnegligible impact on the driver

resistance according to (9). The clock-tree delay  $D_{TSV2}$  with electrical-mechanical coupling is then calculated as

$$D_{\text{TSV2}} = \frac{D_{\sigma}}{1 + m\sigma} + D_{w};$$

$$D_{\sigma} = \frac{R_{D}}{S_{D}} [C_{P}S_{D} + S_{w1}C_{w1} + S_{T}C_{0} + S_{w2}C_{w2} + S_{L}C_{L}];$$

$$D_{w} = S_{w1}R_{w1} [\frac{S_{w1}C_{w1}}{2} + S_{T}C_{0} + S_{w2}C_{w2} + S_{L}C_{L}]$$

$$+ S_{w2}C_{w2} [\frac{S_{w2}C_{w2}}{2} + \frac{R_{0}}{S_{T}}] + \frac{R_{0}C_{0}}{2}$$
(16)

where  $D_{\sigma}$  and  $D_w$  are stress-dependent delay and independent delay, respectively.  $R_D$  is the driver resistance without impact of stress, and  $R_0$  and  $C_0$  are the temperature-independent TSV resistance and capacitance, respectively. All other parameters can be found from Tables II to IV. One can observe that  $D_{\sigma}$  is composed of stress-dependent and stress-independent components. Majority of the stress affected part is to the driver.

## C. Delay Model With Electrical–Thermal–Mechanical Coupling

Till now, the delay models have been studied by considering electrical-thermal and electrical-mechanical coupling one by one. Considering all the couplings, one can develop an accurate TSV modeling to the delay and skew in 3-D clocktree design. The clock-tree delay  $D_{\text{TSV}}$  considering electricalthermal-mechanical coupling is given by

$$D_{\text{TSV}} = D_c + D(T) + D(\sigma) + D(T, \sigma)$$
  

$$D_c = D_0 - D_{\sigma} + R_0 C_0 (1 + \alpha T_0)$$
  

$$- \frac{R_0 (S_{w2} C_{w2} + S_L C_L) \alpha T_0}{S_T};$$
  

$$D(\sigma) = \frac{R_D}{S_D (1 + m\sigma)} [S_D C_P + S_{w1} C_{w1} + S_T C_0]$$

$$S_D(1+m\sigma) = S_D(1+m\sigma) + S_LC_L + S_{w2}C_{w2};$$
(17)

$$D(T) = R_0 \alpha \beta_2 T^3 + R_0 [(1 - \alpha T_0) \beta_2 + \alpha \beta_1] T^2 + [R_0 \beta_1 (1 - \alpha T_0) + \alpha R_0 C_0 + S_{w1} R_{w1} S_T \beta_1 + \frac{R_0 \alpha (S_{w2} C_{w2} + S_L C_L)}{S_T}]T; D(T, \sigma) = \frac{R_D}{S_D (1 + m\sigma)} [S_T \beta_1 T + S_T \beta_2 T^2]$$

where  $D_c$  is a constant delay composed of  $D_0$  and  $D_{\sigma}$  given in (15) and (16); D(T) is the delay as a function of temperature only;  $D(\sigma)$  represents the delay as a function of stress only; and  $D(T, \sigma)$  is the delay as a function of both temperature and stress. All other parameters can be found from Tables II–IV.

TABLE IV MECHANICAL PARAMETERS USED IN TSV MODELING

| Mechanical parameters   |                                                                          |  |  |  |  |
|-------------------------|--------------------------------------------------------------------------|--|--|--|--|
| Notation                | lotation Definition                                                      |  |  |  |  |
| $\sigma, \sigma_i$      | Exerted cumulative and individual stress from <i>i</i> <sup>th</sup> TSV |  |  |  |  |
| $\delta \sigma$         | Stress gradient                                                          |  |  |  |  |
| m                       | mobility enhancement factor                                              |  |  |  |  |
| $\mu_s$                 | Carrier mobility with impact of stress                                   |  |  |  |  |
| $R_D^M$                 | Driver resistance with impact of stress                                  |  |  |  |  |
| η                       | TSV density for regular chip area A                                      |  |  |  |  |
| $S_{T,T}^i$             | Temperature gradient sensitivity in i <sup>th</sup> grid                 |  |  |  |  |
| $S_{T,\sigma}^{i}$      | Temperature gradient sensitivity in $i^{th}$ grid                        |  |  |  |  |
| $S^{i}_{\sigma,\sigma}$ | Stress gradient sensitivity in i <sup>th</sup> grid                      |  |  |  |  |
| $S^i_{\sigma,T}$        | Stress gradient sensitivity in i <sup>th</sup> grid                      |  |  |  |  |

As a summary, delay is quite in contrast with 2-D case that has only linear dependence on temperature. The main reason for nonlinearity arises from the nonlinear signal TSV MOSCAP capacitance with respect to temperature.

## D. Delay Sensitivity With Respect to Temperature and Stress Gradient

Till now, the impact of different couplings on delays and its dependence with stress and temperature have been studied. With one further step, we show the delay sensitivity with respect to temperature gradient and stress gradient.

The impact of the temperature and stress gradient on delay as variation or clock skew can be given by

$$S = S_c + D(\delta T) + D(\delta \sigma) + D(\delta T, \delta \sigma)$$
(18)

with

$$\begin{split} S_c &= D_c + [R_0\beta_1 + \alpha R_0C_0 + S_{w1}R_{w1}S_T\beta_1 \\ &+ \frac{R_0\alpha(S_{w2}C_{w2} + S_LC_L)}{S_T}]T_0 + R_0\beta_2T_0^2 \end{split}$$

$$D(\delta T) = [(R_0\beta_1 + \alpha R_0C_0 + S_{w1}R_{w1}S_T\beta_1 + \frac{R_0\alpha(S_{w2}C_{w2} + S_LC_L)}{S_T}) + [2R_0(\beta_2 + \alpha\beta_1)T_0 + R_0\alpha\beta_2T_0^2]\delta T + [2R_0\alpha\beta_2T_0 + R_0(\beta_2 + \alpha\beta_1)](\delta T)^2 + R_0\alpha\beta_2(\delta T)^3$$

$$D(\delta\sigma) = -\frac{R_D}{S_D(1+m\sigma_0)^2} [(S_D C_P + S_{w1} C_{w1} + S_T C_0 + S_L C_L + S_{w2} C_{w2}) + S_T \beta_1 T_0 + S_T \beta_2 T_0^2] m \delta\sigma$$

$$D(\delta T, \delta \sigma) = -\frac{R_D}{S_D (1 + m\sigma_0)^2} [(S_T \beta_1 + 2T_0) \delta T + S_T \beta_2 (\delta T)^2] m \delta \sigma$$

$$T = T_0 + \delta T; \quad \sigma = \sigma_0 + \delta \sigma.$$

Here,  $\delta T$  and  $\delta \sigma$  represent temperature and stress gradients, respectively.  $S_c$  is the temperature and stress gradient independent coefficient, with  $D_c$  representing the constant delay given in (17).  $D(\delta T)$  and  $D(\delta \sigma)$  represent dependence on temperature and stress gradients, respectively.  $D(\delta T, \delta \sigma)$  represents dependence on both temperature and stress gradients.  $T_0$  and

 $\sigma_0$  represent initial room temperature and stress, respectively. All other parameters can be found from Tables II–IV.

From (18), the temperature gradient has a major impact on the clock-skew sensitivity compared to stress gradient. The nonlinear terms from temperature and its gradient have a major impact on clock skew than that from stress gradient, which is verified by the experiment. When inserting dummy TSVs to reduce the temperature and stress gradient, one can observe the corresponding clock-skew reduction, which can be deployed for the optimization flow as discussed in the next section.

## VI. 3-D CLOCK-SKEW REDUCTION BY NONLINEAR OPTIMIZATION

The nonlinear electrical-thermal-mechanical coupling transforms the 3-D clock-skew reduction problem into a nonlinear optimization problem. In this part, a nonlinear programming based algorithm is developed for insertion of dummy TSVs to reduce the clock skew. Before optimization is performed, sensitivity of clock skew with respect to dummy TSV density needs to be discussed.

The nonlinear optimization of 3-D clock tree in this paper is performed at microarchitecture level by dividing each layer into  $M \times N$  grids. If one TSV passes through a grid  $g_i$ , the delay contributed by that grid needs to be calculated by the developed coupled electrical-thermal-mechanical model. Generally, temperature has much higher impact than stress. Moreover, the linear delay from horizontal metal wires and buffers is also considered in (19). As such, based on (18), the skew from individual grid *i* becomes

$$S_{i} = \begin{cases} S_{c} + [w_{1}\delta T_{i} + w_{2}(\delta T_{i})^{2} + w_{3}(\delta T_{i})^{3}] \\ + [s_{0} + s_{1}\delta T_{i} + s_{2}(\delta T_{i})^{2}]\delta\sigma_{i} : \\ 3D \text{ signal } TSVs \qquad (19) \\ z_{0} + z_{1}\delta T_{i} : 2D \text{ wires} \end{cases}$$

with

$$w_{1} = 2R_{0}(\beta_{2} + \alpha\beta_{1})T_{0} + [R_{0}\beta_{1} + \alpha R_{0}C_{0} + S_{w1}R_{w1}S_{T}\beta_{1} + \frac{R_{0}\alpha(S_{w2}C_{w2} + S_{L}C_{L})}{S_{T}}]$$
(20)

 $w_2 = 2R_0\alpha\beta_2T_0 + R_0(\beta_2 + \alpha\beta_1)w_3 = R_0\beta_2$ 

and

$$s_{0} = -\frac{mR_{D}}{S_{D}(1+m\sigma_{0})^{2}} [(S_{D}C_{P} + S_{w1}C_{w1} + S_{T}C_{0} + S_{L}C_{L} + S_{w2}C_{w2}) + S_{T}\beta_{1}T_{0} + S_{T}\beta_{2}T_{0}^{2}]$$

$$mR_{D}$$

$$s_1 = -\frac{mR_D}{S_D(1+m\sigma_0)^2} [S_T\beta_1 + 2S_T\beta_2 T_0];$$
(21)

$$s_2 = -\frac{mR_D}{S_D(1+m\sigma_0)^2}S_T\beta_2$$

$$z_0 = R_0 C_0 \quad z_1 = R_0 C_0 \alpha$$

where  $w_i$  and  $s_i$  are the skew coefficients in presence of TSV;  $z_0$  and  $z_1$  are the skew coefficients in the absence of

TSV;  $\delta T_i$  and  $\delta \sigma_i$  represent temperature and stress gradients in the *i*th grid, respectively.  $S_c$  is the temperature- and stressindependent coefficient of clock skew and given in (18). Other parameters can be found from Table II–IV.

## A. Clock-Skew Sensitivity With Respect to Dummy TSV Density

The sensitivity of the clock skew with respect to dummy TSV density plays an important role during the optimization. From (19), the sensitivity of clock skew in the *i*th grid can be derived as follows:

$$\frac{\partial S_i}{\partial \eta_i} = (S_{T,T}^i + S_{T,\sigma}^i) \cdot \frac{\partial T}{\partial \eta} + (S_{\sigma,\sigma}^i + S_{\sigma,T}^i) \cdot \frac{\partial \sigma}{\partial \eta}$$
(22)

with

$$S_{T,T}^{i} = w_1 + 2w_2\delta T_i + 3w_3(\delta T_i)^2$$
  

$$S_{T,\sigma}^{i} = (s_1 + 2s_2\delta T_i)\delta\sigma_i$$
  

$$S_{\sigma,\sigma}^{i} = s_0; \quad S_{\sigma,T}^{i} = s_0 + s_1\delta T_i + s_2(\delta T_i)^2.$$

Here,  $S_{T,T}^i$  and  $S_{T,\sigma}^i$  represent the temperature and temperaturestress gradient coefficients for temperature sensitivity in *i*th grid;  $S_{\sigma,\sigma}^i$  and  $S_{\sigma,T}^i$  represent the stress and stress-temperature gradient coefficients for stress sensitivity in *i*th grid; and  $s_i$ and  $w_i$  are the skew coefficients in the presence of TSV and are given in (19). The clock-skew sensitivity depends on the temperature and stress gradient sensitivities, which are eventually related to dummy TSV density, and can be determined from (13) and (14).

Based on the calculation of clock-skew sensitivities, the updated temperature and stress by the updated dummy TSV density  $\eta_i$ , in the *i*th grid can be given by

$$T_i^{new} = T_i + \gamma_T^i P_i \eta_i; \quad \sigma_i^{new} = \sigma_i + \gamma_\sigma^i \eta_i$$
(23)

where  $\gamma_T^i$  and  $\gamma_{\sigma}^i$  are temperature and stress gradients sensitivity in the *i*th grid with respect to the dummy TSV density, and are given by  $\partial T_i / \partial \eta_i$  and  $\partial \sigma_i / \partial \eta_i$  determined from (13) and (14), respectively.

Moreover,  $T_i$  and  $\sigma_i$  represent temperature and stress in *i*th grid; and  $P_i$  and  $\eta_i$  are the heat power density and TSV density in *i*<sup>th</sup> grid. Based on the updated values of temperature and stress, by updating TSV density, the skew and its sensitivity values in (19) and (22) can be updated.

### B. Nonlinear Optimization

Clock-tree branch  $B_k$  is a set of grids that branch k, passes through,  $B_k = \{g_i \mid \text{branch } k \text{ passes } g_i\}$ , with  $g_i$  representing *i*th grid. Therefore, the clock skew of a clock-tree branch is the sum of skews from all the grids it passes through

$$\mathbf{S} = \sum_{i \in B_k} S_i \tag{24}$$

where  $S_i$  represents the skew from *i*th grid among set  $B_k$ . Based on the derived sensitivity and skew function, a nonlinear optimization can be performed as follows.

Substituting (19), (22), and (23) into (24), the clock-tree branch skew S converts to a quadratic function of inserted

TABLE V

| ns Used in | NONLINEAR  | OPTIMIZATION         |
|------------|------------|----------------------|
|            | NS USED IN | IS USED IN NONLINEAR |

| Optimization parameters |                                                                        |  |  |
|-------------------------|------------------------------------------------------------------------|--|--|
| Notation                | Definition                                                             |  |  |
| $N_s$                   | Number of clock-sinks                                                  |  |  |
| $S_i$                   | Clock-skew of i <sup>th</sup> grid                                     |  |  |
| $s_i, w_i$              | Skew coefficients based on temperature gradient in presence of TSV     |  |  |
| $z_0, z_1$              | Coefficients for determining skew when no TSV exist                    |  |  |
| $\eta_i$                | TSV density in i <sup>th</sup> grid                                    |  |  |
| $\delta T_i$            | Temperature gradient in i <sup>th</sup> grid                           |  |  |
| $\delta \sigma_i$       | Stress gradient in i <sup>th</sup> grid                                |  |  |
| $\gamma_T^i$            | Thermal sensitivity capturing thermal gradient in i <sup>th</sup> grid |  |  |
| $\gamma_{\sigma}^{i}$   | Stress sensitivity capturing stress gradient in i <sup>th</sup> grid   |  |  |
| $\mathbf{S}, \hat{S}$   | Clock-skew from k <sup>th</sup> branch and its overall mean            |  |  |
| $B_k$                   | Clock-tree branch with branch $k$ passing through grid $g_i$           |  |  |
| $f, \hat{f}$            | Vector of linear coefficients of skew vector and its mean              |  |  |
| $H, \hat{H}$            | matrix of nonlinear coefficients of skew vector and its mean           |  |  |
| x                       | TSV density matrix                                                     |  |  |
| $f(\mathbf{x})$         | Objective function to be minimized                                     |  |  |
| $g_k$                   | Gradient vector of $f(\mathbf{x}_k)$                                   |  |  |
| $\alpha_k$              | Optimal value for step size                                            |  |  |
| $d_k$                   | Direction search vector at $k^{th}$ iteration                          |  |  |
| ε                       | Lagrange penalty factor                                                |  |  |

dummy TSV density  $\eta_i$ . By considering clock skew from each grid, one can represent clock skew into a matrix form as

$$\mathbf{S} = c + f^T \mathbf{x} + \frac{1}{2} \mathbf{x}^T H \mathbf{x}$$
(25)

with

$$c = S_{c}; \quad f = \begin{pmatrix} f_{0} \\ f_{1} \\ \vdots \\ f_{(M \times N)} \end{pmatrix}; \quad \mathbf{x} = \begin{pmatrix} \eta_{0} \\ \eta_{1} \\ \vdots \\ \eta_{(M \times N)} \end{pmatrix};$$

$$f_{i} = \begin{cases} (S_{T,T}^{i} + S_{T,\sigma}^{i})\lambda_{T}^{i} + (S_{\sigma,\sigma}^{i} + S_{\sigma,T}^{i})\gamma_{\sigma}^{i} \\ if \quad i \in B_{k}; \\ 0: else; \\ 0: else; \\ H_{1,0} \quad H_{1,1} \quad \cdots \quad H_{1,N_{s}} \\ \vdots \quad \vdots \quad \ddots \quad \vdots \\ H_{(M \times N),0} \quad H_{(M \times N),2} \quad \cdots \quad H_{(M \times N),N_{s}} \end{pmatrix}$$

$$H_{i,j} = \begin{cases} (6w_{3}\delta T_{j}\gamma_{T}^{j} + 2w_{2}\gamma_{T}^{j} + s_{1}\gamma_{\sigma}^{j} + 2s_{2}\delta T_{j} \\ +2s_{2}\delta_{\sigma}^{i}\gamma_{T}^{j} + 2s_{2}\delta T_{i}\gamma_{\sigma}^{j})\gamma_{T}^{i} \\ +(s_{1}\gamma_{T}^{j} + 2s_{2}\delta T_{i}\gamma_{T}^{j})\gamma_{\sigma}^{i} : \\ if \quad i, j \in B_{k}; \\ 0: else, \end{cases}$$

Here, *c* represents the zero-order coefficient of clock skew; *f* and *H* represent the linear and nonlinear coefficients of clock skew; **x** represents the dummy TSV density vector;  $N_s$ represents the total number of sinks; and  $M \times N$  is the total number of grids;  $\gamma_T^i$  and  $\gamma_\sigma^i$  are temperature and stress gradient sensitivities in *i*th grid given by (23).

Since clock skew is the difference in delay between two clock sinks, the problem thus becomes to minimize the skew variance over all clock-tree branches  $B_k$ , i.e., to minimize variance of **S** in (25)

min : 
$$f(S) = \frac{1}{N_s - 1} \sum_{k=1}^{N_s} (\mathbf{S} - \bar{S})^2$$
 (26)

where **S** represents the clock skew for  $B_k$  clock tree branches, and  $\overline{S}$  represents the average skew of **S** for  $N_s$  sinks.

As such, f(S) becomes a quadratic function of **x**, given by

$$\bar{S} = \frac{1}{N_s} \sum_{k=1}^{C} \mathbf{S} = \bar{c} + \bar{f}^T \mathbf{x} + \frac{1}{2} \mathbf{x}^T \bar{H} \mathbf{x}.$$
 (27)

where c and  $\bar{c}$  represent the zero-order coefficient of clock skew and its mean, respectively; f and  $\bar{f}$  represent the linear coefficient vector of clock skew and its mean, respectively; and H and  $\bar{H}$  represent the nonlinear coefficient matrix of clock skew and its mean, respectively.

Substituting (27) into (26), the original problem can be rewritten as one polynomial function by

Problem 2:

$$\min : f(\mathbf{x}) = \frac{1}{N_s - 1} \sum_{k=1}^{N_s} (\widehat{c}^2 + 2\widehat{c}\widehat{f}^T \mathbf{x} + \mathbf{x}^T (\widehat{f}\widehat{f}^T + \widehat{c}\widehat{H}_k) \mathbf{x} + \widehat{f}^T \mathbf{x}^T \mathbf{x} \widehat{H}_k \mathbf{x} + \frac{1}{4} \mathbf{x}^T \widehat{H}_k \mathbf{x} \mathbf{x}^T \widehat{H} \mathbf{x})$$
(28)

where  $\hat{c} = c - \bar{c}$ ,  $\hat{f} = f - \bar{f}$ , and  $\hat{H} = H - \bar{H}$ . All these values represent the deviations from their means.

Though the clock skew depends on TSV density matrix  $\mathbf{x}$ , neither large number of TSVs nor small number of TSVs can be inserted, because of design constraints as follows:

$$lb \le \mathbf{x} \le ub \tag{29}$$

where lower bound lb is determined by foundry process such as minimum metal density; and upper bound ub is determined by the maximum allowed overhead with respect to signal routing.

### C. Conjugate-Gradient Solving

Now the objective to minimize clock skew becomes minimizing (28). This can be done by finding the optimum value of the dummy TSV density vector  $\mathbf{x}$  that also satisfies constraints given in (29). Conjugate-gradient method with line search [28] can be an efficient method to solve this nonlinear equation with given constraints.

To remove inequalities in Problem 2, Karush–Kuhn–Tucker optimization method along with Lagrange penalty factor  $\xi$  is used to reformulate original problem by

Problem 3:

$$\min: f^*(\mathbf{x}) = f(\mathbf{x}) + \xi h^2(\mathbf{x}) \tag{30}$$

with

$$h(\mathbf{x}) = \begin{cases} 0, & lb \le \mathbf{x} \le ub\\ \varphi >> 0, & \text{otherwise} \end{cases}$$
(31)

where  $f^*(\mathbf{x})$  is the objective function by considering boundary conditions, i.e., removing inequalities of  $f(\mathbf{x})$ ;  $\xi$  is the Lagrange penalty factor, which can be determined as the stationary point of  $f(\mathbf{x})$ ; and  $\varphi$  is a weighting parameter.

Conjugate-gradient method iteratively searches for value of **x** that can minimize  $f^*(\mathbf{x})$  along the search gradient vector  $g_k$ , which points to the direction where lies the greatest rate of variation of objective function  $f^*(\mathbf{x})$ . To achieve a converged solution, in each iteration k, the search direction vector  $d_k$ , which indicates the direction where the objective variable has to be varied, moves in a negative gradient direction to

minimize the variation. Thus, a new search direction vector  $d_{k+1}$  can be obtained by linear addition of the previous search direction vector  $d_k$  with the current negative search gradient vector  $g_k$ .

Therefore, the next search direction vector becomes

$$d_{k+1} = -\nabla f^*(\mathbf{x}_k)^T + \frac{g_{k+1}^T g_{k+1}}{g_k^T g_k} d_k; \quad g_k = -\nabla f^*(\mathbf{x}_k) \quad (32)$$

where  $d_{k+1}$  is the search direction vector. Note that  $g_k$  and  $g_k^T$  are search gradient vectors of  $\mathbf{x}_k$ , determined from the slope of search vector. Based on the search direction vector  $d_k$ , and gradient  $g_k$ , the optimal value for step size  $\alpha_k$  is decided to minimize the function  $f^*(\mathbf{x}_k + \alpha_k d_k)$ .

The new vector  $\mathbf{x}$  can be updated based on the previous direction vector  $d_k$  by

$$\mathbf{x}_{k+1} = \mathbf{x}_k + \alpha_k d_k; \alpha_k = \frac{g_k^I g_k}{g_k^T f(\mathbf{x}) g_k}.$$
 (33)

This iterative search for the optimum value of TSV density vector **x** stops when the difference in successive approximations of  $\mathbf{x}_k$  reaches certain threshold. To perform with the faster convergence and avoid the local minimum, the problem is solved by starting with some randomly generated approximation of  $\mathbf{x}_0$ . Once the final value for **x** is reached, then it can satisfy (30) and density constraints based on stress and temperature gradients, leading to the reduction of the clock skew.

### VII. EXPERIMENTAL RESULTS

The electrical analysis of signal TSVs is performed based on (3) and (5), and the mechanical analysis is based on (6) and (9). COMSOL multiphysics simulator [29] is used to verify the results. A four-layer 3-D IC for clock-tree design is constructed with each layer having a thickness of  $40 \,\mu m$  and bottom layer having a thickness of  $200 \,\mu\text{m}$ . They are stacked vertically, resulting in a total height of  $320 \,\mu$ m. The height of TSV is the same as the thickness of layer. The heat sink is placed at the bottom with a distributed thermal conductance of  $1.24 \times 10^5 W/(K \cdot m^2)$ . The IBM clock-tree benchmarks [27] are used for synthesizing 3-D clock-tree design using the method in [11]. Each layer in 3-D IC design is considered to be one Alpha-2 processor, stacked one above other forming four layers. After forming a zero-skew initial clock tree in 3-D IC, the dummy TSVs are inserted and clock skew is calculated with consideration of nonlinear electrical-thermal-mechanical coupling under temperature and stress gradients. The power traces and temperature profiles are generated with input of SPEC2000 [30] by GEM5 [31] and 3-D ACME [32]. The experiments are run on Intel core-i5 2400 CPU with 3.1-GHz clock frequency and 8-GB RAM. All optimization procedures are programmed in C++.

### A. Device Modeling

In this section, device modeling for signal TSV, driver, and impact of dummy TSV insertion are discussed.

1) Signal TSV: Variation of signal TSV capacitance and delay with respect to temperature is discussed in this part. The signal TSV is modeled by considering the nonlinear temperature-dependent capacitance as described in (5). For a 3-D clock tree shown in Fig. 2, we use TSV of height



Fig. 7. (a) Variation of TSV capacitance with temperature. (b) Variation of TSV delay with temperature for different TSV bundles.

40  $\mu$ m, diameter of 15  $\mu$ m with a resistance of 44m $\Omega$  at room temperature. Based on the measured results reported in [8], the values for coefficients of temperature dependent parameters  $\alpha$ ,  $C_0$ ,  $\beta_1$ , and  $\beta_2$  in (1) and (5) are used as 0.00125 K<sup>1</sup>, 88.8 fF, 0.0667 fF/K, and 0.0014 fF/K<sup>2</sup>, respectively. In addition, for reliability consideration, a bundle of TSVs are used for signal distribution instead of one single TSV. TSV bundle is formed by grouping a few number of TSVs, which are named as T2, T4, T8, and T10 to represent 2, 4, 8, and 10 TSVs in each bundle, respectively.

The nonlinear variation of the signal TSV MOSCAP based on (5) is shown in Fig. 7(a). It can be observed from Fig. 7(a), at high temperatures, that the TSV capacitance varies nonlinearly due to the existence of liner material. The same is explained mathematically in (4) and (5). For example, one signal TSV capacitance at room temperature 25 °C is 87 fF, at 75 °C is nearly 93 fF, and at 150 °C is 113 fF, which shows a nonlinear growth.

Experiments with process variations in TSV are carried out by varying its capacitance with maximum of 10%. The according variation in delay of signal TSVs is however less than 3%, and hence is negligible when compared to thermal or mechanical impact.

To obtain pure signal TSV delay, length of input and output 2-D wires to signal TSV is assumed to be as small as possible. An inverter in 22-nm CMOS process is used as buffer with following settings:  $\frac{R_D}{S_D} = 100\Omega$  and  $S_D C_P = S_L C_L = 2 fF$ . The nonlinear effects of temperature on RC-delay at different temperatures for different TSV bundles T2, T4, T8, and T10 are shown in Fig. 7(b). Though the nonlinear temperature-dependent MOSCAP contributes to a significant amount of delay, the use of signal TSV bundles can help in reduction of temperature, thereby reducing the overall skew.

It can be observed from Fig. 7(b) that delay for T8-bundle at 120 °C reaches nearly 100 ps, which is 67% of the half-clock cycle for a 3.3-GHz multiprocessor. For a normal temperature of 75 °C, the delay for TSV T8-bundle is nearly 60 ps; and at the maximum temperature of 200 °C, the delay for T8-bundle reaches nearly 140 ps, which is nearly 46% of a clock cycles of 3.3-GHz multiprocessor. This delay is of serious concern if no cooling is applied.

These results are consistent with the discussion in Section III-A. It can also be observed that if TSV is modeled as the traditional linear coupled model, then the calculated delay will be less. For example, for T10-bundle, at a temperature of nearly 125 °C, the delay with the nonlinear coupled model



Fig. 8. (a) Variation of TSV stress with distance. (b) Variation of TSV stress with temperature and TSV density.

is nearly 130 ps, whereas with the traditional linear coupled model, the delay is 120 ps. This difference can bring a big impact on 3-D clock-tree designs.

2) *Driver:* TSV can exert stress with impact on the mobility and delay of driver. The impact of mechanical stress is mainly on the devices that are on substrate. This mechanical stress affects the driver resistance by enhancing its mobility. In the following, the impacts of thermal-mechanical and electrical-mechanical effects on driver are presented, respectively.

a) Thermal–mechanical impact: The mechanical stress from TSV is caused by the difference of CTEs between TSV and substrate. Different layers of 3-D IC have different temperatures, and hence, TSVs and substrate will be at different temperature, resulting in stress gradient. In this paper, all TSVs are considered to have a diameter of  $15 \,\mu$ m and a density of  $400/\text{mm}^2$ . The exerted mechanical stress from TSV on device is given by (6).

By considering the stress from all TSVs, the exerted mechanical stress on a driver that is placed inside a square surrounded by TSVs is shown in Fig. 8(a). It can be observed that the stress and mobility remains nearly uniform out of a particular distance, which defines the keep-out zone; but for area inside, there is a significant variation in stress and mobility observed. In our experiment, a keep-out zone of  $3 \,\mu m$ is considered.

The coupled impact of TSV density and temperature gradient on stress is shown in Fig. 8(b). When temperature gradient increases and at high TSV density, the amount of exerted stress on the substrate will be high. Thus, to reduce TSV stress on substrate, temperature gradient has to be reduced as well. What is more, the amount of stress can be varied when the TSV density is different. Let us consider a temperature gradient of 150 °C, the stress at a TSV density of 200/mm<sup>2</sup> is 28.93 MPa, and the stress at a TSV density of 400/mm<sup>2</sup> is 57.87 MPa, indicating stress has an impact from temperature gradient as well as TSV density.

b) *Electrical-mechanical impact:* The electricalmechanical impact on the driver is studied in this part. As the amount of exerted mechanical stress varies, the deformation in the lattice structure also varies, resulting in variation of carrier mobility.

For the purpose of illustration, a single TSV having a diameter of 15  $\mu$ m is considered as the source of stress, the variation in mobility, and delay due to the exerted stress with different distance is discussed here. Considering a keep-out zone of 3  $\mu$ m, inside which variation of mobility and delay with distance for a 22-nm metal gate pMOS and a nMOS



Fig. 9. (a) Variation of nMOS/pMOS carrier mobility with distance under TSV stress. (b) Variation of driver delay with distance under TSV stress.



Fig. 10. Temperature gradient reduction in four-layer 3-D clock tree with dummy TSVs under different power densities.

placed on substrate is shown in Fig. 9(a). The delay of one according driver is shown in Fig. 9(b). Note that all simulations are carried out in a SPICE simulator [33].

Moreover, one can have the following observations from Fig. 9(a). The amount of stress exerted varies with the distance. In addition, with same amount of stress, the variation in hole mobility is higher than electron mobility. For example, for a distance of  $2 \mu m$ , there is a variation of nearly 4% and 1.8% in mobilities of holes and electrons, respectively.

What is more, the variation of delay of the driver in the presence of mechanical stress is shown in Fig. 9(b). It can be observed that as the distance increases, the variation in delay decreases as the exerted stress decreases with distance. There is a decrease of just 1.6% delay at a distance of 4  $\mu$ m, whereas nearly 3.7% at a distance of 1  $\mu$ m. So if the keep-out zone is increased, there will be less stress experience with small variation of mobility and delay.

3) *Dummy TSV:* The reduction of temperature and stress gradient with the insertion of dummy TSVs is presented in this section.

a) *Temperature gradient reduction:* In this paper, each layer is provided with the same power density of *P*, which serves as the heat source. The study is performed for different values with  $P = \{6, 80, 115\}W/m^2$ . In the experiment, temperature at each layer is collected without dummy TSVs initially. Note that the liner material of dummy TSV is Si<sub>3</sub>N<sub>4</sub>, of which the thickness is 200 nm. Based on the formed temperature distribution, dummy TSVs are inserted with density upper and lower bounds discussed in Section VI-C.

The temperature reduction in each layer with the insertion of dummy TSVs is shown in Fig. 10. It can be observed that



Fig. 11. (a) Setup of TSV distributions for stress gradient calculation. (b) Variation of stress gradient reduction with TSV density and temperature gradient.



Fig. 12. Signal delay of one single TSV under different technologies.

the reduction in temperature initially increases but tends to saturate as a particular limit is reached. This is consistent with the discussion in Section IV-A. It can also be observed that reduction in temperature for the bottom layer, i.e., layer three is less than other layers. As it is close to heat sink, insertion of dummy TSV does not make a big difference compared to other layers. In addition, the maximum inserted dummy density observed is 400/mm<sup>2</sup> with the maximum temperature reduction of nearly 50 °C in layer 0, i.e., the top layer.

b) Stress gradient reduction: Note that stress and stress gradient depends on both TSV density and temperature gradient. Stress gradient at different temperature gradients and TSV densities can be found in Fig. 11. As shown in Fig. 11(a), a block B having a constant TSV density of  $400/\text{mm}^2$  is considered. Another block A has a temperature gradient of 180 °C and its TSV density can be varied. The stress gradient between the two blocks is shown in Fig. 11(b). It can be observed that as TSV density is increased, the stress gradient tends to decrease. The stress gradient reduction for the same setup but with a temperature gradient of 250 °C is also plotted.

### B. Delay With Electrical–Thermal–Mechanical Coupling

The delay of clock tree by considering nonlinear electricalthermal-mechanical coupling is presented in this section. Considering a zero-skew 3-D clock tree shown in Fig. 6(b), the variation of delay on a single signal TSV with drivers in different technologies is presented in Fig. 12.

In Fig. 12, delay values are shown for a single TSV of  $15 \,\mu\text{m}$  diameter having buffers at both ends placed at  $3 \,\mu\text{m}$  distance from TSV [see Fig. 6(a)]. One can observe that one signal TSV adds up the delay by nearly four times, which indicates that the TSV delay is comparable to the minimum sized driver delay. As signal TSV is modeled as a nonlinear MOSCAP with nonlinear temperature-dependent, the delay increases with temperature significantly when electrical-thermal



Fig. 13. Distribution of clock-skew reduction before and after insertion of dummy TSVs.

coupling is considered. It can be observed that the delay can increase by nearly 15% for all technologies when temperature is increased at 200 °C. Then, with the electrical–mechanical coupling considered, the delay with stress gradient at a reference temperature of 75 °C is calculated, which is found to be of 9% lower than the delay without insertion of TSV as stress enhances mobility. By considering all these effects, there is approximately 10% delay variation introduced at 200 °C.

## C. Skew Reduction by Nonlinear Optimization

The nonlinear optimization of clock-skew reduction for 3-D clock-tree design by the insertion of dummy TSVs is presented in this section. With the help of 3-D ACME, a 3-D IC thermal simulator based on Hotspot [34], temperature distribution at each layer can be obtained. Moreover, average temperature based on SPEC2000 benchmarks are taken to avoid application specific temperature distribution [4]. To reduce the area overhead caused by dummy TSVs, the dummy TSV insertion density is limited to 7% of the total area. Note that all benchmarks are presynthesized to fourtier 3-D clock tree by [11] considering the wire length and buffer.

As clock skew is the difference in delay between two sinks, the variation of clock skew with TSVs of one 3-D H-tree is shown in Fig. 13. It can be observed from Fig. 13 that the initial clock skew without insertion of dummy TSVs is higher than the clock skew after insertion of dummy TSVs. For example, for a row grid 16 and column grid 24, having dummy TSV inserted results in a clock-skew reduction of nearly 17 ps.

For the same 3-D H-tree example discussed previously, Fig. 14 shows the clock skew for different TSV bundles presented. Here, orig represents the value of clock skew without insertion of dummy TSVs. It can be observed that by modeling electrical–thermal and electrical–mechanical impacts during clock-skew reduction, one can observe the clock-skew reduction of 51–53%. With the consideration of all the impacts, i.e., electrical–thermal–mechanical-coupled model, there can be a reduction of nearly 64% in the overall clock skew, which is desired.

Next, a 3-D clock tree from the IBM benchmark r5 is studied. The 3-D clock tree after insertion of dummy TSVs is shown in Fig. 15 with TSVs in each layer indicated by solid dots. Dummy TSV insertion is performed under nonlinear



Fig. 14. Comparison of clock-skew reduction with different coupled modelings.



Fig. 15. 3-D clock tree after insertion of dummy TSV (black dots) with balanced clock skews for (a) Tier 0, (b) Tier 1, (c) Tier 2, and (d) Tier 3.

optimization presented in Section VI such that clock skew is reduced under an optimized dummy TSV insertion. One can observe that a large number of TSVs are inserted in the top layer, i.e., tier 0 compared to other layers since the top layer is farthest one from the heat sink.

Finally, the comparison of clock skew for different benchmarks before and after insertion of dummy TSVs is shown in Table VI with a detailed summary, which shows the impact of 3-D electrical-thermal-mechanical-coupled delay model and also the insertion of dummy TSVs to reduce gradient. Clock-skew values are reported in picoseconds. The runtime is in seconds when performing optimization of insertion of dummy TSVs based on linear and nonlinear modeling. Different clock-tree benchmarks and their corresponding numbers of buffers and TSVs are presented in Table VI. The delay models with consideration of nonlinear electrical-thermalmechanical-thermal impacts result in clock-skew reduction by 61.3% on an average, listed under nonlin column, compared to clock skew without the insertion of dummy TSVs, listed under orig column. Note that the reduced clock skew by linear modeling is listed under lin column with 49.1% clockskew reduction compared to orig. The runtime on average for nonlinear optimization is 611 s and is 238 s for linear

### TABLE VI

3-D CLOCK-SKEW REDUCTION BY LINEAR AND NONLINEAR DELAY MODELS

| Tuna      |                        | 1      | HTree1 (14 | signal TSVs | & 63 buff       | ers)             |           |
|-----------|------------------------|--------|------------|-------------|-----------------|------------------|-----------|
| Type      | Orig                   | Lin    | Impr%      | Time (s)    | Nonlin          | Impr%            | Time(s)   |
| T2        | 15.39                  | 9.54   | 38.01%     | 16.98       | 2.45            | 84.08%           | 76.15     |
| T4        | 26.62                  | 8.07   | 69.68%     | 16.85       | 3.92            | 85.27%           | 77.11     |
| T8        | 47.41                  | 11.38  | 76.00%     | 17.05       | 7.03            | 85.17%           | 77.19     |
| T10       | 58.64                  | 14.8   | 74.76%     | 17.22       | 10.02           | 82.91%           | 77.36     |
| Mean      | -                      | -      | 64.61%     | 17.02       | -               | 84.36%           | 76.95     |
| Type      |                        | ł      | Tree2 (28  | Signal TSVs | & 64 buff       | fers)            |           |
| Type      | Orig                   | Lin    | Impr%      | Time (s)    | Nonlin          | Impr%            | Time(s)   |
| T2        | 23.55                  | 8.15   | 65.39%     | 17.41       | 3.31            | 85.94%           | 79.02     |
| T4        | 44.13                  | 11.61  | 73.69%     | 17.51       | 4.81            | 89.10%           | 79.59     |
| T8        | 82.00                  | 14.60  | 82.20%     | 17.40       | 9.00            | 89.02%           | 79.98     |
| T10       | 103.70                 | 15.50  | 82.80%     | 17.49       | 9.51            | 90.83%           | 79.32     |
| Mean      | -                      | -      | 76.58%     | 17.45       | -               | 88.72%           | 79.48     |
| Turne     |                        |        | r1 (45 Sig | nal TSVs &  | 202 buffer      | s)               |           |
| Type      | Orig                   | Lin    | Impr%      | Time (s)    | Nonlin          | Impr%            | Time(s)   |
| T2        | 30.50                  | 17.48  | 42.69%     | 49.80       | 14.46           | 52.59%           | 127.29    |
| T4        | 61.87                  | 32.66  | 47.21%     | 35.16       | 25.55           | 58.71%           | 164.01    |
| T8        | 121.10                 | 66.14  | 45.39%     | 38.28       | 53.18           | 56.08%           | 182.71    |
| T10       | 152.7                  | 83.56  | 45.28%     | 44.64       | 69.17           | 54.70%           | 197.49    |
| Mean      |                        |        | 45.14%     | 42.00       |                 | 55.52%           | 167.88    |
|           |                        |        | r2 (60 Sig | nal TSVs &  | 365 buffer      | ( <i>e</i>       |           |
| Туре      | Orig                   | Lin    | Impr%      | Time (s)    | Nonlin          | Impr%            | Time(s)   |
| T2        | 35.13                  | 23.48  | 33.16%     | 160.80      | 19.33           | 44.97%           | 475.36    |
| T4        | 69.75                  | 46.28  | 33.65%     | 123.36      | 34.17           | 51.00%           | 485.19    |
| T8        | 134.70                 | 86.89  | 35.49%     | 127.80      | 71.01           | 47.28%           | 490.58    |
| T10       | 169.00                 | 118.89 | 29.65%     | 167.16      | 88.08           | 47.20%           | 513.43    |
| Mean      | 107.00                 |        | 32.99%     | 144.84      |                 | 55.52%           | 491.14    |
| mean      |                        | 1      | r3 (75 Sie | mal TSVs &  | 1<br>515 buffer | (20              | 121.11    |
| Type      | Orio                   | Lin    | Impr%      | Time (s)    | Nonlin          | Impr%            | Time(s)   |
| T2        | 32.36                  | 20.19  | 37.60%     | 264.84      | 10.21           | 40.63%           | 751 53    |
| T4        | 64.80                  | 30.07  | 38 32%     | 204.04      | 31.17           | 51.89%           | 682.90    |
| T8        | 125.60                 | 79.86  | 36.41%     | 213.36      | 60.71           | 51.67%           | 658.80    |
| T10       | 157.70                 | 05.47  | 30.467     | 277.20      | 82.80           | 47.50%           | 708.30    |
| Mean      | 137.70                 | 9.5.47 | 37.40%     | 240.12      | 02.00           | 47.92%           | 708.39    |
| mean      | 57.95%40.1247.92%00.40 |        |            |             |                 |                  |           |
| Type      | Orig                   | Lin    | 14 (90 Sig | Time (s)    | Nonlin          | S)               | Time(s)   |
| Т2        | 31.68                  | 17.07  | 43.28%     | 254.16      | 16.53           | 47.83%           | 012.30    |
| T4        | 64.57                  | 36.21  | 43.020%    | 278.64      | 26.13           | 50 53%           | 844.46    |
| 14<br>T9  | 126.80                 | 70.61  | 43.32 %    | 278.04      | 64.50           | 40.120           | 808.21    |
| 10<br>T10 | 120.80                 | 20 01  | 44.51%     | 200.00      | 76.92           | 49.15%<br>51.02% | 000.25    |
| Mean      | 139.00                 | 00.01  | 44.42%     | 393.12      | 70.85           | 52.10%           | 801.08    |
| Mean      | -                      | -      | 45.95%     | 301.44      | -               | 32.10%           | 891.08    |
| Type      | Oria                   | Lin    | r5 (90 Sig | nai TSVs &  | 14/9 buffe      | (S)              | Time(c)   |
| T2        | 35.00                  | 21.00  | 40.0002    | 708.72      | 18.00           | 46.00%           | 1002.60   |
| 12<br>T4  | 68.40                  | 21.00  | 40.00%     | 196.12      | 28.50           | 40.00%           | 1992.00   |
| 14        | 08.40                  | 38.40  | 43.80%     | 834.00      | 28.50           | 51.20%           | 1843.77   |
| 18        | 131.00                 | 74.00  | 45.05%     | 870.24      | 03.80           | 51.30%           | 1/05.82   |
| 110       | 164.10                 | 93.40  | 43.08%     | 1126.20     | 79.43           | 51.00%           | 19.54.662 |
| Mean      | -                      | -      | 42.08%     | 907.32      | -               | 51.81%           | 1869.21   |
| Overall   | - 1                    | -      | 49.10%     |             | - 1             | 61.30%           | -         |

optimization. Although it seems to be time consuming for the nonlinear method, the effective reduction in clock skew is 12.2% larger in the nonlinear method than in the linear method.

## VIII. CONCLUSION

There is an emerging need for robust 3-D IC design when using TSVs. The signal TSVs utilized for interlayer signal connections act as MOSCAPs varying nonlinearly with temperature due to liner material. What is more, TSVs exert stress on drivers which modify mobility and delay. Therefore, the nonuniform temperature and stress introduce large delay variation. In this paper, nonlinear electrical-thermalmechanical delay model was developed. Moreover, insertion of dummy TSVs was utilized to balance temperature and stress gradients for 3-D clock tree with clock-skew reduction. A nonlinear programming problem was formulated to determine the optimum dummy TSV density guided by sensitivity of clock-skew reduction. A number of 3-D clock-tree benchmarks were used to verify the model and also the optimization. The results showed a reduction of clock skew by 61.3% on average when nonlinear electrical-thermal-mechanical delay model was applied.

#### REFERENCES

- J. Cong and Y. Zhang, "Thermal-driven multilevel routing for 3D-ICs," in *Proc. IEEE/ACM Asia South Pacific Design Autom. Conf.*, Jan. 2005, pp. 121–126.
- [2] B. Goplen and S. Sapatnekar, "Thermal via placement for 3D ICs," in Proc. IEEE/ACM Int. Symp. Phys. Design, Apr. 2005, pp. 167–174.
- [3] Y. Xie, G. H. Loh, B. Black, and K. Bernstein, "Design space exploration for 3D architectures," ACM J. Emerging Technol. Comput. Syst., vol. 2, no. 2, pp. 65–103, Apr 2006.
- [4] J. Minz, X. Zhao, and S. K. Lim, "Buffered clock tree synthesis for 3D ICs under thermal variations," in *Proc. IEEE/ACM Asia South Pacific Design Autom. Conf.*, Mar. 2008, pp. 504–509.
- [5] H. Yu, Y. Shi, L. He, and T. Karnik, "Thermal via allocation for 3D ICs considering temporally and spatially variant thermal power," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 16, no. 12, pp. 1609–1619, Dec. 2008.
- [6] T. Bandyopadhyay, R. Chatterjee, D. Chung, M. Swaminathan, and R. Tummala, "Electrical modeling of through silicon and package vias," in *Proc. IEEE Int. Conf. Syst. Integr.*, Sep. 2009, pp. 1–8.
- [7] H. Yu, J. Ho, and L. He, "Allocating power ground vias in 3D ICs for simultaneous power and thermal integrity," ACM Trans. Design Autom. Electron. Syst., vol. 14, pp. 41:1–41:31, Jun. 2009.
- [8] G. Katti, A. Mercha, M. Stucchi, Z. Tokei, D. Velenis, J. Van Olmen, C. Huyghebaert, A. Jourdain, M. Rakowski, I. Debusschere, P. Soussan, H. Oprins, W. Dehaene, K. De Meyer, Y. Travaly, E. Beyne, S. Biesemans, and B. Swinnen, "Temperature dependent electrical characteristics of through-Si-via (TSV) interconnections," in *Proc. IEEE Int. Interconnect Technol. Conf.*, Jun. 2010, pp. 1–3.
- [9] J. Long, J. C. Ku, S. O. Memik, and Y. Ismail, "SACTA: A selfadjusting clock tree architecture for adapting to thermal-induced delay variation," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 18, no. 9, pp. 1323–1336, Sep. 2010.
- [10] J.-S. Yang, K. Athikulwongse, Y.-J. Lee, S. K. Kim, and D. Z. Pan, "TSV stress aware timing analysis with applications to 3D-IC layout optimization," in *Proc. 47th ACM/IEEE Design Autom. Conf.*, Jun. 2010, pp. 803–806.
  [11] X. Zhao, J. Minz, and S. K. Lim, "Low-power and reliable clock
- [11] X. Zhao, J. Minz, and S. K. Lim, "Low-power and reliable clock network design for through-silicon-via (TSV) based 3D ICs," *IEEE Trans. Compon., Packag. Manuf. Technol.*, vol. 1, no. 2, pp. 247–259, Feb. 2011.
- [12] D. F. Lim, K. C. Leong, and C. S. Tan, "Selection of underfill material in Cu hybrid bonding and its effect on the transistor keep-out-zone," in *Proc. IEEE 3D Syst. Integr. Conf.*, Jan.–Feb. 2012, pp. 1–4.
- [13] K. Ghosh, J. Zhang, L. Zhang, Y. Dong, H. Y. Li, C. M. Tan, G. Xia, and C. S. Tan, "Strategy for TSV scaling with consideration on thermo-mechanical stress and acceptable delay," in *Proc. IEEE Microsyst., Packag., Assem. Circuits Technol. Conf.*, Oct. 2012, pp. 49–51.
- [14] X. Huang, C. Zhang, H. Yu, and W. Zhang, "A nanoelectromechanicalswitch based thermal management for 3-D integrated many-core memory-processor system," *IEEE Trans. Nanotechnol.*, vol. 11, no. 3, pp. 588–600, May 2012.
- [15] X.-W. Shih and Y.-W. Chang, "Fast timing-model independent buffered clock-tree synthesis," in *Proc. 47th ACM/IEEE Design Autom. Conf.*, Jun. 2010, pp. 80–85.
- [16] D. Lee and I. Markov, "Contango: Integrated optimization of SoC clock networks," in *Proc. Design Autom. Test Eur. Conf. Exhib.*, Mar. 2010, pp. 1468–1473.
- [17] T.-Y. Kim and T. Kim, "Clock tree embedding for 3D ICs," in Proc. IEEE/ACM Asia South Pacific Design Autom. Conf., Jan. 2010, pp. 486–491.
- [18] T. Mittal and C.-K. Koh, "Cross link insertion for improving tolerance to variations in clock network synthesis," in *Proc. IEEE/ACM Int. Symp. Phys. Design*, Mar. 2011, pp. 29–36.
- [19] T.-H. Chao, Y.-C. Hsu, J.-M. Ho, and A. B. Kahng, "Zero skew clock routing with minimum wirelength," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 39, no. 11, pp. 799–814, Nov. 1992.
- [20] R. S. Tsay, "An exact zero-skew clock routing algorithm," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 12, no. 2, pp. 242–249, Feb. 1993.
- [21] J. Cong, A. B. Kahng, C.-K. Koh, and C.-W. Albert Tsao, "Boundedskew clock and Steiner routing," ACM Trans. Design Autom. Electron. Syst., vol. 3, no. 3, pp. 341–388, Jul. 1998.
- [22] M. Cho, S. Ahmedtt, and D. Z. Pan, "TACO: Temperature aware clocktree optimization," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design*, Nov. 2005, pp. 582–587.
- [23] H. Yu, Y. Hu, C. Liu, and L. He, "Minimal skew clock embedding considering time variant temperature gradient," in *Proc. Int. Symp. Phys. Design*, Mar. 2007, pp. 173–180.

- [24] H. Wang, H. Yu, and S. X.-D. Tan, "Fast timing analysis of clock networks considering environmental uncertainty," *Integr., VLSI J.*, vol. 45, no. 4, pp. 376–387, Sep. 2012.
- [25] Y. Shang, C. Zhang, H. Yu, C. S. Tan, X. Zhao, and S. K. Lim, "Thermalreliable 3D clock-tree synthesis considering nonlinear electrical-thermalcoupled TSV model," in *Proc. ACM/IEEE Asia South Pacific Design Autom. Conf.*, Jan. 2013, pp. 693–698.
- [26] S. Basir-Kazeruni, H. Yu, F. Gong, Y. Hu, C. Liu, and L. He, "SPECO: Stochastic perturbation based clock tree optimization considering temperature uncertainity," *Integr. VLSI J.*, vol. 46, no. 1, pp. 22–32, Jan. 2013.
- [27] IBM clock tree benchmarks [Online]. Available: http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/BST/
- [28] D. Luenberger and Y. Ye, *Linear and Nonlinear Programming*, vol. 116. Berlin, Germany: Springer, 2008.
- [29] COMSOL multiphysics simulation tool [Online]. Available: http://www. comsol.com/products/heat-transfer/
- [30] SPEC 2000 CPU benchmark suits [Online]. Available: http://www. spec.org/cpu/
- [31] GEM5, multicore system simulator [Online]. Available: http://gem5. org/Main\_Page
- [32] 3D-ACME, 3D-IC steady state temperature simulator [Online]. Available: http://www.3dacme.allalla.com/
- [33] NVM SPICE [Online]. Available: http://www.nvmspice.org
- [34] Hotspot [Online]. Available: http://lava.cs.virginia.edu/hotspot/



Sai Manoj P. D. (S'13) received the B.Tech. degree from Jawaharlal Nehru Technological University, Anantapur, India, in 2010, and the M.Tech. degree from the International Institute of Information Technology Bangalore, Bengaluru, India, in 2012. He is currently pursuing the Ph.D. degree with the School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore.

In 2002, he joined Nanyang Technological University. His current research interests include 3-D ICs, network on chips, and low-power system design.

Mr. Manoj P. D. is the recipient of A. Richard Newton Young Research Fellow Award at the Design Automation Conference 2013.



**Hao Yu** (M'06) received the B.S. degree from Fudan University, Shanghai, China, and the Ph.D. degree from the Department of Electrical Engineering, University of California at Los Angeles, Los Angeles, CA, USA.

He was a Senior Research Staff at Berkeley Design Automation. Since October 2009, he has been an Assistant Professor with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. He has 94 peer-reviewed IEEE/ACM publications. His current research inter-IEEE/ACM publications. His current research inter-

ests include 3-D IC and RF-IC at nano-tera scale.

Dr. Yu received the Best Paper Award from the ACM Transactions on Design Automation of Electronic Systems in 2010, the Best Paper Award nominations at Design Automation Conference in 2006, the IEEE/ACM International Conference on Computer-Aided Design in 2006, and 17th Asia and South Pacific Design Automation Conference in 2012, the Best Student Paper (advisor) Finalist at Silicon Monolithic Integrated Circuits in RF Systems in 2013, and the Inventor Award from Semiconductor Research Cooperation in 2008. He is an Associate Editor and technical program committee member for a number of journals and conferences.



Yang Shang (S'11) received the B.S. and M.S. degrees in electrical and electronic engineering from Nanyang Technological University, Singapore, in 2005 and 2009, respectively, where he is currently pursuing the Ph.D. degree with the School of Electrical and Electronics Engineering.

His current research interests include nonvolatile memory device model and simulation and metamaterial-based 60-GHz beyond phase-arrayed receiver design.



**Chuan Seng Tan** (S'00–M'07) received the B.Eng. degree in electrical engineering from the University of Malaya, Kuala Lumpur, Malaysia, in 1999, the M.Eng. degree in advanced materials from the National University of Singapore, Singapore, under the Singapore–Massachusetts Institute of Technology (MIT) Alliance Program, in 2001, and the Ph.D. degree in electrical engineering from MIT, Cambridge, MA, USA, in 2006.

In 2006, he joined Nanyang Technological University, Singapore, as a Lee Kuan Yew Post-Doctoral

fellow, where since July 2008, he has been a holder of the inaugural Nanyang Assistant Professorship. He is currently involved in research on process technology of 3-D ICs. His current research interests include semiconductor process technology and device physics.

Dr. Tan was the recipient of the Applied Materials Graduate Fellowship during 2003–2005.



**Sung Kyu Lim** (S'94–M'00–SM'05) received the B.S., M.S., and Ph.D. degrees from the Department of Computer Science, University of California at Los Angeles, Los Angeles, CA, USA, in 1994, 1997, and 2000, respectively.

In 2001, he joined the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, where he is currently an Associate Professor. His current research interests include the architecture, circuit, and physical design for 3-D ICs and 3-D system in packages.

Dr. Lim received the Design Automation Conference Graduate Scholarship in 2003 and the National Science Foundation Faculty Early Career Development (CAREER) Award in 2006. He was on the Advisory Board of the ACM Special Interest Group on Design Automation (SIGDA) from 2003 to 2008 and received the ACM SIGDA Distinguished Service Award in 2008.