# Q-Learning-Based Voltage-Swing Tuning and Compensation for 2.5-D Memory-Logic Integration

**Dongjun Xu and Ningmei Yu** Xi'an University of Technology

Hantao Huang Nanyang Technological University Sai Manoj P. D.

Technische Universitat wien institut fur Computertechnik

Hao Yu

Southern University of Science and Technology

of 2.5-D integration by through silicon inter-

poser (TSI) [4] is one

promising solution by

integrating multiple dies

on one common sub-

strate. It has shown good thermal dissipation as

#### Editor's note:

In this paper, an efficient I/Q management with Q-learning-based transmitter swing adjustment and receiver compensation is developed for energyefficient 2.5-D memory-logic integration. The proposed approach is able to achieve significant power reduction over other state-of-the-art methods. *—Xin Li, Carnegie Mellon University* 

**DATA-ORIENTED COMPUTING SYSTEMS** involve huge chip-to-chip communication and bandwidth, therefore, it is an emerging need to develop high data-rate and low power input/output (I/O) communication circuits [1], [2]. The 3-D integration by through silicon via (TSV) [1] can significantly improve memory-logic I/O communication bandwidth, and overcome the limitations of conventional 2-D wire-line communication, such as large trace latency and poor signal-to-noise ratio. However, the 3-D integration by TSVs has poor heat removal capability and the resultant temperature rise due to the high power density [3]. Recent development

Digital Object Identifier 10.1109/MDAT.2017.2764075 Date of publication: 17 October 2017; date of current version: 23 March 2018. well as high bandwidth and low power when realized as transmission line (T-line) underneath the substrate.

However, a large and constant output-voltage swing consumes high I/O communication power. In order to leverage the tradeoff between the power reduction and the necessary bit error rate (BER) requirement, various techniques have been developed to compensate signal loss in high-speed serial I/O links in traditional 2-D wire-line communication, including transmit pre-emphasis [5], receiver (Rx) equalization [6], and so on. However, additional power consumption occurs and dominates the whole circuit currents with any of the above mentioned techniques. For 2.5-D memory-logic integration, a Q-Learning-based I/O management is applied to adjust the level of output-voltage swing

### General Interest

at transmitter (Tx) of 2.5-D TSI I/O [7], which can achieve a reduced power under specified BER requirement. But too little states decrease management quality, by contrast, too many states increase the control consumption in state transition and also slow the Q-learning convergence [8].

In this paper, one Q-learning-based I/O management is applied to adjust the level of output-voltage swing at Tx of 2.5-D TSI I/Os such that one can achieve a reduced power under specified BER requirement. To reduce state transition and improve quality of power management, a simple adaptive compensation circuit is employed to compensate transmission loss at Rx. One corresponding 2.5-D TSI I/O is designed in 65-nm CMOS process for multilevel output-voltage swing with balanced power and BER. The proposed algorithm is carried out in Matlab. Simulation results show that the adaptive 2.5-D TSI I/O circuit can reduce the communication power, and achieve 14% energy efficiency improvement compared to the traditional I/O communication with the constant output-voltage swing.

The remainder of this paper is organized as follows. First, we describe the memory-logic integration architecture by 2.5-D integration with an adaptive I/O management and the according problem formulation in "2.5-D TSI I/O communication." In "2.5-D TSI I/O circuit design," the circuit blocks of 2.5-D TSI I/Os is presented. In "Q-learning based adaptive tuning with compensation," the adaptive I/O management by the Q-learning and compensation



Figure 1. (a) PCB-based traditional 2-D interconnection. (b) TSI T-line-based 2.5-D memory-logic integration.

mechanism is presented. The "experimental results" are shown in the fifth section and the final section concludes the paper.

# 2.5-D TSI I/O communication

Memory-logic integration by 2.5-D TSI I/O

The printed circuit board (PCB)-based 2-D interconnection is shown in Figure 1a, which has been widely used for communication between chips. However, PCB-based electrical interconnection has not been able to keep up with increasing I/O bandwidth and performance demand. Bandwidth density is mainly restricted by inherent limit of the minimum flip-chip bump diameter (~100  $\mu$ m) and channel pitch (~500  $\mu$ m) achievable on the PCB. Meanwhile, long trace and nonideal vias cause significant channel loss along with raising of channel frequency which need lots of equalization circuits, more power to drive, and area to layout, all of these give rise to more power dissipation and area occupation [9]. The requirement to increase data transfer speed while preserving signal integrity between chips and keeping power consumption small has moved the ICs to 2.5-D and 3-D technology. Figure 1b shows the 2.5-D TSI-based interconnection for memory-logic integration, which enables high-bandwidth and low-energy communication between chips. Relative to the traditional backplane-based interconnects, 2.5-D TSIs get higher channel density with microbumps (~10  $\mu$ m) and less routing overhead with shorter trace (a few mm). Besides, 2.5-D TSI has less channel loss and power dissipation than traditional 2-D interconnection under the same frequency and to perform similar interconnect. Therefore, the 2.5-D TSI T-linebased integration is selected for the memory-logic integration.

#### Self-adaptive I/O management

The basic 2.5-D TSI I/O is comprised of Tx and Rx to enable a full duplex communication. To further reduce the I/O communication power between core logic and memory blocks, we propose a selfadaptive design with tuning of the output-voltage swing and compensating of received signal based on output of I/O controller blocks, as shown in Figure 2. By adjusting the I/O output-voltage swing, I/O communication power can be reduced with improved energy efficiency compared to the previous designs [5] that utilizes the fixed full outputvoltage swing. However, the BER increases when the output-voltage swing decreases. Hence, a tradeoff needs to be maintained between the I/O communication power and BER, which requires an optimized on-line management.

As mentioned previously, a Q-Learning-based I/O management is applied to balance power dissipation and communication BER [7]. To reduce state transition of output-voltage swing and improve quality of Q-Learning-based I/O power management, a signal compensation decision mechanism is deployed as shown in I/O controller block of Figure 2. Instead of increasing the output-voltage swing at Tx, the compensation circuit can be enabled for strengthening the input signals of Rx. Two additional states are set to the same output-voltage level with two least states of basic Q-learning management, and compensation circuit can be optionally activated to achieve same communication power but smaller BER. And compensation mechanism is assigned a higher priority than tuning output-voltage swing. Besides, samples training is done off-line to form state-action lookup table (LUT) and adaptive controlling is performed on-line. In this way, one can decrease the BER with the same low output-voltage swing and refine the power management states. However, compensated input signal causes a slight increase in sample power, and too many added power states may also affect the management process, so an appropriate compensation circuit and corresponding control flow are introduced in the design. Detailed description of Tx, Rx, tuning, and compensating circuits is presented in "2.5-D TSI I/O circuit design," then the control flow for adaptive tuning and compensating is presented in "Q-learning based adaptive tuning with compensation."

BER increases with the decrease of I/O communication power. Hence, one needs to find an optimal output-voltage swing and compensation mechanism for balancing the I/O communication power and BER simultaneously, which can be defined as the following problem.

Problem: Tune the output-voltage swing and control the compensation circuit to achieve low power at the cost of BER based on the I/O communication channel characteristics

 $Opt. :< P_i, BER_i >$   $S.T. (i) P_i \le P_T$   $(ii) BER_i \le BER_T$ (1)



Figure 2. Adaptive I/O design with output-voltage swing tune and received signal compensation.

where  $P_i$  and  $BER_i$  denote the I/O communication power and BER under the *i*th output-voltage swing level  $V_i$ . Note that the BER and power are both functions of the output-voltage swing.  $P_T$  and  $BER_T$  represent the targeted I/O communication power and BER of one TSI I/O channel under the normal operation. With the increase in output-voltage swing, I/O communication power increases with reduced BER and vice-versa. On the other hand, the compensation circuit can be enabled to reduce the BER with the same output-voltage swing. As such, the output-voltage swing level  $V_i$  and compensation mechanism need to be adaptively controlled for optimizing the I/O communication power and BER simultaneously.

# 2.5-D TSI I/O circuit design

Adaptive transmitter and receiver

Tx employs a 8:1 serializer to convert 8-bit parallel data into serial data, as shown in Figure 3a(i). A current-mode logic (CML) output driver is used to drive the TSI T-line from Tx to Rx on the common substrate. The CML output stage is powered by the fixed supply (1.2 V). The I/O communication power *Pw* depends on the output-voltage swing and the tail current of the driver. For example, one can generate control bits switching ( $I_1I_2I_3$ ) to tune the tail current of the CML driver and alter the output-voltage swing  $V_t$ .

As mentioned above, the signal loss is small in the TSI T-line channel and it does not need a complex equalizer circuits at Rx. However, the BER increases while signals are transmitted with low





voltage swing at Tx. In order to improve the performance of power management, a simple configurable compensation circuit is deployed to strengthen the received signal, as shown in Figure 3a(ii). The current-source  $I_{0}$  are connected when compensation mechanism is activated, thereby bringing higher output impedance and comparing precision for comparator. As a result, the signals (*OP* and *ON*) are enhanced and converted from current-mode into digital levels, which is also used to isolate additional power dissipation of compensation. Then, this data is processed in the digital domain, to save power compared to analog demultiplexer-based implementation. A delay-locked loop-based clockdata recovery at Rx is implemented to deskew the sampling clocks.

Tuning and compensating circuit

Based on the calculated BER from the error correcting code (ECC), the feedback signals are sent to the I/O controller at Rx. Then, the I/O controller generates the corresponding control bits. Part of control bit is sent to Tx and used to control the digital to analog converter (DAC) current at the tail of CML buffer driving the TSI T-line. Thus, the output-voltage swing is tuned by varying the tail current of CML buffer. As shown in Figure 3a(i), the CML driver with variable current source is set by the DAC current and load resistor. The DAC tail current source is composed of a group of current sources in parallel with switches controlled by the control bits generated from the I/O controller. When the driver tail current is varied, the output-voltage swing will change. Generally, the load resistor is set to  $50 \Omega$  for the TSI T-line impedance matching. In this paper, tail current source is varied from 2 to 5 mA.

At Rx, the rest of control bits are used to enable the signal compensation circuit. By comparing detected BER and BER threshold in current state of I/O management guided by the Q-learning, the compensation enable signal is activated when the BER exceeds the threshold. As shown in Figure 3a(ii), variable current sources with switches are controlled by the compensation enable signal. The input differential signal pair will be enhanced about 1.3x when the compensation mechanism is activated. The LUT data can be formed offline sample training and implemented in the hardware with multiple AND/ OR partial matching logic circuit instead of read only memory (ROM). This LUT-based implementation has higher speed and low power consumption compared to the ROM.

# Q-learning-based adaptive tuning with compensation

In this section, we will first present the basics of Q-learning theory, followed by the control flow for adaptive tuning. Then, system power and BER models are discussed as well.

# Q-learning theory

Q-learning theory [10] is generally practiced to find an optimal action-selection policy from the set of states *S*. To solve (1) using Q-learning algorithms, we consider the I/O communication power  $P_i$  and BER *BER<sub>i</sub>* as the state vector *S*, which corresponds to output-voltage swing level  $V_i$  and the change of

output-voltage swing level  $V_i$  as the action  $a_n$ . State vector *S* can be given as

# $S = < P_i, BER_i >.$

In order to obtain the state-action pairs and form an LUT, the input samples (voltage-levels) are trained and the corresponding communication power and BER are denoted as outputs. A sample LUT will be as follows:

| Action          | State |         |
|-----------------|-------|---------|
| (Voltage swing) | Power | BER     |
| $a_n(V_1)$      | $P_1$ | $BER_1$ |
| :               | :     | :       |

The input samples are collected at regular time intervals, called control cycle, at a scale of *ns*. Control cycle can be defined as the minimum time required for the state transition. The next state variable needs to be predicted with an action for the next input sample. This can be done by calculating a reward function to achieve an optimally estimated value based on the state vectors, given by

$$R = f(P, BER).$$
(2)

Here, reward R is a function of communication power P and BER value *BER* as the state vectors. The relation between state variables and reward value is presented later.

The next state and the current state can be the same depending on the workload characteristic. The reward R forms a part of the expected Q-value, which decides the direction of state transition. The optimal estimation is chosen among the set of states to satisfy the required criteria by taking the corresponding action selected from the formed LUT. The expected Q-value is calculated as

$$\hat{\mathbf{Q}}(s_i, a_n) = Q(s_i, a_n)(1 - \alpha) + \alpha(R + \gamma E).$$
(3)

Here  $\alpha$ ,  $\gamma$  denotes the learning rate and discount factor, respectively. The optimal estimation  $\hat{E}$  of state  $s_i$  can be calculated as follows:

$$\hat{E} = \min\{\hat{\mathbf{Q}}(s_i, a_n)\}, n = 1, \dots, M.$$
(4)

Here,  $\hat{\mathbf{Q}}(s_i, a_n)$  represents the expected Q-value after taking action  $a_n$ ; *M* denotes the number of possible actions available at state  $s_i$ . The optimal estimate can be *min* or *max* depending on the reward function.

#### Adaptive control flow with compensation

Figure 3b shows the state diagram of adaptive control flow with compensation to depict the change

of states by basic Q-learning algorithm and compensation method. The I/O communication power and BER are considered as the components of state  $s_i$ , and the change of output-voltage swing level as the action  $a_n$ . The power increases while BER decreases from main state  $s_1$  to  $s_4$ . With compensation, two additional states  $s_{12}(s_{23})$  are set the same output-voltage level with  $s_1(s_2)$ , so that one can get same communication power but smaller BER by compensation mechanism [see Figure 3b(ii)]. Meanwhile, the action  $a_1(a_2)$  is defined to change  $s_i$  to  $s_{i+1}(s_{i-1})$ ,  $a_3(a_4)$  is defined to change  $s_i$  to  $s_{i+2}(s_{i-2})$ , and  $a_5$  holds the state. For example, when the system is in state  $s_1(s_2)$  and action  $a_1$  is selected by Q-learning algorithm, the compensation circuit will be activated and action  $a_{11}$  is enabled, then system changes to additional state  $s_{12}(s_{23})$ , and further switches into  $s_2$  while action  $a_{12}$  is selected. Similarly, other state transitions also happen.

With compensation, the adaptive output-voltage swing tuning by the Q-learning is presented in Algorithm 1. LUT is formed with I/O communication power and BER as the state vectors and output-voltage swing as action. The tail current  $I_t$  at the current control cycle is set by analog design and can be obtained from measurement.

**Algorithm 1.** With compensation, Q-learning based adaptive tuning of output-voltage swing

**Input:** Communication power trace  $P_i$ , BER feedback from receiver and look-up-table (LUT)

**Output:** Adaptive tuning of output-voltage swing V<sub>i</sub>

- 1: Predict tail current:  $I_t(k + 1) = \sum_{i=0}^{N-1} w_i I_t(k i) + \xi$
- 2: Calculate corresponding communication power and BER
- 3: Reward:  $R_w(s_i, a_n, s_{i+1}) = b_1 \Delta(P_i) + b_2 \Delta(BER_i)$
- 4:  $\hat{\mathbf{Q}}(s_i, a_n) \leftarrow Q(s_i, a_n)(1 \alpha) + \alpha(R_w + \gamma E)$
- 5: Optimal value estimate:

 $\hat{E} = \min{\{\hat{\mathbf{Q}}(s_i, a_n)\}}, n = 1, ..., M$ 

- 6: if Compensation activated then
  7: Change to additional states or change from additional states to main states
- 8: else

9: Compute corresponding control bits

#### 10: end if

11: By adjusting tail current using control bits, tune corresponding  $\ensuremath{V_i}$ 

#### General Interest

Tail current for the next control cycle can be predicted by autoregression (AR), as given in (5), Line 1 of Algorithm 1

$$I_t(k+1) = \sum_{i=0}^{N-1} w_i I_t(k-i) + \xi.$$
 (5)

Here  $I_{l}(k + 1)$  denotes the predicted tail current at k + 1th control cycle,  $w_i$  represents the AR coefficient,  $\xi$  is the prediction error, and N represents the order of the AR prediction. During the training process, the AR coefficients can be determined by ordinary least squares method. Then, the corresponding I/O communication power for next control cycle can be calculated based on the predicted tail current. Furthermore, using the present I/O communication power and BER values, reward  $R_w$  is also decided, as given in Line 3 of Algorithm 1. Since we have two factors, we consider the weighted sum of I/O communication power and BER. The  $b_1$  and  $b_2$ denote the weighted coefficients for normalized rewards of the communication power  $\Delta(P_i)$  and BER  $\Delta(BER)$ .

After calculating the reward, the expected Q-value is calculated using (3) and the optimal action is selected based on Q-values using (4). While the compensation is activated, change to additional states or change from additional states to main states. Otherwise, compute corresponding control bits to tuning output-voltage.

#### State vector models

Consider the I/O communication power  $P_i$  and BER *BER<sub>i</sub>* as the state vector for Q-learning algorithm. The state power model includes the I/O communication power of driver and the TSI T-line. Both are the functions of the output-voltage swing  $V_i$ . For the CML-based driver with TSI T-line [11], the I/O communication power is given by

$$P_i = V_i \cdot (I_t + \frac{\eta * V_{dd} * \tau}{R_D + Z_{diff}} * f).$$
(6)

Here  $I_t$  is driver tail current;  $\tau$  is duration of signal pulse;  $\eta$  is activity factor;  $R_D$  is the resistance of driver; and  $Z_{\text{diff}}$  is the characteristic impedance of the TSI T-line.

The second component of the state vector is the BER of I/O communication. Consider the compensation activation, BER depends on the output-voltage swing, signal enhanced value at front-end of Rx, external noise, channel noise, etc. In a wire-line communication system [12], the BER can be estimated with the dependence on the output-voltage swing and enhanced voltage as

$$BER_i = \frac{1}{2} erfc(\frac{V_i + \Delta(V_e)}{\sqrt{2} \sigma_v}).$$
(7)

Here, the *erfc* is complementary error function;  $V_i$  refers to the *i*th output-voltage swing level,  $\Delta(V_e)$  is the enhanced voltage by compensation at Rx; and  $\sigma_v$  is the standard deviation of the noise. As such, the BER can be obtained from the ECC, and during the learning process,  $\sigma_v$  is estimated from (7) based on the BER.

### Experimental results

#### Experiment setup

The 2.5-D adaptive TSI I/O circuit and control mechanism verification is performed in Cadence Virtuoso and Matlab. An eight-core MIPS microprocessor is integrated with 8-bank of SRAM memory by TSI T-line, and the whole circuit system is designed with GF 65-nm CMOS. The 2.5-D TSI T-line is of length 3 mm and  $10 \mu \text{m}$  width, driven by the CML buffer. The power traces are measured from Cadence Virtuoso and control cycle is set as 1 ns, larger than switching time of I/O controller. The I/O management controller is based on the Q-learning output and compensation mechanism to balance the I/O communication power and BER. The LUT is formed with I/O communication power and BER, and used for main state transition. The four main voltage-swing level, communication power, and BER values in LUT are set up as follows: (100 mV, 6.27 E - 2 mW, 7.03 E - 2 BER),(150 mV, 1.41 E – 1 mW, 1.35 E – 2 BER), (250 mV, 3.92 E - 1 mW, 1.14 E - 4 BER), and (300 mV, 5.64 mE - 1 mW, 4.93 E - 6 BER). With compensation, two additional states and corresponding outputvoltage values are set as follows: (100 mV, 6.27 E – 2 mW, 3.81 E - 2 BER) and (150 mV, 1.41 E - 1 mW, 2.2 E – 3 BER). The learning rate  $\alpha$  and discount factor  $\gamma$  are set as 0.5 and 0.9, respectively. AR of order 8 is used for load current (or I/O communication power) prediction. The error between the predicted and actual values is less than 0.3% on average.

The adaptive voltage-swing tuning algorithm is performed in Matlab, with training of samples carried out offline. The overall I/O performance can provide a minimum of 76 mV peak-to-peak signal swing with 4 Gb/s bandwidth. The adaptive self-tuning of output-voltage swing may come with a little area overhead of 0.04 mm<sup>2</sup> for additional control circuits

and a latency of 100–200 ps. The other details of system are presented in Table 1.

# Eye-diagram under tuning and compensating

Figure 4a shows the eye-diagrams under different output-voltage swing, which can make different eye openings under the noise in channel. A larger eye opening is associated with a higher outputvoltage swing (or current driving ability), which has a minimum effective opening of 76 mV amplitude and 77 ps timing margin with 100 mV output-voltage swing [see Figure 4a(i)] and further increases with output-voltage swing. Figure 4b and c shows the eye-diagrams under compensation mechanism. In Figure 4c(i) the effective opening is (110 mV, 86 ps) with 150 mV output-voltage swing, but it declines to (50 mV, 70 ps)[Figure 4c(ii)] at the input of Rx sampler, then one can enable the compensation circuit and enhance the input signal to (82 mV, 75 ps) [see Figure 4c(iii)] but hold the same output-voltage swing. Thereby, compensation mechanism can be used to attain lower BER with same communication power.

# Adaptive tuning control with compensation mechanism

With increase in driver current, BER decreases, but at the cost of power. As compensation circuit can enhance input signal, thereby decreasing BER at Rx, one can leverage the tradeoff between the power reduction and the necessary BER. We further discuss an example with activated compensation mechanism. As shown in Figure 4d, the current state is  $s_2$ with four available actions, and the reward  $(r_i)$  for the next control-cycle can be predicted under corresponding action, for instance, output voltage-swing is 160 mV, power is 0.17 mw, and BER is 1E-2, then the  $s_3$  should be selected based on Q-learning policy. While compensation mechanism is activated, the additional states become available. The predicted BER can be recalculated as given in (7). Afterward, the state  $s_{23}$  will be selected and the driver tailcurrent is still maintained to have the output-swing as 150 mV, which lead to much smaller communication power. Since the output-voltage is not changed and the BER value is also not belong to O-learning state pairs, the LUT will not be updated.

# Performance comparison with benchmark

Various SPEC benchmarks are used to verify the communication power saving, as shown in Figure 5.

| Item           | Description          | Value                      | Size         |
|----------------|----------------------|----------------------------|--------------|
| Microprocessor | Technology node      | 65nm                       |              |
|                | Frequency            | 500MHz                     | $0.3mm^2$    |
|                | Dissipation power    | 15mW                       |              |
| I/O controller | Output-voltage swing | 0.1V, 0.15V, 0.2V, 0.3V    | -            |
|                | Driving current      | 2mA, $3mA$ , $4mA$ , $5mA$ |              |
|                | Number of levels     | 4                          | $0.04mm^{2}$ |
|                | Switching time       | 0.4ns                      |              |
|                | Gain of compensation | 1.3x                       |              |
| TSI            | Length               | 3mm                        |              |
|                | Inductance           | 300 pH                     | $3mm^2$      |
|                | Resistance           | $5\Omega$                  | Smin         |
|                | Capacitance          | 60 fF                      |              |
| Memory         | SRAM                 | 16 KB                      | $0.2mm^2$    |
|                | Power dissipation    | 6mW                        | 0.2mm        |

System settings for memory-logic integration with TSLI/O

Table 1



Figure 4. (a) Eye-diagram under tuning output-voltage swing: (i) 100 mV, (ii) 150 mV, and (iii) 300 mV. (b) and (c) Eye-diagram under compensating: (i) output-voltage swing, (ii) normal received signal, and (iii) compensated received signal. (d) Example of one adaptive I/O control with activated compensation mechanism.





The results are reported, respectively: no Q-learning (Normal); only Q-learning (Only-Q); and with the activated compensation (Q-Comp). It shows that the adaptive I/O management with activated compensation circuit is more power efficient. For example, bzip2 benchmark, the I/O communication power decrease from 0.267 to 0.224 mW, when the system is only tuned by the Q-learning, and is further reduced to 0.216 mW with the compensation of enhancement circuit. On average, 12.95% of I/O communication power saving is achieved when using the Q-learning only, and further 15.61% is achieved with the activated compensation circuit. It needs to be noted that the power values presented in Figure 5 is the I/O communication power, which does not include the Tx and Rx power consumption. On average, the power consumption of whole system (Tx, Rx, and the I/O) is 19 mW with an energy efficiency of 4.75 pJ/bit for the I/O without the adaptive tuning, which is further reduced to 13 mW by the adaptive tuning.

**IN THIS PAPER**, one self-adaptive adjustment of I/O output-voltage swing is investigated toward the energy-efficient 2.5-D memory-logic integration. Based on the Q-learning algorithm and compensation mechanism, the I/O management can leverage the tradeoff between the power reduction and the necessary BER. Experimental results have shown that the developed adaptive 2.5-D I/Os designed in 65-nm CMOS can achieve an average of 13 mW I/O power, 4 GHz bandwidth, and 3.25-pJ/bit energy efficiency for one channel under 10<sup>-6</sup> BER. When

compared to the uniform output-voltage swingbased I/O, the I/O managements with compensation method and controlled by Q-learning can achieve 15.61% communication power reduction and 14% energy efficiency improvement.

# Acknowledgements

This paper is supported by the National Natural Science Foundation of China, No. 61471296 and No. 61771388.

# References

- M. B. Healy et al., "Design and analysis of 3D-MAPS: A many-core 3D processor with stacked memory," in *Proc. IEEE Custom Integr. Circuits Conf.*, 2010, pp. 1–4.
- [2] S. Rusu et al., "Ivytown: A 22nm 15-core enterprise Xeon processor family," in *Proc. IEEE Int. Solid-State Circuits Conf.*, 2014, pp. 102–103.
- [3] D. Oh, C. C. P. Chen, and Y. H. Hu, "Efficient thermal simulation for 3-D IC with thermal through-silicon vias," *IEEE Trans. Comput.-Aided Design*, vol. 31, no. 11, pp. 1767–1771, 2012.
- [4] J. R. Cubillo, R. Weerasekera, Z. Z. Oo, and E. X. Liu, "Interconnect design and analysis for through silicon interposers (TSIs)," in *Proc. IEEE Int. Conf. 3D Syst. Integr.*, 2012, vol. 5, pp. 1–6.
- [5] J.-S. Seo et al., "High-bandwidth and low-energy on-chip signaling with adaptive pre-emphasis in 90nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf.*, 2010, pp. 182–183.
- [6] F. Spagna, L. Chen, M. Deshpande, and Y. Fan, "A 78mW 11.8Gb/s serial link transceiver with adaptive RX equalization and baud-rate CDR in 32nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf.*, 2010, pp. 366–367.
- [7] D. Xu et al., "An energy-efficient 2.5D through-silicon interposer I/O with self-adaptive adjustment of output-voltage swing," in *Proc. Int. Symp. Low Power Electron. Design*, 2014, pp. 93–98.
- [8] M. Baumann and H. K. Buning, "State aggregation by growing neural gas for reinforcement learning in continuous state spaces," in *Proc. 10th Int. Conf. Machine Learning Appl. Workshops*, 2011, vol. 1, pp. 430–435.
- [9] M. Pozzoni et al., "A multi-standard 1.5 to 10Gb/s latchbased 3-tap DFE receiver with a SSC tolerant CDR for serial backplane communication," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1306–1315, 2009.
- [10] E. Even-Dar and Y. Mansour, "Learning rates for Q-learning," *IEEE JML*, vol. 5, pp. 1–25, 2003.

- [11] I. Ndip et al., "High-frequency modeling of TSVs for 3-D chip integration and silicon interposers considering skin-effect, dielectric quasi-TEM and slow-wave modes," *IEEE Trans. Components, Packaging, Manufacturing Technol*, vol. 1, no. 10, pp. 1627–1641, 2011.
- [12] R. A. Shafik et al., "On the extended relationships among EVM, BER and SNR as performance metrics," in *Proc. Int. Conf. Electrical Comput. Eng.*, 2006, pp. 408–411.

**Dongjun Xu** received a BS from the Xi'an University of Technology, Xi'an, China, in 2010, where he is currently pursuing a PhD. He was a Research Assistant at Nanyang Technological University, Singapore, from 2012 to 2015. His research interests include 3-D ICs, multicore, and low-power designs. He is a Student Member of the IEEE.

**Ningmei Yu** is currently a Professor with the Department of Electronic Engineering, Xi'an University of Technology, Xi'an, China. Her research interests include 3-D ICs, multicore, and low-power design. She received a PhD from Tohoku University, Sendai, Japan, in 1999.

**Hantao Huang** received a BS from Nanyang Technological University, Singapore, in 2013, where he has been pursuing a PhD from the School of Electrical and Electronic Engineering since 2014. His research interests are Internet of Things systems, machine-learning algorithms, and 3D-IC design. He is a Student Member of the IEEE.

**Sai Manoj P. D.** is a Post-Doctoral Research Scientist at TU Wien, Vienna, Austria. His research interests include self-aware SoC design, machine learning for on-chip data processing, and security in Internet of Things networks. He received a PhD in electronic and electrical engineering from Nanyang Technological University, Singapore, in 2015. He is a Member of the IEEE.

**Hao Yu** is with the Southern University of Science and Technology, Shenzhen, China. His research interests include CMOS emerging technology for data sensor, link, and accelerator. He received a BS from Fudan University, Shanghai, China, and a PhD from the Department of Electrical Engineering, University of California at Los Angeles, Los Angeles, CA, USA. He is a Senior Member of the IEEE.

■ Direct questions and comments about this article to Ningmei Yu, Xi'an University of Technology, Xi'an, 710048, China; e-mail: yunm@xaut.edu.cn.