An Adaptive Parallel Flow for Power Distribution Network Simulation Using Discrete Fourier Transform Xiang Hu1, Wenbo Zhao2, Peng Du2, Amirali Shayan2, Chung-Kuan Cheng2 1 ECE Dept., 2CSE Dept. University of California, San Diego 01/19/2010 Outline Motivation Adaptive simulation flow using discrete Fourier transform (DFT) Basic DFT flow Problems with the basic DFT flow Adaptive DFT flow Experimental results Error analysis Time efficiency of the adaptive flow Time efficiency of parallel processing Conclusions Page 2 Outline Motivation Adaptive simulation flow using discrete Fourier transform (DFT) Basic DFT flow Problems with the basic DFT flow Adaptive DFT flow Experimental results Error analysis Time efficiency of the adaptive flow Time efficiency of parallel processing Conclusions Page 3 PDN Simulation: Why Frequency Domain? Huge PDN netlists Time-domain simulation: serial - slow Frequency-domain simulation: parallel fast Frequency dependent parasitics Simulation results Time-domain: voltage drops, simultaneous switching noise (SSN) input dependent Frequency-domain: impedance, anti-resonance peaks input independent Page 4

Transform Operations: Why Discrete Fourier Transform Laplace Transform [Wanping 07] Input: Series of ramp functions Output: Rational expressing via vector fitting Vector fitting may introduce large errors Choice of frequency samples is case dependent Discrete Fourier Transform (DFT) Input: periodic signal Inverse DFT is straightforward: vector fitting is not needed Frequency sample points with uniform steps Page 5 Outline Motivation Adaptive simulation flow using discrete Fourier transform (DFT) Basic DFT flow Problems with the basic DFT flow Adaptive DFT flow Experimental results Error analysis Time efficiency of the adaptive flow Time efficiency of parallel processing Conclusions Page 6 Basic DFT Simulation Flow Page 7 Problem with Basic DFT Flow Wrap-around effect requires long padding zeros at the end of the input Periodicity nature of DFT Small uniform time steps are needed to cover the input frequency range T x 10 8 T -3 6 x 10 -3 7 5

output 5 4 3 Correct 4 Voltage (V) 6 Current (A) Large number of simulation points! 2T 3T 4T 3 2 2 1 1 0 0 T -1 0 0.2 0.4 0.6 0.8 Time (sec) x 10 8 DFT repetition 7 6

Current (A) Current (A) 5 4 3 x 10 -3 -7 -1 Wrap-around 6 7 output 5 4 3 2 1 1 0 0 0.5 1 1.5 2 x 10 2.5 x 10 -3 5 6 2

0 Time (sec) -8 Distorted! 4 Voltage (V) 8 -3 1 x 10 3 2 1 -1 0 0.5 Page 8 1 1.5 Time (sec) 2 2.5 x 10 -8 -1 0 0 0.2 0.4 0.6 Time (sec) 0.8

1 x 10 -7 -1 0 0.5 1 1.5 Time (sec) 2 2.5 x 10 -8 Adaptive DFT Simulation x 10 4T - -3 5 Voltage (V) 4 3 2 T 6 x 10 -3 6 5 5 4 4

3 3 Voltage (V) 6 2T 3T Voltage (V) T 2 1 1 0 0 0 0 0.2 0.4 0.6 Time (sec) Correct 0.8 -1 1 x 10 -7 0 0.2 0.4 0.6 Time (sec) Distorted 0.8

1 x 10 -3 2 1 -1 x 10 -1 0 0.5 1 1.5 2 Time (sec) -7 2.5 x 10 -8 Correct! Basic ideas of the adaptive DFT flow: cancel out the wrap-around effect by subtracting the tail from the main part of the output Main part of the output: obtained with small time step and small period; distorted by the wrap-around effect Tail of the output: low frequency oscillation; can be captured with large time steps Total number of simulation points is reduced significantly! Page 9 Adaptive DFT Flow Period[i]: the input period at each iteration Interval[i]: the simulation time step at each iteration FreqUpBd[i]: the upper bound of the input frequency range at each iteration vi(t): tentative time-domain output within the frequency range [0, FreqUpBd] at each iteration Iteration #1: obtain the main part of the output Iteration #2~k: capture the oscillations in the

tail of the output (high, middle, and low resonant frequencies) For each iteration #i, i=k, k-1, , 2, subtract the captured tail from the outputs at iteration #j, j

*
*1 0 0.5 10 Page 12 Original Input -3 5 Current (A) Impedance ( ) 4 x 10 4 10 6 Frequency (Hz) 10 8 -1 0 0.5 1 Time (sec) 1.5 2 x 10 -8 Experimental Results: Adaptive Flow Process T1 x 10 10 Iteration #1: v1(t)

-3 v1(t), t1=20ps, T1=20.48ns 8 t1=20ps Voltage (V) 6 4 2 v1 (t ) 0 0 10 Voltage (V) v (mT : (m 1)T ) 2 1 1 m 1 -2 -4 T1=20.48ns 3 x 10 0.5 -3 1 1.5 2 Time (sec) 3T1 2T1 2.5

x 10 -8 Iteration #2: v2(t) 4T1 8T1 t2 = 64t1 v2(t), t2=640ps, T2=163.84ns 5 = 640ps T2 = 8T1 0 = 163.84ns -5 0 10 x 10 1 2 3 4 5 6 7 9 x 10 -8 5 v1 (t ) v (mT : (m 1)T ) 2 1 m 1

Final output-v1(t) Tail: 0 -5 Final output: 3 Main part: Final output v1(t) Final output Voltage (V) 8 Time (sec) -3 v2 (T1 : 8T1 ) 0 Page 13 1 2 3 4 Time (sec) 5 6 7 8 9 x 10 -8 1 Experimental Results: DFT Flow vs. SPICE 10

x 10 -3 DFT flow SPICE transient simulation 8 Voltage (V) 6 4 2 Relative error: 0.25% 0 -2 -4 0 0.5 1 Time (sec) Page 14 1.5 2 x 10 -7 Error Analysis: Error Caused by Wrap-around Effect 6 x 10 -3 Output comparison 1.5 DFT flow, T=20.48ns DFT flow, T=163.84ns SPICE transient simulation 5 -4 Error relative to SPICE Difference between DFT and SPICE, T=20.48ns Difference between DFT and SPICE, T=163.84ns

1 4 Relative error: 0.12% 0.5 3 Voltage (V) Voltage (V) x 10 2 0 -0.5 1 0 -1 -1 -1.5 Relative error: 2.09% 0 0.5 1 1.5 Time (sec) 2 2.5 x 10 -8 0 0.5 1 1.5 Time (sec)

2 2.5 x 10 Theorem 1: Let be the initial value of the output voltage. Suppose for some , then the mean square error, i.e., is bounded by . Page 15 -8 Error Analysis: Error Caused by Different Interpolation Methods -3 6 x 10 Output comparison 2 DFT flow, t=20ps DFT flow, t=2.5ps SPICE transient simulation 5 x 10 -5 Error relative to SPICE Difference between DFT and SPICE, t=20ps Difference between DFT and SPICE, t=2.5ps 1.5 1 4 Relative error: 0.12% 0.5 Voltage (V) Voltage (V) 3 2 0 -0.5

*
*
1 0 -1 -1.5 0 0.2 0.4 0.6 0.8 Time (sec) x 10 Voltage (V) 1 x 10 -7 1.31 1.3 1.29 2.294 0.2 0.4 0.6 0.8 Time (sec) DFT: sinusoidal interpolation DFT flow, t=20ps DFT flow, t=2.5ps SPICE transient simulation Page 16 -2 0 SPICE: PWL interpolation -4 1.32

1.28 Relative error: 0.09% -1 2.2945 2.295 Time (sec) 2.2955 2.296 x 10 -8 1 x 10 -7 Time Complexity Analysis: Adaptive vs. Non-adaptive k Adaptive flow time complexity: O( Ti / ti ) i 1 Ti: simulation period at iteration #i, T1 T2 Tk ti: simulation time step at iteration #i, t1 t2 tk Non-adaptive flow time complexity: O(Tk / t1 ) 2.5 Non-adaptive method Adaptive method Relative error to Hspice (%) 2 1.5 1 0.5 Page 17 0 0 100 200 300

400 Time (sec) 500 600 700 Parallel Processing 16000 Test case: 3D PDN Basic DFT Adpative DFT 14000 ~ 0.17 million nodes The adaptive DFT flow has more advantage when the number of available processors is limited. Simulations between each iteration of the adaptive flow need to be processed serially Simulation time (sec) 12000 10000 8000 6000 4000 2000 0 0 50 100 150 200 Number of processors 250 Simulation time with different number of processors (sec) 1 prc 4 prcs 8 prcs 16 prcs 64 prcs

128 prcs 256 prcs Hspice 21374 NA NA NA NA NA NA Basic DFT 15976 6635 3225 1647 425 218 143 Adaptive DFT 4947 2096 904 490 180 120 115 Page 18 300 Parallel Processing: DFT Flow vs. SPICE

DFT error compared to Hspice Simulation result 8 x 10 -3 3 Adaptive DFT Hspice 7 x 10 -5 2 6 1 Voltage (V) Voltage (V) 5 4 3 2 0 -1 -2 1 -3 0 -1 0 Page 19 0.5 1 1.5 Time (sec) 2 2.5 x 10 -8

-4 0 0.5 1 1.5 Time (sec) 2 2.5 x 10 -8 Outline Motivation Adaptive simulation flow using discrete Fourier transform (DFT) Basic DFT flow Problems with the basic DFT flow Adaptive DFT flow Experimental results Error analysis Time efficiency of the adaptive flow Time efficiency of parallel processing Conclusions Page 20 Conclusions Implemented an adaptive flow for large PDN simulation using DFT Total number of simulation points is reduced significantly compared to the basic DFT flow Achieved a relative error of the order of 0.1% compared to SPICE 10x speed up with a single processor compared to SPICE. Parallel processing is incorporated to reduce the simulation time even more significantly Page 21 Page 22