SimPL: An Effective Placement Algorithm Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of Michigan ICCAD 2010, Myung-Chul Kim, University of Michigan 1 Global Placement: Motivation Interconnect lagging in performance while transistors continue scaling Circuit delay, power dissipation and area dominated by interconnect Routing quality highly
controlled by placement IR drop Coupling RC delay Unloaded Circuit size and complexity rapidly increasing Scalable placement algorithm is critical Simplicity, integration with other optimizations ICCAD 2010, Myung-Chul Kim, University of Michigan 2 Placement Formulation Objective: Minimize estimated wirelength
(half-perimeter wirelength) Subject to constraints: Legality: Row-based placement with no overlaps Routability: Limiting local interconnect congestion for successful routing Timing: Meeting performance target of a design ICCAD 2010, Myung-Chul Kim, University of Michigan 3 Speed Prior Work Ideal placer
Quadratic and force-directed Non-convex optimization Ideal Placer mFAR, Kraftwerk2, FastPlace3 mPL6, APlace2, NTUPlace3 Solution Quality Fast runtime without sacrificing solution quality
Simplicity, integration with other optimization ICCAD 2010, Myung-Chul Kim, University of Michigan 4 Key features of SimPL Wirelength Flat quadratic placement Primal dual optimization Closing the gap between upper and lower bounds Upper-Bound Solution by Look-ahead Legalization Final Solution Initial WL Opt.
Lower-Bound Solution by Linear System Solver Iteration Final Legal Solution 5 Common Analytical Placement Flow Placement Instance Initial WL Optimization Global Placement
Converge no yes Legalization and Detailed Placement ICCAD 2010, Myung-Chul Kim, University of Michigan 6 SimPL Flow Placement Instance
Initial WL Optimization B2B Graph Building Linear System Solver no WL Converge yes Global Placement Look-ahead
Legalization (Upper-Bound) Converge no Pseudonet Insertion yes B2B Graph Building Legalization and Detailed Placement
Linear System Solver (Lower-Bound) B2B net model[P. Spindler, et al, Kraftwerk2 - A Fast Force-Directed Quadratic Placement Approach Using an Accurate Net Model, TCAD 2008] We delegate final legalization and detailed placement to FastPlace-DP [M. Pan, et al, An Efficient and Effective Detailed Placement Algorithm, ICCAD2005] 7 SimPL: Look-ahead Legalization Purpose: Produces almost-legal placement (UpperBound) while preserving the relative cell ordering given by linear system solver (Lower-Bound) Identify target region Find overflow bin b Create a minimal wide enough bin cluster B around b Perform geometric top-down partitioning
Find cell area median (Cc) and whitespace median (CB) Assign cells (Cc) to corresponding partitions (CB) Non-linear scaling Form stripe regions Move cells across stripe regions in-order based on whitespace 8 SimPL: Look-ahead Legalization (1) Overfilled bin Bin cluster (B) Cell-area whitespace median (Cc) median (CB) B0
ICCAD 2010, Myung-Chul Kim, University of Michigan B1 9 SimPL: Look-ahead Legalization (2) Cell-area whitespace median (Cc) median (CB) B0 ICCAD 2010, Myung-Chul Kim, University of Michigan 10 SimPL: Look-ahead Legalization (2)
CB CB CB 4 7 5 8 6 Uniform cutlines Obstacle 4 7
3 1 5 8 2 Cell Ordering 6 3 1 2 Per-stripe Linear Scaling
borders ICCAD 2010, Myung-Chul Kim, University of Michigan 11 SimPL: Look-ahead Legalization (3) Example (adaptec1) Look-ahead legalization stops when target regions become small enough SimPL: Using legal locations as anchors Purpose: Gradually perturb the linear system to generate lower-bound solutions with less overlap Anchors and Pseudonets Look-ahead locations used as fixed, zero-area anchors
Anchors and original cells connected with 2-pin pseudonets Pseudonet weights grow linearly with iterations ICCAD 2010, Myung-Chul Kim, University of Michigan 13 Next illustration: Tug-of-war between low-wirelength and legalized placements ICCAD 2010, Myung-Chul Kim, University of Michigan 14 SimPL Iterations on Adaptec1 (1)
Iteration=0 (Init WL Opt.) Iteration=2 (Lower Bound) Iteration=1 (Upper Bound) Iteration=3 (Upper Bound) 15 SimPL Iterations on Adaptec1 (2) Iteration=10 (Lower Bound) Iteration=11 Iteration=11(Upper (UpperBound) Bound)
Iteration=20 Iteration=20(Lower (LowerBound) Bound) Iteration=21 (Upper Bound) 16 SimPL Iterations on Adaptec1 (3) Iteration=30 (Lower Bound) Iteration=40 (Lower Bound) Iteration=31 (Upper Bound) Iteration=41 (Upper Bound)
17 Convergence of SimPL Legal solution is formed between two bounds ICCAD 2010, Myung-Chul Kim, University of Michigan 18 Empirical Results: ISPD05 Benchmarks Experimental setup Single threaded runs on a 3.2GHz Intel core i7 Quad CPU Q660 Linux workstation HPWL is computed by GSRC Bookshelf Evaluator < 5000 lines of code in C++, including CG-based solver for sparse linear systems with Jacobi preconditioner
Improvements after ICCAD submission ICCAD 2010, Myung-Chul Kim, University of Michigan 19 Empirical Results: Scalability Study Take an existing design (ISPD 2005) and split each movable cell into two cells of smaller size Each connection to the original cell is inherited by one of two split cells, which are connected by a 2-pin net Not in ICCAD paper Parallelism in Conjugate Gradient Solver Runtime bottleneck in SimPL: Conjugate gradient linear system solver Coarse-grain row partitioning
Implemented using OpenMP3.0 compiler intrinsic SSE2 (Streaming SIMD Extensions) instructions Process 4 multiple data with a single instruction Marginal runtime improvement in SpMxV Reducing memory bandwidth demand of SpMxV CSR (Compressed Sparse Row) format Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM 2003 ICCAD 2010, Myung-Chul Kim, University of Michigan 21 On-going Research Integration with physical synthesis Look-ahead placement offers opportunity for early estimation of circuit parameters Timing look-ahead
Congestion look-ahead Power-density look-ahead Improving the speed and quality of physical synthesis Parallel look-ahead legalization Run independently in separate sub-regions ICCAD 2010, Myung-Chul Kim, University of Michigan 22 Conclusions New flat quadratic placement algorithm: SimPL Novel primal-dual approach Amenable to integration with physical synthesis Self-contained, compact implementation Fastest among available academic placers Highly competitive solution quality
ICCAD 2010, Myung-Chul Kim, University of Michigan 23 Questions and Answers Thank you! Time for Questions IO 1% Post Global Placement 46% Initial placement 5% CG solver 19%
Look-ahead legalization 18% Sparse matrix and B2B net modeling 10% Pseudo-net insertion 1% ICCAD 2010, Myung-Chul Kim, University of Michigan 24