Topics Topics in in Middle-Down Middle-Down Placement Placement John Lillis Karthik Kalpat Devang Jariwala Sung-Woo Hur UIC Background/Overview Current Target: Wire-Length Driven Cell Placement Origins: Mongrel (ICCAD 2000) Key Concepts Middle-Down flow

Relaxation Based Local Search Novel Legalization Procedures Detailed Optimization Techniques: Interleaving Extensions (works in progress) Multi-Level Strategies (both in target grid and source netlist) Placement-Based Strategies: Re-clustering, tie-breaking Two Row Interleaving Themes Constraint Relaxation (flow/qplace for global

placement) Constraint Imposition (e.g., interleaving) Search Clustering Scalability Escape From Local Optima Inducing Uniformity Wire-Length Driven Placement Given: netlist of a circuit, # rows Placement Problem: |E| min len(ei) over all legal placement P i=1

Mongrel Middle-Down Flow (2-level) Global Placement Detailed Placement Relaxation-Based Local Search Optimal Interleaving... Why Middle-Down? Better correlation with final result min-cut bets a lot on 1st cut other objectives (e.g., timing) are meaningless in strict top-down Simplification of constraints (vs. flat) Enables Search More than most Analytical

Methods Moves (later) More Global Than Traditional Search Key Concepts In Mongrel Global Placement RBLS: Relaxation Based Local Search (Network flow based) Max-Gain Based Legalization Detailed Placement Dynamic Clustering Optimal Interleaving Global Placement Given n and m, place cells on an n X m grid

Bin capacity C = Total cell area (n x m) Bin Capacity Constraint:Each bin B(i,j) has lower and upper bound to allow cells to be placed (1 - ) C Sv (1 + ) C vB(I,j) Legal Global Placement: Given , every bin satisfies bin capacity constraint and row size constraint Row Size Constraint: Every row size should be within specified tolerance Philosophy of Constraint Relaxation - Relax constraints (e.g., allow cell overlap) - Solve relaxed formulation optimally (and efficiently) - Resolve infeasibility (cell overlap) heuristically

Overview of Relaxation Based Local Search Input : M and c k P c initial placement while k > 0 Extract set of M mobile nodes Find optimal relaxed location for mobile nodes P Legalized placement Local optimization using partitioning technique Improved No k k-1

Yes P k New placement c Set of Mobile Nodes Sub-circuit Concept e1 a e g f,h e7

a c e6 e3 e5 e4 e2 b e2 d Projected sub-circuits

e3 f e3 a,f c e1 d e 5 e4 d e4 e2 e5 b

e7 e6 e g h b e1 c e6 h g e7 e Relaxed Placement Solvers Linear Program --> Network Flow Solution

Measures Half-Perimeter Combinatorial Algorithm Quadratic Placement Can Only Appx. Half-Perimeter Quadratic Objective: Measures wrong thing Can give more favorable cell distribution Current Implementation Faster than Flow

Preliminary Data: Qplace better for early in optimization (large sub-ckt size); Flow better for small sub-ckt size. Philosophy of Optimal Interleaving Target: Intra-row optimization Desirable Properties: 1. Efficiency 2. Large solution space Optimal Interleaving Window A1 B1 A2 B2

B3 A3 B4 Partitioning A1 A2 A3 B1 B2 B3 B4

Interleaving B1 B2 A1 B3 A2 B4 A3 Two Row Interleaving Idea: Enable More Flexibility by Considering Adjacent Rows Simultaneously. Cells Constrained to be in Original Rows, but Enables Cells to Move Together (less influence of anchors other rows.

Formulation: A1, B1: Subsequences of row 1 (relative order obeyed) A2, B2: Subsequences of row 2 (relative order obeyed) Simultaneously find optimal interleavings of (A1,B1) and (A2,B2) s.t. WL is min. Note: only x-wire length changes. Two Row Interleaving Uniform Case Window a R1 R2 b - All cells identical width (e.g., FPGAs) - Cells (e.g., a, b) can move together

- Solution: Dynamic Programming DP Idea for Uniform Case - For Separability, scan line must move L-to-R - Decomposition: - k: length of prefix in # cells - i1 <= k: index into subseq A1 - [ j1 = k - i1 : implicit index into B1 ] - i2 <= k: index into subseq A2 - [ j2 = k i2: implicit index into B2 ] - S[k,i1,i2] = WL of opt. Interleaving of: A1[1..i1], B1[1..j1] and A2[1..i2], B2[1..j2] Four ways to form S[k,i1,i2] k-1 S[k-1,i1-1,i2-1] S[k-1,i1-1,i2] S[k-1,i1,i2-1]

S[k-1,i1,i2] { { { { A1[i1] A2[i2] A1[i1] B2[j2] B1[j1] A2[i2] B1[j1] B2[j2] cells Uniform Case Discussion Fill in table using recurrence

Take best of four cases by incremental wire-length calculation Because parameterized by k, we ensure separability required for DP Run-Time: O(N^3) for constant degree (more complex expression if pins considered) Next: how about variable width standard cells? Standard Cell Case 80 cells 20 cells Issues: - Decomposition by k not generally possible - Need to deal with jaggedness of right boundary Dealing with Jaggedness WLOG: assume left boundary identical for both rows (minor modifications can accommodate general case).

Let w(R,i,j) be the total cell width of cells A[1..i] and B[1..j] in row R (for given window). Feasibility requirement for quadruple (i1,j1,i2,j2): | w(R1,i1,j1) w(R2,i2,j2) | < D Idea: allow a certain degree of noise with near separability). Nevertheless, near-uniformity desirable (large D implies large error; wide-variance: few or no feasible configurations). Clustering for Uniformity Clustering is performed within the chosen windows for uniformity. A method similar to the dynamic clustering method that finds the cluster edges at points of minimum density is used for this purpose.

Graph Interpretation [ i1 j1 i2 j2] 1001 0001 0010 0110 0000 n1 m1 n2 m2 0100 0101 1000 1010

Notes: Eight candidate edges; only feasible nodes considered. Discussion Feasible space much smaller than n^4. Use hash table for this sparseness Practical runtime: O(n^3) With clusters, a generalization is employed: cluster flipping (pin locations preserved). Routed WL for FPGAs CKT VPR VPR+1ROW

VPR+2ROW ALU4 100 99.7 98.2 APEX2 100 99.6 96.2 APEX4 100

99.4 98.5 BIGKEY 100 99.6 95.5 CLMA 100 99.3 99.1 DES

100 99.9 99.9 Notion of Force Vector Idea: use a cells force vector to guide decisions Primary Application: TieBreaking Current Applications: Interleaving Max-Gain Path (Dynamic Re-clustering) Details: Weighted Graph Model used (as in Qplace) Weighted Center of Gravity of Neighbors Calculated

CLUSTERING IN GLOBAL PLACEMENT Middle-Down With Clustering Cluster the net list & place Global Placement on coarse grids De-cluster this circuit, increase the grid size & place NO Is the circuit completely flattened? YES Convert to detailed placement Perform interleaving & output final result

Clustering Cycle Unclustering Clustering Placement of clusters Grid Schedule Mongrel: 2-level Newer Work: Multi-Level One Data Point (ckt: IBM11): A: 25x25 -> 40x40 -> 60x40 -> 128x40 B: 3x3 ->10x10 -> 20x20 -> 40x40 -> 128x40 Schedule B consistently better

Bottom-up clustering Pair-wise merging Selection of Node1 Criteria: size Selection of Node2 Criteria: bandwidth Note: Target Size is always maintained Reclustering (Work In Progress) Idea: Let Placer Help in Finding Candidate Pairs to Cluster Run Global Placer at Certain Granularity Partially Flatten Run Global Placer on Flattened Netlist Cells (clusters) in same bin now are good candidates

to join. Additionally: Use force vector as similarity metric. MOTIVATION: RE-CLUSTERING IN LINEAR PLACEMENT Displacement Graph Regarding displacement graph, placement means linear placement. P[ i ]: cell placed in location i P-1[ v ]: location of cell v Assume each cell has a unit size for linear placement Location: 1 2 3

4 5 6 7 8 9 Placement: a b c

d e f g h i E.g.: P[4] = d P-1[g] = 7 Displacement Graph (cont.) Suppose Pg is a relatively good reference placement and Pb a mediocre one.

Displacement of Pb for each location i w.r.t. Pg is D[ i ] = Pg-1[Pb[ i ]] - i An Example of Displacement location : 1 2 3 4 5 6 7

8 9 Pg: a b c d e f g

h i Pb: c d f g h a b i

e D[i] : 2 2 3 3 3 -5 -5 1

-4 Displacement Graph (cont.) displacement 6000 4000 2000 0 -2000 14050 14125 location Test circuit: s38584 length(reference placement) = 1247152 length(mediocre placement) = 2698928

14200 Plateaus-Density Correlation displacement density (of mediocre placement) Test circuit: s38584 120 location 290 Dynamic Clustering (Linear Placement) Wire length

600000 550000 500000 500000 450000 200 700 Time(sec) Test circuit: s15850 : Flat placement : Clustered placement 1200

A Few Data Points MCNC avql (1 cell-height row spacing): Middle-Down: Capo: Feng Shui: DRAGON: 5.86 6.207 6.291 5.25 IBM11

Middle-Down: DRAGON: 4.21 4.08 Strengths and Flexibility and Potential of Framework Timing Optimization? Modularity of Optimizers Strong results for small to medium sized circuits Near competitive results for larger circuits Weaknesses

Scalability Unclear Speed Ongoing Work: More Careful Study of Initial Clustering Speedup Techniques (esp. gain graph) Implementation of 2-row interleaving Dynamic Re-clustering Thank you !