Pointer Analysis G. Ramalingam Microsoft Research, India & K. V. Raghavan Goals Points-to Analysis: Determine the set of possible values of a pointer-variable (at different points in a program) what locations can a pointer point-to? Alias Analysis: Determine if two pointer-variables may point to the same location Compute conservative approximation A fundamental analysis, required by most other static analyses A Constant Propagation Example x = 3;

y = 4; z=x+ 5; x is always 3 here can replace x by 3 and replace x+5 by 8 and so on A Constant Propagation Example With Pointers x = 3; *p = 4; z=x+ 5; Is x always 3 here? A Constant Propagation Example With Pointers

p = &y; x = 3; *p = 4; z=x+ 5; if (?) p = &x; p= x = 3; &x; *p = 4; else z=x+ p= 5; &y; x is always 3 x is always 4 pointers affect x = 3; analyses

most program *p = 4; z=x+ x may be 3 or 4 5; (i.e., x is unknown in our lattice) A Constant Propagation Example With Pointers p = &y; x = 3; *p = 4; z=x+ 5; p always points-to y if (?) p = &x; p=

x = 3; &x; *p = 4; else z=x+ p= 5; &y; p always points-to x = 3; x *p = 4; z=x+ p may point-to x or y 5; Points-to Analysis Determine the set of targets a pointer variable could point-to (at different points in the program)

p points-to x p stores the value &x *p denotes the location x targets could be variables or locations in the heap (dynamic memory allocation) p = &x; p = new Foo(); or p = malloc (); Algorithm A (may points-to analysis) A Simple Example p= &x; q= &y; if (?) { q= p; } x=

Algorithm A (may points-to analysis) A Simple Example p= &x; q= &y; if (?) { q= p; } x= p q x y z

null null null null null x null null null null

x y null null null x y null null null x

x null null null x {x,y} null null null x

{x,y} a null null x {x,y} a b null x {x,y}

a b {a,b} Algorithm A (may points-to analysis) A Simple Example x = &a; y = &b; if (?) { p= &x; } else { p= &y; } *x = &c;

How should we handle this statement? (Try it!) Weak update x: a y: b x: a y: b x: {a,c} y: {b,c} Strong update p:

{x,y} p: {x,y} p: {x,y} a: null a: c a: c Questions When is it correct to use a strong update? A weak update? Is this points-to analysis precise? We must formally define what we want to compute before we can answer many such questions Points-To Analysis: An Informal Definition Let u denote a program-point

Define IdealMayPT (u) to be { (p,x) | p points-to x in some state at u in some run } Algorithm should compute a set MayPT(u) that overapproximates above set Static Program Analysis A static program analysis computes approximate information about the runtime behavior of a given program 1. The set of valid programs is defined by the programming language syntax 2. The runtime behavior of a given program is defined by the programming language semantics 3. The analysis problem defines what information is desired 4. The analysis algorithm determines what approximation to make Programming Language: Syntax A program consists of a set of variables Var

a directed graph (V,E,entry) with a distinguished entry vertex, with every edge labelled by a primitive statement A primitive statement is of the form x = null Omitted (for now) x=y Dynamic memory x = *y allocation x = &y; Pointer arithmetic *x = y Structures and fields Procedures skip (where x and y are variables in Var) Example Program x = &a;

y = &b; if (?) { p= &x; } else { p= &y; } Vars = {x,y,p,a,b,c} 1 x = &a 2 y = &b p = &x 3

p = &y 4 5 skip skip 6 *x = &c 7 *p = &c 8 Programming Language: Operational Semantics Operational semantics == an

interpreter (defined mathematically) State DataState ::= Var -> (Var U {null}) PC ::= V (the vertex set of the CFG) ProgramState ::= PC x DataState Initial-state: (entry, \x. null) Example States Vars = {x,y,p,a,b,c} 1 x = &a 2 y = &b p = &x 3

p = &y 4 5 skip skip 6 *x = &c 7 *p = &c 8 Initial data-state x: N, y:N, p:N, a:N, b:N,

c:N Initial program-state x: N, y:N, p:N, a:N, b:N, <1, c:N > Next program-state x: a, y:N, p:N, a:N, b:N, <2, c:N > Programming Language: Operational Semantics Meaning of primitive statements CS[stmt] : DataState -> 2DataState

CS[ CS[ CS[ CS[ x x x x = = = =

null ] s = {s[x null]} &y ] s = {s[x y]} y ] s = {s[x s(y)]} *y ] s = CS[*x = y ] s = = Programming Language: Operational Semantics Meaning of primitive statements CS[stmt] : DataState -> 2DataState CS[ x = null ] s = {s[x null]}

CS[ x = &y ] s = {s[x y]} CS[ x = y ] s = {s[x s(y)]} CS[ x = *y ] s = {s[x s(s(y))]}, if s(y) is not null = {}, otherwise CS[*x = y ] s = Programming Language: Operational Semantics Meaning of primitive statements CS[stmt] : DataState -> 2DataState CS[ x = null ] s = {s[x null]} CS[ x = &y ] s = {s[x y]}

CS[ x = y ] s = {s[x s(y)]} CS[ x = *y ] s = {s[x s(s(y))]}, if s(y) is not null = {}, otherwise CS[*x = y ] s = {s[s(x) s(y)]}, if s(x) is not null = {}, otherwise Programming Language: Operational Semantics Meaning of program a transition relation on program-states ProgramState X ProgramState state1 state2 means that the execution of some edge in the program can transform state1 into state2 Defining (u,s) (v,s) iff the program contains a control-flow edge u->v labelled with a statement stmt such that CS[stmt]s = s

Programming Language: Operational Semantics A sequence of states s1s2 sn is said to be an execution (of the program) iff s1 is the Initial-State si si+1 for 1 <= I < n A state s is said to be a reachable state iff there exists some execution s1s2 sn is such that sn = s. Define RS(u) = { s | (u,s) is reachable } Programming Language: Operational Semantics A sequence of states s1s2 This sn isis said the to collecting be an execution (of the program) iff

s1 is the Initial-State si si+1 for 1 <= I < n semantics at point u A state s is said to be a reachable state iff there exists some execution s1s2 sn is such that sn = s. Define RS(u) = { s | (u,s) is reachable } Ideal Points-To Analysis: Formal Definition Let u denote a vertex in the CFG Define IdealMayPT (u) to be \p. { x | exists s in RS(u). s(p) == x } May-Point-To Analysis: Problem statement Compute MayPT: V -> 2Var such that for every vertex u

MayPT(u) IdealMayPT(u) (where Var = Var U {null}) May-Point-To Algorithms Compute MayPT: V -> 2Vars such that MayPT(u) IdealMayPT(u) An algorithm is said to be correct if the solution MayPT it computes satisfies "uV. MayPT(u) IdealMayPT(u) An algorithm is said to be precise if the solution MayPT it computes satisfies "uV. MayPT(u) = IdealMayPT(u) An algorithm that computes a solution MayPT1 is said to be more precise than one that computes a solution MayPT2 if "uV. MayPT1(u) MayPT2(u) Algorithm A: A Formal Definition The Data Flow Analysis Recipe Define semi-lattice of abstract-values AbsDataState ::=

(Var -> (2Var {})) f1 f2 = \x. (f1 (x) f2 (x)) Define initial abstract-value InitialAbsState = \x. {null} Define transformers for primitive statements AS[stmt] : AbsDataState -> AbsDataState Algorithm A: A Formal Definition The Data Flow Analysis Recipe Let st(v,u) denote stmt on edge v->u x(v) x(w) v w

st(v,u) st(w,u) u x(u) Compute the least-fixed-point of the following dataflow equations x(entry) = InitialAbsState x(u) = v->u AS(st(v,u)) x(v) Algorithm A: The Transformers Abstract transformers for primitive statements AS[stmt] : AbsDataState -> AbsDataState

AS[ AS[ AS[ AS[ x x x x = = = = y ] s = s[x s(y)] null ] s = s[x {null}] &y ] s = s[x {y}]

*y ] s = s[x s*(s(y)-{null})], if s(y) is not = {null} = bot, otherwise where s*({v1,,vn}) = s(v1) s(vn), Algorithm A AS[ *x = y ] s = bot s[z s(y)] s[z1 s(z1) s(y)] [z2 s(z2) s(y)] [zk s(zk) s(y)] if s(x) = {null} if s(x)-{null} = {z} if s(x)-{null} = {z1, , zk} (where k > 1) After fix-point solution is obtained, AbsDataState(u) is emitted as MayPT(u), for

each program point u An alternative algorithm: must points-to analysis AbsDataState is modified, as follows: Each var is mapped to {} or to a singleton set join is point-wise intersection Let MustPT(u) be fix-point at u Guarantee: (MustPT(u)) MayPT(u) IdealMayPT(u) where (S) = S, if S is a singleton set = Var, if S = {} Must points-to analysis algorithm AS transfer functions same as in Algorithm A for x = y, x = null, and x = &y AS[ x = *y ] s

= bot, if s(y) = {null} = s[x {}], if s(y) = {} = s[x s(z)], if s(y) = {z} Must points-to analysis algorithm AS[ *x = y ] s = bot, if s(x) = {null} = s[z s(y)] if s(x) = {z} = \v. {}, otherwise This analysis is less precise than the maypoints-to analysis (Algorithm A), but is more efficient