Overview of the Phase Problem Protein Crystal Data Phases Structure John Rose ACA Summer School 2006 Reorganized by Andy Howard, Biology 555, Spring 2008 Remember We can measure reflection intensities We can calculate structure factors from the intensities We can calculate the structure factors from atomic positions We need phase information to generate the image 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 1 of 42

What is the Phase Problem? X-ray Diffraction Experiment All phase information is lost x,y.z [Real Space] Fhkl [Reciprocal Space] In the X-ray diffraction experiment photons are reflected from the crystal lattice (planes) in different directions giving rise to the diffraction pattern. Using a variety of detectors (film, image plates, CCD area detectors) we can estimate intensities but we lose any information about the relative phase for different reflections. Phases Lets define a phase j associated with a specific plane [hkl] for an individual atom: j = 2(hxj + kyj + lzj) Atom at xj=0.40, yj=0.05, zj=0.10 for plane [213]:

j = 2(2*0.40 + 1*0.05 + 3*0.10) = 2(1.35) If we examine a 2-dimensional case like k=0, then j = 2(hxj + lzj) Thus for [201] (a two-dimensional case): j = 2(2*0.40 + 0*0.05 + 1*0.10) = 2(0.90) Now, to understand what this means: 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 3 of 42 F E C D C E a

I F A G B H 720 H B D A 0.4, y, 0.1

c G 0 360 I 201 planes 201 0Phases 1080 D = 2[ 2(0.40) + 1(0.10)] = 2 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 4 of 42

In General for Any Atom (x, y, z) a dhkl 6 dhkl 4 Atom (j) at x,y,z dhkl 2 0 Remember: Plane hkl We express any position in the cell as (1) fractional coordinates: pxyz = xja+yjb+zjc (2) the sum of integral multiples of the reciprocal axes c

hkl = ha* + kb* + lc* 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 5 of 42 Diffraction vector for a Bragg spot We set up the diffraction vector hkl associated with a specific diffraction direction hkl: hkl = ha* + kb* + lc* The magnitude of this diffraction vector is the reciprocal of our Bragg-law plane spacing dhkl: |hkl| = 1/ dhkl 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 6 of 42 Phase angle for a spot

The phase angle j associated with our atom is 2 times the projection of the displacement vector pj onto hkl: j = 2hkl pj But that displacement vector pj is related to the realspace coordinates of the atom at position j: pj = xja + yjb + zjc where the fractional coordinates of our atom within the unit cell are (xj, yj, zj) Thus j = 2ha* + kb* + lc* xja + yjb + zjc) 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 7 of 42 Real-space and reciprocal space But these real-space and reciprocal-space unit cell vectors (a,b,c) and (a*,b*,c*) are duals of one another; that is, they obey: aa* = 1, ab* = 0, ac* =0 ba* = 0, bb* = 1, bc* =0 ca* = 0, cb* = 0, cc* = 1 even when the unit cell isnt all full of 90-degree angles! 14 Feb 2008

Biology 555 Crystallographic Phasing I p. 8 of 42 Matrix formulation of this duality If we construct the 3x3 reciprocal-space unit cell matrix A = (a* b* c*) And the 3x3 real-space unit cell matrix R = (a b c) for a specific position of the sample, then A and R obey the simple relationship A = R-1, i.e. AR = I Where I is a 3x3 identity matrix 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 9 of 42 How to use this in getting phases j = 2ha* + kb* + lc* xja + yjb + zjc)

But using those dual relationships, e.g. a*a = 1, b*c = 0, we get j = 2hxj + kyj + lzj) Note that this is true even if our unit cell angles arent 90! 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 10 of 42 Why Do We Need the Phase? Fourier transform Inverse Fourier transform Structure Factor Electron Density In order to reconstruct the molecular image (electron density) from its diffraction pattern both the intensity and phase, which can assume any value from 0 to 2, of each of the thousands of measured reflections must be known.

14 Feb 2008 Biology 555 Crystallographic Phasing I p. 11 of 42 Importance of Phases Hauptman amplitudes with Hauptman phases Karle amplitudes with Karle phases Hauptman amplitudes with Karle phases Karle amplitudes with Hauptman phases 14 Feb 2008 Phases dominate the image! Phase estimatesBiology need555to be accurate

Crystallographic Phasing I p. 12 of 42 Understanding the Phase Problem The phase problem can be best understood from a simple mathematical construct. The structure factors (Fhkl) are treated in diffraction theory as complex quantities, i.e., they consist of a real part (Ahkl) and an imaginary part (Bhkl). If the phases, hkl, were available, the values of Ahkl and Bhkl could be calculated from very simple trigonometry: Ahkl = |Fhkl| cos (hkl) Bhkl = |Fhkl| sin (hkl) This leads to the relationship: (Ahkl)2 + (Bhkl)2 = |Fhkl|2 = Ihkl 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 13 of 42 Argand Diagram (Ahkl)2 + (Bhkl)2 = |Fhkl|2 = Ihkl The above relationships are often

illustrated using an Argand diagram (right). From the Argand diagram, it is obvious that Ahkl and Bhkl may be either positive or negative, depending on the value of the phase angle, hkl. imaginary Fhkl Bhkl hkl real Ahkl igure3. An Argand diagram of AnArganddiagramo tructureactor hklwithhae hk l hkandimaginary l

hkl hklThereal Ahkl Note: the units of Ahkl, Bhkl and Fhkl Bhklcomonentarealohown 1 hkl are in electrons. hkl Biology 555 14 Feb 2008 p. 14hkl of 42 Crystallographic Phasing I F =A + iB B = tan A

N The Structure Factor Atomic scattering factors Fhkl = f j e 2i(hx j +ky j +l z j ) j=1 Here fj is the atomic scattering factor f0 14 Feb 2008 sin/l The scattering factor for each atom type in the structure is evaluated at the correct sin/l. That value is the scattering ability for that atom.

Remember sin/l = 1/(2dhkl) We now have an atomic scattering factor with magnitude f0 and direction j Biology 555 Crystallographic Phasing I p. 15 of 42 The Structure Factor Sum of all individual atom contributions imaginary Resultant Fhkl Individual atom fjs Bhkl real Ahkl

j =2 ( hx j + ky j + l z j ) N Fhkl = f j e 14 Feb 2008 j =1 2 i ( hx j +ky j +l z j ) Biology 555 Crystallographic Phasing I N = f j e j =1 i j p. 16 of 42 Electron Density Remember the electron density (image of the molecule) is the Fourier transform of the structure factor Fhkl. Thus

r x,y,z 1 1 2i[hx +ky +lz ] i = Fhkl e = Fhkl e V hkl V hkl ei = cos + isin Fhkl = Ahkl + iBhkl Here V is the volume of the unit cell x,y,z 1 = Ahkl cos + Bhkl sin V hkl

hkl x,y,z 1 = Ahkl cos[2 (hx + ky + lz)] + Bhkl sin[2 (hx + ky + lz)] V hkl hkl 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 17 of 42 How to calculater(x,y,z) In practice, the electron density for one three-dimensional unit cell is calculated by starting at x, y, z = (0, 0, 0) and stepping incrementally along each axis, summing the terms as shown in the equation above for all hkl (as limited by

the resolution of the data) at each point in space. 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 18 of 42 Solving the Phase Problem Small molecules Direct Methods Patterson Methods Molecular Replacement Macromolecules Multiple Isomorphous Replacement (MIR)

Multi Wavelength Anomalous Dispersion (MAD) Single Isomorphous Replacement (SIR) Single Wavelength Anomalous Scattering (SAS) Molecular Replacement Direct Methods (special cases) 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 19 of 42 Solving the Phase Problem SMALL MOLECULES: The use of Direct Methods has essentially solved the phase problem for well diffracting small molecule crystals. MACROMOLECULES: Today, anomalous scattering techniques such as MAD or SAS are the most common techniques used for de novo structure determination of macromolecules. Both techniques require the presence of one or more anomalous scatterers in the crystal. 14 Feb 2008

Biology 555 Crystallographic Phasing I p. 20 of 42 Direct methods Karle, Hauptman, David Sayre, and others determined algebraic relationships among phase angles of groups of reflections. The simplest are triplet relationships: For three reflections h1=(h1,k1,l1), h2=(h2,k2,l2), h3=(h3,k3,l3), they showed that if h3= -h1- h2, then 1 + 2 + 3 0 Thus if 1 and 2 are known then we can estimate that 3 -1 - 2 14 Feb 2008 Biology 555 Crystallographic Phasing I David Sayre

p. 21 of 42 When do triplet relations hold? Note the approximately zero value in that relationship 1 + 2 + 3 0. The stronger the Bragg reflections are, the closer this condition is to being exact. For very strong Bragg reflections that sum will be very close to zero For weaker ones it may differ significantly from zero 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 22 of 42 Phase probabilities This notion of relationships among phases obliges us to think of phases probabilistically rather than deterministically. This is a key to the direct-methods approach and has a huge influence on how we think about phase determination.

Im introducing all of this mostly to get you accustomed to the notion of phase probability distributions! 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 23 of 42 Phase probabilities Any phase has a value between 0 and 2 (or 0 and 360, if were using degrees) If we know its close to 2*0.42, then: If its 2*(0.42 0.01), its a sharp phase probability distribution If its 2*(0.42 0.32), its a much broader phase probability distribution 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 24 of 42

Plots of phase probability P() Integral of probability must be 1, since every phase has to have some value. Sharp distribution Broad distribution 0 14 Feb 2008 Biology 555 Crystallographic Phasing I 2 p. 25 of 42 How can we use this? Obviously if we dont know 1+2, we cant use this to calculate 3, even if the intensities of all three are large. But we could guess what 1 and 2 are and use this to

compute 3. Then we guess 4 and use the triplet relationship to compute 5 and 6, where h5 = -h1 - h4 and h6 = -h1 - h4 assuming that reflections 5 and 6 are strong, too! 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 26 of 42 Can we make this work? We start with guessed phases for a 10-100 strong reflections and use the triplet relationships to determine the phases for another 1000 reflections Any particular calculated phase can be determined by several different triplet relationships, so if theyre self-consistent, the initial guessed 10-100 are correct; if they arent self-consistent, the guess was wrong! In the latter case, we try a different set of guesses for our 10-100 starting phases and keep going 14 Feb 2008

Biology 555 Crystallographic Phasing I p. 27 of 42 This actually works, provided: The data are correctly measured The data are strong enough that we can pick 1000 strong reflections to use in this process The data extend to high enough resolution that atomicity (separable atoms) is really found There are ways to do direct methods without assuming atomicity, but theyre more complicated 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 28 of 42 Is this relevant to macromolecules? Not directly: Atomicity rarely present

Systematic errors in data Indirectly yes, because it can be used in conjunction with other methods for locating heavy atoms in the SIR, MIR, and SAS methods It also helps introduce the notion of phase probability distributions (sneaky!) 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 29 of 42 SIR and SAS Methods 1. 2. 3. 4. Need a heavy atom (lots of electrons) or a anomalous

scatterer (large anomalous scattering signal) in the crystal. SIR - heavy atoms usually soaked in. SAS - anomalous scatterers usually engineered in as selenomethional labels. Can also be soaked. SIR collect a native and a derivative data set (2 sets total). SAS collect one highly redundant data set and keep anomalous pairs separate during processing. SAS - may want to choose a scatterer or wavelength that enhances the anomalous signal. Must find the heavy atoms or anomalous scatterers can use Patterson analysis or direct methods. Must resolve the bimodal ambiguity. use solvent flattening or similar technique 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 30 of 42 Whats the bimodal ambiguity? As well show next time, a single isomorphous derivative or anomalous

scatterer enables us to measure each phase apart from an ambiguity That is, for each phase we get two answers (e.g. 2*0.12 and 2*0.55), and we cant pick one out A second scatterer will resolve that 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 31 of 42 Phase probabilities with no error P() 0 14 Feb 2008 A single derivative with no error gives a phase probability like this: Biology 555 Crystallographic Phasing I

2 p. 32 of 42 2 derivatives, no error P() Wrong Wrong estimate estimate derived from derived from derivative 2 derivative 1 The two distributions overlap at the correct answer, not at the wrong answer Correct phase 0 14 Feb 2008

Biology 555 Crystallographic Phasing I 2 p. 33 of 42 Errors spread this out Each phase estimate is not really that sharp Lack of isomorphism (see below) makes each distribution spread out Joint probability distribution from 2 or more experiments is the product of the probability distributions of the individual experiments 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 34 of 42 Realistic probability distributions

P() 0 14 Feb 2008 Joint probability distribution = product of individual ones Biology 555 Crystallographic Phasing I 2 p. 35 of 42 Joint probability distribution 0.35 Phase probability 0.3 Joint robability

ditribution = P(phase)1)*P2) 0.25 P(phase)1)orirt derivative witheakat 3. An Argand diagram of 2)and 558 0.2 P(phase)2)or2)nd derivative witheakat 3. An Argand diagram of 1)5and81)5 P(phase)hae 0.15 0.1 normP(phase)1) normP(phase)2)

normP(phase)1)*P2)P(phase)2) 0.05 0 0 0.1 14 Feb 2008 0.2 0.3 0.4 0.5 0.6 Phase/2 Biology 0.7

0.8 555 Crystallographic Phasing I 0.9 1 p. 36 of 42 Heavy Atom Derivatives Heavy atom derivatives MUST be isomorphous Heavy atom derivatives are generally prepared by soaking crystals in dilute (2 - 20 mM) solutions of heavy atom salts (see Table II below for some examples). Crystal cracking is generally a good indication that that heavy atom is interacting with the crystal lattice, and suggests that a good derivative can be obtained by soaking the crystal in a more dilute solution. 14 Feb 2008

Biology 555 Crystallographic Phasing I p. 37 of 42 Is the derivative worth using? Once derivative data has been collected, the merging R factor (Rmerge) between the native and derivative data sets can be used to check for heavy atom incorporation and isomorphism. Rmerge values for isomorphous derivatives range from 0.05 to 0.15. Values below 0.05 indicate that there is little heavy atom incorporation. Values above 0.15 indicate a lack of isomorphism between the two crystals. 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 38 of 42 What is isomorphism? Isomorphism for derivatives means that the structure of the derivatized macromolecule

is identical to the structure of the underivatized molecule except at the site where the derivative compound has been introduced. 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 39 of 42 What is lack of isomorphism? A derivative may be nonisomorphous if: It alters the unit cell lengths or angles significantly (>0.2%?) It rotates or translates the entire macromolecule within the unit cell It alters significantly the conformation of a large segment (> 8 amino acids or 4 nucleotides?) of the mcromolecule 14 Feb 2008 Biology 555 Crystallographic Phasing I

p. 40 of 42 Derivative compounds Table II. Protein Residues and Their Affinities for Heavy Metals Residue: Affinity for: Conditions: Histidine K2PtCl4, NaAuCl4, EtHgPO4H2 pH>6 Tryptophan Hg(OAc)2, EtHgPO4H2 Glutamic, Aspartic Acids UO2(NO3)2, rare earth cations

pH>5 Cysteine Hg,Ir,Pt,Pd,Au cations ph>7 Methionine PtCl42- anion 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 41 of 42 Finding the Heavy Atoms or Anomalous Scatterers The Patterson function - a F2 Fourier transform with = 0

- vector map (u,v,w instead of x,y,z) - maps all inter-atomic vectors - get N2 vectors!! (where N= number of atoms) Puvw 1 = | Fhkl |2 cos2 (hu + kv + lv) V hkl From Glusker, Lewis and Rossi 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 42 of 42