# CSSCI2950-C Lecture 2 - Brown University CSCI2950-C Lecture 11 Cancer Genomics: Duplications October 23, 2008 http://cs.brown.edu/courses/csci2950-c/ Outline Cancer Genomes 1. Comparative Genomic Hybridization Cancer Progression Models

DNA Microarrays Measuring Mutations in Cancer Comparative Genomic Hybridization (CGH) CGH Analysis (1) Log2(R/G) Divide genome into segments of equal copy number 0.5

0 Genomic position -0.5 Deletion Amplification 0.5

0 -0.5 Genomic position A+ Model C+ for CGH G+ HMM data

Fridlyand et al. (2004) S1 S2 S3 S4 A model for CGH data K states

copy numbers S1 Homozygous Deletion (copy =0) S2 Heterozygous Deletion

(copy =1) 1, 1 Copy number Emissions: Gaussians 2, 2 S3

Normal (copy =2) 3, 3 Genome coordinate S4 Duplication (copy >2)

4, 4 CGH Segmentation: Model Selection How many states copy number states K? Larger K: 1. Better fit to observed data 2. More parameters to estimate Avoid overfitting by model selection.

Let = (A, B, ) be parameters for HMM. Try different k = 1, , Kmax Compute L( | O ) by dynamic programming (forward-backward algorithm) Calculate: (k) = -log (L ( | O ) ) + qK D(N)/N N = number of probes (data points) qk = number of parameters D(N) = 2 (AIC) or D(N) = log(N) (BIC) Choose K = argmin (k)

Problems with HMM model Length of sequence emitted from fixed state is geometrically distributed. P(j j j j j j j j) = P(t+1 = j | t = j) n For CGH this means, 1) Length of aberrant intervals 2) Separation between two intervals of same copy number Will be geometrically distributed

CGH Segmentation: Transitions Let IX = length of sequence in state X. P[lX = 1] = 1-p P[lX = 2] = p(1-p) p P[lX= k] = pk(1-p) E[lX] = 1/(1-p) Geometric distribution, with

mean 1/(1-p) 1-p X Y 1-q q

CGH Analysis (2) Chromosome 3 of 26 lung tumor samples on middensity cDNA array. Common deletion located in 3p21 and common amplification in 3q. Samples

Identify aberrations common to multiple samples 2001T-1 2002T-1 2009T-1 2010T-1 2011T-1 2014T-1 2017T-1 2020T-1 2022T-1 2062T-1

2068T-1 2069T-1 2073T-1 2075T-1 2076T-1 2079T-1 2080T-1 2082T-1 2083T-1 2086T-1 2090T-1

2091T-1 2092T-1 2093T-1 2097T-1 2099T-1 0 20 40

60 80 100 120 140 160

180 Ben-Dor et al. Results Intervals Stacks and Footprints Results (Diskin et al.) Frequence

Results (Diskin, et al.) Stacks Cancer Genomes Leukemia Breast Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976)

Comparative Genomics of Cancer Human genome Mutation, selection Tumor genome Tumor genome 2 Tumor genome 3 Tumor genome 4

1) Identify recurrent aberrations Mitelman Database, >40,000 aberrations 2) Reconstruct temporal sequence of aberrations Linear model: Colorectal cancer (Vogelstein, 1988): -5q 12p* -17p -18q Tree model: (Desper et al.1999) 3) Find age of tumor, time of clonal expansion

Observing Cancer Progression Obtaining longitudinal (time-course) data difficult. t1 t2 t3 t4

Latitudinal data (multiple patients) readily available. Mutation, selection Human genome Tumor genome Tumor genome 2 Tumor genome 3 Tumor genome 4 Multiple Mutations

4 step model for colorectal cancer, Vogelstein, et al. (1988) New Eng. J.Med -5q 12p* -17p -18q Inferred from latitudinal data in 172 tumor samples. Oncogenetic Tree models (Desper et al. JCB 1999, 2001) Given: measurements of chromosome gain/loss events in multiple tumor samples

(CGH) Compute: rooted tree that best explains temporal sequence of events. {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q} Oncogenetic Tree models (Desper et al. JCB 1999, 2000) Given: measurements of chromosome gain/loss events in multiple tumor samples {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q}

L = set of chromosome alterations observed in all samples Tumor samples give probability distribution on 2L Oncogenetic Tree T = (V, E, r, p, L) rooted tree V = vertices E = edges

L = set of events (leaves) r root p: E (0,1] probability distribution T gives probability distribution on 2L e1 e0

e2 e3 e4 Results CGH of 117 cases of kidney cancer Extensions Oncogenetic trees based on branching

(Desper et al., JCB 1999) Extensions Extensions Oncogenetic trees based on branching (Desper et al., JCB 1999) Maximum Likelihood Estimation (von Heydebreck et al, 2004) Mutagenic trees: mixtures of trees

(Beerenwinkel, et al. JCB 2005) Heterogeneity within a tumor Final tumor is clonal expansion of single cell lineage. Can we date the time of clonal expansion? Tsao, Tavare, et al. Genetic reconstruction of individual colorectal

tumor histories, PNAS 2000. Estimating time of clonal expansion Microsatellite loci (MS), CA dinucleotides. In tumors with loss of mismatch repair (e.g. colorectal), MS change size. Estimating time of clonal expansion For each MS locus, measure mean mi and variance si of size. S2allele = average of s12, , sL2

S2loci = variance of m1, , mL Time to clonal expansion? Simulation Estimates of Tumor Age Y2 Y1 Y1 = time to clonal expansion Tumor age = Y1 + Y2 Branching process simulation. Each cell in population gives

birth to 0, 1 or 2 daughter cells with +- 1 change in MS size (coalescent: forward, backward, forward simulation) Posterior estimate of Y1, Y2 by running simulations, accepting runs with simulated values of S2allele, S2loci close to observed. Results 15 patients, 25 MS loci Estimate time since clonal expansion from observed S2allele, S2loci .

Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976) Sources Fridyland, et al. Hidden Markov models approach to the analysis of array CGH data. Journal of Multivariate Analysis, 2004 Desper, et al. Distance-Based Reconstruction of Tree Models for Oncogenesis. Journal of Computational Biology, 2000. Diskin, et al. STAC: A method for testing the

significance of DNA copy number aberrations across multiple array-CGH.Genome Research, 2006 Tsao, Tavare, et al. Genetic reconstruction of individual colorectal tumor histories, PNAS 2000.