# CSSCI2950-C Lecture 2 - Brown University

CSCI2950-C Lecture 11 Cancer Genomics: Duplications October 23, 2008 http://cs.brown.edu/courses/csci2950-c/ Outline Cancer Genomes 1. Comparative Genomic Hybridization Cancer Progression Models

DNA Microarrays Measuring Mutations in Cancer Comparative Genomic Hybridization (CGH) CGH Analysis (1) Log2(R/G) Divide genome into segments of equal copy number 0.5

0 Genomic position -0.5 Deletion Amplification 0.5

0 -0.5 Genomic position A+ Model C+ for CGH G+ HMM data

Fridlyand et al. (2004) S1 S2 S3 S4 A model for CGH data K states

copy numbers S1 Homozygous Deletion (copy =0) S2 Heterozygous Deletion

(copy =1) 1, 1 Copy number Emissions: Gaussians 2, 2 S3

Normal (copy =2) 3, 3 Genome coordinate S4 Duplication (copy >2)

4, 4 CGH Segmentation: Model Selection How many states copy number states K? Larger K: 1. Better fit to observed data 2. More parameters to estimate Avoid overfitting by model selection.

Let = (A, B, ) be parameters for HMM. Try different k = 1, , Kmax Compute L( | O ) by dynamic programming (forward-backward algorithm) Calculate: (k) = -log (L ( | O ) ) + qK D(N)/N N = number of probes (data points) qk = number of parameters D(N) = 2 (AIC) or D(N) = log(N) (BIC) Choose K = argmin (k)

Problems with HMM model Length of sequence emitted from fixed state is geometrically distributed. P(j j j j j j j j) = P(t+1 = j | t = j) n For CGH this means, 1) Length of aberrant intervals 2) Separation between two intervals of same copy number Will be geometrically distributed

CGH Segmentation: Transitions Let IX = length of sequence in state X. P[lX = 1] = 1-p P[lX = 2] = p(1-p) p P[lX= k] = pk(1-p) E[lX] = 1/(1-p) Geometric distribution, with

mean 1/(1-p) 1-p X Y 1-q q

CGH Analysis (2) Chromosome 3 of 26 lung tumor samples on middensity cDNA array. Common deletion located in 3p21 and common amplification in 3q. Samples

Identify aberrations common to multiple samples 2001T-1 2002T-1 2009T-1 2010T-1 2011T-1 2014T-1 2017T-1 2020T-1 2022T-1 2062T-1

2068T-1 2069T-1 2073T-1 2075T-1 2076T-1 2079T-1 2080T-1 2082T-1 2083T-1 2086T-1 2090T-1

2091T-1 2092T-1 2093T-1 2097T-1 2099T-1 0 20 40

60 80 100 120 140 160

180 Ben-Dor et al. Results Intervals Stacks and Footprints Results (Diskin et al.) Frequence

Results (Diskin, et al.) Stacks Cancer Genomes Leukemia Breast Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976)

Comparative Genomics of Cancer Human genome Mutation, selection Tumor genome Tumor genome 2 Tumor genome 3 Tumor genome 4

1) Identify recurrent aberrations Mitelman Database, >40,000 aberrations 2) Reconstruct temporal sequence of aberrations Linear model: Colorectal cancer (Vogelstein, 1988): -5q 12p* -17p -18q Tree model: (Desper et al.1999) 3) Find age of tumor, time of clonal expansion

Observing Cancer Progression Obtaining longitudinal (time-course) data difficult. t1 t2 t3 t4

Latitudinal data (multiple patients) readily available. Mutation, selection Human genome Tumor genome Tumor genome 2 Tumor genome 3 Tumor genome 4 Multiple Mutations

4 step model for colorectal cancer, Vogelstein, et al. (1988) New Eng. J.Med -5q 12p* -17p -18q Inferred from latitudinal data in 172 tumor samples. Oncogenetic Tree models (Desper et al. JCB 1999, 2001) Given: measurements of chromosome gain/loss events in multiple tumor samples

(CGH) Compute: rooted tree that best explains temporal sequence of events. {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q} Oncogenetic Tree models (Desper et al. JCB 1999, 2000) Given: measurements of chromosome gain/loss events in multiple tumor samples {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q}

L = set of chromosome alterations observed in all samples Tumor samples give probability distribution on 2L Oncogenetic Tree T = (V, E, r, p, L) rooted tree V = vertices E = edges

L = set of events (leaves) r root p: E (0,1] probability distribution T gives probability distribution on 2L e1 e0

e2 e3 e4 Results CGH of 117 cases of kidney cancer Extensions Oncogenetic trees based on branching

(Desper et al., JCB 1999) Extensions Extensions Oncogenetic trees based on branching (Desper et al., JCB 1999) Maximum Likelihood Estimation (von Heydebreck et al, 2004) Mutagenic trees: mixtures of trees

(Beerenwinkel, et al. JCB 2005) Heterogeneity within a tumor Final tumor is clonal expansion of single cell lineage. Can we date the time of clonal expansion? Tsao, Tavare, et al. Genetic reconstruction of individual colorectal

tumor histories, PNAS 2000. Estimating time of clonal expansion Microsatellite loci (MS), CA dinucleotides. In tumors with loss of mismatch repair (e.g. colorectal), MS change size. Estimating time of clonal expansion For each MS locus, measure mean mi and variance si of size. S2allele = average of s12, , sL2

S2loci = variance of m1, , mL Time to clonal expansion? Simulation Estimates of Tumor Age Y2 Y1 Y1 = time to clonal expansion Tumor age = Y1 + Y2 Branching process simulation. Each cell in population gives

birth to 0, 1 or 2 daughter cells with +- 1 change in MS size (coalescent: forward, backward, forward simulation) Posterior estimate of Y1, Y2 by running simulations, accepting runs with simulated values of S2allele, S2loci close to observed. Results 15 patients, 25 MS loci Estimate time since clonal expansion from observed S2allele, S2loci .

Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976) Sources Fridyland, et al. Hidden Markov models approach to the analysis of array CGH data. Journal of Multivariate Analysis, 2004 Desper, et al. Distance-Based Reconstruction of Tree Models for Oncogenesis. Journal of Computational Biology, 2000. Diskin, et al. STAC: A method for testing the

significance of DNA copy number aberrations across multiple array-CGH.Genome Research, 2006 Tsao, Tavare, et al. Genetic reconstruction of individual colorectal tumor histories, PNAS 2000.

## Recently Viewed Presentations

• · Serving=overarm or underarm and the serve may not be volleyed back by the returner. One serve only · 'If you throw it you must hit it' On throwing the ball up to serve, the server must hit it -...
• Has the employer processes in place for reporting and responding to allegations of bullying? Have employees been trained and provided information in these processes? Is there adequate supervision in the workplace? Determine whether there is sufficient evidence to support a...
• Mobile tools can enhance productivity and collaboration by making it easier for employees to stay connected and giving them access to information and work tasks during forced gaps in the workday or while traveling. Mobile apps can assist in a...
• with your thesis. If everyone agrees on first sight, your thesis is too obvious, and not worth writing about. Analytical, not evaluative. A critical analysis of a literary work is not the place to praise or blame works of literature:...
• GENERAL MORAL IMPERATIVES. 1.1 Contribute to society and human well-being. 1.2 Avoid harm to others. 1.3 Be honest and trustworthy. 1.4 Be fair and take action not to discriminate. 1.5 Honor property rights including copyrights and patent. 1.6 Give proper...
• Birth Control Pill? Baby Boomers (1946-1960) Characteristics: Optimism. Driven (80 hr work week) ... The Cheers and the Jeers by Shane Murphy. Way to Go Coach by Ronald Smith and Frank Smoll. THNQ. ... "What's wrong with these kids today?"...
• No discontinuities in energy or forces. No pre-defined reaction sites or types. Only 1 atom type per element. General ReaxFF rules. User should not have to pre-define reactive sites or reactionpathways; potential functions should be able to automatically handlecoordination changes...
• HRE would not interfere in the principalities affairs/authority. What were the long-range effects of the Thirty Years' War? Sec. III. What changes in military equipment, tactics, and organization took place in the 16th and 17th centuries? ... The Wars of...