# Consensus Trees, Ancestral reconstruction, Long Branch Attraction

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayes Theorem P(B | A) = P(A | B) x P(B) P(A) P(MK | D) = P(D | MK) x P(MK) P(D) = P(D | MK) x P(MK)

P(D | M ) x P(M ) i Reverend Thomas Bayes (1702-1761) P(D|MK): Probability of data given model k = likelihood P(MK): Prior probability of model P(D): Essentially a normalizing constant so posterior will sum to one P(MK|D): Posterior probability of model k i i CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesians vs. Frequentists

Meaning of probability: Frequentist: long-run frequency of event in repeatable experiment Bayesian: degree of belief, way of quantifying uncertainty Finding probabilistic models from empirical data: Frequentist: parameters are fixed constants whose true values we are trying to find good (point) estimates for. Bayesian: uncertainty concerning parameter values expressed by means

of probability distribution over possible parameter values Bayes theorem: example CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS We have three cookie jars: #1 #2 #3 P(chocolate) = 0.25 P(chocolate) = 0.50 P(chocolate) = 0.75 Bill picks random bowl and random cookie.

He gets a plain (non-chocolate) cookie. What is the probability that he picked bowl #1 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayes theorem: example II Investigated hypotheses (models): M1: Bowl #1 was chosen M2: Bowl #2 was chosen M3: Bowl #3 was chosen Prior probabilities: P(M1) = 0.33 P(M2) = 0.33 P(M3) = 0.33 Likelihoods: P(plain|M1) = 0.75 P(chocolate|M1) = 0.25 P(plain|M2) = 0.50 P(chocolate|M2) = 0.50 P(plain|M3) = 0.25

P(chocolate|M3) = 0.75 Bayes theorem: example III CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS P(M1 | plain) = P(plain | M1) x P(M1) P(D | M ) x P(M ) i i i P(M1 | plain) = P(plain | M 1) x P(M1) P(plain | M1) x P(M1) + P(plain | M1) x P(M1) + P(plain | M1) x P(M1) P(M1 | plain) =

0.75 x 0.33 0.75 x 0.33 + 0.5 x 0.33 + 0.25 x 0.33 P(M1 | plain) = 0.5 P(M2 | plain) = 0.33 Prior probabilities: P(M1) = 0.33 Likelihoods: P(plain|M1) = 0.75 P(chocolate|M1) = 0.25 P(M2) = 0.33 P(plain|M2) = 0.50 P(chocolate|M2) = 0.50 P(M3) = 0.33

P(plain|M3) = 0.25 P(chocolate|M3) = 0.75 P(M3 | plain) = 0.17 MCMC: Markov chain Monte Carlo CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Problem: for complicated models parameter space is enormous. Not easy/possible to find posterior distribution analytically Solution: MCMC = Markov chain Monte Carlo Start in random position on probability landscape. Attempt step of random length in random direction. (a) If move ends higher up: accept move (b) If move ends below: accept move with probability P (accept) = PLOW/PHIGH Note parameter values for accepted moves in file. After many, many repetitions points will be sampled in proportion to the height of the probability landscape

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS MCMCMC: Metropolis-coupled Markov Chain Monte Carlo Problem: If there are multiple peaks in the probability landscape, then MCMC may get stuck on one of them Solution: Metropolis-coupled Markov Chain Monte Carlo = MCMCMC = MC 3 MC3 essential features: Run several Markov chains simultaneously One chain cold: this chain performs MCMC sampling Rest of chains are heated: move faster across valleys Each turn the cold and warm chains may swap position (swap probability is proportional to ratio between heights) More peaks will be visited More chains means better chance of visiting all important peaks, but each additional chain increases run-time

MCMCMC for inference of phylogeny CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Result of run: (a) Substitution parameters (b) Tree topologies CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Posterior probability distributions of substitution parameters CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS

Posterior Probability Distribution over Trees MAP (maximum a posteriori) estimate of phylogeny: tree topology occurring most often in MCMCMC output Clade support: posterior probability of group = frequency of clade in sampled trees. 95% credible set of trees: order trees from highest to lowest posterior probability, then add trees with highest probability until the cumulative posterior probability is 0.95

## Recently Viewed Presentations

• China De Puertas Abiertas A Colombia Embajada de China en Colombia Comercio Anual Bilateral Colombia-China 1998 - 2005 18 Empresas Chinas calificadas en TOP 500 Mundiales China en Ranking Mundial ￭ No. 2 en la producción de energía eléctirca ￭...
• Cdns were led by Byng (Later GG in the Byng-King Crisis) Vimy had been previously attacked by the French and the British - they could not take it. Cdns took it in a matter of days - success mostly within...
• Key Concepts Four Major "Revolutions" in Human Culture U.S. Environmental History Tribal and Frontier Era Early Conservation Era The Environmental Era Aldo Leopold's Land Ethic Cultural Changes and the Environment: Hunter-Gatherer Culture Hunter-gatherers Limited and local environmental impact Generally work...
• A Mood Walks Summit at the end of the project for knowledge exchange and celebration! Mood Walks Newsletter that all participating groups and individuals will be invited to contribute and share their stories! ... At Laurier in Waterloo, we had...
• Le Tiers monde, de l'émancipation à la diversification 1945-1975 L'Afrique australe: après 1974. Après la fin de la dictature au Portugal, les colonies de l'Angola et du Mozambique deviennent indépendantes (1974-75), mais sont aussitôt ravagées par des guerres civiles.
• Kinematics of Projectile Motion. What is a projectile? A body in free fall that is subject only to the forces of gravity and air resistance. Motion of bodies flung into the air
• Facts linked to child neglectChildren from low-income families are 4 times more likely to experience mental health problems than children from higher-income families.(Morrison Gutman et al., 2015) 75% of adults with a diagnosable mental health problem experience the first symptoms...
• The Hopi (Southwest Desert) Fourth Grade Ms. Alvarado The Hopi (Southwest Desert) Region: Southwest Desert ("I hopi I don't sit on a cactus!") Name:It's pronounced "hope-ee," and it means "peaceful person" or "civilized person" in the Hopi language Shelter: pueblos...