Intelligent Systems and Molecular Biology Richard H. Lathrop
Intelligent Systems and Molecular Biology Richard H. Lathrop Dept. of Computer Science Univ. of California, Irvine [email protected] Donald Bren Hall 4224 949-824-4021 Goal of talk: The power of information science to influence molecular science and technology Computers are to Biology as Mathematics is to Physics. --- Harold Morowitz (spiritual father of BioMatrix, and Intelligent Systems for Molecular Biology Conference) Intelligent Systems and Molecular Biology Artificial Intelligence for Biology and Medicine Biology is data-rich and knowledge-hungry AI is well suited to biomedical problems Examples (omitted for brevity)
Machine learning -- drug discovery Rule-based systems drug-resistant HIV Heuristic search -- protein structure prediction Constraints design of large synthetic genes Current Project Machine learning and p53 cancer rescue mutants Goal of talk: The power of information science to influence molecular science and technology Biology has become Data Rich Massively Parallel Data Generation Genome-scale sequencing High-throughput drug screening Micro-array gene chips Combinatorial chemical synthesis Shotgun mutagenesis Directed protein evolution Two-hybrid protocols for protein interaction A million biomedical articles per year Data Rich GenBank Genomic Sequence Data
Data Rich PDB Protein 3D Structure Data Data Rich PubMed Biomedical Literature Data Rich 10-100K data points per gene chip Characteristics of Biomedical Data Noise!! => need robust analysis methods Little or no theory. => need statistics, probability Multiple scales, tightly linked. => need cross-scale data integration Specialized (boutique) databases => need heterogeneous data integration Intelligent Systems are well suited to biology and medicine Robust in the face of inherent complexity Extract trends and regularities from data
Provide models for complex processes Cope with uncertainty and ambiguity Content-based retrieval from literature Ontologies for heterogeneous databases Machine learning and data mining Intelligent systems handle complexity with grace Intelligent Systems and Molecular Biology Artificial Intelligence for Biology and Medicine Biology is data-rich and knowledge-hungry AI is well suited to biomedical problems Examples Machine learning -- drug discovery Rule-based systems drug-resistant HIV Heuristic search -- protein structure prediction Constraints design of large synthetic genes Current Project Machine learning and p53 cancer rescue mutants
Goal of talk: The power of information science to influence molecular science and technology p53 and Human Cancers p53 is a central tumor suppressor protein The guardian of the genome Controls many tumor suppression functions Monitors cellular distress The most-mutated gene in human cancers All cancers must disable the p53 apoptosis pathway. p53 core domain bound to DNA Image generated with UCSF Chimera Cho, Y., Gorina, S., Jeffrey, P.D., Pavletich, N.P. Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science v265 pp.346-355, 1994
Consequences of p53 mutations ~250,000 US deaths/year Loss of DNA contact Disruption of local structure Denaturation of entire core domain Over 1/3 of all human cancers express full-length p53 with only one a.a. change Cho et al., Science 265, 346-355 (1994) Mutations Rescue Cancerous p53 Cancer Wild Type Cancer Mutation Active p53 Inactive p53 Cancer+Rescue Mutations
Active p53 Ultimate Goal Cancer Cancer Mutation Inactive p53 + = AntiCancer Drug Active p53 Suppressor Mutations Several second-site mutations restore functionality to some p53 cancer mutants in vivo. 248 249 273 175 245
N S 1-42 Transactivation 282 C C 102-292 324-355 Core domain for DNA binding Tetramerization Class Labels: Active/+ or Inactive/p53 Transcription Assay Confirm: Human 1299 Cell-based Luciferase Initial: Yeast Growth Selection, Sequencing ACTIVE (+)
First measurement Firefly luciferase p53 dependent Will grow. Human p53 consensus URA Will not grow. INACTIVE (-) Baroni, T.E., et al., 2004 (S) = Strong (W) = Weak (N) = Negative Danziger, S.D., et al., 2009 Second measurement Renilla luciferase p53 independent Baronio, R., et al., 2010
Active Machine Learning for Biological Discovery Find New Cancer Rescue Mutants Knowledge Theory Experiment How Big is The Problem? Known Mutants: 31,200 Known Actives: 150 Assuming up to 5 mutations in 200 residues How Many Mutants are There?: ~10^11 Known Mutants ~312 stars Known Actives ~1.5 stars Spiral Galaxy M101
http://hubblesite.org/ ~10^9 stars. Computational Active Learning Pick the Best (= Most Informative) Unknown Examples to Label Unknown Known Example 1 Example 2 Example 3 Example N Example N+1 Train the Classifier Example N+2 Classifier
Example N+3 Choose Examples to Label Example N+4 Example M Training Set Add New Examples To Training Set Visualization of Selected Regions Positive Region: Predicted Active 96-105 (Green) Negative Region: Predicted Inactive 223-232 (Red) Expert Region: Predicted Active 114-123 (Blue)
7 (not significant) Total # Rescue 11 2 (p < 0.022) 13 (not significant) p-Values are two-tailed, comparing Positive to Negative and Expert regions. Danziger, et al. (2009) No significant differences between the MIP Positive and Expert regions. Both were statistically significantly better than the MIP Negative region. The Positive region rescued for the first time the cancer mutant P152L. No previous single-a.a. rescue mutants in any region. A Long-held Goal of Anti-cancer Therapy Restore p53 function by a drug compound inactive cancer mutant Restore p53 tumor
suppressor pathways in tumor cells p53 active reactivation compound reactivated A Serendipitous Discovery (With a Great Deal of Support) (a) Cys124 (yellow) is occluded in closed PDB structure. (b) Cys124 structural breathing in open MD geometry. (Wassman, et al., 2013) Other Computational Support c d (c) Cys 124 (yellow) is surrounded by p53 reactivation (rescue) mutations (green) (Wassman, et al., 2013) (d) Druggable pockets in p53 from FTMAP (orange) (Brenke, et al., 2009)
Stictic acid docked into open L1/ S3 pocket of p53 variants (a) wt p53; (b) R175H; (c) R273H; (d) G245S. (Wassman, et al., 2013) 14 Actives in first 91 assayed 1.2 11 Saos-2 (p53null) 0.8 0.8 0.6 0.6 soas2 R175H R175H G245S 0.4 0.4 G245S
00 Vehicle PRIMA-1 Stictic acid 35ZWF 25KKL 22LSV 32CTM 26RQZ 27WT9 33AG6 33BAZ 28NZ6 27TGR 27VFS 32LDE 0.2 0.2 Soas2, Soas2-p53-R175H or Soas2-G245S cells plated at 10000 per well with the different compounds. Samples are collected after 72 hours and tested for cell viability (Cell-titer Glo, promega). Selective inhibition of R175H (red) or G245S (blue) cells versus p53null cells (black) identifies a compound that potentially reactivates p53.
Photomicrograph of cell viability (of 91 compounds assayed) DMSO 26RQZ 27WT9 33AG6 33BAZ 35ZWF p53-null R175H G245S Compounds induced cell death in cells expressing p53 cancer mutants but not p53null cells. Cells were cultured with vehicle (DMSO) or the compounds indicated (concentrations as above) for 24 h and micrographs were taken. The long road to a future anti-cancer drug N N I I N N
C SIII II I I N CV C SIII I I I IV SIII
II C II I IV V C C IV C C V SIII IV II
I N IV I IV I I IV SIII II CII I N C
V IV CV C IV II I N C IV SIII II I C
V SIII I N C V SIII II N C SIII C II N
C V SIII II N C V C SIII I N C V SIII I
N C V CIV II N C V SIII II N C IV
SIII C II N C V IV II N IV SIII II C V C
V C III I N IV S II IV C SIII II Peter Kaiser Rommie Amaro Dick Chamberlin
Melanie Cocco Hudel Luecke Wes Hatfield Chris Wassman Roberta Baronio Ozlem Demir Faezeh Salehi Edwin Vargas Da-Wei Lin Scott Rychnovsky Michael Holzwarth Geoff Tucker Feng Qiao IV SIII C II C V SIII CV IV
C C V C IV V C drug Intelligent Systems and Molecular Biology Artificial Intelligence for Biology and Medicine Biology is data-rich and knowledge-hungry AI is well suited to biomedical problems Examples
Machine learning -- drug discovery Rule-based systems drug-resistant HIV Heuristic search -- protein structure prediction Constraints design of large synthetic genes DNA nanotechnology and space-filling DNA tetrahedra Current Project Machine learning and p53 cancer rescue mutants Goal of talk: The power of information science to influence molecular science and technology p53 Cancer Rescue Acknowledgments Rainer Brachmann (discovered p53 cancer rescue mutants) Peter Kaiser (co-PI for biology) Rommie Amaro (UCSD, molecular dynamics, virtual screening, docking) Scott Rychnovsky (current synthetic chemistry work) Wes Hatfield (Director, Computational Biology Research Lab) Hartmut (Hudel) Luecke (DSF and other structural biology work) Feng Qiao (protein structural biology work) Chris Wassman (then post-doc, now at Google; L1/S3 pocket) Roberta Baronio (then esearch scientist, now at Oxford; biology work) Ozlem Demir (UCSD, molecular dynamics, virtual screening & docking) Faezeh Salehi (then graduate student, now data science researcher) Other Colleagues: Linda Hall, Melanie Cocco, Pierre Baldi, Richard Chamberlin, Jonathan Chen, Ray Luo, Edwin Vargas, Geoff Tucker Funding: UCI Chao Cancer Center, UCI Medical Scientist Training
Program, UCI Office of Research and Graduate Studies, UCI Institute for Genomics and Bioinformatics, Harvey Fellowship, US National Science Foundation, US National Institutes of Health (National Cancer Institute)
3. Controls logging sessions and enables/disables providers. 5. GUI trace analysis via graphs and summary tables. 6. CLI trace analysis via actions. Dataflow. XML file. Control/Status. XPerfView. XPerf. System and Symbol Information. MergedETL file. Control/Status. Post Processing. Action. 4. Metadata...
CCA-treated wood 1Tomoyuki Shibata, 1Helena M. Solo-Gabriele, 2Young Cai, and 3Timothy Townsend ... Table. Monitoring Data at the Untreated Deck. Table. Montoring Results at the CCA-treated Deck. t-Test: Two-Sample Assuming Equal Variances. Variance.
Chapter 8 Goals Explain why a sample is the only feasible way to learn about a population ... -tailed distributions converge toward normal when n > 30 Look at picture on page 265 * Central Limit Theorem We can reason...
Lettering Basics ERHS Drafting Tech Mrs. Oberlin Importance Why is lettering important? Need to communicate Makes or breaks a drawing Types Old English Roman Style Gothic This is the type we use, we draw in ALL CAPITALS Old English- Uses...