Mining Patterns from Protein Structures

Mining Patterns from Protein Structures

EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006 The UNIVERSITY of Introduction Protein A sequence from 20 amino acids Lys Lys Gly Gly Leu Val Ala

His Adopts a stable 3D structure that can be measured experimentally Oxygen Nitrogen Carbon Cartoon Space filling Surface Ribbon Sulfur 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide2

Exponential Growth of Protein Structures 35,000 The total number of known protein structures # of structures Growth of Known Structures in Protein Data Bank 1988 9/25/2006 Protein Structures Newly characterized proteins in that year 2005 Year Mining Biological Data KU EECS 800, Luke Huan, Fall06

slide3 Protein Structure Space http://www.nigms.nih.gov/psi/ 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide4 Structure Space is Described Hierarchically From SCOP: Structure classification of proteins (http://scop.berkeley.edu/) Class Fold Superfamily Family Protein domains 9/25/2006

Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide5 SCOP Statistics Class Number of folds Number of superfamilies Number of families All alpha proteins 218 376 608

All beta proteins 144 290 560 Alpha and beta proteins (a/b) 136 222 629 Alpha and beta proteins (a+b) 279 409 717

Multi-domain proteins 46 46 61 Membrane and cell surface proteins 47 88 99 Small proteins 75 108

171 Total 945 1539 2845 25973 PDB Entries (July 2005). 70859 Domains. 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide6 Amino Acids: Building Blocks of Proteins 9/25/2006 Protein Structures

Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide7 20 Naturally-occurring Amino Acids 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide8 Protein Secondary Structure Helix 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide9

Protein Secondary Structure strands 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide10 Top Level of Structure Space: Structure Classes There are four major classes: proteins proteins + (anti-parallel strands) / (parallel strands). 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06

slide11 Protein Folds Protein fold is the way how secondary structures are organized in a 3D structure. 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide12 Popular Folds The eight most frequent SCOP folds 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide13

Superfamily and Family Proteins within the same superfamily and family will tend to have similar sequence and similar function 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide14 The Nature of Protein Structure Data The ball-stick model is an element-based structure representation A structure is decomposed into a set of amino acids Protein geometry, topology, and attributes are defined with respect to the amino acid set Geometry is the coordinates of amino acids Topology is the phyisco-chemical interactions of the residues Attributes are the physico-chemical properties of the residues . 9/25/2006 Protein Structures

Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide15 Grant Challenges: Proteomics Part of the biological system in a cell at the molecular level F A S -L IG F 1 IL - 3 IG F 1 R FA S m ito g e n IL -3 R F A D D /M O R T IR S 1

F L IC E P 53 P 21 C y c lin D 1 R A S pR b P 16 C dk4 IC E P I 3 -K P 107 P 27 B in -1 E 2F

C P P 32 A K T /P K B C -M y c a p o p to s is B c l-X L B A D C -M y c M ax M ax P 27 C y c lin E p Source: http://www.ircs.upenn.edu/modeling2001/, 9/25/2006 Protein Structures

C dk2 M ad C y c lin E C dc25A ? C dk2 M ad M ax p C y c lin E C y c lin E C dk2 p

c e ll p r o life r a tio n C dk2 Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide16 References Bioinformatics: Genes, Proteins, and Computers, Christine Orengo, David Jones, Janet Thornton edit, Bios Scientific Publishers, 2003. (ISBN: 1-85996-0545) 9/25/2006 Protein Structures Mining Biological Data KU EECS 800, Luke Huan, Fall06 slide17

Recently Viewed Presentations

  • Asper School of Business 9.613 Using Information Technology

    Asper School of Business 9.613 Using Information Technology

    Managing Information - Life Cycle Metaphor Collect, Create Discard (after x cycles) Filter Organize Store Transfer, Share Retrieve Use Update Knowledge management uses similar cycle Technology for Information Management Information filtering (reduce size - increase relevance) Databases (organize, store, retrieve)...
  • Observation Observation Bias Bias (Information (Information Bias) Bias)

    Observation Observation Bias Bias (Information (Information Bias) Bias)

    Example: Mothers of children with birth defects will remember the drugs they took during pregnancy better than mothers of normal children (maternal recall bias). Interviewer or recorder bias. Example: Interview has subconscious belief about the hypothesis. More accurate information in...
  • Decolonization in the

    Decolonization in the

    Decolonization in the Postwar Era: The New Nationalism "The day of small nations have passed away; the day of empires has come."-Joseph Chamberlain
  • The Constitution - sites.tenafly.k12.nj.us

    The Constitution - sites.tenafly.k12.nj.us

    Ideas Behind the Constitution. Ancient Roman Republic. Independence & public service as part of devotion. British Freedoms. Rights to private property, trial by jury, habeas corpus (no one can be jailed unless charged w/crime) American Experience. Mayflower Compact: had powers...
  • Presentation title - Voluntary Action Islington

    Presentation title - Voluntary Action Islington

    Camden, Islington and Haringey experienced the highest population churn, with around 10% of people in these boroughs moving out in 2014. There are high levels of homelessness and households in temporary housing. Lifestyle choices put local people at risk of...
  • "The Hours"

    "The Hours"

    SynopsisThe story of how the novel Mrs.Dalloway affects three generations of women, all of whom, in one way or another, have had to deal with suicide in their lives.. In 1951, Laura Brown, a pregnant housewife, is planning a party...
  • Centura Web Developer - Universitetet i Agder

    Centura Web Developer - Universitetet i Agder

    Wavelet Per Henrik Hogstad Mathematics Statistics Physics Computer Science ICT / Medicine SimReal+ Per Henrik Hogstad Mathematics Statistics Physics Computer Science ICT / Medicine SimReal+ ICT / Medicine Wavelets SimReal SimVideo Videolesson Videosimulation Interactive simulation Exercise University of California, Berkeley...
  • The Steps for Writing a DBQ

    The Steps for Writing a DBQ

    A DBQ thesis should be one sentence. Use the documents in the DBQ to come up with 3 supporting reasons that help explain your position. Thesis Examples:non-proficentproficienthigher proficient. Christianity took hold because of the Good Samaritan, his friend Paul, the...