CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Homework #1 Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Al-Bawardy, Rasha F. Antonio, Dion
Berro, Reem G. Chien, Yu Fung Dharker, Nachiket S. Eunkyung, An Gansberger, Kristen M. Gupta, Madhur V. Hand, Damon Hua, Dong Karim, Halima R. Kebede, Mikael Koyama, Kaori Kwak, Yoon I. Marwin, Victor M. Mody, Manali Moorjani, Priya G. Qukub, Dunia Ryan, Caitlyn E. Williams, Bernadette Yahan, Lin Yawo, Akrodou Zhou, Leming 13 13 14 11 12 13 12
12 13 11 11 9 13 5 10 14 12 14 10 12 6 14 The International Nucleotide Sequence Database Collaboration NIH Sequin BankIt ftp Entrez NCBI GenBa nk
EMBL Submissions Updates CIB NIG DDBJ Submissions Updates getentry Submissions Updates EBI SRS EMBL Primary vs. Derivative Databases ACG TGC Curators
C TC ATCATCT TA TA G CC C G TG G AC GAG GAG A A TATAGCCG AGCTCCGATA CCGATGACAA T T G A C A C G TG
A AT T GA C TA Sequencing Centers RefSeq TA TA GC CG CG C AG TAT GenBank UniGene AT C TC ATCATCT
GAG GAG A A AT T G AC TA GA TACTTTCTT T A ATCA C Genome Assembly A TG CG C TG G C A CGTGA
A G T AT G CTGA CT ACG A TGC Labs CA A G TT TTGACA A TA T A C TA TTG GC TAA CGGA CA C C C C
A A G T G G TTATAGCCG A TA AT TATAGCCG TATAGCCG ATT TATAGCCG TG T A T T AT C Algorithms GAGA GAG A The Entrez Databases The (ever) Expanding Entrez System Journals
UniGene Books PubMed Central SNP PubMed UniSTS Nucleotid e Protein PopSet ProbeSet Entrez Genome Structure Taxonomy
CDD 3D Domains OMIM Genbank Search and retrieval of sequences Entrez is a retrieval system for searching several linked databases. It provides access to: PubMed; Nucleotide; Protein; Structure; Genome; PopSet; OMIM; Taxonomy and more. Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. BLAST selections Quiz #1 Summary: Nucleotide and protein databases
Sequence formats Lab exercises GenBank format Fasta format Sequence formats ASN.1 DNAStrider EMBL Convertible in ReadSeq (Web based) http://bimas.dcrt.nih.gov/molbio/readseq/ Fitch GCG GenBank/GB IG/Stanford or ForCon (stand-alone application) http://www.hgmp.mrc.ac.uk/embnet.news/vol6_1/ForCon/forcon.html MSF NBRF Olsen PAUP/NEXUS
Pearson/Fasta Phylip PIR/CODATA NOTE: - FASTA is a popular sequence format Plain/Raw Quiz #1 Pretty Summary: Nucleotide and protein databases Zuker Sequence formats Lab exercises - it also is a sequence similarity and homology search tool (similar to BLAST) used by EMBL-EBI Lab exercises 1) How many sequences are available in GenBank for Neanderthals? Depends on your search strategy 2) Go to Entrez nucleotide. Find all sequences for the following terms:
neander 1 Neanderthals 0 Neanderthal 1 neanderthal 1 neanderthal* 6 Homo sapiens neanderthalensis 6 2) Go to Entrez taxonomy. Try to find all sequences for Neanderthals! Quiz #1 Summary: Nucleotide and protein databases
Sequence formats Lab exercises 6 Lab exercises 4) How many nucleotide sequences are available for the house mouse Mus musculus? Try both Entrez nucleotides and Entrez taxonomy. How do you explain the difference? Entrez taxonomy 5.403.701 Entrez nucleotides 5.458.506 (Mus musculus) 5.393.552 (house mouse) 5.458.527 (Mus musculsus OR house mouse) 5) A man is found murdered in Yellowstone National Park. Few hairs of unidentified origin are recovered on the victims clothes. The samples arrive in the lab and DNA
is isolated and sequenced: CCATGCATATAAGCATGTACATAATATTATATTCTTACATAGGACATATTAACTCAATCTCATAATTCAT Formulate a hypothesis regarding the origin of the recovered hairs and potential links with the killing! Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Canis lupus (Gray Wolf) The Poliovirus Problem VOL 297, 9 August 2002 Cello, J; Paul, A.V. & Wimmer, E.: Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template - they generated about 7.7 kilobases of single-stranded RNA genome based on the know genetic map - DNA fragments were synthesized from purified oligonucleotides (average length 69: bases) - the cDNA was then transcribed into highly infectious RNA Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises
The Poliovirus Problem 17 July 2002 Weiss, R.: Mail-Order Molecules Brew a Terrorism Debate - mail-order oligonucleotides can be used to manufacture a deadly virus - because they are so small, most oligos lack a fingerprint - call for more control and/or institutional oversight Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises The Poliovirus Problem Are these oligos so small that they lack a fingerprint ?? - search in Genbank for nucleotide sequences of the poliovirus - copy about 100 bp from a sequence of your choice and paste it into the search window of blastn, is the fragment identifiable as poliovirus? - if so, do a blastn search with a 90 bp, 80 bp, 70 bp fragment - what is the length of the shortest fragment still identifiable as poliovirus? Quiz #1 Summary: Nucleotide and
protein databases Sequence formats Lab exercises - is this fragment shorter than the average length of 69 bp used to synthesize the poliovirus? - do these oligos have a fingerprint (i.e. can typical oligos with lengths of 20-50 be assigned to a particular organism)? Homework assignment lecture #4 Explain in your own words and in simple terms the basics of the BLAST tool! - assignment is due on 6 Oct 2003, 3:30 PM - send your assignment as e-mail attachment to [email protected] (type your name and the term homework in the subject line) - maximum size: 500 words Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises