PathoLogic Pathway Predictor SRI International Bioinformatics Inference of

PathoLogic Pathway Predictor SRI International Bioinformatics Inference of

PathoLogic Pathway Predictor SRI International Bioinformatics Inference of Metabolic Pathways Annotated Genomic Sequence Pathway/Genome Database Gene Products Pathways Genes/ORFs DNA Sequences Multi-organism Pathway Database (MetaCyc) Pathways

Reactions PathoLogic Software Integrates genome and pathway data to identify putative metabolic networks Compounds Gene Products Genes Reactions Genomic Map Compounds PathoLogic Functionality Initialize SRI International Bioinformatics

schema for new PGDB Transform existing genome to PGDB form Infer metabolic pathways and store in PGDB Infer operons and store in PGDB Assemble Overview diagram Assist user with manual tasks Assign enzymes to reactions they catalyze Identify false-positive pathway predictions Build protein complexes from monomers Infer transport reactions SRI International Bioinformatics PathoLogic Input/Output Inputs: File listing genetic elements

http://bioinformatics.ai.sri.com/ptools/genetic-elements.dat Files containing DNA sequence for each genetic element Files containing annotation for each genetic element MetaCyc database Output: Pathway/genome database for the subject organism Reports that summarize: Evidence contained in the input genome for the presence of reference pathways Reactions missing from inferred pathways SRI International Bioinformatics

PathoLogic Analysis Phases Trial parsing of input data files [few days] Initialize schema of new PGDB [3 min] Create DB objects for replicons, genes, proteins [5 min] Assign enzymes to reactions they catalyze ferrochelatase [10 min / 1 week] glutamate 1-semialdehyde 2,1-aminomutase porphobilinogen deaminase E1 A B E2 C D E

G F SRI International Bioinformatics PathoLogic Analysis Phases From assigned reactions, infer what pathways are present [5 min / few days] Define metabolic overview diagram Define protein complexes [30 min] [few days]

genetic-elements.dat ID TEST-CHROM-1 NAME Chromosome 1 TYPE :CHRSM CIRCULAR? N ANNOT-FILE chrom1.pf SEQ-FILE chrom1.fsa // ID TEST-CHROM-2 NAME Chromosome 2 CIRCULAR? N ANNOT-FILE /mydata/chrom2.gbk SEQ-FILE /mydata/chrom2.fna //

SRI International Bioinformatics SRI International Bioinformatics File Naming Conventions One pair of sequence and annotation files for each genetic element Sequence files: FASTA format suffix fsa or fna Annotation file: Genbank format: suffix .gbk PathoLogic format: suffix .pf Typical Problems Using

Genbank Files With PathoLogic SRI International Bioinformatics Wrong qualifier names used: read PathoLogic documentation! Extraneous Check information in a given qualifier results of trial parse carefully SRI International Bioinformatics GenBank File Format Accepted

feature types: CDS, tRNA, rRNA, misc_RNA Accepted For qualifiers: /locus_tag /gene /product /EC_number /product_comment /gene_comment /alt_name

/pseudo Unique ID [recm] Gene name [req] [req] [recm] [opt] [opt] Synonyms [opt] Gene is a pseudogene [opt] multifunctional proteins, put each function in a separate /product line SRI International Bioinformatics PathoLogic File Format Each record starts with line containing an ID attribute Tab delimited Each record ends with a line containing // One

attribute-value pair is allowed per line Use multiple FUNCTION lines for multifunctional proteins Lines Valid starting with ; are comment lines attributes are: ID, NAME, SYNONYM STARTBASE, ENDBASE, GENE-COMMENT FUNCTION, PRODUCT-TYPE, EC, FUNCTION-COMMENT DBLINK INTRON PathoLogic File Format

SRI International Bioinformatics IDTP0734 NAME deoD STARTBASE 799084 ENDBASE 799785 FUNCTION purine nucleoside phosphorylase DBLINK PID:g3323039 PRODUCT-TYPE P GENE-COMMENT similar to GP:1638807 percent identity: 57.51; identified by sequence similarity; putative // IDTP0735 NAME gltA STARTBASE 799867 ENDBASE

801423 FUNCTION glutamate synthase DBLINK PID:g3323040 PRODUCT-TYPE P Before you start: What to do when an error occurs SRI International Bioinformatics Most Navigator errors are automatically trapped debugging information is saved to error.tmp file. All other errors (including most PathoLogic errors) will cause software to drop into the Lisp debugger Unix: error message will show up in the original terminal window from which you started Pathway Tools. Windows: Error message will show up in the Lisp console.

The Lisp console usually starts out iconified its icon is a blue bust of Franz Liszt 2 goals when an error occurs: Try to continue working Obtain enough information for a bug report to send to pathway-tools support team. The Lisp Debugger SRI International Bioinformatics Sample error (details and number of restart actions differ for each case) Error: Received signal number 2 (Keyboard interrupt) Restart actions (select using :continue): 0: continue computation 1: Return to command level 2: Pathway Tools version 10.0 top level 3: Exit Pathway Tools version 10.0 [1c] EC(2):

To generate debugging information (stack backtrace): :zoom :count :all To continue from error, find a restart that takes you to the top level in this case, number 2 :cont 2 To exit Pathway Tools: :exit How to report an error Determine SRI International Bioinformatics if problem is reproducible, and how to

reproduce it (make sure you have all the latest patches installed) Send email to [email protected] containing: Pathway Tools version number and platform Description of exactly what you were doing (which command you invoked, what you typed, etc.) or instructions for how to reproduce the problem error.tmp file, if one was generated If software breaks into the lisp debugger, the complete error message and stack backtrace (obtained using the command :zoom :count :all, as described on previous slide) Using the PPP GUI to Create a Pathway/Genome Database SRI International Bioinformatics Input Project Information Organism -> Create New SRI International

Bioinformatics Input Project Information Next Steps Trial Parse Build -> Trial Parse Fix any errors in input files Build pathway/genome database Build -> Automated Build SRI International Bioinformatics SRI International Bioinformatics PathoLogic Parser Output Assign Enzymes to Reactions 5.1.3.2

Gene product SRI International Bioinformatics MetaCyc UDP-glucose-4epimerase Match no Probable enzyme -ase no yes Not a metabolic enzyme yes Assign UDP-D-glucose UDP-galactose

Manually search no Cant Assign yes Assign Enzyme Name Matcher Matches SRI International Bioinformatics on full enzyme name Match is case-insensitive and removes the punctuation characters -_(){}',: Also matches after removal of prefixes and suffixes such as: Putative, Hypothetical, etc alpha|beta||catalytic|inducible chain|subunit|component Parenthetical gene name Enzyme Name Matcher

For SRI International Bioinformatics names that do not match, software identifies probable metabolic enzymes as those Containing ase Not containing keywords such as sensor kinase topoisomerase protein kinase peptidase Etc Research

unknown enzymes MetaCyc, Swiss-Prot, PubMed Enzyme Name to Reaction Mapping SRI International Bioinformatics See also file PTools Tutorial/PathoLogic Reports/name-matching-report.txt SRI International Bioinformatics Manual Polishing Refine -> Assign Probable Enzymes Do this first

Refine -> Rescore Pathways Redo after assigning enzymes Refine -> Create Protein Complexes Can be done at any time Refine -> Assign Modified Proteins Can be done at any time Refine -> Transport Identification Parser Can be done at any time Refine -> Pathway Hole Filler

Refine -> Predict Transcription Units Refine -> Update Overview Do this last, and repeat after any material changes to PGDB Assign Probable Enzymes SRI International Bioinformatics How to find reactions for probable enzymes First, SRI International Bioinformatics verify that enzyme name describes a specific, metabolic function Search for fragment of name in MetaCyc you

may be able to find a match that PathoLogic missed Look up protein in SwissProt or other DBs Search for gene name in PGDB for related organism (bear in mind that gene names are not reliable indicators of function, so check carefully) Search for function name in PubMed Other Manual Polishing Refine -> Assign Probable Enzymes Refine -> Rescore Pathways Refine -> Create Protein Complexes

Refine -> Assign Modified Proteins Refine -> Transport Identification Parser Refine -> Pathway Hole Filler Refine -> Predict Transcription Units Refine -> Run Consistency Checker Refine -> Update Overview SRI International

Bioinformatics SRI International Bioinformatics Automated Pathway Inference All pathways in MetaCyc for which there is at least one enzyme identified in the target organism are considered for possible inclusion. Algorithm errs on side of inclusivity easier to manually delete a pathway from an organism than to find a pathway that should have been predicted but wasnt. SRI International Bioinformatics Considerations taken into account when deciding whether or not a pathway should be inferred: Is

there a unique enzyme an enzyme not involved in any other pathway? Does the organism fall in the expected taxonomic domain of the pathway? Is this pathway part of a variant set, and, if so, is there more evidence for some other variant? If there is no unique enzyme: Is there evidence for more than one enzyme? If a biosynthetic pathway, is there evidence for final reaction(s)? If a degradation pathway, is there evidence for initial reaction(s)? If an energy metabolism pathway, is there evidence for more than half the reactions? SRI International Bioinformatics Assigning Evidence Scores to Predicted Pathways X|Y|Z denotes score for P in O where:

X = total number of reactions in P Y = enzymes catalyzing number of reactions for which there is evidence in O Z = number of Y reactions that are used in other pathways in O SRI International Bioinformatics Manual Pruning of Pathways Use pathway evidence report Coloring scheme aids in assessing pathway evidence Phase I: Prune extra variant pathways Rescore Phase

pathways, re-generate pathway evidence report II: Prune pathways unlikely to be present No/few unique enzymes Most pathway steps present because they are used in another pathway Pathway very unlikely to be present in this organism Nonspecific enzyme name assigned to a pathway step Caveats Cannot predict pathways not present in MetaCyc Evidence Since SRI International Bioinformatics

for short pathways is hard to interpret many reactions occur in multiple pathways, some false positives Output from PPP Pathway/genome SRI International Bioinformatics database Summary pages Pathway evidence page Click Summary of Organisms, then click organism name, then click Pathway Evidence, then click Save Pathway Report

Missing enzymes report Directory etc. tree containing sequence files, reports, SRI International Bioinformatics Resulting Directory Structure ROOT/ptools-local/pgdbs/user/ORGIDcyc/VERSION/ input

reports ORGIDbase.ocelot data name-matching-report.txt trial-parse-report.txt kb organism.dat organism-init.dat

genetic-elements.dat annotation files sequence files overview.graph released -> VERSION Manual Polishing Refine -> Assign Probable Enzymes Refine -> Rescore Pathways Refine -> Create Protein Complexes

Refine -> Assign Modified Proteins Refine -> Transport Identification Parser Refine -> Pathway Hole Filler Refine -> Predict Transcription Units Refine -> Run Consistency Checker Refine -> Update Overview SRI International Bioinformatics

SRI International Bioinformatics Creating Protein Complexes Complex Subunits Stoichiometries SRI International Bioinformatics Manual Polishing Refine -> Assign Probable Enzymes Refine -> Re-run Name Matcher

Refine -> Create Protein Complexes Refine -> Assign Modified Proteins Refine -> Transport Identification Parser Refine -> Pathway Hole Filler Refine -> Predict Transcription Units Refine -> Run Consistency Checker

Refine -> Update Overview SRI International Bioinformatics SRI International Bioinformatics Proteins as Reaction Substrates Manual polishing Refine -> Assign Probable Enzymes Refine -> Rescore Pathways Refine -> Create Protein Complexes

Refine -> Assign Modified Proteins Refine -> Transport Identification Parser Refine -> Pathway Hole Filler Refine -> Predict Transcription Units Refine -> Run Consistency Checker Refine -> Update Overview SRI International

Bioinformatics Nomenclature WO pair = pair of genes within an operon TUB pair = pair of genes at a transcription unit boundary (delineate operons) SRI International Bioinformatics Operation of the operon predictor SRI International Bioinformatics For each contiguous gene pair, predict whether gene pairs are within the same operon or at a transcription unit boundary Use pairwise predictions to identify potential operons AB = TUB pair BC = WO pair

CD = WO pair DE = TUB pair A operon = BCD B C D E Operon predictor SRI International Bioinformatics Predicts operon gene pairs based on: intergenic distance between genes

genes in the same functional class Typically used for operon prediction We use method from Salgado et al, PNAS (2000) as a starting point. Uses E. coli experimentally verified data as a training set. Compute log likelihood of two genes being WO or TUB pair based on intergenic distance. Operon predictor SRI International Bioinformatics Additional features easily computed from a PGDB 1. both genes products enzymes in the same metabolic pathway 2.

both gene products monomers in the same protein complex 3. one gene product transports a substrate for a metabolic pathway in which the other gene product is involved as an enzyme 4. a gene upstream or downstream from the gene pair (and within the same directon) is related to either one of the genes in the pair as per features 1, 2 and 3 above.

Recently Viewed Presentations

  • EE414 Lecture Notes (electronic) - Montana State University

    EE414 Lecture Notes (electronic) - Montana State University

    SPICE Modeling. SPICE Modeling- SPICE allows the use of MODELS to represent components with complex, non-linear responses such as Diodes and Transistors- Models are present in their own file (starting with the .MODEL keyword)- A component is instantiated in the...
  • Getting Back on Offense: What's Gone Wrong and What Social ...

    Getting Back on Offense: What's Gone Wrong and What Social ...

    Papers by Friedman and by Jensen and Meckling in the 1970s introduced the concept of "shareholder primacy," or the idea that a corporation's main purpose is to benefit its shareholders. The concept of shareholder primacy "helped spawn the rise of...
  • Training Schedule - Headquarters Marine Corps

    Training Schedule - Headquarters Marine Corps

    WHO Kit - Service approximately 7k for 30 days. Class IV - Construction Materiel . No non lethal weapon capability. Strengths. Water production capability. Road Clearance - BLT has Dozer/CLB has MTL (Bob Cat) HE capability. Landing Support (4 Nets...
  • HEA Fellowship - Swansea University

    HEA Fellowship - Swansea University

    Final day - exam OSPE (objective structured . practice exam) How? Learning outcomes. Breadth of content. Teaching expertise. Assessment of learning - OSPE. OSPE. Single intensive 6 hour exam process. Student concerns = process, resits. Examiner concerns = rigour, consistency,
  • Democratic Practices - Mr Dombrowski's Social Studies Class

    Democratic Practices - Mr Dombrowski's Social Studies Class

    To establish some kind of law and discipline, Pilgrim leaders decided that they must make an agreement before they went ashore. B. The 41 men aboard signed a document setting up a form of self-government and agreeing to obey laws...
  • State-level school-based sex education policies on sexual ...

    State-level school-based sex education policies on sexual ...

    State-level school-based sex education policies on sexual orientation are associated with changes in teaching about HIV prevention. Tuesday, July 25, 2017. Ashley Grosso, PhDDayana Bermudez, CHESMary Ann Chiasson, DrPHPublic Health Solutions.
  • Four Key Economic Issues: A Market Practitioners Perspective

    Four Key Economic Issues: A Market Practitioners Perspective

    America is the crouching tiger, while China is the hidden dragon. Hiding in plain sight is that China is seeking to become a super-power before it turns into the world's largest nursing home. 2. Monetary & Fiscal Policies. Fiscal policy...
  • Grand Rounds Jinghua Chen, MD, PhD October 21,

    Grand Rounds Jinghua Chen, MD, PhD October 21,

    Multiple evanescent white dot syndrome. Multifocal choroiditis and panuveitis ( MCP ) Acute posterior multifocal placoid pigment epitheliopathy (APMPPE) Birdshot chorioretinopathy. Punctate inner choroidopathy(PIC) White Dot Syndromes (WDS) a collection of diseases characterized by localized,circumscribed whitish lesions in the RPE...