Webinar Wheat Data Interoperability guidelines Esther Dzal Yeumo

Webinar Wheat Data Interoperability guidelines Esther Dzal Yeumo

Webinar Wheat Data Interoperability guidelines Esther Dzal Yeumo Richard Fulss Let us get to know each other What is your area of research? Agronomy Breeding Biochemistry

Bioinformatics Functional genetics Physiology Population genetics Quantitative genetics Statistical genetics Structural genomics Other What is your position Manager Data manager Researcher Software developer Student Other

2 The Wheat Initiative 3 Created in 2011 following endorsement by G20 Agriculture Ministries to improve food security A framework to identify synergies and facilitate collaborations for wheat improvement at the international level The Wheat Initiative members Countries: Argentina, Australia, Brazil, Canada, China, France, Germany, Hungary, India, Ireland, Italy, Japan, Spain, Turkey, UK, USA International organizations: CIMMYT, ICARDA Private companies: Arvalis, Bayer CropScience, Florimond Desprez V&F, KWS UK, Limagrain, Monsanto Company, RAGT 2n Saateen Union Research, Syngenta Crop Protection The Wheat Data Interoperability WG 4

Aims: contribute to the improvement of Wheat related data interoperability by Building a common interoperability framework (metadata, data formats and vocabularies) Providing guidelines for describing, representing and linking Wheat related data Contributors Sponsors Contributors: Alaux Michael (INRA, France), Aubin Sophie (INRA, France), Arnaud Elizabeth (Bioversity, France), Baumann Ute (Adelaide Uni, Australia), Buche Patrice (INRA, France), Cooper Laurel (Planteome, USA), Fulss Richard (CIMMYT, Mexico), Hologne Odile (INRA, France), Laporte Marie-Anglique (Bioversity, France), Larmand Pierre (IRD, France), Letellier Thomas (INRA, France), Lucas Hlne (INRA, France), Pommier Cyril (INRA, France), Protonotarios Vassilis (Agro-Know, Greece), Quesneville Hadi (INRA, France), Shrestha Rosemary (INRA, France), Subirats Imma (FAO of the United Nations, Italy), Aravind Venkatesan (IBC, France), Whan Alex (CSIRO, Australia) Co-chairs: Esther Dzal Yeumo Kabor (INRA, France), Richard Allan Fulss (CIMMYT, Mexico) Question According to you, which data types among those proposed hereafter appear to be the most important for wheat research in the next 5 years?

SNPs Genomic annotations Phenotypes Genetic maps physical maps Germplasm Other 5 The data types covered in the guidelines 6 The methodology Landscape of Wheat related standards and their use by the community

Comprehensive overview of Wheat related ontologies and vocabularies Surveys Workshops Implementatio n Recommendations Mappings between different data formats Actions to conduct in order to improve the current level of Wheat related data interoperability Interoperability use cases Interactive cookbook: recommendations + guidelines A repository of Wheat related linked vocabularies (Bioportal) 7 Use case: sources of resistance to stem rust (UG99) and tolerance to drought conditions in bread wheat Top management or

breeders Data scientists, bioinformaticians What are the sources of resistance to stem rust (UG99) and tolerance to drought conditions in bread wheat? Data are Dispersed Heretogeneous Abundant Phenotypic data Data manager, data

provider How do I extract information from all these databases? Do I have welldocumented metadata to make queries? How do I link the data to get smarter information? Gene sequences Passport information Low density markers

I want to make my data findable, reusable and linkable to other data: What ontologies and metadata elements are commonly used to describe the types of data I am dealing with? What data formats could I use to share my data Where could I deposit my data? Phenotypic data 8

Use case: Identification of wheat genes that control root growth 9 Description Mapping between wheat genes and orthologs from other species (deduce function by seq. similarity); Access to RNASeq data (genes that are not expressed in roots may be irrelevant); mapping of wheat genes and information on their function based on literature Implied data types Annotated genes (Gene Ontology, PFam, and other functional annotation) Implementation in AgroLD 10 www.agrold.or 11

12 Interoperability is one of the keys 13 What do we mean by interoperable data? Use shared vocabularies to name things and express relations between them Use common data formats to represent the data Use persistent and unique identifiers to identify things and allow their linkage over domains and information systems Benefits Efficient information integration / retrieval. Capability to answer complex questions Easier holistic understanding of a domain The deliverables 14 Guidelines (http://wheatis.org/DataStandards.php)

Data exchange formats Example: VCF (Variant Call Format) for sequence variation data, GFF3 for genome annotation data, etc. Data description best practices Consistent use of ontologies, consistent use of external database cross references Data sharing best practices Share data matrices along with relevant metadata (example: trait along with method, units and scales or environmental ones) Useful tools and use cases that highlight data formats and vocabularies issues A portal of wheat related ontologies and vocabularies (http:// wheat.agroportal.lirmm.fr/ontologies) Allows the access to the ontologies and vocabularies through APIs. A prototype Implementation of use cases of wheat data integration within the AgroLD (Agronomic Linked Data) tool: http://volvestre.cirad.fr:8080/agrold/ The deliverables

15 Benefits for many target users As a data producer or manager Easily conform to the well-recognized data repositories and facilitate the deposit of your data within these repositories; Share common meanings of the words you utilize to describe your data and make your data more machine-readable and computable Contribute to foster the development of smarter search tools and make your data more visible and discoverable As a wheat related information system or tool developer Basing your tool or information system on the recommended data formats and vocabularies will make it easier to integrate data from various data sources, deliver smarter outputs for a wider audience As a wheat related ontology developer Share your ontologies through the WDI wheat ontologies portal and make them more visible to the community Reuse or link your ontologies to existing concepts and terms in wheat related ontologies to enrich them, make them more visible and in some cases save you time.

16 How You Can Endorse the guidelines on data formats 17 For legacy data Please provide your data in at least one of the recommended data formats even if, for some reasons, you need to also keep them in other non-recommended formats For future developments Please consider using the recommended data formats from the beginning. Example: provide your sequence variation data in the latest VCF file format Please refer to the WDI guidelines for precise recommendations on each data type How You Can Endorse the data description and

sharing best practices 18 Describe your data following the WDI recommendations and with the recommended vocabularies. Examples: For genome annotation data in GFF3 format, use of ontologies for functional annotation in column 9, such as, Gene Ontology and Sequence Ontology. For observation Variables (including trait and environment variables), use existing variables, listed in the following vocabularies and ontologies : Wheat crop ontology INRA Wheat Ontology Biorefinery ontology XEO, XEML Environment Ontology

Question Which of the following vocabularies are you familiar with? 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

20. 21. 22. 23. AGROVOC Biorefinery CAB Thesaurus (CABT) Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Crop Ontology (CO) Crop Research Ontology part of Crop Ontology (CO_715) Environment Ontology (ENVO) Experimental Factor Ontology (EFO) Feature Annotation Location Description Ontology (FALDO) NAL Thesaurus (NALT) Phenotype And Trait Ontology (PATO) Plant Experimental Conditions Ontology (Plant Environment Ontology, EO, may be changing to PECO) Plant Ontology (PO) Plant Trait Ontology (TO) Population and Community Ontology (PCO) Protein Ontology (PRO) Sequence Ontology (SO)

Variation Ontology (VariO) Wheat Ontology INRA (Wheat_Ontology) Wheat Anatomy and Development Ontology part of Crop Ontology (CO_121) Wheat Trait Ontology: Embedded in Crop Ontology (CO_321) Wheat Phenotype (phenotypes and traits in Wheat) 19 Question Which of the following vocabularies are you using? 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. AGROVOC Biorefinery CAB Thesaurus (CABT) Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Crop Ontology (CO) Crop Research Ontology part of Crop Ontology (CO_715) Environment Ontology (ENVO) Experimental Factor Ontology (EFO) Feature Annotation Location Description Ontology (FALDO)

NAL Thesaurus (NALT) Phenotype And Trait Ontology (PATO) Plant Experimental Conditions Ontology (Plant Environment Ontology, EO, may be changing to PECO) Plant Ontology (PO) Plant Trait Ontology (TO) Population and Community Ontology (PCO) Protein Ontology (PRO) Sequence Ontology (SO) Variation Ontology (VariO) Wheat Ontology INRA (Wheat_Ontology) Wheat Anatomy and Development Ontology part of Crop Ontology (CO_121) Wheat Trait Ontology: Embedded in Crop Ontology (CO_321) Wheat Phenotype (phenotypes and traits in Wheat) 20 How You Can Endorse the WDI wheat related ontologies portal? Share your wheat related ontologies within the WDI slice in Agroportal Before developing a new ontology Make sure there is not an existing one within the WDI slice in Agroportal that covers your needs

When developing a new ontology Please reuse or link to exiting concepts and terms in the ontologies within the WDI slice in Agroportal whenever possible. Whenever possible Please align your ontologies to the existing ones within the WDI slice in Agroportal and share the mapping results 21 Endorsements/Adopters Laboratory Contact NIAB, www.niab.com Professor Mario Caccamo Head of Crop Bioinformatics Doreen Ware Adjunct Associate Professor Ph.D., Ohio State University

USDA ARS and Cold Spring Harbor Laboratory, http://cshl.edu/ Paul Kersey EMBL European Bioinformatics Institute, http://www.ebi.ac.uk/ Australian Center for Plant Functional Genomics, http://www.acpfg.com.au/ The Genome Analysis Center, http://www.tgac.ac.uk/ Munich Information Center for Protein Sequences (MIPS), Helmholtz Center Munich, http://www.helmholtz-muenchen.de/ INRA URGI, https://urgi.versailles.inra.fr/ Rothamsted Research, http://www.rothamsted.ac.uk/ James Hutton Institute, http://www.hutton.ac.uk/ CIMMYT Wheat program, http://www.cimmyt.org/en/ 22 Paul Kersey Team Leader Non-vertebrate Genomics

Dr Baumann, Ute Bioinformatics Leader Robert Davey Data Infrastructure & Algorithms Group Leader Dr. Klaus Mayer Research Director MIPS Michael Alaux, Deputy leader of "Information System and data integration" team Cyril Pommier, Deputy leader, Information System and Data integration team, Phenotype thematic leader Christopher Rawlings Head of Department Computational & Systems Biology Harpenden David Marshall Information and Computational Sciences The James Hutton Institute Richard Allan James, Head of Knowledge Management Rosemary Shrestha, Data Coordinator Acknowledgements 23 WDI WG members: Fulss Richard, co-chair (CIMMYT), Alaux Michael

(INRA), Aubin Sophie (INRA), Arnaud Elizabeth (Bioversity), Baumann Ute (Adelaide University), Buche Patrice (INRA), Cooper Laurel (Planteome), Hologne Odile (INRA), Laporte Marie-Anglique (Bioversity), Larmande Pierre (IRD), Letellier Thomas (INRA), Mohellibi Nacer (INRA) Pommier Cyril (INRA), Protonotarios Vassilis (Agro-Know), Shrestha Rosemary (CIMMYT), Subirats Imma (FAO of the United Nations), Aravind Venkatesan (IBC), Whan Alex (CSIRO), Jonquet Clment (Lirmm, Agroportal) And Lucas Hlne (INRA, International Wheat Initiative), Quesneville Hadi (INRA, co-chair WheatIS EWG) 24 Thank you!

Recently Viewed Presentations

  • Product Strategy - mrkuhnss.weebly.com

    Product Strategy - mrkuhnss.weebly.com

    Consumer Goods: Shopping. Shopping Goods are goods that consumers compare on such bases as sustainability, quality, price, and style before making a selection. Other factors include dependability, service, functionality, guarantees, and warranties. Examples include automobiles, clothing, appliances, major repairs, etc.
  • Skeletal System Gross Anatomy

    Skeletal System Gross Anatomy

    Skeletal System Gross Anatomy II ... Arm Humerus Head Tubercles Greater and lesser Intertubercular groove Capitulum Articulates with radius Trochlea Articulates with ulna Epicondyles Attachment of forearm muscles Forearm Radius Thumb side Radial tuberosity (biceps brachii muscle) Ulna Little ...
  • Body Composition - Bluevale Phys Ed.

    Body Composition - Bluevale Phys Ed.

    Body Composition Chapter 6 Setting Body Composition Goals If fat loss would benefit your health, set a realistic goal in terms of percent body fat or BMI If you have underlying health issues, check with your physician before setting a...
  • Glencoe Algebra 2

    Glencoe Algebra 2

    Use a determinant to find the area of the triangle. Area Formula Example 3 Use Determinants Diagonal Rule Sum of products of diagonals 0 + (-3) + 4 = 1 -18 + 0 + 2 = -16 Example 3 Use...
  • Niçin Bilgi Okuryazarlığı? Bilgi Çağı, Bilgi Patlaması ...

    Niçin Bilgi Okuryazarlığı? Bilgi Çağı, Bilgi Patlaması ...

    Niçin Bilgi Okuryazarlığı? Bilgi Çağı, Bilgi Toplumu, Bilgi Patlaması, Dijital Varsıllık/Yoksulluk, Dijital Bölünme, Öğrenmeyi öğrenme, Yaşam Boyu ...
  • NICET Fire Alarm Tech Level 1 Practice Test

    NICET Fire Alarm Tech Level 1 Practice Test

    Denver Building Amendment. Fire alarm system and communications wiring shall comply with the provisions of NFPA 72 and NFPA 70. Each type of initiating circuit, indicating circuit and control circuit shall be indicated by a separate color by utilizing either...
  • Working across sectors: a public health approach to

    Working across sectors: a public health approach to

    Working across sectors: a public health approach to to antimicrobial resistance * December 2014 * December 2014 * March 2013 December 2014 * December 2014 * December 2014 * March 2013 * Antibiotic use: use in plants, fish and animals...
  • Introduction - Computer Science

    Introduction - Computer Science

    Inductive bias: set of assumptions a learner uses to predict the target value for previously unseen inputs. This is the same as modeling or choosing a target hypothesis class. Types of inductive bias. Occam's razor.