Data Analysis Summary - National Cancer Institute

Data Analysis Summary - National Cancer Institute

Data Analysis Summary Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other omics in clinical settings About 80% of attendees were actively involved in data analysis Clinical practitioners also present

Talking about data analysis is difficult We do not yet have language to do so Complicated Not clear what details are important Overview

Analytical Validity Enhance Clinical Utility Data Sharing Messages to NCI Analytical Validity Definition of mutation, both at the level of variant calling and with established calls, remains unclear Reproducible analytical methods in both research and clinical practice needed

Versioning of raw data and processed data Versioning of data analysis pipelines Versioning of auxiliary data (gene models, sequences, etc.) Analytical Validity Understanding that algorithms that use all the information (tumor/normal or multiple tumor samples) yields higher sensitivity and specificity Data archiving and sharing becomes important

Archiving blocks is not enoughbest to archive data as well Need to provide confidence associated with variants since no test is 100% accurate Analytical Validity Reference genome A computational reference is important to allow communication of findings Lack of adequate ground-truth datasets precludes

rigorous evaluation of analysis, particularly in quantifying false-negative rate Tumor heterogeneity, tissue heterogeneity, and even stochastic sampling at the sample level remain challenges in establishing analytical validity Enhance Clinical Utilty Lacking large (10s of thousands of patients), wellannotated databases of normal or disease patients

Definition of clinically actionable remains unclear Reference database of clinically actionable variants Does not exist Will be challenging to update and maintain Incorporating clinical context is difficult but probably necessary if one is to truly achieve precision medicine Enhance Clinical Utility Establish methods of reporting that empower the clinician

Enough detail to be helpful Not so much detail as to be unintelligible Integrate with online databases and knowledge Data Sharing Need to establish standards for data sharing in both research and clinical venues (think Myriad and BRCA1 testing)

Protocols, both computational and laboratory Controlled vocabularies Clinical data The data themselves Consider incentivizing data sharing Pay-to-play sharing NCI mandate

Data Sharing What constitutes de-identified data? Need to respect rights of patients, including protecting AND sharing data Need some way to feed clinical information back to into the informatics pipeline Clinicians need actionable information with as much interpretation with regard to literature and knowledgebase as possible

Messages to NCI Critical need to establish ground truth datasets and biologics Fix TCGA! NCI should collect and maintain knowledgebase of clinically actionable information (variants, genes, pathways) Start by collecting and updating lists from large medical centers Enhance PDQ database to include computable

information on molecular targets under study Messages to NCI More input is needed when NCI is planning bioinformatics, computational biology, and biomedical informatics Granting mechanism? Less top-down approach to informatic Establish and ENFORCE rational data sharing mechanisms for NCI-sponsored clinical trials

SRA is not the answer. Gene Expression Patient and Population phenotype Characteristics Gene Copy

Number Transcriptional Regulation DNA Methylation Chromatin Structure and Function

Sequence Variation Questions

