BioData World Basel 2018

adeslat 111 views 39 slides Nov 29, 2018
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Introduction to Jackson Labs, JMCRS, Clinical Services and Scientific Services at the Jackson Labs. Differences between long and short read sequencing. FAIR Data Action Plan. Metadata needs. Data Commons and the need to capture sample specific gene models discovered.


Slide Content

NGS: How what we are measuring impacts data models and implications for data commons Anne Deslattes Mays, PhD Principal Computational Scientist

How to handle the disruption of new measurement technologies in our data ecosystem?

What does this mean for data science?

How do we become better data stewards?

11/29/18 BioDataWorld Congress - Basel This presentation was prepared by Anne Deslattes Mays, PhD in her personal capacity. The opinions expressed in this presentation are the author's own and not necessarily the views and opinions of the Jackson Laboratory Disclaimer

Introduction to the Jackson Laboratory What is Next Generation Sequencing Data used for today? How do we handle disruptions new measurement technologies bring? What is Proper Data Stewardship for Data Science? What does this mean for Data Commons ? How does we capture the context and precision of measurements? 11/29/18 BioDataWorld Congress - Basel Talk Overview 1 2 3 4 5 6

11/29/18 BioDataWorld Congress - Basel The Jackson Laboratory ( https://www.jax.org/ ) To discover precise genomic solutions for disease and empower the global biomedical community in the shared quest to improve human health.

11/29/18 BioDataWorld Congress - Basel The Jackson Laboratory (https:// www.jax.org /)

11/29/18 BioDataWorld Congress - Basel Recent News

JAX® Mice, Clinical and Research Services 11/29/18 BioDataWorld Congress - Basel > 10,000 mice strains supporting biomedical research > 80% research publications citing mice strains use JAX® Mice > 30,000 peer-reviewed publications cite use of JAX® Mice > 22,000 genetically diverse background strains cryopreserved > 2,500 strains successfully cryorecovered by JAX each year > 75 new models CRISPR created on different genetic backgrounds Every month hundreds publications reference JAX® Mice strains 1 2 3 4 5 6 7

JAX Clinical Genomics Laboratory (CGL) Offerings: Sample Types Validated for Testing FFPE: Formalin Fixed Paraffin Embedded tissue (SOLID TUMORS) Cell Free DNA Whole Blood Buccal Swab Saliva Cancer Inherited Disorders Honey Reddi , PhD, FACMG Clinical Lab Director

Who Do We Serve? Clinicians Pharma + Academia Biotech + JAX PIs - CLIA validated tests - CLIA validated tests - Research Assays - Research Assays - Custom Assay Development

Assays for Confirmation of variants 48-60 samples/run, TAT of ~6 days if primer/probes available in-house

Research Assays: PDX A suite of assays for mutational and expression analysis of PDX tissue, includes PDX filtering

Clinical Knowledge Base 11/29/18 BioDataWorld Congress - Basel

Scientific Services at JAX 11/29/18 BioDataWorld Congress - Basel JAX-GM Cellular Engineering Microbial Genomics Single Cell Biology Genome Technologies Center for Biometric Analysis PDX Research and Development Microscopy Services Flow Cytometry Mass Spectrometry and Protein Chemistry Monoclonal Antibody Services 1 2 3 4 5 6 7 8 9 10

Scientific Services at JAX 11/29/18 BioDataWorld Congress - Basel JAX-GM Cellular Engineering ✔️ Microbial Genomics ✔️ Single Cell Biology ✔️ Genome Technologies ✔️ Center for Biometric Analysis PDX Research and Development ✔️ Microscopy Services Flow Cytometry Mass Spectrometry and Protein Chemistry Monoclonal Antibody Services 1 2 3 4 5 6 7 8 9 10 ✔️- Using NGS Technologies

Gordon Bell Prize Super Computing 2018 11/29/18 BioDataWorld Congress - Basel

Gordon Bell Prize Super Computing 2018 11/29/18 BioDataWorld Congress - Basel 750,000 human genome types , associated with more than a billion medical records over a 20-year period .

11/29/18 BioDataWorld Congress - Basel

11/29/18 BioDataWorld Congress - Basel

11/29/18 BioDataWorld Congress - Basel

11/29/18 BioDataWorld Congress - Basel

11/29/18 BioDataWorld Congress - Basel Oxford Nanopore Offerings

11/29/18 BioDataWorld Congress - Basel Workman, Rachael E., et al. "Nanopore native RNA sequencing of a human poly (A) transcriptome." bioRxiv (2018): 459529. Human poly (A) transcriptome

11/29/18 BioDataWorld Congress - Basel Workman, Rachael E., et al. "Nanopore native RNA sequencing of a human poly (A) transcriptome." bioRxiv (2018): 459529. Human poly (A) transcriptome

11/29/18 BioDataWorld Congress - Basel https:// blog.genohub.com /2017/06/16/ pacbio -vs-oxford-nanopore-sequencing/ PacBio vs Oxford Nanopore Sequencing

11/29/18 BioDataWorld Congress - Basel PacBio Concensus Accuracy > 99% raw PacBio reads also differ in error types (more indels than mismatches) and have a much higher abundance (∼13–15%, Table 1), though they are spread randomly across the reads (25,26). This randomness enables highly accurate consensuses (>99%) to be build up rapidly by sequencing multiple times the same molecule (CCS reads) Simon Ardui , Adam Ameur , Joris R Vermeesch , Matthew S Hestand ; Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics, Nucleic Acids Research, Volume 46, Issue 5, 16 March 2018, Pages 2159–2168, https:// doi.org /10.1093/ nar /gky066

11/29/18 BioDataWorld Congress - Basel All measurements taken on biological samples are made within the context of instrument limitations , procedures followed in preparing samples for measurement and the condition and the context of the samples being measured. Raw result data, quality data, metadata and procedures used to transform the measurement data from the instrument and/or the experimental procedures are best captured at the time of experimental design to aid in primary and secondary processing. Biological Samples Details Need Metadata Library Construction Details Need Metadata Instrument Details Need Metadata

11/29/18 BioDataWorld Congress - Basel How do we handle disruptions new measurement technologies bring? Long Reads Sequence unfragmented cDNA libraries Short Reads are sequenced on fragmented cDNA libraries Capturing the full length (5’ UTR to 3’ UTR) open reading frames at the transcript level Measuring the Transcriptome allows us to peer into the Proteome Validation can occur with peptides This Sample Specific Transcriptome contains Alternatively Spliced Transcripts Specific to the Sample Collected – altering the gene model for that sample We need to capture the gene model in Data Commons for future reuse

FAIR Data Action Plan (Preliminary Steps) Interim recommendations and actions from the European Commission Expert Group on FAIR data 11/29/18 BioDataWorld Congress - Basel

FAIR Data Action Plan (Preliminary Steps) Interim recommendations and actions from the European Commission Expert Group on FAIR data 11/29/18 BioDataWorld Congress - Basel Define and apply FAIR appropriately Develop and support a sustainable FAIR data ecosystem Ensure FAIR data and certified services to support FAIR 1 2 3

11/29/18 BioDataWorld Congress - Basel FAIR Data Object (Core Bits)

BioDataWorld Congress - Basel 11/29/18 Genome Technologies Imaging Services Single Cell Services Grant Award Data Analysis Repeat Google Cloud Platform Docker TCGA JAX Pipelines API Analysis Program URL RESULTS ISB-CGC / mnt /input / mnt /output - ISB-CGC - JAX-pipelines - Analysis Program - Google Cloud A Typical Researcher’s Path Paper Writing & Acceptance TIER 1 TIER 3 TIER 2 SRA GEO

BioDataWorld Congress - Basel 11/29/18 Genome Technologies Imaging Services Single Cell Services Grant Award Data Analysis Repeat Google Cloud Platform Docker TCGA JAX Pipelines API Analysis Program URL RESULTS ISB-CGC / mnt /input / mnt /output - ISB-CGC - JAX-pipelines - Analysis Program - Google Cloud Where is the metadata and where is it captured? Paper Writing & Acceptance TIER 1 TIER 3 TIER 2 SRA GEO BioProject : What was the question being asked? Experimental Design: What tissue is being measured? How was the library constructed? At what time points were the data collected? SRA: BioSample : Raw FASTQ files stored - controlled access data? Matrices: Junction Count by Sample Instrument Details: which version of the instrument? What chemistries Sample Collection Details – affects quality – when and where were the samples collected Library Construction Details: fragmented or unfragmented libraries?

NCI Cancer Research Data Commons 11/29/18 Data Stewardship | 36

11/29/18 BioDataWorld Congress - Basel # datagovernancematters

Data management plans needed for data produced We need metadata (data about our data) including instruments We need to adhere to W3C standards, RDF, data catalogs, publish data Ontologies should be used everywhere More metadata need to be captured Data need to be FAIR by man and machine 11/29/18 Data Stewardship Data Commons Data Management for Data Stewardship: 1 2 3 4 5 6 | 38

THANK YOU!