Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Goldenadmin
764 views
24 slides
Jun 02, 2014
Slide 1 of 24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
About This Presentation
GenomeBrowse, a free visualization tool for all types of sequence data, was introduced in 2012 to broad acclaim. Researchers using GenomeBrowse discovered a product far beyond the status quo with seamless navigation of sequence alignments and other genomic data using a fluid, fast, and intuitive int...
GenomeBrowse, a free visualization tool for all types of sequence data, was introduced in 2012 to broad acclaim. Researchers using GenomeBrowse discovered a product far beyond the status quo with seamless navigation of sequence alignments and other genomic data using a fluid, fast, and intuitive interface that just "made sense." Recent updates to GenomeBrowse, including support for VCF files and BED files and the ability to export tables of data extracted from viewable annotation tracks, further improved the product and created new synergy with Golden Helix SNP & Variation Suite (SVS).
This webcast will demonstrate the ability of GenomeBrowse to stream sequence alignment data from the Amazon Cloud, seamlessly transitioning between whole genome views and base-pair resolution in the context of both public and custom annotation tracks. We will show how GenomeBrowse can be used in conjunction with SVS to highlight false variant calls, confirm the inheritance pattern of putative functional variants, and aid in the interpretation of a variant's impact. Examples of RNA-seq expression analysis, somatic variation in cancer, and family-based DNA-seq analysis will be included.
Size: 5.66 MB
Language: en
Added: Jun 02, 2014
Slides: 24 pages
Slide Content
RNA/DNA Sequencing and Genotyping Analysis Bryce Christensen, PhD Director of Services and Statistical Geneticist Genetic Data Visualization and Analysis with Golden Helix
? Use the Questions pane in your GoToWebinar window Questions during the presentation ? ?
Core Features Packages Core Features Powerful Data Management Rich Visualizations Robust Statistics Flexible Easy to use Applications Genotype Analysis DNA sequence analysis CNV Analysis RNA-seq differential expression Family Based Association SNP & Variation Suite (SVS)
Today’s Agenda RNA-Seq differential expression example using SVS and GenomeBrowse 2 3 Somatic mutation analysis example Getting started with GenomeBrowse and new features 1
Acknowledgments NA12878 WGS data is from Illumina Genome Network RNA- seq example data provided by EA, same data is available to all SVS and GenomeBrowse users. Gastric cancer sample pair described here: Zang , et al., Exome Sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes. Nat. Genet. 44, 570-574 (2012 ). A big thank you to the Golden Helix product development team!
? Questions ? ? ? Use the Questions pane in your GoToWebinar window
END
Visualization Experience Natural zooming and navigation controls that mimic familiar panning and scrolling actions. Coverage and pile-up views with different modes to highlight mismatches and look for strand bias. Deep, stable stacking algorithms to look at all reads in a pile-up, not just the first 10 or 20. Easily generate exportable tables of any data in the viewable window. Context-sensitive information by clicking on any feature. A dynamic labeling system which gives optimal detail on annotation features without cluttering the view.
Data Streaming Cloud-based repository of public annotations including dbSNP , 1000 Genomes, NHLBI 6500 Exomes , UCSC Known Genes, Ensembl , the OMIM catalog, and much more. All public annotation tracks are hosted on the cloud with optimized on-demand streaming so that you don’t have to download them to start viewing data. Annotations are updated frequently and automatically by Golden Helix so that you can have immediate access to the most up-to-date information. Additional species are also available including cattle, sheep, major food crops, model organisms, etc.
Flexible Data Storage Options Easily manage repositories of BAM files, whether they are on local hard drives or network attached storage with an integrated download manager that can be used to create local copies of cloud-based public or private data files. GenomeBrowse is tightly integrated with the EA Pipeline so that EA customers have immediate streaming access to their RNA- Seq analysis outputs, saving terabytes of data download. GenomeBrowse users are required to create an account, enabling secure connections to cloud-based data sources. Illumina BaseSpace integration coming soon.
SVS Example: Ogden Syndrome Data Lethal X-linked recessive disease affecting males in multiple generations of a family in Ogden, Utah Sequenced 5 members of the family, including 3 carriers, to identify the causal mutation. Dr. Lyon graciously shared this data with GHI.
Study Design Five family members sequenced using X-chromosome exome capture
The Human Reference Sequence Genome Reference Consortium (GRCh37) Feb 2009, previous was NCBI36 March 2006 9 alt loci and 187 patches (11 patch releases) Supercontigs : Large unplaced contigs Some localized to chr level and some unknown Does not include a Mitochondrial reference UCSC hg19 includes older NCBI 36 MT 1000 genomes project using revised Cambridge Reference Sequence ( rCRS ) Provide “g1k” reference: includes rCRS , Human herpesvirus 4 type 1, supercontigs and “decoy” sequence v38 genome coming this summer: Incorporate all patches into the reference Some allele fixes to have reference match major
Single Nucleotide Variants (i.e. SNVs or SNPs) Single base substitution from reference Note that “reference” is not always the “major” allele “Multi-allelic” sites have more than 2 cataloged alleles Gholson Lyon, 2012
Small Insertions/Deletions Generally defined as being < 150bp (often much shorter) Frameshift insertions/deletions important “loss of function” class of variants Although InDels divisible by three are “in-frame” when in coding region Hard to call consistently. Poor concordance between algorithms. Where to call an InDel in a homopolymer ? GTTTAC GTTTTAC 01234567 How do you describe the insertion? Ins of T at 5? Or ins of T at 1? CGI in their v1 pipeline preferred calling insertion at end, others at beginning, now always at beginning MNP – Can also be called differently
Copy Number Variants Best results with WGS CNVs > 10kb pretty accurate. Under 10kb problematic. Detecting Deletions Can see coverage drop to near zero Harder to pinpoint breakpoint Possible false positives in low- mapability regions Amplifications Can see coverage jump False positives due sample prep or sequence artifacts Need “baseline,” look at Log Ratio Somatic detection uses normal tissues Can have control population Venter vs Watson WGS CNV- seq 64kbp gain in DDAH1 Gene of NA12878
Structural Variants Looking for: Balanced rearrangements Inversions Translocations Complex Signals to detect SV: Paired-end mappings / insert length Depth of coverage Split-read mapping Translocations can result in “fusion” genes. For example BCR-ABL fusion gene central in pathogenesis certain leukemias .
Example 1kb Inversion ( intron of APP) Let’s take a look at this one…
Golden Helix Leaders in Genetic Analytics Founded in 1998 Multi-disciplinary: computer science, bioinformatics, statistics, genetics Software and analytic services In Everything We Do… Empowerment Simplicity Responsiveness Excellence About Golden Helix
Hundreds of Customers World-Wide
- Over 750 Published Citations
Visualization Genome browsers: Validate variant calls Look at gene annotations, problematic regions, population catalogs Compare samples where no variant called Free Genome Browsers: IGV Popular desktop by Broad UCSC Web-based, extensive annotations GenomeBrowse Designed to be publication ready Smooth zoom and navigation