201007131ghfjjkklllllllll14012254-152438.ppt

nimrahfarooq 12 views 32 slides Jun 06, 2024
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

Bacteria


Slide Content

Functional Genomics with Next-Generation
Sequencing
Jen Taylor
Bioinformatics Team
CSIRO Plant Industry

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Capacity and Resolution
•Next generation sequencing
•Increasing capacity leads to increased resolution
Eric Lander, Broad Institute

CSIRO.INI Meeting July 2010 -Tutorial -Applications
How a Genome Works?
Parts Description
•Function?
•Interconnectedness?
Comparisons
•Population -level
•Between genomes

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Application domains
Reference genome
No Reference Genome
Partially sequenced
UNsequenced
“PUNGenomes”

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Impact of a Reference Genome
Sequence Data
Alignment
Read Density
Characterisation
Genome
Assembly
Contigs

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Applications of Next Generation Sequencing
•Profiling of Variation
•Genetic variation
•Transcript variation
•Epigenetic variation
•Metagenomic variation
•Discovery
•Novel genomes
•Novel genes
•Novel transcripts
•Small / long non-coding RNA
RNA Sequencing (RNASeq)
•Coding and non-coding transcript profiling
•Dynamic and Context dependent
Epigenomics
•Genome-wide protein-DNA interactions, DNA modifications
•Heritable and reversible regulation of gene expression
Today

CSIRO.INI Meeting July 2010 -Tutorial -Applications
RNASeq
•Qualitative –transcript diversity
•Quantitative –transcript abundance
•Impact of NGS
•Observation of transcript complexity
•Transcript discovery
•Small / long non-coding RNA
•Analytical challenges
•Transcript complexity
•Compositional properties

CSIRO.INI Meeting July 2010 -Tutorial -Applications
RNASeq
Library
Construction
Sample
Total RNA
PolyA RNA
Small RNA
Sequencing
Base calling & QC
Mapping to
Genome
Assembly to
Contigs
Digital “Counts”
Reads per kilobase per million
(RPKM)
Transcript structure
Secondary structure
Targets or Products
Reference
PUN
Analysis

CSIRO.INI Meeting July 2010 -Tutorial -Applications
RNASeq –Transcript Complexity
Mapping :
•Reads with multiple locations
•Conserved domains ?
•Sequencing error ?
•Reads Spanning Exons
•Gapped alignments ?
•Sequencing error ?
Erange Pipeline : Mortazavi et al.,
Nature Methods VOL.5 NO.7 JULY 2008

CSIRO.INI Meeting July 2010 -Tutorial -Applications
RNASeq –Compositional properties
Depth of Sequence
•Sequence count ≈ Transcript Abundance
•Majority of the data can be dominated by a
small number of highly abundant transcripts
•Ability to observe transcripts of smaller
abundance is dependent upon sequence
depth

CSIRO.INI Meeting July 2010 -Tutorial -Applications
RNASeq –Compositional properties
Composition
•Sequence counts are a composition
of a fixed number of total sequence
reads
•Therefore they are sum-constrained
and not independent
•Large variations in component
numbers and sizes can produce
artefacts
True Reads
RPKM

CSIRO.INI Meeting July 2010 -Tutorial -Applications
RNASeq -Correspondence
•Good correspondence with :
•Expression Arrays
•Tiling Arrays
•qRT-PCR
•Range of up to 5 orders of magnitude
•Better detection of low abundance
transcripts
•Greater power to detect
•Transcript sequence polymorphism
•Novel trans-splicing
•Paralogous genes
•Individual cell type expression

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Reference Genome -RNASeq

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Reference Genome -RNASeq
Human Exome
Number of exons targeted: ~180,000 (CCDS database)
plus700+ miRNA(Sanger v13)
300+ ncRNA

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Epigenome
•Protein-DNA interactions [ChIPSeq]
•Nucleosome positioning
•Histone modification
•Transcription factor interactions
•Methylation [MethylSeq]
•Impact of NextGen
•Whole genome profiling
•Resolution
•Analytical challenges
•Systematic bias
•Unambiguous mapping
•Robust event calling
Image : ClearScience

CSIRO.INI Meeting July 2010 -Tutorial -Applications
ChIPSeq
MNase
Linker Digest
Sequence &
Align
Remove
Nucleosomes

CSIRO.INI Meeting July 2010 -Tutorial -Applications
ChIPSeq
MNase
Digest
Sequence &
Align
Remove
Nucleosomes

CSIRO.INI Meeting July 2010 -Tutorial -Applications
ChipSeq methods
Pepke et al., 2009
CisGenome
ERANGE
FindPeaks
F-Seq
GLITR
MACS
PeakSeq
QuEST

CSIRO.INI Meeting July 2010 -Tutorial -Applications
MethylSeq using Bisulfite conversion
Cytosine Uracil
Bisulfite
conversion
Thymine
PCR
5-methylcytosine 5-methylcytosine Cytosine
Bisulfite
conversion PCR

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Limited publications from BS-Seq
•Mammals
•Methylation predominant occurs at CpG site
•Several publications in human
•One publications in mouse
•Plants
•Methylation occurs at CG, CHH, CHG sites
•Two publications in arabidopsis
H = A, G, T

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Problems of mapping BS-seq reads
•Reduced sequence complexity
C
m
methylated
CUn-methylated
Watson >>A C
m
G T T C T C C A G T C>>
Bisulfite
conversion
>>A C
m
G T T T T T T A G T T>>
>>A CG T T T T T T A G T T >>

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Problems of mapping BS-seq reads
•Increased search space
Watson >> A C
m
G T T C T C C A G T C>>
Crick << T G C
m
A A G A G G T CA G<<
BSW>> AC
m
GTTTTTTAGTT>> BSC <<TGC
m
AAGAGGTTAG<<
Bisulfite
conversion
BSW>>AC
m
GTTTTTTAGTT>>
BSWR << TG CAAAAAATCAA>>
BSCR >>ACG TTCTCCAAGA >>
BSC <<TGC
m
AAGAGGTTAG <<
PCR

CSIRO.INI Meeting July 2010 -Tutorial -Applications
ELAND
•Mapping reads to genome sequences
•Mapping reads to two converted genome
sequences
•Cross match for reads mapping to multiple
positions in converted genomes
•Mapping results were combined to generate methylation
information
•Eland only allows 2 mismatches.
Lister et al. Cell(2008)

CSIRO.INI Meeting July 2010 -Tutorial -Applications
BSMAP
•Based on HASH table seeding algorithm
Xi and Li BMC Bioinformatics(2009)

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Re-mapping of Lister’s data using BSMAP
Raw Reads Methods
Uniquely
Mapped Reads
Unique and
Nonclonal
Reads
Unique and
nonclonal
reads%
144,704,372
Eland 55,805,931 39,113,599 27.03%
BSMAP 67,975,425 48,498,687 35.52%
Lister et al. Cell(2008)

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Methylation pattern throughout chromosomes
CHG
Crick
Watson
Position
Arabidopsis Chromosome 3
CG
Watson
Crick
CHH Watson
Crick
Methylation Level / 50Kb
1.0
0.80
0.20

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Partially / Unsequenced Genomes
Options for dealing with partial or unsequenced genomes
•Wait for or generate the genome sequence
•‘Borrow’ a reference genome from a phylogenetic neighbour
•Take a deep breath and ‘do denovo’
•Denovo Genome
•Denovo Transcriptome
DNA or RNA Sequence
Data
Partial Sequence
Database
Partial
Assembly
Gene Annotation
Genetic Variation
Non-coding RNA
Transcript Variation

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Plant Genomes –Haploid Size
Human
Arabidopsis
Rice
Potato
Sugarcane
Cotton
Barley
Wheat
Diameter proportional to genome haploid genome size

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Plant Genomes –Total Size
Human Cotton Barley Sugarcane
Wheat

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Denovo RNA Seq
•Why transcriptome ?
•Large genome sizes with high repeat content are difficult to
assemble
•Transcriptomes more constant size
•Enriched for functional content
•Aims :
•Transcript discovery
•Small /long non-coding RNA profiling
•Analytical challenges
•Assembly –ABySS, Velvet, Euler-SR
•Comparisons between non-discrete, overlapping transcripts
•Annotation
•Ploidy

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Summary –Impacts and Challenges
•RNASeq
•Increased resolution
•Increased power for transcript complexity and variation
•Analytical challenges –transcript complexity, compositional bias
•Large gains in small and long non-coding RNA profiling
•Epigenomics
•ChipSeq and MethylSeq
•Genome-wide with resolution
•Robust event calling is challenging
•Denovo transcriptomics
•Attractive option for large, repeat rich genomes

CSIRO.INI Meeting July 2010 -Tutorial -Applications
Acknowledgements
CSIRO PI Bioinformatics Team
Andrew Spriggs
Stuart Stephen
Emily Ying
Jose Robles
Michael James
CSIRO Biostatistics
David Lovell
Tags