Transcriptomics approaches

8,892 views 48 slides Dec 07, 2021
Slide 1
Slide 1 of 48
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48

About This Presentation

The study of the complete set of RNAs (transcriptome) encoded by the genome of a specific cell or organism at a specific time or under a specific set of conditions is called Transcriptomics.
Transcriptomics aims:
I. To catalogue all species of transcripts, including mRNAs, noncoding RNAs and ...


Slide Content

ADVANCES IN TRANSCRIPTOMICS AND IT’S APPROACHES CHARUPRIYA CHAUHAN ID- 52616 DOCTORAL SEMINAR I Setia Pramana 1

Transcriptome Approaches for transcriptome analysis Candidate gene approach Hybridisation based approaches Sequencing based approaches Emerging sequencing approaches Applications of transcriptomics Case studies Overview

CENTRAL DOGMA OF MOLECULAR BIOLOGY GENOME TRANSCRIPTOME PROTEOME Complete set of transcripts and relative levels of expression in a particular cell or tissue under defined conditions at a given time Complete DNA content of an organism with all its gene and regulatory sequences Complete collection of proteins and their relative levels in each cell

TRANSCRIPTOME The ‘ transcriptome ’ is defined as the complete complement of RNA molecules generated by a cell or population of cells. The term was first proposed by Charles Auffray in 1996. Charles Auffray

The study of the complete set of RNAs ( transcriptome) encoded by the genome of a specific cell or organism at a specific time or under a specific set of conditions is called Transcriptomics . In transcriptomics , the expression of genes by a genome is studied : Qualitatively (identifying which genes are expressed and which are not) Quantitatively (measuring varying levels of expression for different genes) TRANSCRIPTOMICS

Milestones in transcriptome analysis YEAR MILESTONE 1965 Sequence of the first RNA molecule determined 1977 Development of the Northern blot technique and the Sanger sequencing method 1989 Reports of RT-PCR experiments for transcriptome analysis 1991 First high-throughput EST sequencing study 1992 Introduction of Differential Display (DD) for the discovery of differentially expressed genes 1995 Reports of the microarray and Serial Analysis of Gene Expression (SAGE) methods 2001 Draft of the Human Genome completed 2005 First next-generation sequencing technology (454/Roche) introduced to the market 2006 First transcriptome sequencing studies using a next-generation technology (454/Roche)

Why t ranscriptomics is im p o r tant ? Transcriptome profiling provides clues to: Expressed sequences and genes of genome Gene regulation and regulatory sequences Gene function annotation Functional differences between tissue and cell types Identification of candidate genes for any given process or disease

To catalogue all species of transcripts, including mRNAs, noncoding RNAs and small RNAs . To determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications. T o qu a nti f y t he ch a nging e xp r e ssi o n l e ve l s o f e a c h transcript during development and u n de r diff e r e n t conditions . Transcriptomics aims:

Approaches for transcriptome mining

Northern Blotting Analysis The northern blot, or RNA blot , is a technique used in molecular biology research to study gene expression by detection of RNA (or isolated mRNA) in a sample. The northern blot technique was developed in 1977 by James Alwine , David Kemp, and George Stark at Stanford University.

Reverse transcription polymerase chain reaction (RT-PCR) A variant of polymerase chain reaction (PCR), commonly used in molecular biology to detect RNA expression.

Comparision between RT-PCR and PCR

Probes for qRT PCR SYBR Green Taqman Probe

Fluorescence emission is measured continuously during the PCR reaction and ▵ Rn (increase in fluorescence emission, from which the background fluorescence signal is subtracted) is plotted against cycle number. The threshold cycle (Ct) is the cycle at which the fluorescence exceeds a chosen threshold . Fig: PCR amplification plot Intensity of fluoroscence Cycle number PCR amplification plot

One-step vs. Two-step RT- qPCR One-step assays combine reverse transcription and PCR in a single tube and buffer, using a reverse transcriptase along with a DNA polymerase. One-step RT- qPCR only utilizes sequence-specific primers. In two-step assays , the reverse transcription and PCR steps are performed in separate tubes, with different optimized buffers, reaction conditions, and priming strategies.

MICROARRAY Microarray is a nucleic acid hybridization based, high throughput technique developed to quantitate gene expression levels at the whole genome scale. A microarray is a pattern of ssDNA probes which are immobilized on a surface (called a chip or a slide) in a regular pattern of spots and each spot containing millions of copies of a unique DNA probe. Principle: base-pairing hybridization The technique was first time used for “ Qua n titative Monitoring of G ene Expression Patterns with a complementary DNA microarray ” by Patrick Brown, Mark Schena and colleague published in Science (1995 ). Mark Schena “ Father of Microarray Technology”

Methodology Microarray analysis can be divided into Probe production Specific sequences are immobilized to a surface and reacted with labelled cDNA targets. 2. Target ( cDNA ) production mRNA is extracted from the sample, converted to cDNA and labeled using fluorescent dyes (usually Cy3 or Cy5). The array

3 . Hybridisation The chip is exposed to a solution containing extracted labelled cDNA Complementary nucleic acid sequences get pair via hydrogen bonds. Washing off of non-specific bonding sequences .

4 . Scanning The array is scanned to measure fluorescent label. Fluorescently labelled target sequences that bind to a probe sequence generate a signal. The signal depends on. The hybridization conditions, ex: temperature washing after hybridization Total strength of the signal, depends upon the amount of target sample.

prepared by using cDNA known as cDNA chips , cDNAs are amplified using PCR than Spotted by inkjet technology Probes are >1000 nucleotides in length Two different fluorophores can be used to label the test and control sample (two-channel detection) Cannot identify alternative splicing events short DNA oligonucleotides are directly synthesised on solid microarray substrate using photolithgraphy ( Affymatrix ) or spotted by ink-jet printing. Probes are 70-80 nucleotides in length Uses a single fluorescent lable (single channel) Can identify alternative splicing events Commercial oligonucleotide chips are available (e.g. Affymetrix , Inc. GeneChip system)

The Colours of a Microarray cDNA chip Affymatrix gene chip

Applications of Microarray

ADVANTAGES High-throughput Fast Relatively inexpensive LIMITATIONS Reliance upon existing knowledge about genome sequence High background levels owing to cross-hybridization A limited dynamic range of detection due to both background and saturation of signals Comparing expression levels across different experiments is often difficult and can require complicated normalization methods

SAGE - Serial Analysis of Gene Expression SAGE invented at Johns Hopkins University in USA (Oncology Center) by Dr. Victor Velculescu in 1995. SAGE is an approach that allows rapid and detailed analysis of overall gene expression patterns. SAGE provides quantitative and comprehensive expression profiling in a given cell population. Dr. Victor Velculescu

Principle Underlining SAGE methodology A short sequence tag (10-14bp) contains sufficient information to uniquely identify a transcript provided that tag is obtained from a unique position within each transcript. Sequence tag can be linked together to form long serial molecules that can be cloned and sequenced. Quantitation of the number of times a particular tag is observed provides the expression level of the corresponding transcript.

Advantages mRNA sequence does not need to be known prior, so genes of variants which are not known can be discovered. Its more accurate as it involves direct counting of the number of transcripts. Disadvantages Length of gene tag is extremely short (13 or 14bp), so if the tag is derived from an unknown gene, it is difficult to analyze with such a short sequence. Type II restriction enzyme does not yield same length fragments.

CAGE Cap analysis gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample . First published by Hayashizaki , Carninci and coworkers in 2003. Aims to identify TIS and promoters . Collects 21 bp from 5’ ends of cap purified cDNA . The method essentially uses full-length cDNAs , to the 5’ ends of which linkers are attached. This is followed by the cleavage of the first 20 base pairs by class II restriction enzymes, PCR, concatamerization , and cloning of the CAGE tags

ESTs In 1983, SD Putney for the first time demonstrated the use of cDNA in identification of genome . In 1991 Adams and co-workers coined the term EST. They are the tiny sequences of cistron randomly selected from genome library and can be used to identify and map the whole genome of any particular species. ESTs are usually 200 to 500 nucleotides long and are generated by sequencing the ends of DNA. Figure: Method of construction of ESTs from nascent DNA. Use of EST Identify unknown gene and map their position in a genome . Provi d e s i m p l e and i ne x p ensi v e path f o r d iscov e r i ng new gene . Genome map construction . Characterization of expressed gene

RNA-Sequencing RNA- seq , also called whole- transcriptome shotgun sequencing, refers to the use of high-throughput sequencing technologies to reveal the presence and quantity of RNA in a biological sample at a given moment.

Direct RNA sequencing Direct single molecule RNA sequencing without prior cDNA conversion Sequencing by synthesis Polyadenylation by poly (A) polymerase I (PAPI) from E. coli Blocking by 3′ deoxyATP Poly (A) tail ~ 150 nucleotides Each RNA molecule is filled in with dTTP and polymerase and than locked in position VT-A, VT-C, VT-G, stopping the further nucleotide addition. Unincorporated dye labelled nucleotides are washed away, images are taken. Flourescent dye and inhibitor are cleaved off from the incorporated nucleotide, making it suitable for additional rounds of incorporation. Repeating this cycle of rinsing, imaging and cleaving provides a set of images that are aligned and are used to generate the sequence information for each individual RNA molecule with real time image processing. Requirement of minor RNA quantities No biases due to cDNA synthesis, end repair, ligation and amplification procedures Potentially useful to study short RNA species

Sequencing using NGS Next generation sequencing is also known as massively-parallel sequencing It enables large amount sequencing to be performed in a single assay Mostly produce short reads <400bp Read numbers vary from ~ 1 million to ~ 1 billion per run

Next-generation sequencing- Workflow c DNA fragmentation an d i n vitro adaptor ligation emulsion PCR bridge PCR Pyrosequencing Sequencing-by-ligation Sequencing-by-synthesis 1 2 3 1 2 3 SOLiD platform ROCHE/ 454 sequencing ILLUMINA/ Solexa technology Library preparation Clonal amplification Cyclic array sequencing

Data analysis for mRNA- seq : key steps Mapping reads to the reference genome Read mapping of 454 sequencers can be done by conventional sequence aligners. Short read aligner needed for Illumina or SOLiD reads Quantifying the known genes Prediction of novel transcripts Assembly of short reads: comparative vs. de novo

Next generation sequencing (NGS) techniques Perticulars 454 Sequencing Illumina/Solexa ABI SOLiD Sequencing Chemistry Pyrosequencing Polymerase-based sequence-by-synthesis Ligation-based sequencing Amplification approach Emulsion PCR Bridge amplification Emulsion PCR Paired end (PED) separation 3 kb 200-500 bp 3 kb Mb per run 100 Mb 1300 Mb 3000 Mb Time per PED run <0.5 day 4 days 5 days Read length (update) 250-400 bp 35, 75 and 100 bp 35 and 50 bp Cost per run $ 8,438 USD $ 8,950 USD $ 17,447 USD Cost per Mb $ 84.39 USD $ 5.97 USD $ 5.81 USD

Do not use PCR amplification for template preparation Sequences single RNA molecule Faster , Cheaper, much higher throughput than NGS . Higher error rate . can produce long reads averaging between 5,000bp to 15,000bp. also known as long-read sequencing The two commercially available third-generation DNA sequencing technologies are Pacific Biosciences (PacBio ) Single Molecule Real Time (SMRT) sequencing, the Oxford Nanopore Technologies sequencing platform. THIRD GENERATION SEQUENCING

Commercially introduced in 2010. Sequences DNA using sequencing-by-synthesis , Based on monitoring polymerase activity while incorporating differently labelled nucleotides into the DNA strand Read lengths of up to ~100,000 bp Greatest throughput (~8GB / day). Error rate - 10 % to 15% Cost ~ $500-2000/Gb ($100/run ) PacBio SMRT technology L imitation PacBio sequencing is the cost relative to second-generation approaches, which has limited its application. F or analyzing large numbers of genomes.

Fig. In Single Molecule Real-Time (SMRT) sequencing the emission spectra of fluorescent labelled nucleotides are detected while being incorporated by the polymerase . Source: Pacific Biosciences.

Released in 2014. Sequences RNA by electronically measuring the minute disruptions to electric current as RNA molecules pass through a nanopore . 150 bases/sec/pore 125 Gb/ day 20-100.000 bases reads 4% error rate Cost $ 10/Gb Oxford Nanopore Technologies Fig . ONTs MinION sequencing device attached to a laptop computer .

Some mRNA- Seq Applications

Some questions that can be addressed by transcriptomic methods: a) How much transcript is there from each gene (expression level)? b) How does expression level change over development (expression profile)? c) How does expression differ among different tissues ? d) How does environment/treatment affect gene expression? e) How much variation is there in gene expression levels within natural populations?

Molecular Markers Developed by Means of High-throughput Transcriptomics Techniques for the Breeding of Important Crops

In this study, it is revealed that PB could significantly enhance rice seedling survival by retaining a higher level of chlorophyll content and alcohol dehydrogenase activity . Transcriptomic analysis identified 3936 differentially expressed genes (DEGs) among the GA- and PB-treated samples and control, which are extensively involved in the submergence and other abiotic stress responses, phytohormone biosynthesis and signaling, photosynthesis, and nutrient metabolism . The results suggested that PB enhances rice survival under submergence through maintaining the photosynthesis capacity and reducing nutrient metabolism.

Comparative analysis of wheat anther transcriptomes for male fertile wheat and SQ-1–induced male sterile wheat was carried out using next-generation sequencing technology In all, 42,634,12 sequence reads were generated and were assembled into 82,356 high quality unigenes with an average length of 724 bp. 1,088 unigenes were significantly differentially expressed in the fertile and sterile wheat anthers, including 643 up-regulated unigenes and 445 down-regulated unigenes . This study is the first to provide a systematic overview comparing wheat anther transcriptomes of male fertile wheat with those of SQ-1–induced male sterile wheat and is a valuable source of data for future research in SQ-1–induced wheat male sterility.

CONCLUSION

Thank You