The study of the complete set of RNAs (transcriptome) encoded by the genome of a specific cell or organism at a specific time or under a specific set of conditions is called Transcriptomics.
Transcriptomics aims:
I. To catalogue all species of transcripts, including mRNAs, noncoding RNAs and ...
The study of the complete set of RNAs (transcriptome) encoded by the genome of a specific cell or organism at a specific time or under a specific set of conditions is called Transcriptomics.
Transcriptomics aims:
I. To catalogue all species of transcripts, including mRNAs, noncoding RNAs and small RNAs.
II. To determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications.
III. To quantify the changing expression levels of each transcript during development and under different conditions.
Size: 5.72 MB
Language: en
Added: Dec 07, 2021
Slides: 48 pages
Slide Content
ADVANCES IN TRANSCRIPTOMICS AND IT’S APPROACHES CHARUPRIYA CHAUHAN ID- 52616 DOCTORAL SEMINAR I Setia Pramana 1
Transcriptome Approaches for transcriptome analysis Candidate gene approach Hybridisation based approaches Sequencing based approaches Emerging sequencing approaches Applications of transcriptomics Case studies Overview
CENTRAL DOGMA OF MOLECULAR BIOLOGY GENOME TRANSCRIPTOME PROTEOME Complete set of transcripts and relative levels of expression in a particular cell or tissue under defined conditions at a given time Complete DNA content of an organism with all its gene and regulatory sequences Complete collection of proteins and their relative levels in each cell
TRANSCRIPTOME The ‘ transcriptome ’ is defined as the complete complement of RNA molecules generated by a cell or population of cells. The term was first proposed by Charles Auffray in 1996. Charles Auffray
The study of the complete set of RNAs ( transcriptome) encoded by the genome of a specific cell or organism at a specific time or under a specific set of conditions is called Transcriptomics . In transcriptomics , the expression of genes by a genome is studied : Qualitatively (identifying which genes are expressed and which are not) Quantitatively (measuring varying levels of expression for different genes) TRANSCRIPTOMICS
Milestones in transcriptome analysis YEAR MILESTONE 1965 Sequence of the first RNA molecule determined 1977 Development of the Northern blot technique and the Sanger sequencing method 1989 Reports of RT-PCR experiments for transcriptome analysis 1991 First high-throughput EST sequencing study 1992 Introduction of Differential Display (DD) for the discovery of differentially expressed genes 1995 Reports of the microarray and Serial Analysis of Gene Expression (SAGE) methods 2001 Draft of the Human Genome completed 2005 First next-generation sequencing technology (454/Roche) introduced to the market 2006 First transcriptome sequencing studies using a next-generation technology (454/Roche)
Why t ranscriptomics is im p o r tant ? Transcriptome profiling provides clues to: Expressed sequences and genes of genome Gene regulation and regulatory sequences Gene function annotation Functional differences between tissue and cell types Identification of candidate genes for any given process or disease
To catalogue all species of transcripts, including mRNAs, noncoding RNAs and small RNAs . To determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications. T o qu a nti f y t he ch a nging e xp r e ssi o n l e ve l s o f e a c h transcript during development and u n de r diff e r e n t conditions . Transcriptomics aims:
Approaches for transcriptome mining
Northern Blotting Analysis The northern blot, or RNA blot , is a technique used in molecular biology research to study gene expression by detection of RNA (or isolated mRNA) in a sample. The northern blot technique was developed in 1977 by James Alwine , David Kemp, and George Stark at Stanford University.
Reverse transcription polymerase chain reaction (RT-PCR) A variant of polymerase chain reaction (PCR), commonly used in molecular biology to detect RNA expression.
Comparision between RT-PCR and PCR
Probes for qRT PCR SYBR Green Taqman Probe
Fluorescence emission is measured continuously during the PCR reaction and ▵ Rn (increase in fluorescence emission, from which the background fluorescence signal is subtracted) is plotted against cycle number. The threshold cycle (Ct) is the cycle at which the fluorescence exceeds a chosen threshold . Fig: PCR amplification plot Intensity of fluoroscence Cycle number PCR amplification plot
One-step vs. Two-step RT- qPCR One-step assays combine reverse transcription and PCR in a single tube and buffer, using a reverse transcriptase along with a DNA polymerase. One-step RT- qPCR only utilizes sequence-specific primers. In two-step assays , the reverse transcription and PCR steps are performed in separate tubes, with different optimized buffers, reaction conditions, and priming strategies.
MICROARRAY Microarray is a nucleic acid hybridization based, high throughput technique developed to quantitate gene expression levels at the whole genome scale. A microarray is a pattern of ssDNA probes which are immobilized on a surface (called a chip or a slide) in a regular pattern of spots and each spot containing millions of copies of a unique DNA probe. Principle: base-pairing hybridization The technique was first time used for “ Qua n titative Monitoring of G ene Expression Patterns with a complementary DNA microarray ” by Patrick Brown, Mark Schena and colleague published in Science (1995 ). Mark Schena “ Father of Microarray Technology”
Methodology Microarray analysis can be divided into Probe production Specific sequences are immobilized to a surface and reacted with labelled cDNA targets. 2. Target ( cDNA ) production mRNA is extracted from the sample, converted to cDNA and labeled using fluorescent dyes (usually Cy3 or Cy5). The array
3 . Hybridisation The chip is exposed to a solution containing extracted labelled cDNA Complementary nucleic acid sequences get pair via hydrogen bonds. Washing off of non-specific bonding sequences .
4 . Scanning The array is scanned to measure fluorescent label. Fluorescently labelled target sequences that bind to a probe sequence generate a signal. The signal depends on. The hybridization conditions, ex: temperature washing after hybridization Total strength of the signal, depends upon the amount of target sample.
prepared by using cDNA known as cDNA chips , cDNAs are amplified using PCR than Spotted by inkjet technology Probes are >1000 nucleotides in length Two different fluorophores can be used to label the test and control sample (two-channel detection) Cannot identify alternative splicing events short DNA oligonucleotides are directly synthesised on solid microarray substrate using photolithgraphy ( Affymatrix ) or spotted by ink-jet printing. Probes are 70-80 nucleotides in length Uses a single fluorescent lable (single channel) Can identify alternative splicing events Commercial oligonucleotide chips are available (e.g. Affymetrix , Inc. GeneChip system)
The Colours of a Microarray cDNA chip Affymatrix gene chip
Applications of Microarray
ADVANTAGES High-throughput Fast Relatively inexpensive LIMITATIONS Reliance upon existing knowledge about genome sequence High background levels owing to cross-hybridization A limited dynamic range of detection due to both background and saturation of signals Comparing expression levels across different experiments is often difficult and can require complicated normalization methods
SAGE - Serial Analysis of Gene Expression SAGE invented at Johns Hopkins University in USA (Oncology Center) by Dr. Victor Velculescu in 1995. SAGE is an approach that allows rapid and detailed analysis of overall gene expression patterns. SAGE provides quantitative and comprehensive expression profiling in a given cell population. Dr. Victor Velculescu
Principle Underlining SAGE methodology A short sequence tag (10-14bp) contains sufficient information to uniquely identify a transcript provided that tag is obtained from a unique position within each transcript. Sequence tag can be linked together to form long serial molecules that can be cloned and sequenced. Quantitation of the number of times a particular tag is observed provides the expression level of the corresponding transcript.
Advantages mRNA sequence does not need to be known prior, so genes of variants which are not known can be discovered. Its more accurate as it involves direct counting of the number of transcripts. Disadvantages Length of gene tag is extremely short (13 or 14bp), so if the tag is derived from an unknown gene, it is difficult to analyze with such a short sequence. Type II restriction enzyme does not yield same length fragments.
CAGE Cap analysis gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample . First published by Hayashizaki , Carninci and coworkers in 2003. Aims to identify TIS and promoters . Collects 21 bp from 5’ ends of cap purified cDNA . The method essentially uses full-length cDNAs , to the 5’ ends of which linkers are attached. This is followed by the cleavage of the first 20 base pairs by class II restriction enzymes, PCR, concatamerization , and cloning of the CAGE tags
ESTs In 1983, SD Putney for the first time demonstrated the use of cDNA in identification of genome . In 1991 Adams and co-workers coined the term EST. They are the tiny sequences of cistron randomly selected from genome library and can be used to identify and map the whole genome of any particular species. ESTs are usually 200 to 500 nucleotides long and are generated by sequencing the ends of DNA. Figure: Method of construction of ESTs from nascent DNA. Use of EST Identify unknown gene and map their position in a genome . Provi d e s i m p l e and i ne x p ensi v e path f o r d iscov e r i ng new gene . Genome map construction . Characterization of expressed gene
RNA-Sequencing RNA- seq , also called whole- transcriptome shotgun sequencing, refers to the use of high-throughput sequencing technologies to reveal the presence and quantity of RNA in a biological sample at a given moment.
Direct RNA sequencing Direct single molecule RNA sequencing without prior cDNA conversion Sequencing by synthesis Polyadenylation by poly (A) polymerase I (PAPI) from E. coli Blocking by 3′ deoxyATP Poly (A) tail ~ 150 nucleotides Each RNA molecule is filled in with dTTP and polymerase and than locked in position VT-A, VT-C, VT-G, stopping the further nucleotide addition. Unincorporated dye labelled nucleotides are washed away, images are taken. Flourescent dye and inhibitor are cleaved off from the incorporated nucleotide, making it suitable for additional rounds of incorporation. Repeating this cycle of rinsing, imaging and cleaving provides a set of images that are aligned and are used to generate the sequence information for each individual RNA molecule with real time image processing. Requirement of minor RNA quantities No biases due to cDNA synthesis, end repair, ligation and amplification procedures Potentially useful to study short RNA species
Sequencing using NGS Next generation sequencing is also known as massively-parallel sequencing It enables large amount sequencing to be performed in a single assay Mostly produce short reads <400bp Read numbers vary from ~ 1 million to ~ 1 billion per run
Next-generation sequencing- Workflow c DNA fragmentation an d i n vitro adaptor ligation emulsion PCR bridge PCR Pyrosequencing Sequencing-by-ligation Sequencing-by-synthesis 1 2 3 1 2 3 SOLiD platform ROCHE/ 454 sequencing ILLUMINA/ Solexa technology Library preparation Clonal amplification Cyclic array sequencing
Data analysis for mRNA- seq : key steps Mapping reads to the reference genome Read mapping of 454 sequencers can be done by conventional sequence aligners. Short read aligner needed for Illumina or SOLiD reads Quantifying the known genes Prediction of novel transcripts Assembly of short reads: comparative vs. de novo
Next generation sequencing (NGS) techniques Perticulars 454 Sequencing Illumina/Solexa ABI SOLiD Sequencing Chemistry Pyrosequencing Polymerase-based sequence-by-synthesis Ligation-based sequencing Amplification approach Emulsion PCR Bridge amplification Emulsion PCR Paired end (PED) separation 3 kb 200-500 bp 3 kb Mb per run 100 Mb 1300 Mb 3000 Mb Time per PED run <0.5 day 4 days 5 days Read length (update) 250-400 bp 35, 75 and 100 bp 35 and 50 bp Cost per run $ 8,438 USD $ 8,950 USD $ 17,447 USD Cost per Mb $ 84.39 USD $ 5.97 USD $ 5.81 USD
Do not use PCR amplification for template preparation Sequences single RNA molecule Faster , Cheaper, much higher throughput than NGS . Higher error rate . can produce long reads averaging between 5,000bp to 15,000bp. also known as long-read sequencing The two commercially available third-generation DNA sequencing technologies are Pacific Biosciences (PacBio ) Single Molecule Real Time (SMRT) sequencing, the Oxford Nanopore Technologies sequencing platform. THIRD GENERATION SEQUENCING
Commercially introduced in 2010. Sequences DNA using sequencing-by-synthesis , Based on monitoring polymerase activity while incorporating differently labelled nucleotides into the DNA strand Read lengths of up to ~100,000 bp Greatest throughput (~8GB / day). Error rate - 10 % to 15% Cost ~ $500-2000/Gb ($100/run ) PacBio SMRT technology L imitation PacBio sequencing is the cost relative to second-generation approaches, which has limited its application. F or analyzing large numbers of genomes.
Fig. In Single Molecule Real-Time (SMRT) sequencing the emission spectra of fluorescent labelled nucleotides are detected while being incorporated by the polymerase . Source: Pacific Biosciences.
Released in 2014. Sequences RNA by electronically measuring the minute disruptions to electric current as RNA molecules pass through a nanopore . 150 bases/sec/pore 125 Gb/ day 20-100.000 bases reads 4% error rate Cost $ 10/Gb Oxford Nanopore Technologies Fig . ONTs MinION sequencing device attached to a laptop computer .
Some mRNA- Seq Applications
Some questions that can be addressed by transcriptomic methods: a) How much transcript is there from each gene (expression level)? b) How does expression level change over development (expression profile)? c) How does expression differ among different tissues ? d) How does environment/treatment affect gene expression? e) How much variation is there in gene expression levels within natural populations?
Molecular Markers Developed by Means of High-throughput Transcriptomics Techniques for the Breeding of Important Crops
In this study, it is revealed that PB could significantly enhance rice seedling survival by retaining a higher level of chlorophyll content and alcohol dehydrogenase activity . Transcriptomic analysis identified 3936 differentially expressed genes (DEGs) among the GA- and PB-treated samples and control, which are extensively involved in the submergence and other abiotic stress responses, phytohormone biosynthesis and signaling, photosynthesis, and nutrient metabolism . The results suggested that PB enhances rice survival under submergence through maintaining the photosynthesis capacity and reducing nutrient metabolism.
Comparative analysis of wheat anther transcriptomes for male fertile wheat and SQ-1–induced male sterile wheat was carried out using next-generation sequencing technology In all, 42,634,12 sequence reads were generated and were assembled into 82,356 high quality unigenes with an average length of 724 bp. 1,088 unigenes were significantly differentially expressed in the fertile and sterile wheat anthers, including 643 up-regulated unigenes and 445 down-regulated unigenes . This study is the first to provide a systematic overview comparing wheat anther transcriptomes of male fertile wheat with those of SQ-1–induced male sterile wheat and is a valuable source of data for future research in SQ-1–induced wheat male sterility.