Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plant Genome Project, Functional Analysis of Genes
12,199 views
90 slides
Jun 18, 2019
Slide 1 of 90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
About This Presentation
Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plant Genome Project, Functional Analysis of Genes
Size: 3 MB
Language: en
Added: Jun 18, 2019
Slides: 90 pages
Slide Content
Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plant Genome Project, Functional Analysis of Genes Promila Sheoran PhD Biotechnology GJU S&T Hisar
Genome Organization The word “genome,” coined by German botanist Hans Winkler in 1920, was derived simply by combining gene and the final syllable of chromosome . If not specified, “genome” usually refers to the nuclear genome! An organism’s genome is defined as the complete haploid genetic complement of a typical cell. The genetic content of the organelles in the cell is not considered part of the nuclear genome. In diploid organisms, sequence variations exist between the two copies of each chromosome present in a cell. The genome is the ultimate source of information about an organism.
Continue… The number of genomes sequenced in their entirety is now in the thousands and includes organisms ranging from bacteria to mammals. The first complete genome to be sequenced was that of the bacterium Haemophilus influenzae , in 1995. The first eukaryotic genome sequence, that of the yeast Saccharomyces cerevisiae , followed in 1996. The genome sequence for the bacterium Escherichia coli became available in 1997 .
Hierarchy of gene organization Gene – single unit of genetic function Operon – genes transcribed in single transcript Regulon – genes controlled by same regulator Modulon – genes modulated by same stimulus Element – plasmid, chrom - osome,phage Genome ** order of ascending complexity
Prokaryotes and Eukaryotes genome Prokaryotes Eukaryotes Single cell Single or multi cell No nucleus Nucleus One piece of circular DNA Chromosomes No mRNA post transcriptional modification Exons / Introns splicing
Prokaryotic Genome Organization Prokaryotes The genome of E . coli contains 4X10 6 base pairs > 90% of DNA encode protein Lacks a membrane-bound nucleus. Circular DNA and supercoiled domain Histones not present
Prokaryotic genomes generally contain one large circular piece of DNA referred to as a "chromosome" (not a true chromosome in the eukaryotic sense). Some bacteria have linear "chromosomes". Many bacteria have small circular DNA structures called plasmids which can be swapped between neighbors and across bacterial species. Continue…
The term plasmid was first introduced by the American molecular biologist Joshua Lederberg in 1952. A plasmid is separate from, and can replicate independently of, the chromosomal DNA. Plasmid size varies from 1 to over 1,000 ( kbp ). Plasmid
Eukaryotic genome organization More about the nuclear genome: Multiple linear chromosomes, 5000 to 50000 genes Mono- cistronic transcription units Discontinuous coding regions ( introns and exons ) Large amounts of non-coding DNA Transcription and translation take place in different compartments Variety of RNA genes: rRNA , tRNA , snRNA (small nuclear), sno (small nucleolar ), microRNAs , etc. Often diploid genomes and obligatory sexual reproduction Standard mechanism of recombination: meiosis Multiple genomes: nuclear, plastid genome, mitochondria, chloroplasts Plastid genomes resemble prokaryotic genomes
EUKARYOTIC GENOME ‘The nucleus is heart of the cell, which serves as the main distinguishing feature of the eukaryotic cells. It is an organelle submerged in its sea of turbulent cytoplasm which has the genetic information encoding the past history and future prospects of the cell. Nucleus contains many thread like coiled structures which remain suspended in the nucleoplasm which are known as chromatin substance ’ Chromatin is the complex combination of DNA and proteins that makes up chromosomes. The major proteins involved in chromatin are histone proteins; although many other chromosomal proteins have prominent roles too. . The functions of chromatin is to package DNA into smaller volume to fit in the cell, to strengthen the DNA to allow mitosis and meiosis and to serve as a mechanism to control gene expression and DNA replication.
ORGANIZATION OF CHROMATIN In resting non-dividing eukaryotic cells, the genome is in the form of nucleoprotein-complex- the chromatin. (randomly dispersed in the nuclear matrix as interwoven network of fine chromatin threads) The information stored in DNA is organized, replicated and read with the help of a variety of DNA-binding proteins: Structural Proteins- Histones(Packing proteins): Main structural proteins found in eukaryotic cells Low molecular weight basic proteins with high proportion of positively charged amino acids, Bound to DNA along most of its length, The positive charge helps histones to bind to DNA and play a crucial role in packing of long DNA molecules. Functional Proteins- Non- Histones: Associated with gene regulation and other functions of chromatin.
Hierarchy of Chromatin Organization in the Cell Nucleus: Nuclear Matrix Associated Chromatin Loops
Next Generation Sequencing DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule Refers to non-Sanger-based high-throughput DNA sequencing technologies
ILLUMINA SEQUENCING Step 1: Sample Preparation Steps 2-6: Cluster Generation by Bridge Amplification Steps 7-12: Sequencing by Synthesis
Solid-phase amplification can produce 100-200 million spatially separated clusters, providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction
454 Sequencing Emulsion-based sample preparation ( emPCR ) Pyrosequencing : non- electrophoretic , bioluminescence method that measures the release of inorganic pyrophosphate by proportionally converting it into visible light using a series of enzymatic reaction
Step 1:
Step 2: Loading DNA Sample onto Beads
Step 3: Sequencing
Sequence Assembly Sequence assembly refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence. First sequence assemblers began to appear in the late 1980s and early 1990s
Why We Need genome assemblers T erabytes of sequencing data which need processing on computing clusters I dentical and nearly identical sequences increase the time and space complexity of algorithms exponentially ; E rrors in the fragments from the sequencing instruments
Basic Principles Of Assembly Sequence and quality data are read and the reads are cleaned. Overlaps are detected between reads. False overlaps, duplicate reads, chimeric reads and reads with self-matches are also identified The reads are grouped to form a contig layout of the finished sequence. A multiple sequence alignment of the reads is performed, and a consensus sequence is constructed for each contig layout Possible sites of mis -assembly are identified by combining manual inspection with quality value validation .
Mapping Assembly A ssembles reads against an existing backbone sequence, building a sequence that is similar but not necessarily identical to the backbone sequence Compared to de novo assembly, the mapping of resequenced reads to a template genome is a computationally easier problem Use seeding techniques Seeds of fixed length allow for not more than one or two mismatches. In addition, the capability to detect insertions and deletions is very limited and most programs can only detect indels in subsequent alignment runs
Tools for Mapping Assembly MAQ-Particularly designed for Illumina SOAP- program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences SHRiMP - Developed with Applied Biosystem SOCS - Aligns SOLiD data Eland - Efficient Large-Scale Alignment of Nucleotide Databases GMAP- Genomic Mapping and Alignment Program for mRNA and EST Sequences
De-novo assembly A ssembles short reads to create full-length sequences . De novo assembly software must deal with sequencing errors, repeat structures, and the computational complexity of processing large volumes of data.
De-novo assembly tools ABySS - Assembly By Short Sequences designed for very short reads ALLPATHS-De novo assembly of whole-genome shotgun microreads Velvet -designed for short read sequencing technologies Edena - Exact DE Novo Assembler MIRA2- Mimicking Intelligent Read Assembly is able to perform true hybrid de-novo assembly
Arabidopsis thaliana genome project Arabidopsis: The Model Plant Relative genetic simplicity Fast life cycle Susceptibility to manipulation through genetic engineering Convenience and abundance Basic similarities to other crops
Arabidopsis genome Contains about 125 Mb of sequence Contains 25,500 genes 5 chromosomes Has 35% unique genes
Arabidopsis Genome Initiative ( AGI) Collaboration of the U.S. Department of Energy and the U.S. Department of Agriculture, The European Union, the Government of France, and the Chiba Prefectural Government in Japan August 1996- National Science Foundation (NSF) in Arlington, VA
Major Highlights of Genome Project 1990- Arabidosis genome project initiated 1995 standard bac and p1 libraries constructed 1996- Arabidopsis genome initiative organized 1997-physical maps of all chromosomes completed 1999- chromosomes 2 and 4 sequenced 2000- completion of genome sequence
Applications Understanding Photosensitivity Creating Healthier Edible Manufacturing Biodegradable Plastics. Making Vegetables and Fruits Cheaper and Hardier Improving Erosion Resistance Understanding How Plants Flower
Rice Genome Project Rice genome Smallest among grass genomes (Wheat, oat, rye, Barley, corn) Size: 430 Mbp (3.3 X Arabidopsis) 12 chromosomes Approximately 62,435 genes Repetitive elements: Most in intergenic regions versus in introns in humans
IRGSP ( International Rice Genome Sequencing Project) E stablished in 1997 Comprised of ten members: Japan, the United States of America, China, Taiwan, Korea, India, Thailand, France, Brazil, and the United Kingdom IRGSP adopts the clone-by-clone shotgun sequencing strategy
Milestones 1997- sequencing of rice genome was initiated as an international collaboration among 10 countries 1998- IRGSP (International Rice Genome Sequencing Project) was launched under the coordination of the Rice Genome Project (RGP) of genome 2000- Monsanto Co produced a draft sequence of BAC contigs covering 260 Mb of rice geome ; 95% of rice genes were identified 2001- syngenta produced a draft sequence and identified 32000 to 50000 genes with 99.8% accuracy and identified 99% of rice genes 2002- IRGSP finished high quality draft sequence (clone-by-clone approach) with a sequence length excluding overlaps , of 366 Mb corresponding to ~92% of rice genome 2004- IRGSP produce the high-quality sequence of entire rice genome with 99.99% accuracy and without any sequence gap
Applications First crop plant to be sequenced, therefore have a great impact in agriculture U seful in understanding the genome of other crops in the grass family including corn, wheat, barley, rye and sorghum Identification of agronomically important traits - genes that affect growth habit to promote yield and photoperiod genes to extend the range of elite cultivars.
Tomato Genome Project Tomato ( Solanum Lycopersicon ) economically important crop worldwide, intensively investigated and model system for genetic studies in plants. Characteristics: Simple diploid genetics: 12 chromosome pairs and 950 Mb genome size. Short generation time Routine transformation technology Rich genetic and genomic resources.
International Tomato Genome Sequencing Project Started in 2004 P articipants were Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy and the United States The initial approach was to sequence only the euchromatic sequence using a BAC-by-BAC approach In 2009, a complementary whole-genome shotgun approach was initiated and finally sequenced in 2012.
Applications Tomato as a reference genome sequence Understanding Diversification & Adaptation Exploring the Role of Natural Diversity in the Genetic Improvement of Crops
Chickpea Genome Project S econd most widely grown legume crop after soybean Approximately 28,269 genes of chickpea were identified Approximately 738 Mb genomic sequence H alf (49.41%) of the chickpea genome is composed of transposable elements and unclassified repeats
International Chickpea Genome Sequencing Consortium Role of ICGSC : 1. To ensure data and information on the chickpea is readily available to all researchers, 2. To help avoid duplication of research efforts, 3. To provide a framework for accessing national and international collaboration, 4. To help keep chickpea research at the cutting edge of genetic research.
Applications The sequencing would help reduce the time to breed new chick pea varieties as plant breeders would now have access to genes with the required traits. The availability of these genome sequences facilitate de novo assembly of the genomes of other important but less-studied legume crops.
Poplar genome project F irst tree DNA to be sequenced because of relatively compact genetic complement Genome sequence was published in 2006. Third plant genome to be published Contains a whole genome duplication Includes ~370 megabases of sequence 19 chromosomes 41,377 protein coding genes
International Populus Genome Consortium Goals E xamine the suite of genetic resources in Populus that are currently available to the scientific community, I ntegrate genomics with physiology and ecology in an effort to understand and manipulate tree growth, development and function D evelop the ability to attain predictive understanding of tree growth, development, and complex function.
Applications Offers the opportunity and modify to study genes related to commercial important traits Opportunity to better understand the distribution of genes across the landscape Poplar genome project covers the promise and possibility of uncovering and understanding mechanisms uniquely associated with perennial woody plant growth, development and ecology. Able to address issues related to interpret annual cycling of nutrients, water movement up dozens of meters in height, perennial crown development and wood formation.
Function analysis of genes Different tools Virus-induced gene silencing (VIGS ) CRES-T RNA Interference
Virus-induced gene silencing (VIGS) E ffective strategy for rapid functional analysis of genes in plant tissues E legant tool for functional characterization of genes associated with abiotic stress response VIGS is rapid (3–4 weeks from infection to silencing) Does not require development of stable transformants Allows characterization of phenotypes that might be lethal in stable lines Offers the potential to silence either individual or multiple members of a gene family Example Knockdown of TaNAC1 with barley stripe mosaic virus-induced gene silencing (BSMV-VIGS) enhanced stripe rust resistance
CRES-T Chimeric REpressor Gene-Silencing Technology (CRES-T ) Chimeric repressor produced by fusion of a transcription factor to the plant-specific repression domain (SRDX) suppresses the target genes of a transcription factor Useful tool for functional analysis of redundant plant transcription factors and the manipulation of plant traits
About RNAi RNA interference ( RNAi ) is a system within living cells that takes part in controlling which genes are active and how active they are. Two types of small RNA molecules – microRNA ( miRNA ) and small interfering RNA ( siRNA ) – are central to RNA interference. RNAs are the direct products of genes, and these small RNAs can bind to other specific RNAs (mRNA) and either increase or decrease their activity, for example by preventing a messenger RNA from producing a protein. RNA interference has an important role in defending cells against parasitic genes – viruses and transposons – but also in directing development as well as gene expression in general.
The Mechanism of RNA Interference The long dsRNAs enter a cellular pathway that is commonly referred to as the RNA interference ( RNAi ) pathway. First, the dsRNAs get processed into 20-25 nucleotide ( nt ) small interfering RNAs ( siRNAs ) by an RNase III-like enzyme called Dicer. Then, the siRNAs assemble into endoribonuclease -containing complexes known as RNA-induced silencing complexes (RISCs), unwinding in the process. The siRNA strands subsequently guide the RISCs to complementary RNA molecules, where they cleave and destroy the cognate RNA Cleavage of cognate RNA takes place near the middle of the region bound by the siRNA strand.
Approaches for candidate gene discovery
Traditional candidate gene approach Position dependent strategy Identification of candidate gene is based on the physical linkage information in a QTL-identified chromosomal segment Example – position of QTLs controlling field blast resistance in rice Isolation of Arabidopsis AB13 gene
Comparative genomics strategy Includes comparative functional genomics strategy and comparative structural genomics strategy Candidate genes may be functionally conserved or structurally homologous genes
Function dependent strategy Results in the functional candidate gene approach, in which a putative candidate gene is the one that could be statistically detected from the genes controlling large components of inheritable gene expression variation. Example- identification of new disease resistance genes in Tobacco
Combined strategy Combines at least two strategies Genetical genomic approach originating from function-dependent strategy provides powerful means to identify candidate genes. Example- selection of candidate genes for grape proanthocyanidin pathway
Digital candidate gene approach ( DigiCGA ) Novel web resource-based candidate gene identification approach. DigiCGA can be defined as an approach that objectively extract, filter, (re)assemble, or (re)analyze all possible resources available derived from the public web databases mainly in accordance with the principles of biological ontology and complex statistical methods to make computational identification of the potential candidate genes of specific interest. A combination of RNA- seq and DGE analysis based on the next generation sequencing technology was shown to be a powerful method for identifying candidate genes encoding enzymes responsible for the biosynthesis of novel secondary metabolites in a non-model plant. Seven CYP450 s and five UDPG s were selected as potential candidates involved in mogrosides biosynthesis. The transcriptome data from this study provides an important resource for understanding the formation of major bioactive constituents in the fruit extract from S. grosvenorii .
Deciphering the function of gene in plant secondary metabolism To complete the metabolic map for an entire class of compounds, it is essential to identify gene-metabolite correlations of a metabolic pathway Effective approach to predict genes involved in the same metabolic pathway is the co-expression analysis. Co-expression analysis can be conducted using datasets from RNA- seq or microarray obtained in expressly designed experiments or also by comparing already existing data publicly available
Example Comparative coexpression analysis between tomato and potato coupled with chemical profiling revealed an array of 10 genes that partake in SGA biosynthesis. Following systematic functional analysis, a revised SGA biosynthetic pathway starting from cholesterol up to the tetrasaccharide moiety linked to the tomato SGA aglycone . Silencing GLYCOALKALOID METABOLISM 4 prevented accumulation of SGAs in potato tubers and tomato fruit. This may provide a means for removal of unsafe, antinutritional substances present in these widely used food crops.
Gene Inactivation The ability to manipulate gene expression levels has been essential to the study of gene function and biological processes. Classically, whole body deletions of genes were generated via homologous recombination. The last few years have seen a revolution in the approaches scientists use to inactivate gene expression, such as the development of highly efficient ribonucleic acid interference ( RNAi ) delivery systems, Gene knock out and anti-sense.
Gene Knockout A gene knockout (abbreviation: KO ) is a genetic technique in which one of an organism 's genes is made inoperative ("knocked out" of the organism). Also known as knockout organisms or simply knockouts , they are used in learning about a gene that has been sequenced , but which has an unknown or incompletely known function. Researchers draw inferences from the difference between the knockout organism and normal individuals.
KNOCK OUT MICE A mouse in which a gene has been deleted/mutated (gene is inactivated) S pecific gene is targeted The loss of gene activity often causes changes in a mouse's phenotype and thus provides valuable information on the function of the gene.
Researchers who developed the technology for the creation of knockout mice won Nobel Prize in the year 2007 The Nobel Prize in Physiology or Medicine 2007 was awarded jointly to Mario R. Capecchi , Sir Martin J. Evans and Oliver Smithies "for their discoveries of principles for introducing specific gene modifications in mice by the use of embryonic stem cells " .
GENERATION OF KNOCKOUT MICE BY HOMOLOGOUS RECOMBINATION Creating a knockout construct Introduce the knockout construct into mouse embryonic stem cells (ES) in culture Screen ES cells and select those whose DNA includes the new genes Implant selected cells into normal mouse embryos , making “chimeras” Implant chimeric embryos in pseudopregnant females Females give birth to chimeric offsprings , which are subsequently bred to verify transmission of the new gene, producing a mutant mouse line
Knockout construct: The gene to be knocked out is isolated from a mouse gene library. Then a new DNA sequence is engineered which is very similar to the original gene and its immediate neighbour sequence, except that it is changed sufficiently to make the gene inoperable. Usually, the new sequence is also given a marker gene, a gene that normal mice don't have and that confers resistance to a certain toxic agent or that produces an observable change (e.g. colour or fluorescence).
Knockout Mice to study genetic diseases Knockout mice make good model systems for investigating the nature of genetic diseases and the efficacy of different types of treatment and for developing effective gene therapies to cure these often devastating diseases For instance, the knockout mice for CFTR gene show symptoms similar to those of humans with cystic fibrosis
Drawbacks of knockout mice About 15% of gene knockouts are developmentally lethal and therefore cannot grow into adult mice. Thus it becomes difficult to determine the gene function in adults. Many genes that participate in interesting gene pathways are essential for either mouse development, viability or fertility. Therefore , a traditional knock out of the gene can never lead to the establishment of knockout mouse strain for analysis
Antisense RNA-Technology Antisense RNA is a single-stranded RNA that is complementary to a messenger RNA (mRNA) strand transcribed within a cell. They are introduced in a cell to inhibit the translation machinery by base pairing with the sense RNA and activating the RNase H, to develop a particular novel transgenic. mRNA sequence(sense) Antisense RNA UACUUUGGGCAC AUGAAACCCGUG
How it Differ from RNAi The intended effect of the both technique is same but the processing is a little bit different in both. Antisense technology degrade the mRNA by RNaseH while RNAi employed enzyme Dicer for degradation. RNAi are twice larger than antisense oligonucleotide .
Nature’s Antisense System There is a HOK (host killing)/SOK(suppress killing) system of postsegregational killing employed by R1 plasmid in E. Coli . When E.Coli cell undergo cell division the daughter cell inherit the hok toxin gene and sok gene from the parents but due to the short half life the sok gets degraded quickly. So in a normal cell hok protein get over expressed and cell die. But if the cell inherit a R1 plasmid which has a sok gene and sok specific promoter to transcribe sok gene then sok over expressed the hok and by base pairing with hok , it inhibit the translation of hok protein
Flavr-Savr Flavr-Savr the first FDA approved GM food developed by Calgene in 1992. Licensed in may 17, 1994. Ripening of tomato causes production of an enzyme Polygalactouronase in a gradual increasing level, which is responsible for softening of the tomato and which becomes the cause of rottening . So, tomato never last for few extra days in ripening condition without rottening . Calgene introduced a gene in plant which synthesize a complementary mRNA to PG gene and inhibiting the synthesis of PG enzyme.
INDIAN CONTRIBUTION NIPGR, (National institute of Plant Genome Research) in feb,2010 has developed a tomato by antisense technology which can last long upto 45 days. So no need to pick up the green tomatoes and forcefully ripen them with ethylene and no longer to take tension whether they are going to reach the market shelves or no need hurry up in your kitchen before they go meshy . NIPGR scientist had silenced the expression of two important gene which are responsible for loss in firmness and textures during ripening.
The two gene silenced are alpha-man and beta-hex of Glycosyl hydrolase, a kind of enzyme that breaks the chemical bond holding a sugar to either another sugar or some other molecule, like a protein.
Challenges to antisense technology… One major challenge to antisense technology (and RNAi ) is the difficulty of getting it into the body. Delivery of the treatment to the brain, for use in diseases like HD, is especially challenging because it must cross the blood-brain barrier. 2. The second major challenge to antisense technology is its inevitable toxic effects. Although antisense technology is engineered to be very specific, it can still cause unintended damage because it would regulate both the mutant and normal Huntington alleles.