JajatiKeshariNayak1
4,026 views
40 slides
Oct 05, 2019
Slide 1 of 40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
About This Presentation
Genome Comparison Techniques, Advanced & Classical Approaches.
Size: 4.5 MB
Language: en
Added: Oct 05, 2019
Slides: 40 pages
Slide Content
SEMINAR ON Genomics Genome Comparison Techniques, Advanced & Classical Approaches Presented By Jajati Keshari Nayak Dept. Molecular Biology GB PANT UNIVERSITY PhD 1 st yr ID-55493
GENOMICS Field of biology that attempts to understand the content, organization, function and evolution of genetic information contained in the whole genome Three Levels of Genome Research
Structural Genomics EXON1 EXON2 EXON3 Structural genomics seeks to describe the structural features of genes & 3-dimensional structure of every protein encoded by a given genome Sequence and size in bp Map/ position Repeats Motifs/ domains etc
Functional genomics U se of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe gene (and protein) functions and interactions. Study of Functional transcript and protein product encoded by genes of a genome & their functional characterization Role of gene Time and tissue specific expression Eg .- Gene encoding Florigen hormone in plants
What is Comparative Genomics? Analyzing & comparing different genomes for studying the gene content , function, organization & evolution of different organism Not a core technique but application of various techniques of genome sequencing, mapping, bioinformatics etc with the sole objective of comparing genomes
What to compare? • Size of the genome: total number of base pairs Genome organization: circular, linear, ss / ds genome, extra chromosomal elements etc . Percentage of the genome (coding) • Total number of predicted ORFs Average length of ORF Repetitive DNA/ junk DNA • Functional assignment
Basic points of consideration in comparative genomics All existing genomes had a common ancestor and that each organism is a combination of ancestor and the action of evolution. New genes are derived from existing sequences Information gained in one organism can have application in other even distantly related organisms. The existing variation among current forms of living organisms is due to Selection Speciation Divergence Gene mutations/ duplications etc .
New genes are derived from existing sequences
• U nderstanding the similarity & difference between the genomes that lead to special phenotypes or diseases . • Identifying genes and discovering their functions by studying their counterparts in other organisms . • Revealing the evolutionary relationships between different organisms . Purpose or Goals of Comparative Genomics
“Nothing in biology makes sense except in the light of evolution” Theodosius Dobzhansky (1900 – 1975) All modern biological processes evolved from related processes. Every modern gene evolved from other genes Every gene has an ortholog in related species most genes have paralogs in the same species.
Patterns of Gene Evolution CASE I CASE II Gene Orthologs Gene paralogs Homologous genes
CASE III Homologous genes arise from common ancestral organism and show structural similarity (sequence) G1A & G2A are paralogs G1A & G1B OR G2A G2B are orthologs
Great deal of information on an organism can be extracted by examining their counterparts in simpler model organisms I s conducted using model organisms Model organisms offer a cost-effective way to follow the inheritance of genes through many generations in a relatively short time. Mammals: Homo sapiens, mouse Insects: Drosophila melanogaster Roundworms: C. elegans Fungi: Saccharomyces cerevisiae Bacteria: Escherichia coli Fish: Zebrafish Arabidopsis thaliana : Model plant genome Rice: Model cereal genome Medicago trancatuala : Model tree genome How did it all start? The Concept of Model organisms OR Model genomes
Genomes sequenced First bacterial genomes sequenced H.influenzae and M.genitalium The yeast genome 1995 1996 1997 E.coli K12 1998 C.elegans 1999 Full sequence of chr. 22 2000 D.melanogaster Genome & Chr. 21 Human draft 2001 A.thaliana Mouse Ciona Rice Fugu Anopheles 2002 2003 Chimpanzee 2004 2005 Human finished Rat Chicken Xenopus Zebrafish
Techniques of Comparative Genomics Use of molecular markers in plant genome analysis Comparative genome maps Studying Synteny Whole genome sequencing Bioinformatics tools: Homology search in public databases ( BLASTn , BLASTp etc) Sequence alignment tools ClustalW ClustalX etc
Molecular markers in plant genome analysis Comparative analysis of Aromatic rice varieties
Comparative maps for genome comparison Involves the use of molecular markers to map the genomes of two species for a common set of markers (loci) To study genome evolution–how the genome has been rearranged through time–and to make inferences about gene organization, repeated sequences, etc Overview of steps A map is constructed for a species using set of markers Align the map to reference map/ published map of related species Work out common loci/ regions
Sorghum linkage map produced from maize RFLP genomic probes Map location of of RFLP loci in maize Comparative map Maize Vs Sorghum Comparative Genome Mapping of Sorghum and Maize
Gramene : A versatile tool for comparative mapping
Synteny Maps for Comparative Genomics Map showing syntenic regions and homologous loci from another species aligned against a map of a target organism. Can give us information about shared ancestry, evolutionary history, or a key to functional relationships between genes. Genes present in one species are likely to be present in closely-related species. Synteny Maps Synteny : defined as the preservation of the order of genes on a chromosome
Goff et al ( 2002 Science 296: 92-100 ) Rice- Maize comparative synteny map Rice shows great synteny with other cereals i.e. genes present in one cereal will almost certainly be present in the same order in another Regions of homology between rice and maize of greater than 80%. Virtually every part of the maize genome finds a homologue in rice
Approximately 99% of mouse genes have a homologs in the human genome. For 96% the homologue lies within a similar conserved syntenic interval in the human genome. Conservation of synteny between mouse and human genomes Mural et al., Science, 2002, 296:1661 Mouse chromosome 16 is syntenic with: Chr.’s 3,8,12,16,21,22 of Humans Chr.’s 10,11 of Rat
Comparison of mouse chromosome 16 and the human genome Q: Why more breakpoints in mouse-human than in mouse-rat? Q: Why more conserved genes in human than in rat? • The longer the divergence time between 2 species, the more recombination has occurred • 100 million years since human-mouse divergence • 40 million years since rat-mouse divergence
Bioinformatics Approaches : Homology search in public databases Case: We have a gene/ protein sequence with no idea of its function Subject sequence to homology/ sequence similarity search across database (BLAST search) Look for the genes showing significant similarity Putative function can be inferred Homology in bioinformatics = sequence similarity (expressed in % sim / identity) BLAST: Basic Local Alignment Search Tool
An Example Isolate the fragment Get it sequenced Subject the sequence to BLAST Get idea on the role/ function of the novel protein Correlate with phenotype and confirm the findings M 1 2 3 4 5 5 rice varieties grown under heat stress Isolate protein from individual plants Analyze on SDS PAGE
www. ncbi .nlm.nih.gov
Use of Sequence Alignment tools in comparative genomics Sequence alignment is a way of arranging the sequences to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences SEQ1 SEQ2 SEQ1 SEQ2 Pairwise alignment Multiple sequence alignment Progressive multiple alignment techniques produce a phylogenetic tree used to work out evolutionary relationship
No ! This is the beginning of other advanced comparative genomics approaches What if target sequence shows no similarity with existing sequences Dead End ?
Prerequisites: Enough data & tools for processing large amounts of data Development of new computational methods Advanced statistical tools Knowledge of Algorithms “ Informatics” techniques from applied maths , computer science and statistics adapted to biological sequences Advanced Techniques of Comparative Genomics
Genes of related function are associated in various ways Enzymes in a pathway, proteins in a complex Group of genes having similar biochemical function tend to remain localized E.g. Genes required for synthesis of tryptophan ( trp genes) in E. coli and other prokaryotes Whatever a gene’s associates do, the gene probably does the similar function Prediction of functions via ‘ guilt by association ’ principle Proverbial principle: Show me your friends and I’ll tell you who you are’
Plant association studies evidence is mainly post-genomics studies Some plant post-genomic resources: Microarray analysis Organellar targeting prediction proteomics & phenomics databases
• Locus Link/ RefSeq http://www.ncbi.nih.gov/LocusLink/ • PEDANT - Protein Extraction Description ANalysis Tool http://pedant.gsf.de/ • MIPS –mammalian Protein Interaction Database http://mips.gsf.de/ • COGs - Cluster of Orthologous Groups (of proteins) http://www.ncbi.nih.gov/COG/ • KEGG - Kyoto Encyclopedia of Genes and Genomes http://www.genome.ad.jp/kegg/ • MBGD - Microbial Genome Database – http://mbgd.genome.ad.jp/ • GOLD - Genome OnLine Database – http://www.genomesonline.org/ • TOGA (TIGR Orthologous Gene Alignment) – http://www.tigr.org/tdb/toga/toga.shtml General Databases Useful for Comparative Genomics
What we have learned by comparing genomes It will help us to understand the genetic basis of diversity in organisms, both speciation & variation & important aspects of evolutionary biology. Provides “first pass” information on the function of the putative gene based on the existence of conserved protein sequences. Comparative genomics provides a powerful way in which to analyze sequence data.