Phylogenetic Analysis, Comparative Genomics, Orthologs and Paralogs COURSE TITLE: BIOINFORMATICS COURSE CODE:21AEC62 Course Coordinator: Prof. Ramya K P By : K BHARGAVI - 1BY21EC063 K SOWMYA JASMINE - 1BY21EC070 MANAS UPADHYAY - 1BY21EC077 MOHAMMED DANISH ALI - 1BY21EC084 NAMITHA S - 1BY21EC091 NITISH SINGH - 1BY21EC098 PRABHU AMIT RAVINDRA - 1BY21EC105 PUNITHA RAGHAVENDRA - 1BY21EC112 SHASHANK N - 1BY21EC411
Phylogenetic Analysis Phylogenetic analysis provides an in-depth understanding of how species evolve through genetic changes. Using phylogenetics, scientists can evaluate the path that connects a present-day organism with its ancestral origin, as well as can predict the genetic divergence that may occur in the future. Phylogenetics has many applications in medical and biological fields, including forensic science, conservation biology, epidemiology, drug discovery, drug design, prediction of protein structure and function, and gene function prediction. Phylogenetic analysis can be useful in comparative genomics, which studies the relationship between genomes of different species. In this context, one major application is gene prediction or gene finding, which means locating specific genetic regions along a genome.
Methods in Phylogenetic Analysis Distance-Based Methods- Distance-based methods estimate the genetic distance between pairs of sequences and use these distances to construct a phylogenetic tree. Commonly employed algorithms include Neighbor-Joining (NJ) and Unweighted Pair Group Method with Arithmetic Mean (UPGMA). These methods are relatively fast and can handle large datasets but may be sensitive to long-branch attraction artifacts. Character-Based Methods- Character-based methods involve analyzing the character states (nucleotides or amino acids) at specific positions in the sequences. Maximum Parsimony (MP), Maximum Likelihood (ML), and Bayesian Inference (BI) are widely used character-based methods. MP seeks the tree that requires the fewest evolutionary changes, while ML and BI estimate the most likely tree given a specific model of sequence evolution. These methods are computationally intensive but generally yield more accurate results.
Comparative Genomics Comparative genomics is the field of biological research that compares the genomic features of different organisms. This comparison can include the analysis of DNA sequences, gene structures, regulatory sequences, and other genomic elements. The primary goal of comparative genomics is to understand the similarities and differences in the genomes of different species, which can provide insights into evolutionary relationships, gene function, and the genetic basis of diseases.
Comparative Genomics Tools and Databases BLAST: Basic Local Alignment Search Tool, used for comparing an input sequence against a database of sequences. Clustal Omega: Tool for multiple sequence alignment. UCSC Genome Browser: Allows visualization and comparison of genomic data from various species. Ensembl : Provide s genomi c informatio n an d comparativ e genomic s resources. NCBI Genome : Central repository for genomic data, supporting comparative analysis and genomic research across diverse organisms. Phytozome : Specializes in comparative plant genomics, providing resources for studying plant genetic diversity and evolutionary adaptations.
Applications Applications of Phylogenetic Analysis and Comparative Genomics: Medica l Research : Identifyin g diseas e genes , dru g targets. Evolutionar y Biology : Understandin g speciation , adaptiv e evolution. Biotechnology : Engineerin g genes , syntheti c biology. Disease Research: Comparing human genes with those of model organisms can help identify genetic variations associated with diseases and potential targets for therapy. Agricultural Improvements: Comparative genomics can be used to identify beneficial traits in crops and livestock, leading to improved breeding programs. Drug Development: Insights from comparative genomics can assist in identifying new drug targets and understanding drug resistance mechanisms.
Orthologous and Paralogous Genes An orthologous gene is a gene in different species that evolved from a common ancestor by speciation. Normally orthologous genes retain the same function in the course of evolution. An example of orthologous gene is the plant Flu regulatory protein that is present both in Arabidopsis (multicellular higher plant) and Chlamydomonas (single cell green algae). Paralogous genes are genes that are related via duplication events in the last common ancestor of the species being compared. They result from the mutation of duplicated genes during separate speciation events. An example of a paralogous gene is the gene for hemoglobin in humans and myoglobin in chimpanzees.
Role of Orthologs and Paralogs in Disease Gene Identification Orthologs: Functional Annotation: Predicting gene function in human diseases based on orthologs in model organisms. Example: Identifying orthologs of disease-associated genes in model organisms like mice or zebrafish to understand disease mechanisms. Paralogs: Disease Gene Expansion: Exploring paralogous genes within human genomes to expand the search for disease-associated variants. Example: Studying paralogous gene families involved in neurological disorders to uncover novel disease genes.
Drug Target Discovery Using Evolutionary Conservation and Divergence Evolutionary Conservation: Target Identification: Identifying conserved orthologous genes across species as potential drug targets. Example: Targeting conserved pathways in pathogens for antibiotic development. Evolutionary Divergence: Adaptive Evolution: Exploiting divergence in paralogous genes to target specific disease mechanisms. Example: Developing drugs that selectively inhibit divergent paralogous genes involved in cancer progression. These biomedical applications demonstrate how phylogenetic analysis, comparative genomics, and the study of orthologs and paralogs contribute to advancing disease gene identification and drug discovery efforts. By leveraging evolutionary relationships and genomic data, researchers can uncover novel therapeutic targets and develop more effective treatments for various diseases.
Case Study 1: Analysis of Orthologous Genes in Mammals Objective: Identify and compare orthologous genes across different mammalian species to understand evolutionary conservation and divergence. Species : Human , mouse , an d do g genomes. Methods: Dat a Collection : Genom e sequencin g dat a fro m database s lik e Ensemb l an d NCBI. Sequenc e Alignment : Usin g tool s lik e BLAS T an d MUSCL E t o alig n gen e sequences. Phylogeneti c Tre e Construction : Maximu m likelihoo d metho d t o buil d a tre e showin g evolutionar y relationships. Findings: Conserve d Genes : Hig h conservatio n i n essentia l gene s (e.g. , housekeepin g genes). Divergen t Genes : Species-specifi c adaptation s observe d i n gene s relate d t o immun e respons e an d sensor y perception. Implications : Understandin g o f gen e functio n an d evolutionar y pressure s i n mammals.
Case Study 2: Functional Diversification of Paralogous Genes in Plants Objective : Investigat e ho w gen e duplicatio n lead s t o functiona l diversificatio n i n plan t genomes. Species : Arabidopsi s thalian a an d rice. Methods: Dat a Collection : Whol e genom e sequence s fro m plan t databases. Identificatio n o f Paralogs : Usin g tool s lik e OrthoMC L t o identif y duplicate d genes. Functiona l Analysis : Gen e Ontolog y (GO ) annotatio n an d expressio n profiling. Findings: Divergen t Functions : Duplicate d gene s ofte n acquir e ne w functions , suc h a s stres s respons e and developmental processes. Expressio n Patterns : Differentia l expressio n i n variou s tissue s an d developmenta l stages. Implications : Insight s int o th e rol e o f gen e duplicatio n i n plan t adaptatio n an d evolution.