BIOINFORMATICS_PRACTICAL_A_BRIEF_INTRODUCTION.pptx

ShibsekharRoy1 15 views 45 slides Jun 29, 2024
Slide 1
Slide 1 of 45
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45

About This Presentation

BIOINFORMATICS_PRACTICAL_A_BRIEF_INTRODUCTION


Slide Content

OMIM DATABASE

What is OMIM? Online Mendelian Inheritance in Man (OMIM ® ) is a continuously updated catalog of human genes and genetic disorders and traits, with particular focus on the molecular relationship between genetic variation and phenotypic expression. OMIM is a continuation of Dr. Victor A. McKusick's Mendelian Inheritance in Man , which was published through 12 editions, the last in 1998. OMIM is based on the peer-reviewed biomedical literature, and criteria for inclusion of papers continue to evolve. In general, priority for inclusion is given to papers that provide significant insight into the gene-phenotype relationship, expand our understanding of human biology, or contribute to the characterization of a disorder. Information in each OMIM entry is cited, and the full reference is provided. OMIM is biocurated at the McKusick -Nathans Institute of Genetic Medicine, The Johns Hopkins University School of Medicine.

What numbering system is used in the OMIM database? Each OMIM entry is given a unique six-digit number as summarized below: 1----- (100000- ) 2----- (200000- ) Autosomal loci or phenotypes (entries created before May 15, 1994) 3----- (300000- ) X-linked loci or phenotypes 4----- (400000- ) Y-linked loci or phenotypes 5----- (500000- ) Mitochondrial loci or phenotypes 6----- (600000- ) Autosomal loci or phenotypes (entries created after May 15, 1994) Allelic variants (mutations) are designated by the MIM number of the entry, followed by a decimal point and a unique 4-digit variant number. For example, allelic variants in the factor IX gene (300746) are numbered 300746.0001 through 300746.0101.

What do the symbols preceding a MIM number represent? An asterisk (*) before an entry number indicates a gene. A number symbol (#) before an entry number indicates that it is a descriptive entry, usually of a phenotype, and does not represent a unique locus. The reason for the use of the number symbol is given in the first paragraph of he entry. Discussion of any gene(s) related to the phenotype resides in another entry( ies ) as described in the first paragraph. A plus sign (+) before an entry number indicates that the entry contains the description of a gene of known sequence and a phenotype. A percent sign (%) before an entry number indicates that the entry describes a confirmed mendelian phenotype or phenotypic locus for which the underlying molecular basis is not known. No symbol before an entry number generally indicates a description of a phenotype for which the mendelian basis, although suspected, has not been clearly established or that the separateness of this phenotype from that in another entry is unclear. A caret (^) before an entry number means the entry no longer exists because it was removed from the database or moved to another entry as indicated.

How are mutations cataloged in OMIM? Mutations are cataloged in OMIM in the Allelic Variants section of gene entries. For most genes, only selected mutations are included. Criteria for inclusion include the first mutation to be discovered, high population frequency, distinctive phenotype, historic significance, unusual mechanism of mutation, unusual pathogenetic mechanism, and distinctive inheritance (e.g., dominant with some mutations, recessive with other mutations in the same gene). Most of the allelic variants represent disease-causing mutations. A few polymorphisms are included, many of which show a positive correlation with particular common disorders.

General description Cloning and expression Gene structure Mapping Gene function Biochemical features Molecular Genetics Animal model study Allellic variants (Mutations) Information that one can gather from the database

RETRIEVAL OF SEQUENECE FROM APPROPRIATE DATABASE

The source databases for NCBI nucleotide and protein sequences are listed below. Protein: SwissProt and PIR components of UniProt ; Protein Research Foundation (PRF); Protein Data Bank (PDB); and translations of coding regions on sequences in Entrez Nucleotide ( RefSeq , International Sequence Database Collaboration – DDBJ / EMBL / GenBank . Nucleotide: International Sequence Database Collaboration (DDBJ / EMBL / GenBank ); NCBI Reference Sequences ( RefSeq ); Nucleotide sequences from PDB; Third Party Annotation (TPA). GSS and EST : All records are from the International Sequence Database Collaboration – DDBJ / EMBL / GenBank NCBI Database primary list

Which of the three databases containing nucleic acid sequence (Nucleotide, EST, or GSS) should I search? The Nucleotide, Genome Survey Sequence (GSS), and Expressed Sequence Tag (EST) database all contain nucleic acid sequences. The data in GSS and EST are from two large bulk sequence divisions of GenBank . GSS and EST data are typically uncharacterized, short genomic (GSS) or cDNA (EST) sequences. Searching any of the three databases will provide links to results in the other. Unless you know that you are trying to find a specific set of EST or GSS sequences, searching the Nucleotide database with general text queries will produce the most relevant results . You can always follow links to results in EST and GSS from the Nucleotide database results.

Example………….

How do I use a simple query, such as a word or a phrase? You can use a protein name, gene name, or gene symbol directly. Searching with a submitter or author name in the following format will produce the best results. Smith JR (last name followed by initials, no punctuation) Database identifiers such as accession numbers or gi numbers will directly retrieve the full sequence record. CAA79696 NP_778203 263191547 BC043443 NM_002020 To find a match to an exact phrase, enclose it in quotation marks. " contactin associated protein" " duchenne muscular dystrophy"

How can I make my search more specific with Boolean operators (AND, OR, NOT)? AND: Use the Boolean operator AND to find records that contain every one of your search terms, the intersection of search results. contactin AND neurofascin OR: Use the Boolean operator OR to find records that include one of several search terms, the union of search results. contactin OR neurofascin NOT : Use the Boolean operator NOT to exclude records matching a search term contactin NOT neurofascin

How do I restrict my search to specific subsets of records such as those from a specific organism, molecule type or source database?

Select Species as well as specific animal

Result

You can also use the linked numbers in the Top Organisms list in the right-hand column of search results to filter select records from specific organisms from your results.

Molecule type In the Nucleotide database you can use the Molecule types facet to limit results to particular molecule type.

Source database The Source databases facet allows you to limit to results from a particular database.

The KEGG database has been in development by Kanehisa Laboratories since 1995, and is now a prominent reference knowledge base for integration and interpretation of large-scale molecular data sets generated by genome sequencing and other high-throughput experimental technologies

KEGG Overview Genomes to Biological System KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from genomic and molecular-level information. It is a computer representation of the biological system, consisting of molecular building blocks of genes and proteins ( genomic information ) and chemical substances ( chemical information ) that are integrated with the knowledge on molecular wiring diagrams of interaction, reaction and relation networks ( systems information ). It also contains disease and drug information (health information) as perturbations to the biological system.

KEGG Overview KEGG is an integrated database resource consisting of sixteen databases shown here. They are broadly categorized into systems information, genomic information, chemical information and health information, which are distinguished by color coding of web pages.

Pathway Identifiers: Each pathway map is identified by the combination of 2-4 letter prefix code and 5 digit number. The prefix has the following meaning:

SEARCH RESULTS

CLICK ON THE THUMBNAIL IMAGE