DNA data bank of japan (DDBJ)

13,405 views 75 slides Aug 09, 2019
Slide 1
Slide 1 of 75
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75

About This Presentation

The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC.


Slide Content

DNA Data Bank of Japan   ( DDBJ )

INTRODUCTION The DNA Data Bank of Japan is a public database of nucleotide sequences established at the National Institute of Genetics (NIG). DDBJ, http://www.ddbj. nig.ac.jp

HOMEPAGE OF DDBJ

HISTORY Since 1987, the DDBJ has been collecting annotated nucleotide sequences as its traditional database service. This endeavor has been conducted in collaboration with GenBank at the National Center for Biotechnology Information (NCBI) and with European Molecular Biology Laboratory  ( EMBL ) at the European Bioinformatics Institute (EBI). The collaborative framework is called the International Nucleotide Sequence Database Collaboration (INSDC).

DDBJ collects and edit about 20% of the data released by these 3 International databases.

DDBJ began data bank activities in 1986 at NIG and remains the only nucleotide sequence data bank in Asia. Although DDBJ mainly receives its data from Japanese researchers, it can accept data from contributors from any other country.

DATA UNIT The nucleotide sequence database is a set of data units called ENTRIES. In addition to the nucleotide sequence itself, each entry contains information about the researcher who determined the sequence plus related references, organism and gene function and features.

DIVISIONS OF DDBJ ENTERIES DDBJ classifies entries into 21 divisions as below;

a: TAXONOMIC DIVISIONS HUM Human PRI Primates (other than human) ROD Rodents MAM Mammals (other than primates and rodents) VRT Vertebrates (other than mammals) INV Invertebrates ( animals other than vertebrates) PLN Plants, Fungi, Plastids ( eukaryotes other than animals) BCT Bacteria (including both Eubacteria and Archaea ) VRL Viruses PHG Bacteriophages

b: OTHER DIVISIONS PAT Sequence Data Related To Patent Application ENV Sequences Obtained Via Environmental Sampling Methods SYN Synthetic Constructs; Artificially Constructed Sequences EST Expressed Sequence Tags; Short Single Pass Cdna Sequences TSA Transcriptome Shotgun Assemblies; Assembled mRNA Sequences GSS Genome Survey Sequences; Short Single Pass Genomic Sequences HTC High Throughput cDNA Sequences 

8. HTG High Throughput Genomic Sequences 9. STS Sequence Tagged Sites 10. UNA The Data Not Annotated 11. CON Contig / Constructed

DATA RETREIVAL IN DDBJ For data retrieval on DDBJ, click on SEARCH AND ANALYSIS on homepage. A window tab will open, with various searching options.

1- GETENTRY DDBJ annotated/assembled data retrieval by accession numbers. KEYWORD= ACCESSSION NUMBER thus, only accession no. is used for the sequence search in this method of data retrieval.

In the ID box, write down the exact accession no. of the sequence you want to search. The database is by default on DDBJ/EMBL/GENBANK. OUTPUT FORMAT of any type can be chosen based on user requirement.

These formats are shown in the picture below:

Choose any format and click on SEARCH. The sequence in formation will open in a new window.

Flatfile format of DDBJ

Total nt seq FASTA format

CDS amino acid seq FASTA format

CDS nt seq FASTA format

INSD-XML_v1.4 format

Same steps are repeated for the search of PROTEIN SEQUENCE on getentry . PROTEIN DATABASE are chosen which are UNIPROT, PDB, DAD and PATENT. Select OUTPUT FORMAT and click on SEARCH.

2- ARSA DDBJ annotated/assembled data retrieval by accession numbers and keywords. ACCESSION NUMBERS and KEYWORDS can both be used for this method of sequence search.

Put in the keywords or accession no. in the search bar. The more the keywords, the narrower will be the search. A LIST OF ENTERIES based on the search will be provided below.

Flatfile , XML and FASTA formats are provided , flatfile being the default. Click on any sequence or view/download multiple sequences by selecting more than one sequence.

3- TX SEARCH Taxonomy database search of DDBJ. Type in the organism name and click on SEARCH. A complete Lineage of the organism will be given.

4- BLAST A blast homepage is also provided on search and analysis page.

The BLAST OUTPUT is different in formattinf from that of provided at NCBI homepage’s BLAST, the results however are the same. The SIGNIFICANT ALIGNMENTS table is first thing given on DDBJ which is the second on NCBI BLAST, and the HITS are second on DDBJ with IDENTIFICATION LINE written below each hit with its SCORE.

And lastly the alignment of query with each hit is given, along with several details.

5- CLUSTALW For Multiple alignment and phylogenetic tree-making, ClustalW is also provided on DDBJ.

The ClustalW output is almost same, however a small detail is different. In DDBJ , ClustalW only give “*” identifier i.e only fully conserved sequences are mentioned in form of symbols.

6- GGGenome An ultrafast sequence search, in which you can type any sequence and it will provide you with the information from which organism, chromosome no the sequence belongs to. It will give the exact base pair no of the sequence too.

Type in or paste the query sequencein the search bar,choose the organism and hit SEARCH.

7- GENDOO Functional profiling of gene and disease features for omics analysis. Gendoo  provides keywords including diseases, drugs and biological phenomena related to genes and diseases of interests.

Type in the disease/ gene,or their IDs to get all the relevent genes/diseases associated with them.

DDBJ STATISTICS DDBJ statistics gives information about the releases and their records on DDBJ. It also provides useful facts about DDBJ totl data volume, its contribution in INSDC, proportion of each division and much more.

Some important statistical information provided by DDBJ are:

Latest Release Information

Data Category Distribution At Each Archive

Organism Ranking By Bases

Journal Ranking By Counts In Flat File

DATA SUBMISSION When you wish to publicize your sequence through DDBJ, and your sequence is acceptable for DDBJ, you can submit your sequence to DDBJ, even if you have no plan to publication of any research paper related to the sequence. Once released, the nucleotide sequences submitted to INSDC including DDBJ are available for everyone.

(A) Nucleotide Sequence Submission System DDBJ generally recommends you to use Nucleotide Sequence Submission System

(B) Mass Submission System (MSS) DDBJ recommend the use of MSS if: The submission consists of large number of sequences (entries); greater than 1024, The submission involves long (greater than 500 kb) nucleotide sequences which result in a complex submission containing many features (greater than 30 in an entry) as in the case of genome data, or The submission cannot be handled by Nucleotide Sequence Submission System.

Sequence Data Transition

Assignment and Notification of Accession Number We inform an accession number (unique number assigned by the International Nucleotide Sequence Database Collaboration) to the Contact Person whose E-mail address is entered in the "Contact person E-mail address" field. This notification is normally sent within five business days after receipt of the data.

Submitter Submitter of the entry is the person who have responsibility to the submitted data in the entry, in principle. Only submitter can update his/her entry. Basically, submitter takes responsibility to reply inquiry from DDBJ or DDBJ users about his/her data.

Contact Person "Contact person" is the person who is responsible about the descriptions of the entry and has a duty as a representative to correspond with DDBJ and its users."Contact person" has to be one of the submitters, in principle. "Contact person" is the person who will make contact with DDBJ and its users about the entry, in principle. So, do not block E-mails from DDBJ. When user wishes to contact to the submitter(s) of an entry of your interest, please contact DDBJ with the inquiry form with reasons briefly, then we will forward the message to the submitter(s).

Right of Entry Update Only submitters of the entry can update and modify the entry. After data modification, the submitter of the entry can also specify either of immediate release or hold until publication. However, in principle, if the entry have already been open to the public, the entry can not restore hold.

GROWTH IN DDBJ DATA When DDBJ first released its nucleotide sequence database in July 1987, it consisted of only 66 entries and 108,970 base pairs. I n recent years INSDC databases are increasing at annual rate of 130-150%. Between June 2014 and May 2015, the DDBJ periodical release increased by 11,879,389 entries and 31,427,753,923 base pairs.

NIG SUPERCOMPUTER The NIG supercomputer as a sequence analytical platform. The DDBJ Center operates the NIG supercomputer which specializes in analysis of large-scale sequence data. The NIG supercomputer offers computational infrastructure for the construction of DDBJ databases and analysis services, and provides researchers with a large-scale data analysis and supercomputing environment.

The NIG supercomputer is currently composed of two computer systems: ( i ) the Phase 1 system which was introduced in 2012 . (ii) the Phase 2 system which went into production in 2014.