3- introduction(SEQU ANAL of PCR products 9 9 12 (2).ppt

MohamedHasan816582 9 views 42 slides Mar 02, 2025
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

3- introduction(SEQU ANAL of PCR products 9 9 12 (2).ppt


Slide Content

SEQUENCE ANALYSIS OF
PCR PRODUCTS
By
Amal Mahmoud

PCR product
18

DNA Sequencing
Using heat, separate the DNA into
strands. The primer binds to the
intended location and polymerase
starts lengthening the the primer.

DNA Sequencing

DNA Sequencing
To find out fragment sizes,
Use gel electrophloresis
-positions and spacing show
relative sizes
-Fragments are terminated by a
specific known nucleotide

DNA Sequencing
In reality the gels look like this.
Using gels researchers then read the
sequence from it bottom to top.
An automated DNA sequencer does
this for large scale readings. (3-4
meters long!)

DNA Sequencing
Example output – Fragment of one file (usually spans
600-700 nucleotides)
Sequencer plots the fragments

The design, construction and use of software tools to
generate, store, annotate, access and analyse data and
information relating to Molecular Biology
Bioinformatics
OR
Biologists doing “stuff” with computers?

Nucleotide Sequence Databases
 NCBI (National Center for Biotechnology
Information)
 EMBL (European Molecular Biology Laboratory)
 DDBJ (DNA DataBank of Japan)

Protein Sequence Database
 SWISS-PROT
 TrEMBL

Sequence submission
•Data mainly direct submissions from the
authors.
•Submissions through the Internet:
–Web forms.
–Email.
•Sequences shared/exchanged between the 3
centers on a daily basis:
–The sequence content of the banks is identical.

12
GenBank Flat FileGenBank Flat File
Features (AA seq)Features (AA seq)
DNA SequenceDNA Sequence
HeadeHeade
rr
•TitleTitle
•TaxonomyTaxonomy
•CitationCitation
LOCUS AF115338 591 bp DNA linear BCT 19-AUG-1999
DEFINITION Pseudomonas fluorescens ECF sigma factor SigX (sigX) gene, complete
cds.
ACCESSION AF115338
VERSION AF115338.1 GI:4959391
KEYWORDS .
SOURCE Pseudomonas fluorescens.
ORGANISM Pseudomonas fluorescens
Bacteria; Proteobacteria; gamma subdivision; Pseudomonadaceae;
Pseudomonas.
REFERENCE 1 (bases 1 to 591)
AUTHORS Brinkman,F.S., Schoofs,G., Hancock,R.E. and De Mot,R.
TITLE Influence of a putative ECF sigma factor on expression of the major
outer membrane protein, OprF, in Pseudomonas aeruginosa and
Pseudomonas fluorescens
JOURNAL J. Bacteriol. 181 (16), 4746-4754 (1999)
MEDLINE 99369842
PUBMED 10438740
REFERENCE 2 (bases 1 to 591)
AUTHORS De Mot,R.
TITLE Direct Submission
JOURNAL Submitted (04-DEC-1998) F.A. Janssens Laboratory of Genetics,
Applied Plant Sciences, K. Mercierlaan 92, Heverlee B-3001, Belgium
FEATURES Location/Qualifiers
source 1..591
/organism="Pseudomonas fluorescens"
/strain="M114"
/db_xref="taxon:294"
gene 1..591
/gene="sigX"
CDS 1..591
/gene="sigX"
/codon_start=1
/transl_table=11
/product="ECF sigma factor SigX"
/protein_id="AAD34329.1"
/db_xref="GI:4959392"
/translation="MNKAQTLSTRYDPRELSDEELVARSHTELFHVTRAYEELMRRYQ
RTLFNVCARYLGNDRDADDVCQEVMLKVLYGLKNLEGKSKFKTWLYSITYNECITQYR
KERRKRRLMDALSLDPLEEASEEKALQPEEKGGLDRWLVYVNPIDRGILVLRFVAELE
FQEIADIMHMGLSATKMRYKRALDKLREKFAGETET"
BASE COUNT 157 a 133 c 170 g 131 t
ORIGIN
1 atgaataaag cccaaacgct atccacgcgc tacgaccccc gcgagctctc tgatgaggag
61 ttggtcgcgc gctcgcatac cgagcttttt cacgtaacgc gcgcctatga agaactgatg
121 cggcgttacc agcgaacatt atttaacgtt tgtgcgagat atcttgggaa cgatcgcgac
181 gcagacgatg tctgtcagga agtcatgttg aaggtgctgt atggcctgaa gaacctcgag
241 gggaaatcga agttcaaaac gtggctctac agcatcacgt acaacgaatg tattacgcag
301 tatcggaagg aacggcgaaa gcgtcgcttg atggacgcat tgagtcttga ccccctcgag
361 gaagcgtccg aagaaaaggc gcttcaaccc gaggagaagg gcgggcttga tcgctggctg
421 gtgtatgtga acccgattga ccgtggaatt ctggtgcttc gatttgtcgc agagctggaa
481 tttcaggaga tcgcagacat catgcacatg ggtttgagtg cgacaaaaat gcgttacaaa
541 cgtgctctag ataaattgcg tgagaaattt gcaggcgaga ctgaaactta g

•Major research areas
•1- Sequence analysis
•2- Computational evolutionary biology
•3- Gene prediction
•4- Epitope prediction
•5- Prediction of protein structure

Multiple sequence alignment

Nucleotide sequence
alignments
Amino acid sequence
alignments

Sequence identity of the partial CP region of
PVY Egyptian isolates
Identity % of nucleotide
sequences

Computational evolutionary
biology
Phylogenetic analysis

Phylogenetic tree
EF016294.1 (NTN-UK)
AJ390289.1(NTN-UK)
AJ585342.1 (NTN-Slovenia)
AJ390290.1(NTN-UK)
AJ890347.1 (NTN-Germany)
DQ925437.1 (isolate VN/P2-Vietnam)
AJ890344.1 (NTN-Poland)
AJ390293.1 (NTN-Solvenia)
FJ204165.1 (NTN-USA)
FJ204164.1 (NTN-USA)
AY884982.1 (NTN-USA)
FJ204166.1 (NTN-USA)
AJ535662.1 (NTN-Hungary)
AJ390288.1 (NTN-UK)
AJ890345.1 (NTN-Germany)
AJ390300.1 (NTN-Hungary)
PVY-Egypt Medhat
EF026075.1 (NTN-USA)
AJ609240.1 (N-8-Greece)
AF264151.1 (Kr-Koria)
AJ890343.1(NTN-Poland)
AJ890342.1 (N-Poland)
AJ889866.1 (NTN-Poland)
EF558545.1 (N(W)-Poland)
D12539.1 (O)
AJ585196.1 (O-UK)
AJ889868.1 (Wilga-Germany)
AJ889867.1 (Wilga-Germany)
EF026074.1 (O-USA)
AJ890350.1(Wilga-Germany)
AM113988.1(Wilga-Germany)
AY745492.1 (N:O-Canada)
AY745491.1 (N:O-Canada)
EF026076.1 (N:O-USA)
DQ157178.1 (N:O)
AY884985.1 (N:O-USA)
DQ157179.1 (N:O)
66
95
89
50
46
47
65
66
59
99
30
43
53
32
21
18
37
7
29
28
0 . 0 0 2
Group I
Group I I
Necrotic
Group
Ordinary
Group
Egyptian
PVY
sample

RNA Secondary structure
•RNA secondary structures of the 3 Untranslated regions (UTR) of Potato virus Y (PVY) isolates NTN (A) and Egyptian

isolate (B) predicted by the use of mFOLD version 3.2 program (M.Zucker, 2003) with the temperature parameter
set to default.
B
A
Interior loop
Multi-loop
(NTN)
PVY Egypt

T C substitution in the Egyptian isolate
Multi-loop
Change may lead to

Epitope prediction

partialKolaskar and Tongaonkar antigenicity sites for
the G protein

•Applying four different models on the consensus
sequences of PVY, resulted in 3 conserved
epitopes regions:
61 ERHTTEDVSPSMHTL 75
62 RHTTEDVSPSMHTLL 76
63 HTTEDVSPSMHTLLG 77

Epitope prediction
Alignment of the partial BEFV glycoprotein
amino acid sequences

Exercise

FASTA Format
•simple format used by almost all programs
•>header line with a [return] at end
•Sequence (no specific requirements for line
length, characters, etc)
>URO1 uro1.seq Length: 2018 November 9, 2000 11:50 Type: N Check: 3854 ..
CGCAGAAAGAGGAGGCGCTTGCCTTCAGCTTGTGGGAAATCCCGAAGATGGCCAAAGACA
ACTCAACTGTTCGTTGCTTCCAGGGCCTGCTGATTTTTGGAAATGTGATTATTGGTTGTT
GCGGCATTGCCCTGACTGCGGAGTGCATCTTCTTTGTATCTGACCAACACAGCCTCTACC
CACTGCTTGAAGCCACCGACAACGATGACATCTATGGGGCTGCCTGGATCGGCATATTTG
TGGGCATCTGCCTCTTCTGCCTGTCTGTTCTAGGCATTGTAGGCATCATGAAGTCCAGCA
GGAAAATTCTTCTGGCGTATTTCATTCTGATGTTTATAGTATATGCCTTTGAAGTGGCAT
CTTGTATCACAGCAGCAACACAACAAGACTTTTTCACACCCAACCTCTTCCTGAAGCAGA
TGCTAGAGAGGTACCAAAACAACAGCCCTCCAAACAATGATGACCAGTGGAAAAACAATG
GAGTCACCAAAACCTGGGACAGGCTCATGCTCCAGGACAATTGCTGTGGCGTAAATGGTC
CATCAGACTGGCAAAAATACACATCTGCCTTCCGGACTGAGAATAATGATGCTGACTATC
CCTGGCCTCGTCAATGCTGTGTTATGAACAATCTTAAAGAACCTCTCAACCTGGAGGCTT

BLAST Searches GenBank
[BLAST= Basic Local Alignment Search Tool]
The NCBI BLAST web server lets you compare
your query sequence to various sections of
GenBank:
–nr = non-redundant (main sections)
–month = new sequences from the past few weeks
–ESTs
–human, drososphila, yeast, or E.coli genomes
–proteins (by automatic translation)
•This is a VERY fast and powerful computer.

BLAST
•Uses word matching like FASTA
•Similarity matching of words (3 aa’s, 11 bases)
–does not require identical words.
•If no words are similar, then no alignment
–won’t find matches for very short sequences
•Does not handle gaps well

Why use BLAST?
•To discover functional, structural and evolutionary
similarities
•Because “similarity” may be an indicator of
“homology” and thus provide some insight into
function or gene identification.
•Applications include
– identifying orthologs and paralogs
– discovering new genes or proteins
– exploring protein structure and function

Lecture 3.1 31

Lecture 3.1 32
Running NCBI BLAST

Searching on the web: BLAST at
NCBI
Very fast computer dedicated to
running BLAST searches
Many databases that are always up
to date
Nice simple web interface
But you still need knowledge about
BLAST to use it properly
http://blast.ncbi.nlm.nih.gov/Blast.cgi

BLAST Output: Alignments
>gi|730028|sp|P40692|MLH1_HUMAN DNA mismatch repair protein Mlh1 1)
Length = 756
Score = 233 bits (593), Expect = 8e-62
Identities = 117/131 (89%), Positives = 117/131 (89%)
Query: 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL 60
IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL
Sbjct: 276 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL 335
Query: 61 GSNSSRMYFTQTLLPGLAGPSGEMVKXXXXXXXXXXXXXXDKVYAHQMVRTDSREQKLDA 120
GSNSSRMYFTQTLLPGLAGPSGEMVK DKVYAHQMVRTDSREQKLDA
Sbjct: 336 GSNSSRMYFTQTLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDA 395
Query: 121 FLQPLSKPLSS 131
FLQPLSKPLSS
Sbjct: 396 FLQPLSKPLSS 406
low complexity sequence filtered

A pairwise alignment consists of a series of paired bases,
one base from each sequence. There are three types of
pairs:
(1) matches = the same nucleotide appears in both
sequences.
(2) mismatches = different nucleotides are found in the
two sequences.
(3) gaps = a base in one sequence and a null base in the
other.
GCGGCCCATCAGGTACTTGGTG -G
GCGT TCCATC - - CTGGTTGGTGTG
Match Gap Mismatch

DNA vs. Protein searches
•DNA is composed of 4 characters: A,G,C,T It is
anticipated that on the average, at least 25% of the
residues of any 2 unrelated aligned sequences, would
be identical.
•Protein sequence is composed of 20 characters (aa).
The sensitivity of the comparison is improved. It is
accepted that convergence of Proteins is rare,
meaning that high similarity between 2 proteins
always means homology.

DNA vs. Protein searches
•What should we use to search for similarity, the
nucleotide or the protein sequences?
•If we have a nucleotide sequence, should we search
the DNA databases only? Or should we translate it to
protein and search protein databases?
Note, that by translating into aa
sequence, we’ll presumably lose information, since
the genetic code is degenerate, meaning that two or
more codons can be translated to the same amino
acid.

-GGAGCCATATTAGATAGA-
-GGAGCAATTTTTGATAGA-
Gly Ala Ile Leu asp Arg
Gly Ala Ile Phe asp Arg
DNA yields more phylogenetic information than proteins. The
nucleotide sequences of a pair of homologous genes have a
higher information content than the amino acid sequences of
the corresponding proteins, because mutations that result in
synonymous changes alter the DNA sequence but do not affect the
amino acid sequence. (Amino-acid sequences are more efficiently
aligned).
• 3 different DNA positions but
only one different amino acid
position:
2 of the nucleotide substitutions
are therefore synonymous and
one is non-synonymous.
Nucleotide, amino-acid sequences

DNA vs. Protein searches
•What about very different DNA sequences that
code for similar protein sequences? We certainly
do not want to miss those.
•Conclusion: We should use proteins for database
similarity searches when possible.

DNA vs. Protein searches
•The reasons for this conclusion are:
–When comparing DNA sequences, we get significantly
more random matches than we get with proteins.
–The DNA databases are much larger, and grow faster
than Protein databases. Bigger database means more
random hits!
– For DNA we usually use identity matrices, for protein
more sensitive matrices like PAM and BLOSUM, which
allow for better search results.
– The conservation in evolution, protein are rarely
mutated.

Input Query
DNA SequenceAmino Acid Sequence
Blastp tblastn blastn blastx tblastx
Compares
Against
Protein
Sequence
Database
Compares
Against
translated
Nucleotide
Sequence
Database
Compares
Against
Nucleotide
Sequence
Database
Compares
Against
Protein
Sequence
Database
Compares
Against
translated
nucleotide
Sequence
Database
An Overview of BLAST

THANK
YOU
Tags