BIF-20306
Introduction to Bioinformatics
Week 1 –Recap and review
RensHolmer
1
Self-test
§66 / 140 peopletook the self-test
§Class average 77%
§Note that the self-test is not at exam level; it is checking
whether you know the basic concepts (like in part A of
the exam)
§A test exam will be provided later in the course
2
What does the term "genetic code" refer
to?
§The linear sequence of nucleotides in a gene
§The substitution of uracil (U) for thymine (T) in an RNA
molecule
§The set of genes encoded in an organism's DNA
§The correspondence between the 4-letter nucleotide
alphabet of DNA and the 20-letter amino acid alphabet
of proteins
3
The standard
genetic code
http://de.genetica.wikia.com/wiki/Genetischer_Code4
Provided in the exam
Which transcript is produced from a piece
of DNA with the following coding strand:
CATTGCCAGT?
5
What provides the information necessary to
specify the three-dimensional shape of a
protein?
9
§The protein’s peptide bonds
§The protein’s interaction with other polypeptides
§The protein’s amino acid sequence
§The protein’s interaction with molecular
chaperones
Amino acid properties
We stick to this
classification, even
though in reality it is not
black and white.
10
Amino acid properties
Many amino acids have
multiple properties.
E.g.tyrosine has both
hydrophobic and
hydrophilic features
11
Some amino acids are
Very small
Forms disulfide bridges
Covalent bonds!
The side chain connects
back to the backbone
Phi is fixed!
12
Which hydrogen bonds have been found to
stabilize a polypeptide’s folded shape?
§Hydrogen bonds between side chain atoms (A)
§Hydrogen bonds between backbone atoms (B)
§Hydrogen bonds between backbone atoms and side
chain atoms (C)
§All of the above
§A and B, but not C
14
Protein secondary structure
15
Stabilized by hydrogen bonds between
backbone atoms
Alpha helixBeta sheet
anti-parallelparallel
To which of the following databases can
you as a user not directly submit novel
information?
§GenBank
§Pfam
§ENA
§Uniprot
16
Self-test incorrectly only
allowed one answer!
ORFfinder
Nice for prokaryotes, less useful
for eukaryotes
GFF file
A line in a GFF file describes the following feature:
Chr1 TAIR10 CDS 10 24 . + 0
Parent=AT1G65484.1,AT1G65484.1-Protein;
What is the sequence of this CDS given this Chr1 sequence?
AGAAGAATAATGGGTTTGAAAATGTCAAGCAATGCACTTC?
GFF file
A line in a GFF file describes the following feature:
Chr1 TAIR10 CDS 10 24 . + 0
Parent=AT1G65484.1,AT1G65484.1-Protein;
What is the sequence of this CDS given this Chr1 sequence?
AGAAGAATAATGGGTTTGAAAATGTCAAGCAATGCACTTC?
Project Preparation Exercise
§Sequence analyses, evolution and structure of
Auxin Response Factors (ARFs), key regulators of plant development
Auxin Response Factor (ARF)
§Auxin-dependent
transcription factor:
binds DNA,
regulates expression
of certain genes
based on auxin level
Auxin-dependent regulation
§Auxin promotes degradation of Aux/IAA proteins that
prevent ARFs from regulating target genes
●In other words:
auxin up ->
Aux/IAA down ->
ARF works->
targets up
From sequence to understanding
>Unknown_sequence
MNSSGVEQGVVIAESEPPRGNRSRAFACAILAL
SDVQLEILMGILNIYSLVGSGAAGRTSDWLGRR
VGRFVAGIGVGYAMMIAPVYTAEVAPASSRGFL
HLGWRFMLGVGAVPSVFLAIGVLAMPESPRWLV
PLEEMETLFGSYTANKKNNSMSKDNQ
given a sequence,
what is already known?
how did it
get here?
what does the
product look like?
what is its role
in the cell?
Genes, proteins
and databases
Sequence
similarity
Evolution
Structure
and function
-Omics
UniProt (1)
25
UniProt (2)
26
UniProt (3)
27
InterPro
28
UniProt (4)
29
UniProt (5)
30
TAIR JBrowse
31
Exam(ple)
§Given the following DNA sequence and annotation, predict
the effect of the following mutations:
§Position 3 A -> G
§Position 16 G -> A
>CHR1
CGATGGTACGTCCAGGGAGCTACTAACG
1 5 1 1 2 2 2
0 5 0 5 8
##gff-version 3
CHR1 . mRNA 3 26 . + . ID=mRNA1
CHR1 . CDS 3 9 . + . ID=CDS1
CHR1 . CDS 16 26 . + . ID=CDS2
32
Exam(ple): gene structure
>CHR1
CGGTGGTACGTCCAGGGAGCTACTAACG
1 5 1 1 2 2 2
0 5 0 5 8
##gff-version 3
CHR1 . mRNA 3 26 . + . ID=mRNA1
CHR1 . CDS 3 9 . + . ID=CDS1
CHR1 . CDS 16 26 . + . ID=CDS2
mRNA: ATG GTA CGG AGC TAC TAA
Amino acid: M V R S Y *
Met Val Arg Ser Tyr *
33
Mutation 1: position 3 A -> G
>CHR1
CGATGGTACGTCCAGGGAGCTACTAACG
1 5 1 1 2 2 2
0 5 0 5 8
##gff-version 3
CHR1 . mRNA 3 26 . + . ID=mRNA1
CHR1 . CDS 3 9 . + . ID=CDS1
CHR1 . CDS 16 26 . + . ID=CDS2
mRNA: GTG GTA CGG AGC TAC TAA
Amino acid: VV R S Y *
ValVal Arg Ser Tyr *
34
Mutation 2: position 16 G -> A
>CHR1
CGATGGTACGTCCAGGGAGCTACTAACG
1 5 1 1 2 2 2
0 5 0 5 8
##gff-version 3
CHR1 . mRNA 3 26 . + . ID=mRNA1
CHR1 . CDS 3 9 . + . ID=CDS1
CHR1 . CDS 16 26 . + . ID=CDS2
mRNA: ATG GTA CAG AGC TAC TAA
Amino acid: M V QS Y *
Met Val GlnSer Tyr *
35
Week 2 –Sneak Preview
Alignments
Sequence search
Primer design
36
Genes, proteins
and databases
Sequence
similarity
Evolution
Structure
and function
-Omics
We are drowning in data…
37
Finding the fish in the sea…
38
Sequence alignments are a crucial step in
sequence comparisons
Alignment is the task of locating equivalent regions of two or
more sequences to maximize their similarity
MSKMLAGSN--VERMILV
||:. |||. :||||:|
MSRV-AGSDLVIERMIMV
39