Human genome project

ruchibioinfo 3,466 views 62 slides Sep 17, 2010
Slide 1
Slide 1 of 62
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62

About This Presentation

hgp
sequencing


Slide Content

HUMAN GENOME PROJECTHUMAN GENOME PROJECT
MS.RUCHI YADAV
LECTURER
AMITY INSTITUTE OF
BIOTECHNOLOGY
AMITY UNIVERSITY
LUCKNOW(UP)

HUMAN GENOME PROJECTHUMAN GENOME PROJECT
GENOME SEQUENCING
GENOME ASSEMBLY
GENOME ANNOTATION

Human Genome Project BackgroundHuman Genome Project Background
The idea of sequencing the entire human genome
was First proposed in discussions at scientific
meetings organized by the US Department of
Energy and others from 1984 to 1986
Recommended a broader programme, to include:
 The creation of genetic, physical and sequence
maps of the human genome;
Parallel efforts in key model organisms such as
bacteria, yeast, worms, fies and mice;
Development of technology in support of these
objectives;
Research into the ethical, legal and social issues
raised by human genome research.

HGP BACKGROUND……HGP BACKGROUND……
Human Genome Organization (HUGO) &
International Human Genome Sequencing Consortium
(IHGSC) was founded to provide a forum for
international coordination of genomic research
HGP Project is constituted as the National Human
Genome Research Initiative (NHGRI).
 The collaboration was coordinated through periodic
international meetings (referred to as ‘Bermuda
meetings’)
Work was shared flexibly among the centres, with
some groups focusing on particular chromosomes and
others contributing in a genome-wide fashion.
The second principle was rapid and unrestricted data
release. The centres adopted a policy that all genomic
sequence data should be made publicly available without
restriction within 24 hours of assembly (Bermuda
Principle)

Human Genome Project
Begun formally in 1990, the U.S. Human Genome
Project was a 13-year effort coordinated by the U.S.
Department of Energy and the National Institutes of
Health. The project originally was planned to last 15
years, but rapid technological advances accelerated the
completion date to 2003.
Project goals were to :-
Identify all the approximately 20,000-25,000 genes in
human DNA,
Determine the sequences of the 3 billion chemical base
pairs that make up human DNA,
Store this information in databases,
Improve tools for data analysis,
Transfer related technologies to the private sector, and
Address the ethical, legal, and social issues (ELSI) that
may arise from the project.

Milestones::
June 2000: Completion of a working draft of
the entire human genome
February 2001: Analyses of the working
draft are published
April 2003: HGP sequencing is completed
and Project is declared finished two years
ahead of schedule

Timeline of large-scale genomic analyses.

HUMAN GENOME
The human genome contains 3 billion chemical
nucleotide bases (A, C, T, and G).
The average gene consists of 3000 bases, but sizes
vary greatly, with the largest known human gene
being dystrophin at 2.4 million bases.
The total number of genes is estimated at around
30,000 much lower than previous estimates of
80,000 to 140,000.
 Almost all (99.9%) nucleotide bases are exactly
the same in all people.
 The functions are unknown for over 50% of
discovered genes.

HUMAN GENOME PROJECTHUMAN GENOME PROJECT
PUBLIC AND
PRIVATE SECTOR

Two Different Groups Worked to Obtain
the DNA Sequence of the Human Genome
The US HGP is a multinational consortium
established by government research agencies
and funded publicly.
Celera Genomics is a private company whose
former CEO, J. Craig Venter and Francis
collins, ran an independent sequencing project.
Differences arose regarding who should receive
the credit for this scientific milestone.
June 6, 2000, the HGP and Celera Genomics
held a joint press conference to announce that
TOGETHER they had completed ~97% of the
human genome.

PUBLISHED
The International Human Genome Sequencing
Consortium published their results in Nature,
409 (6822): 860-921, 2001.
“Initial Sequencing and Analysis of the
Human Genome”
Celera Genomics published their results in
Science, Vol 291(5507): 1304-1351, 2001.
“The Sequence of the Human Genome”

HGP SEQUENCING HGP SEQUENCING
STRATEGIESSTRATEGIES
LARGE SCALE SEQUENCING TECHNOLOGY

Genome GlossaryGenome Glossary

Genome GlossaryGenome Glossary

Genome GlossaryGenome Glossary

HGP SEQUENCING STRATEGIESHGP SEQUENCING STRATEGIES
The HGP project had three stages:
Genetic (or linkage) mapping
Physical mapping
DNA sequencing

Three-Stage Approach to Three-Stage Approach to
Genome SequencingGenome Sequencing

Strategic IssuesStrategic Issues
There are two approaches for sequencing
large repeat-rich genomes.
First is a whole-genome shotgun sequencing
approach, as has been used for the repeat-
poor genomes of viruses, bacteria and flies,
using linking information and computational
Second is the ‘hierarchical shotgun
sequencing’ approach , also referred to as
`map-based', `BAC-based' or `clone-by-
clone'

‘‘HIERARCHICAL SHOTGUN SEQUENCING’HIERARCHICAL SHOTGUN SEQUENCING’
`MAP-BASED', `BAC-BASED' OR
`CLONE-BY-CLONE'
Technology for large-scale sequencing
US HGP

Hierarchical shotgun sequencingHierarchical shotgun sequencing

Clone-by-clone or hierarchicalClone-by-clone or hierarchical
sequencing strategysequencing strategy
Advantages:
Ability to fill gap and re-sequence the
uncertain regions.
Ability to distribute the clones to other labs
Ability to check the produced sequence by
restriction enzymes
Disadvantages:
Expensive and time-consuming for
construction of the physical map
Experienced personnel are required,

HIERARCHIAL ASSEMBLY OF SEQUENCE
CONTIG SCAFFOLD

Assembly of the draft genome Assembly of the draft genome
sequencesequence
The key steps in assembling individual sequenced clones into the draft genome
sequence.

Levels of clone and sequence coverage.Levels of clone and sequence coverage.

WHOLE-GENOME SHOTGUNWHOLE-GENOME SHOTGUN
Developed by J. Craig Venter

Whole-Genome Shotgun Approach to Genome
Sequencing
The whole-genome shotgun approach was
developed by J. Craig Venter in 1992.
This approach skips genetic and physical
mapping and sequences random DNA
fragments directly.
Powerful computer programs are used to
order fragments into a continuous
sequence.

Whole-Genome Shotgun Sequencing

Shotgun Sequencing Strategy
Advantage:
No physical map construction,
Less risk of recombinant clones,
Cost effective and fast.
Ideal for small genome sequencing
Disadvantage:
Difficult to fill gaps and
Re-track all the sequenced plasmids,
Data less useful for positional cloning

Whole-Genome AssemblyWhole-Genome Assembly

Hierarchical vs. Shotgun Sequencing

Assembly of a mapped scaffold

Generating the draft genome sequence
Generating a draft sequence of the human
genome involved three steps:
Selecting the BAC clones to be sequenced,
Sequencing them ,and
Assembling the individual sequenced clones
into an overall draft genome sequence.

Assembly of the draft genome sequence
This process involved three steps:
Filtering,
Layout and
Merging.
The entire data set was filtered uniformly
to eliminate contamination from nonhuman
sequences and other artefacts that had not
already been removed by the individual
centres.

Assembly of the draft genome sequence
The sequenced clones were then associated
with specific clones on the physical map to
produce a `layout'.
The fingerprint clone contigs were then
mapped to chromosomal locations, using
sequence matches to mapped STSs from
four human maps; radiation hybrid maps,
one YAC and two genetic maps together
with data from FISH

The human
genome
assembly and
annotation
process
•BUILD CYCLE
•DATA FREEZE
•RELEASE

The human genome assembly and annotation
process : INPUTS

Genome AnnotationGenome Annotation
Feature Annotation
◦Clone Features
◦STS Features
◦SNP Features
◦Gene, mRNA(transcript),
◦misc_RNA(pseudogenes , and non-coding
transcripts, )
◦Protein Features
◦Repeat features

Genome AnnotationGenome Annotation
Products
◦Sequence Data
◦Resource Support( dbSNP , Entrez Gene, Map
Viewer, UniSTS)
Data Access
◦BLAST
◦Entrez Retrieval(Accession number, gene
symbol, or protein name)
◦FTP(genomes FTP site)

Links from Map Viewer objects to other
NCBI resources

UCSC put the human genome
sequence on the web July 7, 2000
UCSC put the human genome sequence
on CD in October 2000, with varying
results

HGP ON WEBHGP ON WEB
Genome Browsers were developed and are maintained
by the University of California at Santa Cruz (UCSC) .
 EnsEMBL project of the European Bioinformatics
Institute and the Sanger Centre Additional browsers
have been created;
URLs are listed at www.nhgri.nih.gov/genome_hub.
These web-based computer tools allow users to view
an annotated display of the draft genome sequence,
with the ability to scroll along the chromosomes and
zoom in or out to different scales.
In addition to using the Genome Browsers, one can
download from these sites the entire draft genome
sequence together with the annotations in a computer-
readable format.

UCSC GENOME BROWSERUCSC GENOME BROWSER

Broad genomic landscapeBroad genomic landscape
The distribution of GC content,
CpG islands
Recombination rates,
Repeat content and
Gene content of the human genome.

Long-range variation in GC contentLong-range variation in GC content
GC-rich and GC-poor regions may have
different biological properties:
Gene density,
Composition of repeat sequences,
correspondence with cytogenetic bands
Recombination rate
CpG islands are of particular Interest
because they are associated with the
5’ends of genes

Repeat content of the human genomeRepeat content of the human genome

INTERSPERSED REPEATSINTERSPERSED REPEATS

Gene content of the human genomeGene content of the human genome
RNA genes and
protein-coding genes in the human genome.
Noncoding RNAs

There are several major classes of ncRNA
tRNA
rRNAs
small nucleolar RNAs (snoRNAs) are
small nuclear RNAs (snRNAs) are critical components
of spliceosomes, the large ribonucleoprotein (RNP)
complexes that splice introns out of pre-mRNAs in the
nucleus.
ncRNAs do not have translated ORFs, are often small
and are not polyadenylated.

  Software tools for ab initio gene prediction

  Software tools for ab initio gene prediction

Distribution of the homologues of Distribution of the homologues of
the predicted human proteins.the predicted human proteins.

Conserved Conserved
segments in the segments in the
human and human and
mouse genome.mouse genome.

* * Each colour
corresponds to a
particular mouse
chromosome.

DISEASE GENESDISEASE GENES

DRUG TARGETSDRUG TARGETS

Research challenges in genetics--what we still don't know, even with
the full human DNA sequence in hand.
Gene number, exact locations, and functions ,Gene regulation
DNA sequence organization ,Chromosomal structure and organization
Noncoding DNA types, amount, distribution, information content, and
functions
Coordination of gene expression, protein synthesis, and post-translational
events
Interaction of proteins in complex molecular machines
Predicted vs. experimentally determined gene function
Evolutionary conservation among organisms ,Protein conservation (structure
and function)
Proteomes in organisms
Correlation of SNPs with health and disease
Disease-susceptibility prediction based on gene sequence variation
Genes involved in complex traits and multigene diseases
Complex systems biology, including microbial consortia useful for
environmental restoration
Developmental genetics, genomics

“The more we learn about the human genome,
the more there is to explore”
“We shall not cease from exploration. And the end of all
our exploring will be to arrive where we started, and
know the place for the first time.” T. S. Eliot