Ensembl Plants: Visualising, mining and analysing crop genomics data

danbolser 2,124 views 36 slides May 23, 2014
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

Ensembl Plants is a genome centric platform for visualisation and analysis of plant genomics data. It hosts assembly, sequence, expression, variation and comparative datasets for a growing number of plant species (currently 26) covering a range of economically important crops, including brassica, to...


Slide Content

Ensembl Plants:
Visualising, mining and analysing crop
genomics data
Dan Bolser
Ensembl Plants project leader
EMBL-EBI

http://plants.ensembl.org
#EnsemblGenomes

Visualising, mining and
analysing data:

●The Ensembl
genome browser

●BioMart

●Tools for processing
your own data
Overview
Background:

●Ensembl Plants
●History
●Data

●Recent updates
●Wheat
●Barley

EBI
Ensembl is developed
jointly by the EBI and
the Wellcome Trust
Sanger Institute

Ensembl Plants uses Ensembl technology
Ensembl:

●A platform for genome browsing, annotation and analysis
developed jointly by the EBI and Wellcome Trust Sanger Institute.

●Has modules for handling:
●Genomic data, Variations, Comparative genomics, Gene prediction, ...

●Multiple points of access to data:
●Browser-based application, Perl and REST APIs, direct access
(MySQL), BioMart data mining tool, DAS (client and server), FTP.

●Upload your own data and compare it to the reference seq. and annotation.

Ensembl was originally developed for vertebrate genomes, subsequently
extended to non-vertebrate species:
●Ensembl Genomes → Ensembl Plants

Currently 33 genomes in
Ensembl Plants
http://plants.ensembl.org

Dicots in
Ensembl Plants
(10)
Brassicales
Fabales
Malpighiales
Rosales
Solanales
Vitales

Monocots in
Ensembl Plants
(12+5)
Poales
Zingiberales

'Others' (5)

Types of data in Ensembl (Ensembl Plants)
●Genomic sequence
●Gene, transcript, and protein annotations
●External references and ontology terms
●Mapped sequences: cDNAs, proteins,
probes, BACs, repeats, markers, ...

●Variation data:
●sequence variants
●structural variants

●Comparative data:
●gene trees, orthologues, paralogues
●whole genome alignments and synteny

Recent data updates

Wheat data in Ensembl Plants
●The chromosome survey sequence
from the International Wheat Genome
Sequencing Consortium.
●Version 2.1 of the IWGSC gene models called
on the chromosome survey sequence.

●Repeats
●Repbase
●The Triticeae Repeat Sequence Database
(TREP)

●Alignments
●RNA-seq from various studies in ENA
●ESTs and UniGene clusters
●5x 454 Brenchley et al.
●Triticum turgidum cDNA assemblies

Wheat data in Ensembl Plants
●Whole genome alignments
●Between wheat(s) and:
●Rice
●Brachypodium
●Within wheat
●A vs. B
●A vs. D
●B vs. D

●Gene trees
●Aegilops tauschii
●Triticum urartu
●and other more
distant relatives

WGA between wheat, rice and brachy

WGA within wheat A, B and D sub-genomes

Gene trees

Gene trees

Walk through ‘demo’ for
Ensembl Plants

Search

Variant Effect Predictor (VEP)


●Predicts functional consequences of known and
unknown variants
●For substitutions, insertions, deletions and structural
variants
●Web interface (for up to 750 variants), standalone Perl
script, Perl API and REST API

Visualise your own data
Upload data:
●Data saved on server
●5 MB limit
●Large file formats?

Attach remote files:
●URL-based
●HTTP or FTP
●No size limit
Upload formats:
●BED genes / features
●Gbrowse genes / features
●GFF/GTF genes / features
●PSL sequence alignments
●WIG continuous-valued data
●BedGraph continuous-valued data
●TrackHub collections of tracks

Attach formats:
●BigBed genes / features
●BAM sequence alignments
●BigWig continuous-valued data
●VCF variants
User added tracks:
●Can be saved or shared
●Only trivial security, do not use for sensitive data!

The barley Gene-ome

●Step 1 – Dataset
●Choose your dataset
and species

●Step 2 – Filters
●Limit your dataset

●Step 3 – Attributes
●Specify what
information you want
to output

●Step 4 – Results
●Preview and output
your results

Blast and
BioMart...

[email protected] 10/01/2014

Funding (Ensembl Plants)
•Ensembl Genomes Funded by
•EMBL
•EU (INFRAVEC, Microme, transPLANT, AllBio)
•BBSRC (PhytoPath, wheat, barley and midge sequencing,
UK-US collaboration, RNAcentral)
•Wellcome Trust (PomBase)
•NIH/NIAID (VectorBase)
•NSF (Gramene collaboration)
•Bill and Melinda Gates Foundation (wheat rust)

[email protected] 10/01/2014

People (Ensembl Plants)

•James Allen, Irina Armean, Dan Bolser, Mikkel
Christensen, Paul Davies, Christoph Grabmueller, Kevin
Howe, Malcolm Hinsley, Jay Humphrey, Arnaud
Kerhornou, Paul Kersey, Julia Khobdova, Eugene
Kulesha, Nick Langridge, Dan Lawson, Mark McDowall,
Uma Maheswari, Gareth Maslen, Michael Nuhn, Chuang
Kee Ong, Michael Paulini, Helder Pedro, Anton Petrov,
Dan Staines, Mary Ann Tuli, Brandon Walts, Gary
Williams

•If you have a question that is not answered here,
please Contact our HelpDesk:
[email protected]