Alex Hardisty, XLDB-Europe, Edinburgh, 8-10
th June 2011 Page 2
doing things, their own data resources and their own tools. Not only that, but they have their own different
vocabularies and conceptual underpinnings. Interoperability is a problem demanding a determined
ontological and thesaurus solution like that used in the medical domain: the Unified Medical Language
System (UMLS) (www.nlm.nih.gov/research/umls).
The interconnections between different biodiversity ideas/concepts, data sources, and the outputs from
data processing, manipulation and modelling are intricate. As well as the traditional sources mentioned
above, genomic data including, for example: sequence data, DNA barcodes and phylogenies are becoming
increasingly important sources. Biodiversity science also demands environmental data (climate, soil, ocean
temperature, etc.), as well as economic and census data for particular types of studies.
Apart from the well known and often large sources - GBIF, EBI, environmental data, census data - there are
numerous small datasets in the hands of individual researchers. If computerised at all, these small datasets
are often held in spreadsheets and with no identifiable common structure. There are probably thousands of
them. And multiple tools for processing too. The biodiversity science community is highly fragmented and
all these kinds of small, personal, group and departmental datasets need to get published and become
discoverable and usable.
LifeWatch aims to support upwards of 25,000 users, primarily from the academic and research community,
and the policymaking community, but also supporting the student education sector and the general public
(citizen science).
The LifeWatch strategy of “Thinking globally, acting locally” addresses these challenges of heterogeneity
and scale. “Thinking globally, acting locally” devises and promotes the pan-European top-down strategies
that foster collaboration and interoperability, and at the local level assists and encourages ‘islands’ of
compliant infrastructure to emerge and fuse.
ENVRI: Common Operations of the ESFRI Environmental Research
Infrastructures
What is ENVRI?
ENVRI is a soon to be funded EC FP7 project that brings together many of the main ESFRI research
infrastructures from the environmental sciences domain. The ENVRI project will contribute to the
construction of these research infrastructures by sharing experiences and technologies and by solving
crucial common technology issues and challenges together. Through cooperation in this project the ESFRI
ENV infrastructures, together with ICT partners, are seeking to increase the interoperability of their data
and facilities to increase the use and effectiveness of their infrastructures. The central goal of the ENVRI
project is to implement harmonised solutions and draw up guidelines for the common needs of the
environmental ESFRI projects, with a special focus on issues as architectures, metadata frameworks, data
discovery in scattered repo sitories, visualization and data curation.
ENVRI recognises scientific data services as part of a horizontal set of foundational services that include
communications, distributed computing, and storage. It recognises that data providers, as well as data
users, are users of data services and that there are common requirements irrespective of domain-specific
communities. Community-specific services sit on top of data services and interact with them.
The key to improved interoperability is finding common solutions to common problems that can be
adopted by each research infrastructure as it progresses through its construction phase. Fundamental
common solutions include: