Advancing the International Plant Names Index (IPNI)

992 views 34 slides Oct 18, 2011
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

The "names and taxa" information space is often thought of as being composed of three layers:
Taxonomic concepts
Code governed nomenclatural acts
Name occurrences
In many circumstances the distinction of these layers is blurred, leading to confusion and inefficiencies in information manage...


Slide Content

Advancing the International
Plant Names Index (IPNI)
Nicky Nicolson, Alan Paton, Jim Croft, James Macklin,
Paul Morris, Greg Whitbread, Kanchi Gandhi

Advancing IPNI
•Current - where IPNI is now
•Issues
•Future - where we’d like to go and how to get
there

What data?
•What data types:
–ICBN governed nomenclatural acts
–Standardised author list
–Publications
•Which groups:
–Vascular plants
•Which ranks:
–Family and below

How is data entered?
•Data entry:
–From literature scanning, journals received by
library at Kew, Harvard, Canberra (2 years - 95%)
–User reports of missing nomenclatural acts,
usually accompanied by a link to digitised
literature page (BHL)
•How many?
–About 7400 names entered in average year
–About 6100 nomenclatural acts published / year
–… of these about 2800 are tax. novs.

How is data managed?
•Full audit history on core objects – names /
authors / publications.
•Average 300,000 edits on name records / year
•Standardisation effort ongoing :
–Epithet
–Author citation
–Publication title
–Collation
–Year

Standardisation – author and title
Author and Title standardization
30%
40%
50%
60%
70%
80%
90%
Mar- 06 J un-0 6 S ep- 06 De c-0 6 Mar- 07 J un-0 7 S ep- 07 De c-0 7 Mar- 08 J un-0 8 S ep- 08 De c-0 8 Mar- 09 J un-0 9 S ep- 09 De c-0 9 Mar- 10 J un-1 0 S ep- 10 De c-1 0 Mar- 11 J un-1 1
standardized author citationsstandardized publication title

Standardisation – epithet updates
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
2006-01 2006-03 2006-05 2006-07 2006-09 2006-11 2007-01 2007-03 2007-05 2007-07 2007-09 2007-11 2008-01 2008-03 2008-05 2008-07 2008-09 2008-11 2009-01 2009-03 2009-05 2009-07 2009-09 2009-11 2010-01 2010-03 2010-05 2010-07 2010-09 2010-11 2011-01 2011-03 2011-05

Standardisation of epithets
•Why important
–Main search criterion
–Improving epithets enables other improvements
in dataset e.g.:
•basionym linkage
•de-duplication
–Errors propagate

Rhus keamcyi was
an OCR error for
Rhus kearneyi but
the incorrect value
persists in datasets
derived from IPNI

Statistics
•Dataset can be used for trends analysis:
–Publication rates
–Combination rates
–Author collaborations
•Audit history used to determine changes in
data-set over time
http://www.ipni.org/stats.html

http://www.ipni.org/stats.html

As well as the data…
•IPNI editors respond to user queries about the
data, dealing with c. 50 cases / month
•Includes an expert service re interpretation of
ICBN
•Can provide worked examples illustrating
particular articles of the code

Why should anyone care?
•c55,000 searches / day
BUT
•dataset is not being used to full advantage
•inputs not being handled efficiently:
–limited to partnership
–missing out on community input
•expertise is hidden

Future
•Increase efficiency of input
–provision of core data
–annotating and linking existing data
–solving nomenclatural problems
•Increase output
–usage of IPNI data
–benefit from on-going curation effort
– benefit from nomenclatural expertise

Data in - contributor services
•Pre-publication data entry
•Batch submission of datasets
•Annotation
•Addition of links within dataset
•Facilitate interpretation of nomenclatural
issues
•Accreditation – credit for helping improve the
data

Pre-publication data entry
•Workflow currently being trialled
–Author or publisher submits data to IPNI once
article has been accepted for publication
–Generated record suppressed until publication
effective under the code
–But this not yet automated!

Electronic Publication Example -
Phytokeys
A nomenclator of Pacific oceanic island Phyllanthus
(Phyllanthaceae), including Glochidion
Warren L. Wagner, David H. Lorence
•5. Phyllanthus atalotrichus (A.C. Sm.) W.L. Wagner
& Lorence, comb. nov.
urn:lsid:ipni.org:names:77112693-1
PhytoKeys 4: 67–94 (2011)
doi: 10.3897/phytokeys.4.1581
www.phytokeys.com

Pre-publication issues
•Name squatting – mitigated by only entering
names which are in papers accepted for
publication
•Curation of record throughout publication
process
•Electronic and effective publication – before
this the record will not be visible
•IPNI editors provide visible expert service re
validity of name

Where IPNI data are placed
Any name occurrence: e.g. specimens, reports, literature citation
concepts
Standard form of name

Data out - links
•To concept layer:
–embed IPNI identifiers
–storage of factual concepts / links to concept layer
•To name occurrence layer:
–seed lexical reconciliation projects (e.g. GNI)
•To allied information:
–literature
–types

Links to concept layer
Embed IPNI identifiers in externally held names lists
•IPNI holds curated name data, labelled with persistent
identifiers.
•Need a tool to seed IPNI identifiers into datasets (in
prototype)
• Can devolve curation of name elements in other systems to
IPNI
Benefit from on-going curation:
•300,000 edits per year
Report on changes in name list since date

Links to the Concept Layer
Example The Plant List

Link to name occurrence layer
•IPNI’s version history can be used to seed lexical
reconciliation projects (GNI), e.g.:
–Plectranthus macrophylius -> Plectranthus macrophyllus
•These editorialised translations of higher value than
programmatically derived operations of the same
edit distance, e.g:
–Plectranthus microphyllus -> Plectranthus macrophyllus
•Standardisation tools and techniques opened up for
use in allied projects

Conclusion
•Faciliate electronic publication - pilot
registration
•Foster larger community to support the data
and automate workflows
•Stronger links between:
–the people who produce names
–the places where they are published
–the downstream users
•Technical redevelopment