FAIRSpectra - Enabling the FAIRification of Analytical Science

AlexHendersonManchester 47 views 34 slides May 08, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

Presentation at the BioFAIR Roadshow in Manchester.
23 April 2024
https://biofair.uk/
https://fairspectra.net


Slide Content

FAIRSpectra
Enabling the FAIRification
of Analytical Science
Alex Henderson
University of Manchester
Office for Open Research
https://fairspectra.nethttps://alexhenderson.info

Thanks…
•For financial support
•University of Manchester’s Office for Open Research
•SurfaceSpectra Ltd.
•For in-kind support (free exhibition space)
•UK Surface Analysis Users Forum (UKSAF)
•SIMS Europe
•SpringSciX 2024
•101
st
IUVSTA Workshop (The International Union for Vacuum Science, Technique and Applications)
•Zulip (free upgrade)
SIMS Europe
Office for Open Research

What is Analytical Science?
•Assessing physical, chemical, or biological nature of ‘things’
•Anything called a ‘test’ was developed using analytical science
•Instrument development
•Method development
•Once formalised, used in assays
•‘Assay’
•Where analytical science becomes analytical ‘engineering’
•Same data and metadata requirement, but varies less often

What is Analytical Science?
•Instrumentation-based chemical analysis
•mass spectrometry (MALDI, SIMS, DESI)
•UV-vis / infrared / Raman spectroscopies
•NMR
•X-ray diffraction
•…
•Hyphenated techniques (e.g. LC-MS)
•Variants of each technique have own requirements
•Need to consider combination of techniques → data fusion
•Concentrating on imaging modalities

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
Secondary ion mass spectrum of bovine colostrum lactoferrin on mica. SurfaceSpectra Ltd.Bovine colostrum lactoferrin (positive ion)
The Static SIMS Library, SurfaceSpectra Ltd

m/z
20018016014012010080604020
Intensity
26,000
25,000
24,000
23,000
22,000
21,000
20,000
19,000
18,000
17,000
16,000
15,000
14,000
13,000
12,000
11,000
10,000
9,000
8,000
7,000
6,000
5,000
4,000
3,000
2,000
1,000
0

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
Typical infrared spectrum of tissue. Courtesy Peter Gardner @ Manchester

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
X-ray photoelectron spectra. Range of carbon functionalities in polymers. SurfaceSpectra Ltd.

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
Secondary ion mass spectrometry spectra from a range of urinary-tract infection bacteria. John Fletcher @ Manchester now Gothenburg

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
Co-located infrared and Raman spectroscopy spectra of neuroglioma cells. Photothermal Inc.

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
Secondary ion mass spectrometry image of arsenic, sulfur and phosphate abundance in rice. Katie Moore @ Manchester

What is hyperspectral imaging?
•Operates more like a camera, with multiple image elements
•128 × 128 pixels, liquid nitrogen cooled
•Mosaic these ‘pictures’ to cover large areas
Courtesy of Peter Gardner @ Manchester

Modalities
Infrared spectroscopy hyperspectral image of kidney tissue. False coloured k-means classification of spectra. Caryn Hughes @ Manchester (unpublished)
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
Infrared spectroscopy hyperspectral image of prostate cancer tissue. False coloured Random Forests classification of spectra. Peter Gardner @ Manchester
H&E optical FTIR
Epithelium
Smooth Muscle
Lymphocytes
Blood
Concretion
Fibrous Stroma
ECM

What is 3D hyperspectral imaging?
3 dimensions
2 physical + 1 chemical
4 dimensions
3 physical + 1 chemical
2D SIMS 3D SIMS

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
Secondary ion mass spectrometry 3D image depth profile, topography corrected, false coloured green=lipid (DPPC), R=nucleic acid (Adenine). John Fletcher & Alex Henderson @ Manchester

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
Secondary ion mass spectrometry 3D depth profile image showing cholesterol distribution in surface of frog oocyte. John Fletcher @ Manchester, now Gothenburg

Modalities
•Single spectrum
•Collections of spectra
•Spectral maps
•Multispectral images
•Hyperspectral images
•3D images
X-ray photoelectron spectroscopy image of nitrogen and specific carbon functionality in caffeine tablet. Alex Henderson @ Manchester with Kratos Analytical.

What are the issues?
Academia
Funders require ‘data’ to be deposited in (open) repositories
But…
•Not all analytical science areas have a dedicated repo.
•Instruments store data in proprietary file formats
•Metadata spread across multiple ontologies, or no terms avail.
•Not all experiments are part of a large assay → varied metadata
•Many software packages not compatible with open formats
Researchers willing to share, but don’t know how

What are the issues?
Commercial activity needs to be considered
•Barriers
•FAIR often confused with Open
•In-house processes considered good enough
•Worry about certain metadata usage giving secrets away
•Benefits
•Easier to share data in-house, between labs and (overseas) sites
•FAIR practises lead to better records retention
•Acquisitions and mergers become more straightforward
•Third-party (open source) software becomes easily accessible
•Incoming staff already familiar with systems
Instrument vendor buy-in vital for born-digital data and metadata

Moving forward
https://xkcd.com/2116/

What is FAIRSpectra?
Community driven initiative
Focus on hyperspectral imaging techniques
•Metadata requirements
•File formats for hyperspectral imaging
•No standards exist right now
•Software tools to support these
•Education and training
•Raising awareness

What is FAIRSpectra?
https://fairspectra.net https://fairspectra.zulipchat.com https://github.com/FAIRSpectra

Which metadata are required?
•Sampling method
•Storage conditions
•Chemical modifications
•Physical state
•Pre-treatment
•…
Sample
•Experiment plan
•Substrate material
•Mounting method
•Region analysed
•Instrument params
•…
Experiment
•Artifact removal
•Pre-processing
•Algorithm choice
•Hyperparameters
•Validation method
•…
Analysis
Downstream reporting
Upstream sample provenance

Where do we start?
•Sampling method
•Storage conditions
•Chemical modifications
•Physical state
•Pre-treatment
•…
Sample
•Experiment plan
•Substrate material
•Mounting method
•Region analysed
•Instrument params
•…
Experiment
•Artifact removal
•Pre-processing
•Algorithm choice
•Hyperparameters
•Validation method
•…
Analysis
Downstream reporting
Upstream sample provenance
Born-digital metadata
in data files
Limited/common options

Where do we start?
•Sampling method
•Storage conditions
•Chemical modifications
•Physical state
•Pre-treatment
•…
Sample
•Experiment plan
•Substrate material
•Mounting method
•Region analysed
•Instrument params
•…
Experiment
•Artifact removal
•Pre-processing
•Algorithm choice
•Hyperparameters
•Validation method
•…
Analysis
Downstream reporting
Upstream sample provenance
Many workflows have
common steps
Default hyperparameters

Where do we start?
•Sampling method
•Storage conditions
•Chemical modifications
•Physical state
•Pre-treatment
•…
Sample
•Experiment plan
•Substrate material
•Mounting method
•Region analysed
•Instrument params
•…
Experiment
•Artifact removal
•Pre-processing
•Algorithm choice
•Hyperparameters
•Validation method
•…
Analysis
Downstream reporting
Upstream sample provenance
Samples so varied makes
this very difficult

Where do we start?
•Sampling method
•Storage conditions
•Chemical modifications
•Physical state
•Pre-treatment
•…
Sample
•Experiment plan
•Substrate material
•Mounting method
•Region analysed
•Instrument params
•…
Experiment
•Artifact removal
•Pre-processing
•Algorithm choice
•Hyperparameters
•Validation method
•…
Analysis
Downstream reporting
Upstream sample provenance
Workflows also include repeated steps
with poorly defined break points

Exhibition booths at 3 conferences/workshops
•More planned, another 3 before Christmas
Survey
•Positives
•Everyone wanted to see something done, not sure about how
•Barriers
•People have difficulty sharing
•Poor documentation
•Proprietary file formats – loss of information, roundtripping
•Raw data versus processed data
•File size
•Gazumping / IP & prior art / confidentiality
•Time consuming
Where are we now – 6 months in?

Data file formats
•Looking to re-purpose strategies from astronomy, climate science,
and microscopy
•Some file convertors ready for testing
•Suitable test files an issue – everyone’s a critic
Two instrument vendors interested in getting involved
•Need to be careful not to go too fast
•Only one shot at changing instrument software
Discussions with journals just beginning
•Need minimum reporting requirement
•Need to convince referees this is important
Where are we now – 6 months in?

Metadata
•Trying to identify suitable ontologies for
•Instrumentation
•Non-life science areas
•How to combine data and metadata
Looking at LEGO-like semantic metadata bricks
•Start REALLY small
•Can be aggregated to produce SOPs
•Aggregations can be version controlled
Where are we now – 6 months in?

BioFAIR - FAIRSpectra synergies:
Standardisation

BioFAIR - FAIRSpectra synergies:
Tooling
•Metadata
•Identify suitable ontologies for instrumentation
•How to capture metadata without onerous workload
•PIDs
•Aggregations of LEGO bricks is a graph
•Can be version controlled
•Need version controlled PID graph
•Produces citeable, machine actionable SOPs
•Bringing data and metadata together
•How to combine data and metadata
•Cannot put barriers to data analysis using proprietary software

Summary
Analytical Science has large overlap with Life Science
The researchers are willing, but their resources are weak
•Few solutions currently exist
•Metadata terms missing
•Proprietary file formats are a barrier
•Instrument vendor buy-in required
•Lack of awareness persists
But…
•Some low-hanging fruit
•Opportunity to make an impact
•Even Closed FAIR can still have benefits to industry
There’s lots to do, but FAIRSpectra is just getting started!
https://fairspectra.net