Scientific Datasets and Machine Learning Benchmarks

SamuelJackson100 11 views 9 slides Jul 17, 2024

Slide 1 of 9

About This Presentation

Presented progress of SCD benchmarking efforts and acted as a panel member for a wider discussion of ML benchmarking for science

Size: 1.03 MB

Language: en

Added: Jul 17, 2024

Slides: 9 pages

Slide Content

Scientific Datasets and Machine
Learning Benchmarks
Sam Jackson
Rutherford Appleton Labs, STFC
[email protected]

Facilities at RAL
Rutherford Appleton Laboratory
Harwell Campus, near Oxford
•Rate of scientific data is
exploding
•Traditional data processing
cannot keep up
•Facilities looking for new
software/hardware solutions
to keep up.

Why Scientific Benchmarks?
•Scientific data is often large, complex, and challenging to work with
•General solutionsare not necessarily optimal
•Provide a realistic, specific & focussed test cases based on real
experimentaldata
•To motivate exploration of new ideas & models
•To inform hardware & software choices
•Focus on end-to-endbenchmarking rather than microbenchmarks
•Examples: Three archetypal problems
•Inverse problems
•Self-supervised denoising
•Multi-modal image segmentation

SLSTR Benchmark
9 Channel input
Each as separate NetCDF file.
2 resolutions
Generally convert to patches
Standard U-Net Architecture
Output a binary mask
Sea Surface Temperature
Estimates

SLSTR Benchmark
•Parts in Orange have already been implemented.
•Parts in Blue are not measured, but we can do them.
•Depends on system. (i.e. transfer time to PEARL)
•Both parts incur time penalty from unzipping.
Image Extraction Training Inference SST Validation

Benchmarking Code
•Code base started at:https://gitlab.stfc.ac.uk/sciml/sciml-benchmarks
•Pip installable
•Docker & singularity support.

Benchmarking Metrics

Benchmarking Metrics
•Currently Tracking
•Model Performance: Loss, Accuracy, etc.
•Time: Duration, Image/s
•Host information: CPU cores, utilization, Memory utilization, RAM etc.
•GPU information: number, utilization, power draw

Summary
•Scientific machine learning datasets are challenging in their scale &
complexity
•Providing representative datasets & models can:
•Motivate new solutions
•Inform & train non-ML, non-HPC experts
•Aid understanding of performance of new hardware to inform facility
choices
•Aid fair comparisons between models/methods/hardware/software

Scientific Datasets and Machine Learning Benchmarks

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Scientific Datasets and Machine Learning Benchmarks​ ​

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

Scientific Datasets and Machine Learning Benchmarks