Linac Coherent Light Source (LCLS) Data Transfer Requirements

insideHPC 473 views 25 slides Feb 25, 2018
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

In this deck from the Stanford HPC Conference, Les Cottrell from the SLAC National Accelerator Laboratory, at Stanford University presents: Linac Coherent Light Source (LCLS) Data Transfer Requirements.

"Funded by the U.S. Department of Energy (DOE) the LCLS is the world’s first har...


Slide Content

Dr. Les Cottrell,
SLAC
<[email protected]>
LinacCoherent Light Source (LCLS)
Data Transfer Requirements
HPC talk
Stanford Feb 2018

2
LCLS-II, a major (~ B$) upgrade to LCLS is
currently underway. Online in 2020.
Video

Basic instrument layout:
Optical laser pump, and x-ray laser probe
Basement Level
Optical
Laser
X-ray hutch
X-Rays
Sub-Basement Level
3

Example experiment #1
‘Molecular Movie’ Captures Ultrafast Chemistry in Motion
Scientific Achievement
Time-resolved observation of
an evolving chemical reaction
triggered by light.
Method
LCLS X-ray pulses were
delivered at different time
intervals, measuring
thestructural changes on an
X-ray area detector.
M.P.Minitti,J.M.Budarz,etal.,Phys.Rev.Lett.,114,255501(2015)(COVERARTICLE)
Significance and Impact
ResultspavethewayforawiderangeofX-raystudiesexamininggasphasechemistry
andthestructuraldynamicsassociatedwiththechemicalreactionstheyundergo.
4

5
Next example: Catalysis
Another related example
Polluting
gases
Porous substrate
coated with
precious metals
Exhaust gases
react with
precious metals
H2O, CO2
Catalyst
HC, CO, NOX
50% of the world’s gasoline goes through fluid catalytic cracking

Example experiment #2: Catalytic converter –
transient dynamics resolved at the atomic scale
•Surface catalysis of CO oxidation to CO
2
•Sub-picosecond transient states, monitored
via appearance of new electronic states in
the O K-edge x-ray absorption spectrum.
H. Öström et al., Science2015
6

Data Analytics for high repetition rate Free Electron Lasers
7
FEL data challenge:
●Ultrafast X-ray pulsesfrom LCLS are
used like flashes from a high-speed
strobe light, producing stop-action
movies of atoms and molecules
●Both data processingand scientific
interpretationdemand intensive
computational analysis
LCLS-II represents SLAC’s largest data challenge
LCLS-II will increase data throughput by orders of magnitudeby 2026,
creating an exceptional scientific computing challenge

8
LCLS-II experiments will present challenging computing
requirements, in addition to the capacity increase
1.Fast feedbackis essential (seconds / minute timescale) to reduce the time to
complete the experiment, improve data quality, and increase the success rate
2.24/7 availability
3.Short burstjobs, needing very short startuptime
Very disruptive for computers that typically host simulations that run for days
4.Storagerepresents significant fraction of the overall system, both in cost and
complexity:
1.1 Pbyte/day fast local buffer, 5-10 Pbytesstorage for local processing, 10 yr
long term offline tape storage
2.60-300 Teraflops in 2020, grow to 1 Pflopin 2022+
5.Throughputbetween storage and processing is critical
Currently most LCLS jobs are I/O limited
6.Speed and flexibility of the development cycleis critical
Wide variety of experiments, with rapid turnaround, and the need to tune data
analysis during experiments
These aspects are instrumental also for other SLAC facilities
(egCryoEM, UED, SSRL, FACET-2)

9
LCLS-II data flow: from data production, to online
reduction, real-time analysis, and offline interpretation
Currently seeking user input on acceptable solutions for data reduction & analysis
Megapixel
detector
X-ray diffraction
image
Intensity map from
multiple pulses
Interpretation of system
structure / dynamics

Critical Requirement for Offsite Resources for LCLS Computing
MIRA
at Argonne
TITAN
at Oak
Ridge
CORI
at NERSC
●Several experiments require access to leading
edge computers for detailed data analysis.
This has its own challenges, in particular:
•Need to transfer huge amounts of
compressed data from SLAC to
supercomputers at other sites
•Providing near real-time
results/feedback to the experiment
•Has to be reliable, production quality
The requirements will need long term agreement between BES and ASCR 10

11
Requirement
•Today 20 Gbpsfrom SLAC to NERSC =>70Gbps 2019
•Experiments increase efficiency & networking
•2020 LCLS-II online, data rate 120Hz=>1MHz
•LCLS-II starts taking data at
increased data rate
•2024 1Tbps:
•Imaging detectors get faster
Moore’s
law

Offsite Data Transfer: Needs and Plans
12
LCLS-II needs are compatible with SLAC and NERSC plans
SLAC plans
LCLS-II
LCLS-I
NERSC plans
ESnet6
upgrade

13
Data Transfer testing
•Zettar/zx:
•Provide HPC data transfer solution (i.e. SW + transfer system
reference design):
-state of the art, efficient, scalable high speed data transfer
•Over carefully selected demonstration hardware
•Today using existing equipment
•DTNs at NERSC & SLAC
•With Lustrefile systems at ends
•Today 100Gbps link
•Using widely used bbcpand xrootdtools
•Currently ~55Gbps, exploring limitations
•Upgrading border to 2*100Gbps in progress

14
Data transfer performance: Test bed: two clusters at
SLAC with 5000 mile link
Space efficient: 6U per cluster
Energy efficient,
<80Gbps

15
NG demonstration
100Gbps
100Gbps
Storage servers
> 25TBytes in 8 SSDs
Data Transfer Nodes (DTNs)
IP over InfiniBand
IPoIB)
4* 56 Gbps
Other cluster
or High speed
Internet
n(2)*100GbE
4* 2 * 25GbE
2x100G
LAG

16
Memory to Memory between clusters with 2*100Gbps
•No storage involved just DTN to DTN mem-to-mem
•Extended locally to 200Gbps
•Here repeated 3 times
•Note uniformity of 8* 25Gbps interfaces.
•Can simply use TCP, no need for exotic proprietary protocols
•Network is not a problem

17
Storage
•On the other hand, file-to-file transfers are at the mercy of
the back-end storage performance.
•Even with generous compute power and network
bandwidth available, the best designed and implemented
data transfer software cannot create any magic with a
slow storage backend

18
XFS READ performance of 8*SSDs in a file server
measured by Unix fioutility
20GBps
15GBps
10GBps
5GBps
0GBps
15:1615:18
Read Throughput
15:18
SSD busy
800%
600%
400%
200%
15:16 15:1615:18
Queue Size
3500
2500
1500
Data size = 5*200GiB files similar to
typical LCLS large file sizes
Note reading SSD busy, uniformity, plenty of objects in queue
yields close to raw throughput available

19
XFS + parallel file system WRITE performance for 16
SSDs in 2 file servers
SSD busy
10GBps
SSD write throughput
50
Queue size of
pending writes
Write much slower than read
File system layers can’t keep queue full (factor
1000 less items queued than for reads)

20
Conclusion
Network is fine, can drive 200Gbps, no need for proprietary
protocols
Insufficient IOPSfor write < 50% of raw capability
-Today limited to 80-90Gbps file transfer
Work with local vendors
•State of art components fail, need fast replacements
•Worst case waited 2 months for parts
Use fastest SSDs for write
•We used Intel DC P3700 NVMe1.6TB drives
•Biggest also fastest but also most expensive
•1.6TB $1677 vs 2.0TB $2655 ; 20% improvement 60% cost
increase
Parallel file system is bottleneck
•Needs enhancing for modern hardware & OS

21
Demonstration PetaBytein < 1.5days at ~70Gbps on
<80Gbps shared link
1/3rd of ESnet's
traffic

22
Impact on all ESnettraffic
12pm 3pm 6pm 9pm Mon 103am 6am 9am
100G
200G
OSCARS LHCONE Other
When running data transfer contributes ~ 1/3 of total ESnettraffic

23
Summary
What is special:
•Scalable. Add more NICs, more DTNs, more storage servers,
links as needed/available…
•Power, & space efficient; low cost
•HA tolerant to loss of components
•Storage tieringfriendly
•Reference designs
•Easy to use, production-ready software
Proposed Future PetaByteClub
A member of the Petabyte ClubMUSTbe an organization that is
capable of using ashared productionpoint-to-point WAN link to
attain aproductiondata transfer rate >=150PiB-mile/hour

24
Future
Upgrade 200Gbps border at SLAC to 2x100Gbps to Esnet –Spring 2018
Then onto SLAC to NERSC
•Very special environment, hard to modify
•Focus has been different to what we need:
•vector computing and communication-intensive problems at expense of faster cpus
-but LCLS embarassinglyparallel
•Not easy to use for cluster computing such as xrootdor zx
•Unless Cray supports porting application -unlikely
•Using the Burst Buffer (BB) is not easy from an application
•BB Will probably limit data transfers performance
•Normally used from batch so challenge for near-realtime
•Can bypass the NERSC DTNs and go directly to the Cori nodes which have direct
access to the BB.
•However DTNs provided common ways to customize, more agile
•Harder to do for Cori nodes
•Complex supercomputer,
•components tend to be behind very latest technology curve
Looking to exceed PB/day by SC2018

Questions