SUSTAINABLE HIGH-SPEED DATA TRANSFER TECHNIQUES IN SHARED NETWORK ENVIRONMENTS

hasibjamil 12 views 64 slides Jul 22, 2024
Slide 1
Slide 1 of 64
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64

About This Presentation

Data Throughput Optimization over shared network with focus on Energy efficiency


Slide Content

‘-
1
Oral Qualifying Exam (OQE)
Candidate: Jamil Hasibul (50426549)
Research advisor: Dr. Tevfik Kosar
Committee members: Dr. Jinjun Xiong and Dr. Lukasz Ziarek
SUSTAINABLE HIGH-SPEED DATA
TRANSFER TECHNIQUES IN SHARED
NETWORK ENVIRONMENTS

‘-
2
Table of contents
•Identification of research problem
•Overview of research goal
•Proposed approach
•Preliminary results & List of publications
•Review of existing literature

‘-
3
Identification of research problem
AWS snowmobile data transfer service uses
Semi-truck To move 100 PB per Snowmobile
Massive Distributed Data
•Everyday services, encompassing both scientific and
commercial applications, are increasingly data-
intensive.
•In 2024 the global Internet traffic volume is projected
to exceed 256 EB per month.
•Data sources are distributed and needs to be moved to
be processed.
•100 Gb/s between many DOE’s ESnet facilities, and
future deployments will likely support 400 Gb/s
followed by 1 Tb/s.
•Legacy data transfer solutions often struggle with
achieving high performance, generate excessive
overhead, or result in unfair resource allocation.

‘-
4
Identification of research problem
AWS snowmobile data transfer service uses
Semi-truck To move 100 PB per Snowmobile
We need high performance, fair and robust
data transfer solution-- Objective 1Massive Distributed Data
•Everyday services, encompassing both scientific and
commercial applications, are increasingly data-
intensive.
•In 2024 the global Internet traffic volume is projected
to exceed 256 EB per month.
•Data sources are distributed and needs to be moved to
be processed.
•100 Gb/s between many DOE’s ESnet facilities, and
future deployments will likely support 400 Gb/s
followed by 1 Tb/s.
•Legacy data transfer solutions often struggle with
achieving high performance, generate excessive
overhead, or result in unfair resource allocation.

‘-
5
Identification of research problem
Energy Footprint for Data Movement
•Internet comprises more than 10% of the overall energy consumption in many countries and cost
global economy 50+ billions of dollar per year.
•Telecommunication networks’ energy consumption exceeds 350 terawatt-hours.
Figure taken from Kosar, Tevfik (2020): GreenDataFlow: Minimizing Energy Footprint of Global Data Movement - Talk. figshare. Presentation. https://doi.org/10.6084/m9.figshare.11803893.v3
An extensive portion of the existing research on
Energy efficient networking focuses on reducing
energy consumption in the core networking
infrastructure
(e.g., switches, hubs, middleware and routers).

‘-
6
Identification of research problem
Energy Footprint for Data Movement
•Internet comprises more than 10% of the overall energy consumption in many countries and cost
global economy 50+ billions of dollar per year.
•Telecommunication networks’ energy consumption exceeds 350 terawatt-hours.
Figure taken from Kosar, Tevfik (2020): GreenDataFlow: Minimizing Energy Footprint of Global Data Movement - Talk. figshare. Presentation. https://doi.org/10.6084/m9.figshare.11803893.v3
We need energy efficient data transfer solutions
Objective 2
An extensive portion of the existing research on
Energy efficient networking focuses on reducing
energy consumption in the core networking
infrastructure
(e.g., switches, hubs, middleware and routers).

‘-
7
0 5 10 15 20 25 30 35
Pipelining
4000
4500
5000
5500
6000
6500
7000
7500
Throughput (Gbps)
Achieved throughut for different pipelining when concurrency (9) and parallelism (32) fixed
Background traffic 1
Background traffic 2
Background traffic 3
(Mbps)
Network Dynamics in a Shared Environments

‘-
8
0 5 10 15 20 25 30 35
Pipelining
4000
4500
5000
5500
6000
6500
7000
7500
Throughput (Gbps)
Achieved throughut for different pipelining when concurrency (9) and parallelism (32) fixed
Background traffic 1
Background traffic 2
Background traffic 3
(Mbps)
Network Dynamics in a Shared Environments

‘-
9
0 5 10 15 20 25 30 35
Pipelining
4000
4500
5000
5500
6000
6500
7000
7500
Throughput (Gbps)
Achieved throughut for different pipelining when concurrency (9) and parallelism (32) fixed
Background traffic 1
Background traffic 2
Background traffic 3
(Mbps)
Network Dynamics in a Shared Environments

‘-
10
0 5 10 15 20 25 30 35
Pipelining
4000
4500
5000
5500
6000
6500
7000
7500
Throughput (Gbps)
Achieved throughut for different pipelining when concurrency (9) and parallelism (32) fixed
Background traffic 1
Background traffic 2
Background traffic 3
(Mbps)
Network Dynamics in a Shared Environments

‘-
11
Research Questions
1.How to maximize data transfer throughput within an energy budget?
2.How to minimize energy consumption for the data transfer application within the throughput
performance boundary?
3.How can we dynamically adjust the transfer parameters with changing network condition to
maintain desired and stable performance?

‘-
12
Research Goal
•Complete application-level solution build on existing kernel and network stack
•Tuning cross layer (application and kernel) parameters
oPipelining
oParallelism
oConcurrency
oNumber of CPU cores
oCPU Frequency
•Effective utilization of the existing network infrastructure to maximize throughput
•Minimize energy consumption in end systems during data movement
through application and kernel level tuning

‘-
13
Research Goal
•Complete application-level solution build on existing kernel and network stack
•Tuning cross layer (application and kernel) parameters
oPipelining
oParallelism
oConcurrency
oNumber of CPU cores
oCPU Frequency
•Effective utilization of the existing network infrastructure to maximize throughput
•Minimize energy consumption in end systems during data movement
through application and kernel level tuning

‘-
14
Research Goal
•Complete application-level solution build on existing kernel and network stack
•Tuning cross layer (application and kernel) parameters
oPipelining
oParallelism
oConcurrency
oNumber of CPU cores
oCPU Frequency
•Effective utilization of the existing network infrastructure to maximize throughput
•Minimize energy consumption in end systems during data movement
through application and kernel level tuning
Application-level parameters

‘-
15
Research Goal
•Complete application-level solution build on existing kernel and network stack
•Tuning cross layer (application and kernel) parameters
oPipelining
oParallelism
oConcurrency
oNumber of CPU cores
oCPU Frequency
•Effective utilization of the existing network infrastructure to maximize throughput
•Minimize energy consumption in end systems during data movement
through application and kernel level tuning
Application-level parameters
kernel-level parameters

‘-
16
Research Goal
•Complete application-level solution build on existing kernel and network stack
•Tuning cross layer (application and kernel) parameters
oPipelining
oParallelism
oConcurrency
oNumber of CPU cores
oCPU Frequency
Cross-layer parameter tuning for
energy efficient throughput optimization
•Effective utilization of the existing network infrastructure to maximize throughput
•Minimize energy consumption in end systems during data movement
through application and kernel level tuning
Application-level parameters
kernel-level parameters

‘-
18
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
Approach 1 (Offline Analysis)

‘-
19
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Approach 1 (Offline Analysis)

‘-
20
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
SLA
Approach 1 (Offline Analysis)

‘-
21
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuningDestination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
SLA
Approach 1 (Offline Analysis)

‘-
22
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuningDestination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Query
Params
SLA
Approach 1 (Offline Analysis)

‘-
23
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
Start transfer
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Query
Params
SLA
Approach 1 (Offline Analysis)

‘-
24
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
Start transfer
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Query
Params
Feedback
SLA
Approach 1 (Offline Analysis)

‘-
25
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
Start transfer
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Query
Params
Feedback
SLA
Approach 1 (Offline Analysis)

‘-
26
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
Start transfer
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Feedback
SLA
Approach 1 (Offline Analysis)

‘-
27
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
Start transfer
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Feedback
Updated Params
Updated Query
SLA
Approach 1 (Offline Analysis)

‘-
28
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
continue transfer
With updated params
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Feedback
Updated Params
Updated Query
SLA
Approach 1 (Offline Analysis)

‘-
29
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
continue transfer
With updated params
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Feedback
Updated Params
Updated Query
SLA
Approach 1 (Offline Analysis)

‘-
30
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
continue transfer
With updated params
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Feedback
Updated Params
Transfer Finished
Updated Query
SLA
Approach 1 (Offline Analysis)

‘-
31
•Introduces ensemble method to reduce
“uncertainty” during offline analysis
•Outperforms state-of-the-art solutions (i.e., its closest
competitor 117% on throughput and 19% less energy)
•Combines offline knowledge
discovery with online adaptive tuning
continue transfer
With updated params
Destination
Source
User
Dynamic Tuning Module
Store results
as (key, value) pairHistorical
Log Analysis Server
Algorithm 2:
Optimal parameter discovery algorithm
Offline Analysis
•Decision search tree for
cluster and group logs
•Search tree traversing
•Energy and Throughput
Surface construction from
matched node’s logs
•Finding optimal parameters
Based on SLA
Start periodic
analysis
Feedback
Updated Params
Append new LogsUpdated Query
SLA
Approach 1 (Offline Analysis)

‘-
32
Results High BDP Network-Chameleon
Throughput (Mbps) ResultsEnergy Consumption (Joules) Results

‘-
33
Results High BDP Network-Chameleon
Throughput (Mbps) ResultsEnergy Consumption (Joules) Results

‘-
34
Results High BDP Network-Chameleon
Throughput (Mbps) ResultsEnergy Consumption (Joules) Results
2X -2.5X
improvement

‘-
35
Results High BDP Network-Chameleon
Throughput (Mbps) ResultsEnergy Consumption (Joules) Results
2X -2.5X
improvement

‘-
36
Results High BDP Network-Chameleon
Throughput (Mbps) ResultsEnergy Consumption (Joules) Results
2X -2.5X
improvement
36%
Less energy

‘-
37
Approach 2 (Online Optimization)
DestinationSource
DRL agent
(Proximal Policy Optimization)
State (st ) : Only attributes observable in Source system at time t
Reward (rt ) : Reward is calculated from selective state
variables using a utility function

Action (at ) :
•Add
•remove
•or do nothing
for the total number
Of TCP streams•x, if Ut − Ut−1 > ϵ
•y, if Ut − Ut−1 < −ϵ
•0, otherwise

‘-
38
Under the hood
State:
The ratio of the current MI’s mean
RTT to minimum observed mean RTT
of any MI in the connection’s history
t
t-2t-k
t-1
Monitor Interval
RTT gradient
RTT ratio
packet loss rate
average throughput
Action:
+5
+1
0
-1
-5
Current
# of TCP
streams
Every measuring interval (MI)
The number of TCP streams
Are adjusted
history length

‘-
39
Results (Finding Optimal point )
We compare the performance of our solution with two other online optimization
algorithms: Gradient Descent (GD) and Bayesian Optimizer (BO).

‘-
40
Evaluation Results (Fairness)
Dynamic throughput
while multiple
transfers share the
same network
resource.
The instantaneous
Jain fairness index is
depicted for all three
algorithms when all
three transfers for
each algorithm occur
simultaneously on the
shared network.

‘-
41
•RL_Fair:
•Function measures the balance of resource allocation
across nodes.
•Ensures equitable bandwidth distribution to prevent
network congestion.
•RL_Energy:
•Calculates the energy consumption of data transmission.
•Aims to minimize the power used while maintaining
network performance.
•RL_Throughput:
•Assesses the data rate successfully delivered over the
network.
•Focuses on maximizing the speed and efficiency of data
transfer.
•RL_EnergyEfficiency:
•Balances throughput with energy usage to optimize for
both performance and sustainability.
•Encourages solutions that provide high data rates with
lower energy costs.
TACC to UC -- Chameleon
TACC to Utah -- interCloud
Evaluation Results (Energy)

‘-
42
Future plan
•Combining offline and online optimization
•Multi-parameter optimization
•Parallel training in multi-objective framework
•Agent training to generalization across different networks

‘-
43
List of publications
[1] 2024 IEEE TPDS “Learning based Techniques for Sustainable High-Speed Data Transfer Over Shared Network”
H Jamil, E Rodrigues, J Glodverg, T Kosar (In progress) --This work consist multiparameter optimization along with
different SLA based rewards and focus on generalizing for different network.
[2] 2023 IEEE GLOBECOM: ” Learning to Maximize Network Bandwidth Utilization with Deep Reinforcement
Learning” H Jamil, E Rodrigues, J Glodverg, T Kosar (Published) --Online optimization for a single parameter and focuses
Mainly of fairness along with throughput.
[3] SC 2023 - INDIS : ” Throughput Optimization with a NUMA-Aware Runtime System for Efficient Scientific
Data Streaming”H Jamil, J Chung, T Bicer, T Kosar, R Kettimuthu (Published) -- Data Streaming with a focus on maximal
utilization of modern processor architecture to ensure higher throughput.
[4] 2022 IEEE ICCCN: ”Energy-Efficient Data Transfer Optimization via Decision-Tree Based Uncertainty Reduction”
H Jamil,L Rodolph, J Glodverg, T Kosar (Published) -- Offline cross layer optimization that consider both end system
energy and throughput based on historical data logs.

‘-
44
Study and analysis of existing literature
[1] Singh, R., Agarwal, S., Calder, M. and Bahl, P., 2021. Cost-effective cloud edge traffic engineering with CASCARA.
In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)
[2] Tseng, S.H., Agarwal, S., Agarwal, R., Ballani, H. and Tang, A., 2021. {CodedBulk}:{Inter-Datacenter} Bulk Transfers
using Network Coding. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)
[3] Jain, P., Kumar, S., Wooders, S., Patil, S. G., Gonzalez, J. E., & Stoica, I. (2023). Skyplane: Optimizing transfer cost
and throughput using {Cloud-Aware} overlays. NSDI 23
[4] El-Zahr, S., Gunning, P. and Zilberman, N., 2023. Exploring the Benefits of Carbon-Aware Routing.
Proceedings of the ACM on Networking (CoNEXT3)
[5] Qureshi, Z., Mailthody, V.S., Gelado, I., Min, S., Masood, A., Park, J., Xiong, J., Newburn, C.J., Vainbrand, D.,
Chung, I.H. and Garland, M., GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System
Architecture. ASPLOS 2023
[6] Pope, R., Douglas, S., Chowdhery, A., Devlin, J., Bradbury, J., Heek, J., Xiao, K., Agrawal, S. and Dean, J., 2023.
Efficiently scaling transformer inference.Proceedings of Machine Learning and Systems,5.

‘-
45
Cost-effective Cloud Edge Traffic Engineering with Cascara
Introduction & Scope
Background: CASCARA addresses the challenge of optimizing inter-domain
bandwidth costs while ensuring minimal impact on client latency.
Importance: Authors claim to save 11–50% in bandwidth costs per cloud PoP
without exceeding a 3ms increase in client latency.
Methodology and Insights
Approach: Utilize latency-equivalent peer links at the cloud edge and a novel traffic
engineering framework containing offline analysis inspired heuristic for online optimization
Key Insights: Significant cost savings potential across traffic patterns and peering rates, showing
robustness to dynamic traffic demands.
Results and Analysis
Results: Authors claim cost savings 10% of the optimal with incremental deployment potential.
Analysis: Authors claim generalizability for larger cloud providers but didn’t show how their heuristic
Solution could be generalizable across different cloud provider and their peer ISP.
Limitations: Authors focus on North American peer ISPs but with global peer and dynamic traffic nature, how
CASCARA will perform specially with a heuristic from data of only one cloud provider isn’t investigated.

‘-
46
CodedBulk: Inter-Datacenter Bulk Transfers using Network Coding
Introduction & Scope
Background:CodedBulkuses network coding for optimal throughput in bulk
data transfers, addressing the asymmetric link problem and enable high utilization of
available link bandwidth without infrastructure overhaul .
Importance: Authors claim 1.2−2.5× throughput enhancement over state-of-the-art
non-coded methods in real-world inter-datacenter networks
Methodology
•Network Coding: Transforms data packets at intermediate nodes, maximizing bandwidth utilization and throughput.
•Hop-by-Hop Flow Control: Custom flow control mechanism ensures efficient and deadlock-free operation across
dynamic network conditions, addressing the asymmetric link problem.
Analysis
•Centralized Control Requirement: Assumes complete visibility and control over the network topology, challenging in multi-entity
managed networks.
•Intermediate Computation: Requires routers to perform data computations, diverging from their traditional forwarding-focused design.
•End-to-End Principle Violation: Moves away from the simplicity of the end-to-end principle, violating network management and security.
•Modest Performance Gains: Offers 1.2−2.5×throughput improvement, modest against the backdrop of potential infrastructure complexity.
•Evaluation Throughput vs. High-Speed Networks: Evaluated at a maximum of 800Mbps, not addressing scalability in networks
operating at 100Gbps or 400Gbps.
•Unexplained Throughput Discrepancy: Theoretical support for 31Gbps on a single machine, but limited evaluation to 800Mbps without
explanation.

‘-
47
Skyplane: Cloud-Aware Overlays for Optimized Data Transfer

Introduction & Scope
•Background: Addressing the need for fast, cost-effective bulk data transfers across
multi-region and multi-cloud environments.
•Importance: For optimized data transfers between cloud object stores. By utilizing
cloud-aware network overlays to balance transfer cost against throughput, addressing
challenges of traditional network overlays.
Methodology
•Skyplaneemploys mixed-integer linear programming to identify optimal paths and resource allocations for data transfers, considering
user constraints.
•It introduces cloud-aware overlays with Parallel TCP connections and multiple VMs that navigate between direct and indirect paths
to optimize for cost and speed
Analysis and Takeaways
•Skyplanesignificantly outperforms existing cloud transfer tools, achieving up to 4.6×faster transfers within the same cloud and
5.0×across different clouds.
•Offers a flexible approach that allows users to balance cost and performance according to their specific needs.
•Demonstrates the efficacy of cloud-aware overlays in leveraging cloud elasticity and pricing differences to optimize data transfer
rates and costs.
•The system does not consider varying object sizes affecting throughput, potentially leading to non-optimal data transfer performance.
•Lack of details on the optimizer's runtime complicates its application in environments with fluctuating egress prices and bandwidth.

‘-
48
Exploring the Benefits of Carbon-Aware Routing

Introduction & Scope
•Investigates carbon-aware networks to reduce scope 2 carbon
emissions in wired networks.
•Proposes a new carbon-aware TE algorithm (CATE),
utilizing real network topologies and carbon intensity data.
•Aims for environmental sustainability in ICT by minimizing
carbon emissions without altering existing router hardware.
Methodology
•Integrates energy and carbon-related metrics into link costs for routing decisions.
•CATE algorithm optimizes network carbon emissions by considering traffic patterns, carbon intensity, and dynamic power consumption.
•Utilizes simulations with realistic WAN topologies and traffic patterns for evaluation.
Analysis and Takeaways
•No "silver bullet" for significant carbon reduction; however, certain strategies promise minor improvements without hardware
changes.
•Demonstrates potential carbon savings by altering traffic flows, especially during low-carbon intensity periods.
•Limited by static router power models and assumptions; does not significantly address diverse and dynamic
real-world network conditions. The impact on overall network performance, including latency and packet loss, under varied
metrics, was not comprehensively assessed.

‘-
49
GPU-Initiated On-Demand High-Throughput Storage Access with BaM

Introduction & Scope
•Addresses the bottleneck of CPU-initiated storage access in GPU applications with
data-dependent access patterns.
•Introduces BaM(Big accelerator Memory), optimizing for
applications like graph analytics, where datasets range from GBs to
TBs, far exceeding GPU memory capacities.
Methodology
•BaMfeatures a fine-grained software cache, enabling GPU threads to make on-demand storage requests, coalescing redundant
accesses, and maximizing storage and interconnect utilization.
•Employs a user-level GPU library with concurrent submission/completion queues, reducing CPU-GPU sync overheads and I/O
traffic amplification.
Analysis and Takeaways
•Demonstrates end-to-end speed up to 1.49×for graph analytics benchmarks, reducing hardware costs by up to 21.7×compared to
host memory solutions.
•While BaMaddresses GPU storage access latency and increases throughput, the solution's real-world scalability, handling of
varied SSD technologies, and system crash robustness warrant further investigation.

‘-
50
Efficiently Scaling Transformer Inference

Introduction & Scope
•Examines generative inference for large Transformer models with high parameters,
focusing on deep models with long sequences and tight latency targets.
•Key challenges include managing large memory footprints and balancing
parallelization with latency targets for interactive applications like chatbots.
Methodology
•Develops an abstract partitioning framework to maximize model parallel scaling, tailoring strategies for specific model sizes,
sequence lengths, and hardware setups.
•Employs a suite of low-level optimizations, including weight quantization and efficient collective operations, to maximize
throughput and minimize latency.
Analysis and Takeaways
•Achieves significant efficiency improvements, reducing per-token generation latency to 29ms and achieving 76% Model FLOPS
Utilization (MFU) for 500B+ parameter models.
•Introduces multiqueryattention, allowing up to 32×larger context lengths, and shows potential for wide applicability in diverse
computing environments.
•Addresses the complexity of model partitioning strategies, providing an engineering framework to navigate the trade-offs between
latency, throughput, and hardware scalability.

‘-
51
Thank you and Questions

‘-
52
Appendix 1

‘-
53
Clustering Logs with Decision Search Tree

‘-
54
Clustering Logs with Decision Search Tree
N0
Decision Search Tree Construction

‘-
55
Clustering Logs with Decision Search Tree
N0N0
N1N2
Decision Search Tree Construction

‘-
56
Clustering Logs with Decision Search Tree
N0N0
N1N2
Decision Search Tree Construction
What attribute to cut, how many pieces to cut?
•User Diversity Index and Standard Dimension to rank
attributes and choose the highest rank attribute to cut a
current node

‘-
57
Clustering Logs with Decision Search Tree
N0N0
N1N2N0
N1N2
N3N4N5
Decision Search Tree Construction
What attribute to cut, how many pieces to cut?
•User Diversity Index and Standard Dimension to rank
attributes and choose the highest rank attribute to cut a
current node

‘-
58
Clustering Logs with Decision Search Tree
N0N0
N1N2N0
N1N2
N3N4N5
Decision Search Tree Construction
What attribute to cut, how many pieces to cut?
•User Diversity Index and Standard Dimension to rank
attributes and choose the highest rank attribute to cut a
current node

‘-
59
Clustering Logs with Decision Search Tree
N0N0
N1N2N0
N1N2
N3N4N5
Decision Search Tree Construction
What attribute to cut, how many pieces to cut?
•User Diversity Index and Standard Dimension to rank
attributes and choose the highest rank attribute to cut a
current node
Band of Trees
!!

‘-
60
Cascara

‘-
61
CodeBulk

‘-
62
skyPlane

‘-
63
CATE

‘-
64
BaM

‘-
65
TRANSFORMER