Mathematical Approaches and Algorithms for Data Stream Analysis by Arthur Tabatchnic
DevClub_lv
804 views
31 slides
Sep 03, 2024
Slide 1 of 31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
About This Presentation
Data streams can be challenging to analyze on the fly. Hardware constraints, large volumes of data, and rapid changes in patterns can all lead to difficult problems. In this talk, I will provide an overview of the different types of analyses that can be performed on data streams and the theory behin...
Data streams can be challenging to analyze on the fly. Hardware constraints, large volumes of data, and rapid changes in patterns can all lead to difficult problems. In this talk, I will provide an overview of the different types of analyses that can be performed on data streams and the theory behind some popular algorithms. We will look at three of the most useful groups of tasks: classification, change diagnosis, and distributed data mining.
Arthur is a full-stack developer at Visma Labs, working on a large payroll system
Size: 1.47 MB
Language: en
Added: Sep 03, 2024
Slides: 31 pages
Slide Content
Mathematical Approaches and Algorithms for Data Stream Analysis
Topics C lassification Change diagnosis Distributed data mining
Classification
Problems High speed Unbounded requirements Drift Accuracy vs Efficiency Distributed processing
Very Fast Decision Trees Pros Very Fast Memory usage* Cons Quite drifty
CluStream/On Demand Classification Pros All of them? Distribute friendly Cons None of them?
CluStream/On Demand Classification Continued work DenStream ClusCTA
ANNCAD (Adaptive Nearest Neighbor Classification Algorithm for Data Streams) Pros Good with slow drift Cons Bad for real-time Hungry
Ensemble Based Classification (Wang) Pros Good with drift Cons Bad for real-time Hungry
2. Change diagnosis
Velocity density
Goals Find significant changes Dissolution Coagulation Shift Keep the model up-to-date
Spatial velocity
3. Distributed data mining
Problems Many independent sites of observation Data stream per site Locally non-obvious variable relationships to other sites Performance
Bayesian Network Learning
Steps Learn local BNs Identify key observations Combine observations and learn a non-local BN Communicate updated probabilities back to local sites Obtain a collective BN
BNs Pros Missing data Transparent Cause & effect Causal intervention Resource-friendly Pretty easy Cons Assumptions Assumptions Assumptions