Introduction to Data Science Introduction to Data Science .pptx

Nishant83346 57 views 13 slides Jun 10, 2024
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Introduction to Data Science


Slide Content

INTRODUCTION TO DATA SCIENCE AND MACHINE LAERNING

Agenda

Data: 3V’s 3

Volume (Scale) Data Volume 44x increase from 2009 2020 From 0.8 zettabytes to 35zb Data volume is increasing exponentially 4 Exponential increase in collected/generated data

Variety (Complexity) Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can only scan the data once A single application can be generating/collecting many types of data Big Public Data (online, weather, finance, etc ) 5 To extract knowledge  all these types of data need to linked together

Velocity (Speed) Data is begin generated fast and need to be processed fast Online Data Analytics Late decisions  missing opportunities Examples E-Promotions: Based on your current location, your purchase history, what you like  send promotions right now for store next to you Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction 6

AI, Machine Learning, and Deep Learning Image from https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/. AI : Getting machines to do what humans are good at Machine Learning : Feeding an algorithm data to learn and predict something Deep Learning : A type of machine learning

Data Science: Solving Problems with Data Understanding of the underlying assumptions Algorithms and nume ri c al techniques to derive insights H A C K I NG SKILLS MATH AND STATISTICS KN O W LE DG E D an g e r Zone! T r a d i t i o n a l Research Mac h i n e L ea r nin g DATA SC I EN CE S UB S T A N T I VE EXPERIENCE Computer science, data engineering and wrangling, coding Domain knowledge, business acumen, experience, value to the business

12+ TBs of tweet data every day 25+ TBs of log data every day ? TBs of data every day

A Single View to the Customer Customer Social Media Gaming Entertain Banking Finance Our Known History Purchase

Real-time/Fast Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 11

The Data Science Process: Getting from Raw Data to Outcomes Joe Blizstein and Hanspeter Pfister created for Harvard Data Science course. Formal Framework CRISP–DM Cross Industry Standard Process for Data Mining The Data Science Workflow

Thanks
Tags