Introduction to data science

HibaAkroush 112 views 29 slides Dec 25, 2017
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

a) What is data.
b) types of data.
c) difference between data science and big data and data analytics.
d) relationship between data and artificial intelligence.


Slide Content

Data vs information vs knowledge
Data is unprocessed and raw when we receive data
and add some value to it it becomes information.
When we add our experience to information in a
given context we get something called knowledge

For data to be useful it should be:
Relevant
Timely
Accurate
Available

Types of knowledge in companies
1- available: procedures and guidelines in the company they already exist



2- hidden: specific knowledge that some people in the company hold because of
experience

Artificial intelligence
Artificial general intelligence: is form of intelligence similar or equal to the human
intelligence which will have a language and be able to learn and take decisions

To be able to achieve that scientists should study the human brain and be able to
mimic exactly how it works. All the logical thinking and ability to learn.

What we have today is called weak artificial intelligence
Examples: alfaGo and deepMind the latter was created by google.

AI vs machine learning
Machine learning is a type of artificial intelligence where we create an algorithm
that would learn from given data . we give the system an input and output and it
will write a code to link the two after the code is written we can use it to give an
output when we feed input to the system.

Types of machine learning
Supervised: input and output are provided

Unsupervised : only output is provided


Reinforcement: machine will learn according to the feedback from environment
An decide on actions its typical for automated systems that have to take decision
without human interference example: self driving car at a yellow traffic should the
car accelerate or go slower?

Neural network and deep learning
Neural learning is type of machine learning that is similar to the way humans learn

While deep learning uses adjustable blocks that will control intelligence
Scikit-learn and tensirFlow are a librarys used to create computational graphs for
deep learning the suggested language to use is python after a computational
graph if created we do training (which is an expensive phase) then interface and
evaluation
To overcome the computational expense of the training phase cloud TPU can be
used or customized hardware

Training phase also contains optimization to do that one can use an algorithm
called gradient descent

If the model contains logic for both training and evaluation it is then called an
estimator

There are already trained models to overcome the problem of the computational
expense of training one of them is called v3-inception its used for image
recognition

Data science
the act of exploring data and analysing patterns it to mine out insight that is useful
for the company to know how to act.

Data product: first we use data as input this data is processed to generate results
using an algorithm for example the use of recommendation engines: google or
amazon … etc
It is different from the above because the above used data science to make
business decisions

Data science requisites
1- mathematics: analyse data and the ability to solve problems is essential
Classical statistics, linear algebra and bayesian statistics are important
2- hacking: the ability to create technological solutions and algorithm
3- business: the value of data science is to be able using knowledge and expert to
give the analysis outcomes and insights a meaning

The difference between a data scientist and analyst
Data scientist works in a more fundamental and raw level dealing with algorithms
while analyst takes in insights to turn them into business decisions

Data munging
Turning messy and unstructured data into useful and meaningful set
It needs cleverness pattern recognising and hacking skills

Types of data
1- quantitative: things that can be measured objectively like length width mass
etc...


2- qualitative : characteristics that are subjective like color and taste

Quantitative data
Quantitative data can be classified into continuous and discrete.

Discrete data are counts that can be represented in integers only like the number
of people

Continuous data is data that can be represented in decimal for example mass or
height

Continuous data
It can be used in statistics in hypothesis testing.
This type is the most precise it uses mean median and standard deviation
Sometimes both continuous and discrete testing are used to check if there is
correlation between the two for example using regression we can check if a
continuous measurement has correlation with a discrete measurement

Qualitative data
There are three main kinds:
1- binary: could be either true or false only

2- nomial: when the characteristic can not be ranked or ordered like the color or
gender it uses frequency or percentage
Use chai square analysis

3- ordinal data: categories like small medium and xxlarge it has a rank but
sometimes the gap between may not be equal

Best way to grafically represent each type
1- nominal: pie charts column charts

2- ordinal : bar chart

3-interval : bar chart or histogram
Box plots or line charts

Difference between data analytics, data science and
big data analytics
What is big data?
Its a big amount of data, an amount that cant be stored in a usual computer. This
amount of data cant possibly be processed using usual techniques

This data can be high volume, high velocity or unstructured (high variety) therefore
its processing is costy and it needs innovative solution

What is data analytics?
To use models theories or algorithms to process data.
The result reached depends majorly on the researcher's knowledge or experience

Application of data science
1- recommender system:
This system filters the suggested items for the user according to the users
preference and previous searches or common characteristics.

2- internet search:
Used to refine search results by angines

3- digital advertisement

Applications of big data
1- retail
2- communication
3- financial services

Application of data analytics
1- management of energy: to optimize usage of energy
2- healthcare
3- gaming
4- travel

Bullet points
Data scientists:
●work with unstructured data
●Python is essential
●R is very important
●Learn how to write complex queries using SQL
●They make much more money than the other two

Bullet points

Big data professional
●Math and statistics
●Business and analytical skills
●Creativity
●Technology and computer science skills
●The amount of data is huge
●No sql database like mongoDB
●Programming language like java

Bullet points

Data analyst
●Statistics and math
●Programming skills
●Intuition
●Be able to convert raw data to something meaningful

resources 1:
http://searchdatamanagement.techtarget.com/feature/Defining-data-information-an
d-knowledge

https://hackernoon.com/understanding-understanding-an-intro-to-artificial-intellige
nce-be76c5ec4d2e

https://datajobs.com/what-is-data-science

Resources 2:
http://blog.minitab.com/blog/understanding-statistics/understanding-qualitative-qua
ntitative-attribute-discrete-and-continuous-data-types

https://www.google.com/url?hl=en&q=http://www.digitalvidya.com/blog/data-analyti
cs-vs-big-data-vs-data-science-difference/&source=gmail&ust=151430965343200
0&usg=AFQjCNEyWvBIA9EIBXWpLfdgJ3J8e8Wifg