a) What is data.
b) types of data.
c) difference between data science and big data and data analytics.
d) relationship between data and artificial intelligence.
Size: 193.54 KB
Language: en
Added: Dec 25, 2017
Slides: 29 pages
Slide Content
Data vs information vs knowledge
Data is unprocessed and raw when we receive data
and add some value to it it becomes information.
When we add our experience to information in a
given context we get something called knowledge
For data to be useful it should be:
Relevant
Timely
Accurate
Available
Types of knowledge in companies
1- available: procedures and guidelines in the company they already exist
2- hidden: specific knowledge that some people in the company hold because of
experience
Artificial intelligence
Artificial general intelligence: is form of intelligence similar or equal to the human
intelligence which will have a language and be able to learn and take decisions
To be able to achieve that scientists should study the human brain and be able to
mimic exactly how it works. All the logical thinking and ability to learn.
What we have today is called weak artificial intelligence
Examples: alfaGo and deepMind the latter was created by google.
AI vs machine learning
Machine learning is a type of artificial intelligence where we create an algorithm
that would learn from given data . we give the system an input and output and it
will write a code to link the two after the code is written we can use it to give an
output when we feed input to the system.
Types of machine learning
Supervised: input and output are provided
Unsupervised : only output is provided
Reinforcement: machine will learn according to the feedback from environment
An decide on actions its typical for automated systems that have to take decision
without human interference example: self driving car at a yellow traffic should the
car accelerate or go slower?
Neural network and deep learning
Neural learning is type of machine learning that is similar to the way humans learn
While deep learning uses adjustable blocks that will control intelligence
Scikit-learn and tensirFlow are a librarys used to create computational graphs for
deep learning the suggested language to use is python after a computational
graph if created we do training (which is an expensive phase) then interface and
evaluation
To overcome the computational expense of the training phase cloud TPU can be
used or customized hardware
Training phase also contains optimization to do that one can use an algorithm
called gradient descent
If the model contains logic for both training and evaluation it is then called an
estimator
There are already trained models to overcome the problem of the computational
expense of training one of them is called v3-inception its used for image
recognition
Data science
the act of exploring data and analysing patterns it to mine out insight that is useful
for the company to know how to act.
Data product: first we use data as input this data is processed to generate results
using an algorithm for example the use of recommendation engines: google or
amazon … etc
It is different from the above because the above used data science to make
business decisions
Data science requisites
1- mathematics: analyse data and the ability to solve problems is essential
Classical statistics, linear algebra and bayesian statistics are important
2- hacking: the ability to create technological solutions and algorithm
3- business: the value of data science is to be able using knowledge and expert to
give the analysis outcomes and insights a meaning
The difference between a data scientist and analyst
Data scientist works in a more fundamental and raw level dealing with algorithms
while analyst takes in insights to turn them into business decisions
Data munging
Turning messy and unstructured data into useful and meaningful set
It needs cleverness pattern recognising and hacking skills
Types of data
1- quantitative: things that can be measured objectively like length width mass
etc...
2- qualitative : characteristics that are subjective like color and taste
Quantitative data
Quantitative data can be classified into continuous and discrete.
Discrete data are counts that can be represented in integers only like the number
of people
Continuous data is data that can be represented in decimal for example mass or
height
Continuous data
It can be used in statistics in hypothesis testing.
This type is the most precise it uses mean median and standard deviation
Sometimes both continuous and discrete testing are used to check if there is
correlation between the two for example using regression we can check if a
continuous measurement has correlation with a discrete measurement
Qualitative data
There are three main kinds:
1- binary: could be either true or false only
2- nomial: when the characteristic can not be ranked or ordered like the color or
gender it uses frequency or percentage
Use chai square analysis
3- ordinal data: categories like small medium and xxlarge it has a rank but
sometimes the gap between may not be equal
Best way to grafically represent each type
1- nominal: pie charts column charts
2- ordinal : bar chart
3-interval : bar chart or histogram
Box plots or line charts
Difference between data analytics, data science and
big data analytics
What is big data?
Its a big amount of data, an amount that cant be stored in a usual computer. This
amount of data cant possibly be processed using usual techniques
This data can be high volume, high velocity or unstructured (high variety) therefore
its processing is costy and it needs innovative solution
What is data analytics?
To use models theories or algorithms to process data.
The result reached depends majorly on the researcher's knowledge or experience
Application of data science
1- recommender system:
This system filters the suggested items for the user according to the users
preference and previous searches or common characteristics.
2- internet search:
Used to refine search results by angines
3- digital advertisement
Applications of big data
1- retail
2- communication
3- financial services
Application of data analytics
1- management of energy: to optimize usage of energy
2- healthcare
3- gaming
4- travel
Bullet points
Data scientists:
●work with unstructured data
●Python is essential
●R is very important
●Learn how to write complex queries using SQL
●They make much more money than the other two
Bullet points
Big data professional
●Math and statistics
●Business and analytical skills
●Creativity
●Technology and computer science skills
●The amount of data is huge
●No sql database like mongoDB
●Programming language like java
Bullet points
Data analyst
●Statistics and math
●Programming skills
●Intuition
●Be able to convert raw data to something meaningful