DATA MINING - CHARACTERISTICS and APPLICATION

126 views 14 slides Jun 28, 2024
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

This slide is all about the basics of Data Mining and Machine Learning. Firstly, it speaks about the Data related things such as what is data, its quality, and its types and attributes. Then we dive into the Data mining part. Basic information of two of the major part in data mining is given, Data M...


Slide Content

D ATA M I N I N G
&
M A C H I N E L E A R N I N G
DAFFODIL INTERNATIONAL UNIVERSITY
Md. Anisur Rahman

Contents
1)Data Mining & Machine Learning
2)Data
3)Exploring data -Visualization
4)Data Mining and ML Techniques
5)Applications
6)Summary

DATA MINING
Data mining is considered the process of extracting useful information from a
vast amount of data. It’s used to discover new, accurate, and useful patterns in the
data, looking for meaning and relevant information.
MACHINE LEARNING
Machine learning is the process of discovering algorithms that have improved
courtesy of experience derived from data. It’s the design, study, and development
of algorithms that permit machines to learn without human intervention.
BothdataminingandmachinelearningfallundertheaegisofData
Science,whichmakessensesincetheybothusedata.Bothprocessesare
usedforsolvingcomplexproblems,soconsequently,manypeople
(erroneously)usethetwotermsinterchangeably.

DATA
Collection of data objects and their attributes.
Acollectionofattributes
describeanobject.
-record,point,case,
sample,entity,orinstance
property or characteristic of an object
-eye color of a person, temperature,
variable, field, characteristic, or feature
TYPES OF ATTRIBUTES
Nominal Order Interval Ratio
zip codes, employee
ID numbers, eye
color,
sex: {male, female}
hardness of minerals,
{good, better, best},
grades,
street numbers
calendar dates,
temperature in
Celsius or Fahrenheit
temperature in Kelvin,
monetary quantities,
counts, age, mass, length,
electrical current

IMPORTANT CHARACTERISTICS OF STRUCTURED DATA
1)Dimensionality
Dimensionalityisbasicallythenumberofcolumnsinadatasetwhichalsocanbecalledthe
attributesofdata.Ifweaddtoomanydimensions,thiscanpotentiallymakethedata
incrediblydifficulttoanalyzebecauseitbecomessodifferent,anddifficulttogrouptogether,
thedatainameaningfulway.
2)Sparsity
Data sparsity is term used for how much data we have for a particular dimension/entity of
the model. Data is considered sparse when certain expected values in a dataset are missing,
which is a common phenomenon in general large scaled data analysis.
3)Resolution
Data resolution means a number of units or digits to which a measured or calculated value is
expressed and used. Patterns depend on the scale; think about weather patterns, rainfall over
a time period.
4)Distribution
Data distributions are used often in statistics. They are graphical methods of organizing and
displaying useful information. There are several types of data distributions. We are familiar
with the symmetrical and skewed distribution

Record
•Data Matrix
•Document Data
•Transaction Data
Graph
•World Wide Web
•Molecular Structures
Order
•Spatial Data
•TemporalData
•Genetic Sequence etc.

DATA QUALITY
Noise and Outliers
•Noise refers to modification of original values
•Outliers are data objects with characteristics that are considerably different than most of the other data
objects in the data set.
Missing Values
•Information is not collected
•Attributes may not be applicable to all cases
•We can handle missing values by eliminating missing values or filling them with statistical approach
Duplicate Data
•Data set may include data objects that are duplicates, or almost duplicates of one another.
•Major issue when merging data from heterogeneous sources.
•Data cleaning can solve the problem for duplication of data.

DATA PREPROCESSING

DATA VISUALIZATION
Datavisualizationisthegraphicalrepresentationofinformationanddata.Byusingvisualelementslikecharts,
graphs,andmaps,datavisualizationtoolsprovideanaccessiblewaytoseeandunderstandtrends,outliers,and
patternsindata.Datavisualizationtoolsandtechnologiesareessentialtoanalyzemassiveamountsof
informationandmakedata-drivendecisions.

TECHNIQUES

Market Based Analysis
Education
Manufacturing Engineering
Research Analysis
Fraud Detection
APPLICATIONS

Market Based Analysis
Digital Midea & Entertainment
Manufacturing & Automobile
E-Commerce & CRM
Healthcare
APPLICATIONS

THANK YOU