This slide is all about the basics of Data Mining and Machine Learning. Firstly, it speaks about the Data related things such as what is data, its quality, and its types and attributes. Then we dive into the Data mining part. Basic information of two of the major part in data mining is given, Data M...
This slide is all about the basics of Data Mining and Machine Learning. Firstly, it speaks about the Data related things such as what is data, its quality, and its types and attributes. Then we dive into the Data mining part. Basic information of two of the major part in data mining is given, Data Mining and Data Preprocessing. Then we discussed about the Data mining techniques and its application. At last the slide gives us a full overview of how data mining works from start to end.
Size: 1.12 MB
Language: en
Added: Jun 28, 2024
Slides: 14 pages
Slide Content
D ATA M I N I N G
&
M A C H I N E L E A R N I N G
DAFFODIL INTERNATIONAL UNIVERSITY
Md. Anisur Rahman
Contents
1)Data Mining & Machine Learning
2)Data
3)Exploring data -Visualization
4)Data Mining and ML Techniques
5)Applications
6)Summary
DATA MINING
Data mining is considered the process of extracting useful information from a
vast amount of data. It’s used to discover new, accurate, and useful patterns in the
data, looking for meaning and relevant information.
MACHINE LEARNING
Machine learning is the process of discovering algorithms that have improved
courtesy of experience derived from data. It’s the design, study, and development
of algorithms that permit machines to learn without human intervention.
BothdataminingandmachinelearningfallundertheaegisofData
Science,whichmakessensesincetheybothusedata.Bothprocessesare
usedforsolvingcomplexproblems,soconsequently,manypeople
(erroneously)usethetwotermsinterchangeably.
DATA
Collection of data objects and their attributes.
Acollectionofattributes
describeanobject.
-record,point,case,
sample,entity,orinstance
property or characteristic of an object
-eye color of a person, temperature,
variable, field, characteristic, or feature
TYPES OF ATTRIBUTES
Nominal Order Interval Ratio
zip codes, employee
ID numbers, eye
color,
sex: {male, female}
hardness of minerals,
{good, better, best},
grades,
street numbers
calendar dates,
temperature in
Celsius or Fahrenheit
temperature in Kelvin,
monetary quantities,
counts, age, mass, length,
electrical current
IMPORTANT CHARACTERISTICS OF STRUCTURED DATA
1)Dimensionality
Dimensionalityisbasicallythenumberofcolumnsinadatasetwhichalsocanbecalledthe
attributesofdata.Ifweaddtoomanydimensions,thiscanpotentiallymakethedata
incrediblydifficulttoanalyzebecauseitbecomessodifferent,anddifficulttogrouptogether,
thedatainameaningfulway.
2)Sparsity
Data sparsity is term used for how much data we have for a particular dimension/entity of
the model. Data is considered sparse when certain expected values in a dataset are missing,
which is a common phenomenon in general large scaled data analysis.
3)Resolution
Data resolution means a number of units or digits to which a measured or calculated value is
expressed and used. Patterns depend on the scale; think about weather patterns, rainfall over
a time period.
4)Distribution
Data distributions are used often in statistics. They are graphical methods of organizing and
displaying useful information. There are several types of data distributions. We are familiar
with the symmetrical and skewed distribution
Record
•Data Matrix
•Document Data
•Transaction Data
Graph
•World Wide Web
•Molecular Structures
Order
•Spatial Data
•TemporalData
•Genetic Sequence etc.
DATA QUALITY
Noise and Outliers
•Noise refers to modification of original values
•Outliers are data objects with characteristics that are considerably different than most of the other data
objects in the data set.
Missing Values
•Information is not collected
•Attributes may not be applicable to all cases
•We can handle missing values by eliminating missing values or filling them with statistical approach
Duplicate Data
•Data set may include data objects that are duplicates, or almost duplicates of one another.
•Major issue when merging data from heterogeneous sources.
•Data cleaning can solve the problem for duplication of data.
DATA PREPROCESSING
DATA VISUALIZATION
Datavisualizationisthegraphicalrepresentationofinformationanddata.Byusingvisualelementslikecharts,
graphs,andmaps,datavisualizationtoolsprovideanaccessiblewaytoseeandunderstandtrends,outliers,and
patternsindata.Datavisualizationtoolsandtechnologiesareessentialtoanalyzemassiveamountsof
informationandmakedata-drivendecisions.
TECHNIQUES
Market Based Analysis
Education
Manufacturing Engineering
Research Analysis
Fraud Detection
APPLICATIONS
Market Based Analysis
Digital Midea & Entertainment
Manufacturing & Automobile
E-Commerce & CRM
Healthcare
APPLICATIONS