Md. SanzidulIslam
M.Sc. In CSE, DIU
ID: 211-25-953
There is a huge amount of data available in the
InformationIndustry.This data is of no use until itis
converted intousefulinformation.It is necessary to
analyze this huge amount of data and extract useful
information fromit.
Extraction of information is not only the single process,
data mining also involves other processes such as Data
Cleaning, Data Integration, Data Transformation, Data
Mining, Pattern Evaluation and DataPresentation.
Once all these processes are over, we would be able touse
this information in many applications such as Fraud
detection, Market analysis, Science exploration,etc.
What is DataMining?
Why DataMining?
What is KDDProcess?
On What Kind ofData?
Data MiningTechniques
Data Mining QueryLanguage
Applications of DataMining
Extraction of interesting
Patterns or Knowledge
from huge amount ofdata
(KnowledgeDiscovery
fromData)
One of the Step fromKDD
process
The Explosive Growth of Data:from
terabytes topetabytes
We are drowning in data, but starvingfor
knowledge!
Fraud detection and detection of unusual
patterns
Datacleaning
to remove noise and inconsistentdata
Dataintegration
where multiple data sources may becombined
Dataselection
RelatedData
Datatransformation
Unifiedformat
Datamining
ExtractPatterns
Patternevaluation
to identify the truly interestingpatterns
representingknowledge
Knowledgepresentation
Present the mined knowledge to theuser
RelationalDatabases
Collection oftables
DataWarehouses
Data from differentsources
TransactionalDatabases
Consists of a file where each record represent
transactions
Advanced Data &Applications
Multimedia, Spatial data andWWW
Classification is the process of predicting theclass
of a newitem.
Therefore to classify the new item and identify to
which class itbelongs
Group Data into Clusters
Similar data is grouped in the samecluster
Dissimilar data is grouped in the samecluster
“Regression deals with the
prediction of a value, rather
than aclass.”
Regression is a data mining
function that predicts anumber
For example, a regression
model could be used topredict
children's height, given their
age, weight, and other factors.
“An association algorithm creates
rules that describe how often events
have occurredtogether.”
Example: When a customer buysa
Computer, then 90% of the time
they will buysoftwares.
A DMQL can provide the ability to supportinteractive
datamining.
Adopts SQL-likesyntax
Hence, can be easily integrated with relational query
languages
Market BasketAnalysis
Market basket analysis is a modeling technique based upon a theorythat
if you buy a certain group of items you are more likely to buy another
group ofitems.
This information may help the retailer to know the buyer’s needsand
retailer can enhance the store’slayout
BioInformatics
Mining biological data helps to extract useful knowledge frommassive
datasets gathered in biology, and in other related life sciencesareas
Applications of data mining to bioinformaticsinclude
gene finding, protein function inference, disease diagnosis,disease
treatment
Education
Data mining can be used by an institution to take accurate
decisions and also to predict the results of thestudent.
Learning pattern of the students can be captured and usedto
develop techniques to teachthem.
Customer Relationships Management(CRM)
To maintain a proper relationship with a customer a business
need to collect data and analyze theinformation.
With data mining technologies the collected data canbe
used foranalysis.