introDM.ppt introduced the data mining system and

MohsinAli469958 7 views 7 slides Aug 28, 2024
Slide 1
Slide 1 of 7
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7

About This Presentation

Datataggagagagsggagagsggs


Slide Content

Introduction to Data Mining

Why Mine Data? Commercial Viewpoint
•Lots of data is being collected
and warehoused
–Web data, e-commerce
–purchases at department/
grocery stores
–Bank/Credit Card
transactions
•Twice as much information was created in 2002 as in 1999 (~30% growth
rate)
•Other growth rate estimates even higher

Largest databases in WORLD
•Largest database in the world: World Data Centre for Climate
(WDCC) operated by the Max Planck Institute and German
Climate Computing Centre
–220 terabytes of data on climate research and climatic trends,
–110 terabytes worth of climate simulation data.
–6 petabytes worth of additional information stored on tapes.
•AT&T
–323 terabytes of information
–1.9 trillion phone call records
•Google
– 91 million searches per day,
•After a year worth of searches, this figure amounts to more than 33
trillion database entries.

Why Mine Data? Scientific Viewpoint
•Data is collected and stored at
enormous speeds (GB/hour). E.g.
–remote sensors on a satellite
–telescopes scanning the skies
–scientific simulations
generating terabytes of data
•Very little data will ever be looked at
by a human
•Knowledge Discovery is NEEDED
to make sense and use of data.

Data Mining
•Data mining is the process of automatically discovering useful
information in large data repositories.
•Human analysts may take weeks to discover useful information.
•Much of the data is never analyzed at all.
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
1995 1996 1997 1998 1999
The Data Gap
Total new disk (TB) since 1995
Number of
analysts
From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”

What is (not) Data Mining?
 What is Data Mining?

– Certain names are more
prevalent in certain locations
(O’Brien, O’Rurke, O’Reilly…
in Boston area)

–Discover groups of similar
documents on the Web
 What is not Data
Mining?
– Look up phone
number in phone
directory

– Query a Web
search engine for
information about
“Amazon”

•Draws ideas from: machine learning/AI, statistics, and database
systems
Origins of Data Mining
Machine LearningStatistics
Data Mining
Database
systems
Tags