dewashishpradhan010
12 views
45 slides
Oct 19, 2024
Slide 1 of 45
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
About This Presentation
Text and Web Mining
Size: 481.79 KB
Language: en
Added: Oct 19, 2024
Slides: 45 pages
Slide Content
Decision Support and Business Decision Support and Business
Intelligence SystemsIntelligence Systems
(9(9
thth
Ed., Prentice Hall) Ed., Prentice Hall)
Chapter 7:Chapter 7:
Text and Web MiningText and Web Mining
Statements
Transcribed for
Processing
Text Processing
Software Identified
Cues in Statements
Statements Labeled as
Truthful or Deceptive
By Law Enforcement
Text Processing
Software Generated
Quantified Cues
Classification Models
Trained and Tested on
Quantified Cues
Cues Extracted &
Selected
Establish the Corpus:
Collect & Organize the
Domain Specific
Unstructured Data
Create the Term-
Document Matrix:
Introduce Structure
to the Corpus
Extract Knowledge:
Discover Novel
Patterns from the
T-D Matrix
The inputs to the process
includes a variety of relevant
unstructured (and semi-
structured) data sources such
as text, XML, HTML, etc.
The output of the Task 1 is a
collection of documents in
some digitized format for
computer processing
The output of the Task 2 is a
flat file called term-document
matrix where the cells are
populated with the term
frequencies
The output of Task 3 is a
number of problem specific
classification, association,
clustering models and
visualizations
Task 1 Task 2 Task 3
FeedbackFeedback
The three-step text mining process The three-step text mining process
o
f A
r
ti
c
l
e
s
C LU S T E R : 1
IS RJ M ISM IS Q
0
10
20
30
40
50
60
70
80
90
100
C LU S T E R : 2
IS RJ M ISM IS Q
C LU S T E R : 3
IS RJ M ISM IS Q
C LU S T E R : 4
IS RJ M ISM IS Q
0
10
20
30
40
50
60
70
80
90
100
C LU S T E R : 5
IS RJ M ISM IS Q
C LU S T E R : 6
IS RJ M ISM IS Q
C LU S T E R : 7
IS RJ M ISM IS Q
0
10
20
30
40
50
60
70
80
90
100
C LU S T E R : 8
IS RJ M ISM IS Q
C LU S T E R : 9
IS RJ M ISM IS Q
Web Mining
Web Structure Mining
Source: the unified
resource locator (URL)
links contained in the
Web pages
Web Content Mining
Source: unstructured
textual content of the
Web pages (usually in
HTML format)
Web Usage Mining
Source: the detailed
description of a Web
site’s visits (sequence
of clicks by sessions)
Weblogs
Website
Pre-Process Data
Collecting
Merging
Cleaning
Structuring
- Identify users
- Identify sessions
- Identify page views
- Identify visits
Extract Knowledge
Usage patterns
User profiles
Page profiles
Visit profiles
Customer value
How to better the data
How to improve the Web site
How to increase the customer value
User /
Customer