Introduction to Web Mining and Spatial Data Mining
1,351 views
33 slides
May 18, 2021
Slide 1 of 33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
About This Presentation
Data Ware Housing And Mining subject offer in Gujarat Technological University in Branch of Information and Technology.
This Topic is from chapter 8 named Advance Topics.
Size: 13.15 MB
Language: en
Added: May 18, 2021
Slides: 33 pages
Slide Content
Gujarat technological university Introduction To Web Mining and Spatial Data Mining Active Learning Assignment of Data Ware Housing and Mining ( 3161610 ) Prepared by Aarsh Dhokai Dharmam savani Guided by Prof. ravi patel sir A. D. Patel Institute of Technology
What is the Data Mining ? Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. What is the Web Mining ? Web Mining is the process of Data Mining techniques to automatically discover and extract information from Web documents and services. The main purpose of web mining is discovering useful information from the World-Wide Web and its usage patterns.
Data mining v/s web mining Points Data Mining Web Mining Definition Data Mining is the process that attempts to discover pattern and hidden knowledge in large data sets in any system. Web Mining is the process of data mining techniques to automatically discover and extract information from web documents. Application Data Mining is very useful for to find pattern in large batches of data. Web Mining is very useful for a particular website and e-service. Performed By Data scientist and data engineers. Data scientists along with data analysts. Access Data Mining access data privately. Web Mining access data publicly. Structure Data Mining gets the information from explicit structure. Web Mining gets the information from structured, unstructured and semi-structured web pages. Problem Type Clustering, classification, regression, prediction, optimization and control. Web content mining, Web structure mining, Web usage mining Tools It includes tools like machine learning algorithms. Special tools for web mining are Scrapy, PageRank and Apache logs. Skills It includes approaches for data cleansing, machine learning algorithms. Statistics and probability. It includes application level knowledge, data engineering with mathematical modules like statistics and probability.
Why Web mining? Web mining is the application of data mining techniques to discover patterns, structures, and knowledge from the Web. The World Wide Web is fertile source for data mining. The World Wide Web serves as a huge, widely distributed, global information center for news, advertisements, consumer information, financial management, education, government, and e-commerce.
Types of web mining
Web Content Mining Web Content Mining is the process of extracting useful information from the content of the web documents. Web content consist of several types of data – text, image, audio, video or structured records such as lists and tables . Web content mining has been studied extensively by researchers, search engines, and other web service companies. Web content mining can build links across multiple web pages for individuals; therefore, it has the potential to inappropriately disclose personal information.
Web Content Mining Web content mining is done to:-
Web Structure Mining Web structure mining uses graph theory to analyze the node and connection structure of a web site. According to the type of web structural data . W eb structure mining can be divided into two kinds: Extracting patterns from hyperlinks in the web: a hyperlink is a structural component that connects the web page to a different location. Mining the document structure: analysis of the tree-like structure of page structures to describe HTML or XML tag usage. Web structure mining terminology: Web graph: directed graph representing web. Node: web page in graph. Edge: hyperlinks. In degree: number of links pointing to particular node. Out degree: number of links generated from particular node.
Web Structure Mining Web structure mining is done to :-
Web Usage Mining It is the is the process of extracting useful information from server logs of users. It is classified in to three kind of data usage : Web Server Data: T he web server including IP address, page reference and access time collects user logs. Application Server Data: Ability to track various kinds of business events and log them in application server logs. Application Level Data: Defining new kinds of events and logging them by generating histories of the events.
Web Usage Mining Web usage mining is done to :-
Tools for web mining
Applications of web mining
in business
Security and Crime Investigation Government agencies are using this technology to classify threats and fight against terrorism. The predicting capability of mining applications can benefit society by identifying criminal activities. Terrorist groups use the Web as their infrastructure for various purposes. Web Usage Mining is aims to track down online access to abnormal content, which may include terrorist-generated sites, by analyzing the content of information accessed by the Web users.
search engines Web mining helps to improve the power of web search engine by classifying the web documents and identifying the web pages. It is used for Web Searching e.g., Google, Yahoo etc. The use of data mining in web search engine helps in analyzing the content and at the same time delivering results that are relevant for the users. As a result, digital marketers who are focused on creating valuable content for users sure to benefit from the impact of data mining on SEO.
Advantages of web mining
Challenges in web mining
Challenges in web mining
Spatial data mining
What is Spatial Data? Spatial data is any data with a direct or indirect reference to a specific location or geographical area. Spatial data is often referred to as geospatial data or geographic information.
Introduction to Spatial Data Mining
Spatial Data Mining Tasks Classification : finds a set of rules which determine the class of the classified object according to its attributes e. g. ” Classify remotely-sensed images based on spectrum and GIS data. Association Rules : find (spatially related) rules from the database. Association rules describe patterns, which are often in the database. The association rule has the following form: A → B (s%, c%), where s is the support of the rule (the probability, that A and B hold together in all the possible cases) and c is the confidence (the conditional probability that B is true under the condition of A. E. g. ” Rain (x, pour) = & gt ; landslide (x, happen) , support is 76%, and confidence is 51%. ”
Spatial Data Mining Tasks Clustering : groups the object from database into clusters in such a way that object in one cluster are similar and objects from different clusters are dissimilar. e. g. we can find clusters of cities with similar level of un employment or we can cluster pixels into similarity classes based on spectral characteristics. Trend Detection : Finds trends in database. A trend is a temporal pattern in some time series data. A spatial trend is defined as a pattern of change of a non-spatial attribute in the neighborhood of a spatial object. e. g. ”Google Maps Traffic Detection”
Spatial Data Mining Tasks Characteristic Rules : A common character of a kind of spatial entity, or several kinds of spatial entities. A kind of tested knowledge for summarizing similar features of objects in a target class. e. g. ” Characterize similar ground objects in a large set of remote sensing images. ”. Discriminant Rules : Describe differences between two parts of database. e. g. Compare land price in urban boundary and land price in urban center .
Spatial Database Database is similar to a plain relational database, but in addition to storing data on qualitative and quantitative attributes, spatial databases store data about physical location and feature geometry type. Every record in a spatial database is stored with numeric coordinates that represent where that record occurs on a map and each feature is represented by only one of these three geometry types: Point Line Polygon Stores a large amount of space-related data Maps, Remote Sensing, Medical Imaging, VLSI chip layout
Spatial data base Whether you want to calculate the distance between two places on a map or determine the area of a particular piece of land, you can use spatial database querying to quickly and easily make automated spatial calculations on entire sets of records at one time. You can use spatial databases to perform almost all the same types of calculations on — and manipulations of — attribute data that you can in a plain relational database system.
Spatial Classification Analyze spatial objects to derive classification schemes, such as decision trees, in relevance to certain spatial properties (district, highway, river) Classifying medium-size families according to income, region, and infant mortality rates Mining Data for volcanoes on Venus Employ methods such as: Decision-tree classification, Naïve-Bayesian classifier + boosting, neural network, etc.
Spatial Trend Analysis
Applications of spatial data mining Domain Spatial Data Mining Application Public Safety Discovery of hotspot patterns from crime event maps Epidemiology Detection of disease outbreak Neuroscience Discovering patterns of human brain activity from neuroimages Climate Science Finding positive or negative correlations between temperatures of distance places Business Market allocation to maximize stores' profits
Other Applications Spatial data mining is used in Space technology : ISRO GPS SYSTEM Security : National Crime Records Bureau uses spatial data to track down criminals GIS, Geo-marketing, Remote Sensing, Image database exploration, medical imaging, Navigation
Challenges in spatial data mining Complexity of spatial data types and access methods Large amounts of data Requires Huge Data storage facilities.