UNIT - 5: Data Warehousing and Data Mining

1,268 views 51 slides Feb 27, 2024
Slide 1
Slide 1 of 51
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51

About This Presentation

UNIT-V
Mining Object, Spatial, Multimedia, Text, and Web Data: Multidimensional Analysis and Descriptive Mining of Complex Data Objects – Spatial Data Mining – Multimedia Data Mining – Text Mining – Mining the World Wide Web.


Slide Content

DATA WAREHOUSING AND DATA MINING UNIT – 5 Prepared by Mr. P. Nandakumar Assistant Professor, Department of IT, SVCET

UNIT-V Mining Object, Spatial, Multimedia, Text, and Web Data: Multidimensional Analysis and Descriptive Mining of Complex Data Objects – Spatial Data Mining – Multimedia Data Mining – Text Mining – Mining the World Wide Web.

Mining Object, Spatial, Multimedia, Text, and Web Data In general terms, β€œMining” is the process of extraction. In the context of computer science, Data Mining can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. There are other kinds of data like semi-structured or unstructured data which includes spatial data, multimedia data, text data, web data which require different methodologies for data mining. Mining Multimedia Data: Multimedia data objects include image data, video data, audio data, website hyperlinks, and linkages. Multimedia data mining tries to find out interesting patterns from multimedia databases. This includes the processing of the digital data and performs tasks like image processing, image classification, video, and audio data mining, and pattern recognition.

Mining Object, Spatial, Multimedia, Text, and Web Data Multimedia Data mining is becoming the most interesting research area because most of the social media platforms like Twitter, Facebook data can be analyzed through this and derive interesting trends and patterns. Mining Web Data: Web mining is essential to discover crucial patterns and knowledge from the Web. Web content mining analyzes data of several websites which includes the web pages and the multimedia data such as images in the web pages. Web mining is done to understand the content of web pages, unique users of the website, unique hypertext links, web page relevance and ranking, web page content summaries, time that the users spent on the particular website, and understand user search patterns. Web mining also finds out the best search engine and determines the search algorithm used by it. So it helps improve search efficiency and finds the best search engine for the users.

Mining Object, Spatial, Multimedia, Text, and Web Data Mining Text Data: Text mining is the subfield of data mining, machine learning, Natural Language processing, and statistics. Most of the information in our daily life is stored as text such as news articles, technical papers, books, email messages, blogs. Text Mining helps us to retrieve high-quality information from text such as sentiment analysis, document summarization, text categorization, text clustering. We apply machine learning models and NLP techniques to derive useful information from the text. This is done by finding out the hidden patterns and trends by means such as statistical pattern learning and statistical language modeling. In order to perform text mining, we need to preprocess the text by applying the techniques of stemming and lemmatization in order to convert the textual data into data vectors.

Mining Object, Spatial, Multimedia, Text, and Web Data Mining Spatiotemporal Data: The data that is related to both space and time is Spatiotemporal data. Spatiotemporal data mining retrieves interesting patterns and knowledge from spatiotemporal data. Spatiotemporal Data mining helps us to find the value of the lands, the age of the rocks and precious stones, predict the weather patterns. Spatiotemporal data mining has many practical applications like GPS in mobile phones, timers, Internet-based map services, weather services, satellite, RFID, sensor.

Mining Object, Spatial, Multimedia, Text, and Web Data Mining Data Streams: Stream data is the data that can change dynamically and it is noisy, inconsistent which contain multidimensional features of different data types. So this data is stored in NoSql database systems. NoSQL Database is used to refer a non-SQL or non relational database. It provides a mechanism for storage and retrieval of data other than tabular relations model used in relational databases. NoSQL database doesn't use tables for storing data. It is generally used to store big data and real-time web applications. The volume of the stream data is very high and this is the challenge for the effective mining of stream data. While mining the Data Streams we need to perform the tasks such as clustering, outlier analysis, and the online detection of rare events in data streams.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects Multidimensional analysis is the analysis of dimension objects organized in meaningful hierarchies. Multidimensional analysis allows users to observe data from various viewpoints. This enables them to spot trends or exceptions in the data. A hierarchy is an ordered series of related dimensions. Multidimensional analysis gives the ability to view data from different viewpoints. This is especially critical for business. In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. The multidimensional data model is composed of logical cubes, measures, dimensions, hierarchies, levels, and attributes. The simplicity of the model is inherent because it defines objects that represent real-world business entities.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects The Complex data types require advanced data mining techniques. Some of the Complex data types are sequence Data which includes the Time-Series, Symbolic Sequences, and Biological Sequences. The additional preprocessing steps are needed for data mining of these complex data types. 1. Time-Series Data Mining: In time-series data, data is measured as the long series of the numerical or textual data at equal time intervals per minute, per hour, or per day. Time-series data mining is performed on the data obtained from the stock markets, scientific data, and medical data. In time series mining it is not possible to find the data that exactly matches the given query. We employ the similarity search method that finds the data sequences that are similar to the given query string. In the similarity search method, subsequence matching is performed to find the subsequences that are similar to a given query string.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 2. Sequential Pattern Mining in Symbolic Sequences: Symbolic sequences are composed of long nominal data sequences, which dynamically change their behavior over time intervals. Examples of the Symbolic Sequences include online customer shopping sequences as well as sequences of events of experiments. Mining of Symbolic Sequences is called Sequential Mining. A sequential pattern is a subsequence that exists more frequently in a set of sequences. so it finds the most frequent subsequence in a set of sequences to perform the mining. Many scalable algorithms have been built to find out the frequent subsequence. There are also algorithms to mine the multidimensional and multilevel sequential patterns.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 3. Data mining of Biological Sequences: Biological sequences are the long sequences of nucleotides and data mining of biological sequences is required to find the features of the DNA of humans. Biological sequence analysis is the first step of data mining to compare the alignment of the biological sequences. Two species are similar to each other only if their nucleotide (DNA, RNA) and protein sequences are close and similar. During the data mining of Biological Sequences, the degree of similarity between nucleotide sequences is measured. The degree of similarity obtained by sequence alignment of nucleotides is essential in determining the homology between two sequences. There can be the situation of alignment of two or more input biological sequences by identifying similar sequences with long subsequences. The amino acids also called proteins sequences are also compared and aligned.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 4. Graph Pattern Mining: Graph Pattern Mining can be done by using Apriori -based and pattern growth-based approaches. We can mine the subgraphs of the graph and the set of closed graphs. A closed graph g is the graph that doesn’t have a super graph that carries the same support count as g. Graph Pattern Mining is applied to different types of graphs such as frequent graphs, coherent graphs, and dense graphs. We can also improve the mining efficiency by applying the user constraints on the graph patterns. Graph patterns are two types. Homogeneous graphs where nodes or links of the graph are of the same type by having similar features. In Heterogeneous graph patterns, the nodes and links are of different types.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 5. Statistical Modeling of Networks: A network is a collection of nodes where each node represents the data and the nodes are linked through edges, representing relationships between data objects. If all the nodes and links connecting the nodes are of the same type, then the network is homogeneous such as a friend network or a web page network. If the nodes and links connecting the nodes are of different types, then the network is heterogeneous such as health-care networks (linking the different parameters such as doctors, nurses, patients, diseases together in the network). Graph Pattern Mining can be further applied to the network to derive the knowledge and useful patterns from the network.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 6. Mining Spatial Data: Spatial data is the geo space-related data that is stored in large data repositories. The spatial data is represented in β€œvector” format and geo-referenced multimedia format. A spatial database is constructed from large geographic data warehouses by integrating geographical data of multiple sources of areas. we can construct spatial data cubes that contain information about the spatial dimensions and measures. It is possible to perform the OLAP operations on the spatial data for spatial data analysis. Spatial data mining is performed on spatial data warehouses, spatial databases, and other geospatial data repositories. Spatial Data mining discovers knowledge about the geographic areas. The preprocessing of spatial data involves several operations like spatial clustering, spatial classification, spatial modeling, and outlier detection in spatial data.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 7. Mining Cyber-Physical System Data: Cyber-Physical System Data can be mined by constructing a graph or network of data. A cyber-physical system (CPS) is a heterogeneous network that consists of a large number of interconnected nodes that store patients or medical information. The links in the CPS network represent the relationship between the nodes . cyber-physical systems store dynamic, inconsistent, and interdependent data that contains spatiotemporal information. Mining cyber-physical data links the situation as a query to access the data from a large information database and it involves real-time calculations and analysis to prompt responses from the CPS system. CPS analysis requires rare-event detection and anomaly analysis in cyber-physical data streams, in cyber-physical networks, and the processing of Cyber-Physical Data involves the integration of stream data with real-time automated control processes.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 8. Mining Multimedia Data: Multimedia data objects include image data, video data, audio data, website hyperlinks, and linkages. Multimedia data mining tries to find out interesting patterns from multimedia databases. This includes the processing of the digital data and performs tasks like image processing, image classification, video, and audio data mining, and pattern recognition. Multimedia Data mining is becoming the most interesting research area because most of the social media platforms like Twitter, Facebook data can be analyzed through this and derive interesting trends and patterns.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 9. Mining Web Data: Web mining is essential to discover crucial patterns and knowledge from the Web. Web content mining analyzes data of several websites which includes the web pages and the multimedia data such as images in the web pages. Web mining is done to understand the content of web pages, unique users of the website, unique hypertext links, web page relevance and ranking, web page content summaries, time that the users spent on the particular website, and understand user search patterns. Web mining also finds out the best search engine and determines the search algorithm used by it. So it helps improve search efficiency and finds the best search engine for the users.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 10. Mining Text Data: Text mining is the subfield of data mining, machine learning, Natural Language processing, and statistics. Most of the information in our daily life is stored as text such as news articles, technical papers, books, email messages, blogs. Text Mining helps us to retrieve high-quality information from text such as sentiment analysis, document summarization, text categorization, text clustering. We apply machine learning models and NLP techniques to derive useful information from the text. This is done by finding out the hidden patterns and trends by means such as statistical pattern learning and statistical language modeling. In order to perform text mining, we need to preprocess the text by applying the techniques of stemming and lemmatization in order to convert the textual data into data vectors.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 11. Mining Spatiotemporal Data: The data that is related to both space and time is Spatiotemporal data. Spatiotemporal data mining retrieves interesting patterns and knowledge from spatiotemporal data. Spatiotemporal Data mining helps us to find the value of the lands, the age of the rocks and precious stones, predict the weather patterns. Spatiotemporal data mining has many practical applications like GPS in mobile phones, timers, Internet-based map services, weather services, satellite, RFID, sensor.

Multidimensional Analysis and Descriptive Mining of Complex Data Objects 12. Mining Data Streams: Stream data is the data that can change dynamically and it is noisy, inconsistent which contain multidimensional features of different data types. So this data is stored in NoSql database systems. The volume of the stream data is very high and this is the challenge for the effective mining of stream data. While mining the Data Streams we need to perform the tasks such as clustering, outlier analysis, and the online detection of rare events in data streams.

Spatial Data Mining A spatial database saves a huge amount of space-related data, including maps, preprocessed remote sensing or medical imaging records, and VLSI chip design data. Spatial databases have several features that distinguish them from relational databases. They carry topological and/or distance information, usually organized by sophisticated, multidimensional spatial indexing structures that are accessed by spatial data access methods and often require spatial reasoning, geometric computation, and spatial knowledge representation techniques. Spatial data mining refers to the extraction of knowledge, spatial relationships, or other interesting patterns not explicitly stored in spatial databases. Such mining demands the unification of data mining with spatial database technologies. It can be used for learning spatial records, discovering spatial relationships and relationships among spatial and nonspatial records, constructing spatial knowledge bases, reorganizing spatial databases, and optimizing spatial queries.

Spatial Data Mining It is expected to have broad applications in geographic data systems, marketing, remote sensing, image database exploration, medical imaging, navigation, traffic control, environmental studies, and many other areas where spatial data are used. A central challenge to spatial data mining is the exploration of efficient spatial data mining techniques because of the large amount of spatial data and the difficulty of spatial data types and spatial access methods. Statistical spatial data analysis has been a popular approach to analyzing spatial data and exploring geographic information. The term geostatistics is often associated with continuous geographic space, whereas the term spatial statistics is often associated with discrete space. In a statistical model that manages non-spatial records, one generally considers statistical independence among different areas of data.

Spatial Data Mining There is no such separation among spatially distributed records because, actually spatial objects are interrelated, or more exactly spatially co-located, in the sense that the closer the two objects are placed, the more likely they send the same properties. For example, natural resources, climate, temperature, and economic situations are likely to be similar in geographically closely located regions. Such a property of close interdependency across nearby space leads to the notion of spatial autocorrelation. Based on this notion, spatial statistical modeling methods have been developed with success. Spatial data mining will create spatial statistical analysis methods and extend them for large amounts of spatial data, with more emphasis on effectiveness, scalability, cooperation with database and data warehouse systems, enhanced user interaction, and the discovery of new kinds of knowledge.

Multimedia Data Mining Multimedia mining is a subfield of data mining that is used to find interesting information of implicit knowledge from multimedia databases. Mining in multimedia is referred to as automatic annotation or annotation mining. Mining multimedia data requires two or more data types, such as text and video or text video and audio. Multimedia data mining is an interdisciplinary field that integrates image processing and understanding, computer vision, data mining, and pattern recognition. Multimedia data mining discovers interesting patterns from multimedia databases that store and manage large collections of multimedia objects, including image data, video data, audio data, sequence data and hypertext data containing text, text markups, and linkages. Issues in multimedia data mining include content-based retrieval and similarity search, generalization and multidimensional analysis. Multimedia data cubes contain additional dimensions and measures for multimedia information.

Multimedia Data Mining The framework that manages different types of multimedia data stored, delivered, and utilized in different ways is known as a multimedia database management system. There are three classes of multimedia databases: static, dynamic, and dimensional media. The content of the Multimedia Database management system is as follows: Media data:The actual data representing an object. Media format data: Information such as sampling rate, resolution, encoding scheme etc., about the format of the media data after it goes through the acquisition, processing and encoding phase. Media keyword data:Keywords description relating to the generation of data. It is also known as content descriptive data. Example: date, time and place of recording. Media feature data: Content dependent data such as the distribution of colours , kinds of texture and different shapes present in data.

Multimedia Data Mining Types of multimedia applications based on data management characteristics are: Repository applications: A Large amount of multimedia data and meta-data (Media format date, Media keyword data, Media feature data) that is stored for retrieval purposes, e.g., Repository of satellite images, engineering drawings, radiology scanned pictures. Presentation applications: They involve delivering multimedia data subject to temporal constraints. Optimal viewing or listening requires DBMS to deliver data at a certain rate, offering the quality of service above a certain threshold. Here data is processed as it is delivered. Example: Annotating of video and audio data, real-time editing analysis. Collaborative work using multimedia information involves executing a complex task by merging drawings and changing notifications. Example: Intelligent healthcare network.

Multimedia Data Mining Challenges with Multimedia Database: Modelling: Working in this area can improve database versus information retrieval techniques; thus, documents constitute a specialized area and deserve special consideration. Design: The conceptual, logical and physical design of multimedia databases has not yet been addressed fully as performance and tuning issues at each level are far more complex as they consist of a variety of formats like JPEG, GIF, PNG, MPEG, which is not easy to convert from one form to another. Storage: Storage of multimedia database on any standard disk presents the problem of representation, compression, mapping to device hierarchies, archiving and buffering during input-output operation. In DBMS, a BLOB (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved. Performance: Physical limitations dominate an application involving video playback or audio-video synchronization. The use of parallel processing may alleviate some problems, but such techniques are not yet fully developed. Apart from this, a multimedia database consumes a lot of processing time and bandwidth. Queries and retrieval: For multimedia data like images, video, and audio accessing data through query open up many issues like efficient query formulation, query execution and optimization, which need to be worked upon.

Multimedia Data Mining Where is Multimedia Database Applied? Below are the following areas where a multimedia database is applied, such as: Documents and record management: Industries and businesses keep detailed records and various documents. For example, insurance claim records. Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in terms of providing several resources. For example, electronic books. Education and training: Computer-aided learning materials can be designed using multimedia sources which are nowadays very popular sources of learning. Example: Digital libraries. Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of cities. Real-time control and monitoring: With active database technology, multimedia presentation of information can effectively monitor and control complex tasks. For example, manufacturing operation control.

Multimedia Data Mining Where is Multimedia Database Applied? Below are the following areas where a multimedia database is applied, such as: Documents and record management: Industries and businesses keep detailed records and various documents. For example, insurance claim records. Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in terms of providing several resources. For example, electronic books. Education and training: Computer-aided learning materials can be designed using multimedia sources which are nowadays very popular sources of learning. Example: Digital libraries. Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of cities. Real-time control and monitoring: With active database technology, multimedia presentation of information can effectively monitor and control complex tasks. For example, manufacturing operation control.

Multimedia Data Mining Categories of Multimedia Data Mining Multimedia mining refers to analyzing a large amount of multimedia information to extract patterns based on their statistical relationships. Multimedia data mining is classified into two broad categories: static and dynamic media. Static media contains text (digital library, creating SMS and MMS) and images (photos and medical images). Dynamic media contains Audio (music and MP3 sounds) and Video (movies). The below image shows the categories of multimedia data mining.

Multimedia Data Mining Application of Multimedia Mining

Multimedia Data Mining Process of Multimedia Data Mining

Multimedia Data Mining Architecture for Multimedia Data Mining

Text Mining in Data Mining Text mining is a component of data mining that deals specifically with unstructured text data. It involves the use of natural language processing (NLP) techniques to extract useful information and insights from large amounts of unstructured text data. Text mining can be used as a preprocessing step for data mining or as a standalone process for specific tasks. By using text mining, the unstructured text data can be transformed into structured data that can be used for data mining tasks such as classification, clustering, and association rule mining. This allows organizations to gain insights from a wide range of data sources, such as customer feedback, social media posts, and news articles. What is the common usage of Text Mining? Text mining is widely used in various fields, such as natural language processing, information retrieval, and social media analysis. It has become an essential tool for organizations to extract insights from unstructured text data and make data-driven decisions.

Text Mining in Data Mining Conventional Process of Text Mining Gathering unstructured information from various sources accessible in various document organizations, for example, plain text, web pages, PDF records, etc. Pre-processing and data cleansing tasks are performed to distinguish and eliminate inconsistency in the data. The data cleansing process makes sure to capture the genuine text, and it is performed to eliminate stop words stemming (the process of identifying the root of a certain word and indexing the data. Processing and controlling tasks are applied to review and further clean the data set. Pattern analysis is implemented in Management Information System. Information processed in the above steps is utilized to extract important and applicable data for a powerful and convenient decision-making process and trend analysis.

Text Mining in Data Mining Conventional Process of Text Mining

Text Mining in Data Mining Procedures for Analyzing Text Mining Text Summarization: To extract its partial content and reflect its whole content automatically. Text Categorization: To assign a category to the text among categories predefined by users. Text Clustering: To segment texts into several clusters, depending on the substantial relevance.

Text Mining in Data Mining Text Mining Techniques Information Retrieval : In the process of Information retrieval, we try to process the available documents and the text data into a structured form so, that we can apply different pattern recognition and analytical processes. It is a process of extracting relevant and associated patterns according to a given set of words or text documents. Information Extraction: It is a process of extracting meaningful words from documents. Feature Extraction – In this process, we try to develop some new features from existing ones. This objective can be achieved by parsing an existing feature or combining two or more features based on some mathematical operation. Feature Selection – In this process, we try to reduce the dimensionality of the dataset which is generally a common issue while dealing with the text data by selecting a subset of features from the whole dataset. Natural Language Processing: Natural Language Processing includes tasks that are accomplished by using Machine Learning and Deep Learning methodologies. It concerns the automatic processing and analysis of unstructured text information.

Text Mining in Data Mining Application Area of Text Mining Digital Library Academic and Research Field Life Science Social-Media Business Intelligence

Text Mining in Data Mining Issues in Text Mining Numerous issues happen during the text mining process: The efficiency and effectiveness of decision-making. The uncertain problem can come at an intermediate stage of text mining. In the pre-processing stage, different rules and guidelines are characterized to normalize the text which makes the text-mining process efficient. Before applying pattern analysis to the document, there is a need to change over unstructured data into a moderate structure. Sometimes original message or meaning can be changed due to alteration. Another issue in text mining is many algorithms and techniques support multi-language text. It may create ambiguity in text meaning. This problem can lead to false-positive results. The utilization of synonyms, polysemy, and antonyms in the document text makes issues for the text mining tools that take both in a similar setting. It is difficult to categorize such kinds of text/ words.

Text Mining in Data Mining Advantages of Text Mining Large Amounts of Data: Text mining allows organizations to extract insights from large amounts of unstructured text data. This can include customer feedback, social media posts, and news articles. Variety of Applications: Text mining has a wide range of applications, including sentiment analysis, named entity recognition, and topic modeling. This makes it a versatile tool for organizations to gain insights from unstructured text data. Improved Decision Making: Text mining can be used to extract insights from unstructured text data, which can be used to make data-driven decisions. Cost-effective: Text mining can be a cost-effective way to extract insights from unstructured text data, as it eliminates the need for manual data entry. Broader benefits : Cost reductions, productivity increases, the creation of novel new services, and new business models are just a few of the larger economic advantages mentioned by those consulted.

Text Mining in Data Mining Disadvantages of Text Mining Complexity: Text mining can be a complex process that requires advanced skills in natural language processing and machine learning. Quality of Data: The quality of text data can vary, which can affect the accuracy of the insights extracted from text mining. High Computational Cost: Text mining requires high computational resources, and it may be difficult for smaller organizations to afford the technology. Limited to Text Data: Text mining is limited to extracting insights from unstructured text data and cannot be used with other data types. Noise in text mining results: Text mining of documents may result in mistakes. It’s possible to find false links or to miss others. In most situations, if the noise (error rate) is sufficiently low, the benefits of automation exceed the chance of a larger mistake than that produced by a human reader. Lack of transparency: Text mining is frequently viewed as a mysterious process where large corpora of text documents are input and new information is produced. Text mining is in fact opaque when researchers lack the technical know-how or expertise to comprehend how it operates, or when they lack access to corpora or text mining tools.

Data Mining- World Wide Web Over the last few years, the World Wide Web has become a significant source of information and simultaneously a popular platform for business. Web mining can define as the method of utilizing data mining techniques and algorithms to extract useful information directly from the web, such as Web documents and services, hyperlinks, Web content, and server logs. The World Wide Web contains a large amount of data that provides a rich source to data mining. The objective of Web mining is to look for patterns in Web data by collecting and examining data in order to gain insights.

Data Mining- World Wide Web What is Web Mining? Web mining can widely be seen as the application of adapted data mining techniques to the web, whereas data mining is defined as the application of the algorithm to discover patterns on mostly structured data embedded into a knowledge discovery process. Web mining has a distinctive property to provide a set of various data types. The web has multiple aspects that yield different approaches for the mining process, such as web pages consist of text, web pages are linked via hyperlinks, and user activity can be monitored via web server logs.

Data Mining- World Wide Web

Data Mining- World Wide Web 1. Web Content Mining: Web content mining can be used to extract useful data, information, knowledge from the web page content. In web content mining, each web page is considered as an individual document. The individual can take advantage of the semi-structured nature of web pages, as HTML provides information that concerns not only the layout but also logical structure. The primary task of content mining is data extraction, where structured data is extracted from unstructured websites. The objective is to facilitate data aggregation over various web sites by using the extracted structured data. Web content mining can be utilized to distinguish topics on the web. For Example, if any user searches for a specific task on the search engine, then the user will get a list of suggestions.

Data Mining- World Wide Web 2. Web Structured Mining: The web structure mining can be used to find the link structure of hyperlink. It is used to identify that data either link the web pages or direct link network. In Web Structure Mining, an individual considers the web as a directed graph, with the web pages being the vertices that are associated with hyperlinks. The most important application in this regard is the Google search engine, which estimates the ranking of its outcomes primarily with the PageRank algorithm. It characterizes a page to be exceptionally relevant when frequently connected by other highly related pages. Structure and content mining methodologies are usually combined. For example, web structured mining can be beneficial to organizations to regulate the network between two commercial sites.

Data Mining- World Wide Web 3. Web Usage Mining: Web usage mining is used to extract useful data, information, knowledge from the weblog records, and assists in recognizing the user access patterns for web pages. In Mining, the usage of web resources, the individual is thinking about records of requests of visitors of a website, that are often collected as web server logs. While the content and structure of the collection of web pages follow the intentions of the authors of the pages, the individual requests demonstrate how the consumers see these pages. Web usage mining may disclose relationships that were not proposed by the creator of the pages.

Data Mining- World Wide Web Some of the methods to identify and analyze the web usage patterns are given below: I. Session and visitor analysis: The analysis of preprocessed data can be accomplished in session analysis, which incorporates the guest records, days, time, sessions, etc. This data can be utilized to analyze the visitor's behavior. The document is created after this analysis, which contains the details of repeatedly visited web pages, common entry, and exit. II. OLAP (Online Analytical Processing): OLAP accomplishes a multidimensional analysis of advanced data. OLAP can be accomplished on various parts of log related data in a specific period. OLAP tools can be used to infer important business intelligence metrics

Data Mining- World Wide Web Challenges in Web Mining: The web pretends incredible challenges for resources, and knowledge discovery based on the following observations: The complexity of web pages: The site pages don't have a unifying structure. They are extremely complicated as compared to traditional text documents. There are enormous amounts of documents in the digital library of the web. These libraries are not organized according to a specific order. The web is a dynamic data source: The data on the internet is quickly updated. For example, news, climate, shopping, financial news, sports, and so on. Diversity of client networks: The client network on the web is quickly expanding. These clients have different interests, backgrounds, and usage purposes. There are over a hundred million workstations that are associated with the internet and still increasing tremendously.

Data Mining- World Wide Web Relevancy of data: It is considered that a specific person is generally concerned about a small portion of the web, while the rest of the segment of the web contains the data that is not familiar to the user and may lead to unwanted results. The web is too broad: The size of the web is tremendous and rapidly increasing. It appears that the web is too huge for data warehousing and data mining. Application of Web Mining: Web mining has an extensive application because of various uses of the web. The list of some applications of web mining is given below. Marketing and conversion tool Data analysis on website and application accomplishment. Audience behavior analysis Advertising and campaign accomplishment analysis. Testing and analysis of a site.