An emerging step : Data Warehousing to Pattern Warehousing

545 views 19 slides Feb 22, 2017
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

This presentation shows the sequential step from the advent of Data mining, Data Warehousing to Pattern Warehousing which includes the present gaps and gives idea for future work & research in order to make the work more easy.


Slide Content

Madhav Institute of Technology & science, Gwalior( m.p .) Department of Cse /it A synopsis report on Model for Optimal pattern extraction from Diabetes Metillus Pattern Warehouse (DMPW) pattern warehouse using pso By : Harshita S. Jain

Contents : Introduction RECENT APPROACH (FLOWCHART, RESULT ANALYSIS) GAPS IN THE CURRRENT KNOW PROPOSED METHODOLOGY(new approach, analysis) Particle swarm optimization Future exploration references

Introduction (Arrival of Data Mining) In 1990s,  the term “data mining” appeared in the database community.  Retail companies and the financial community are using data mining to analyze data and recognize trends to increase their customer base, predict fluctuations in interest rates, stock prices, customer demand etc. Eventually the application domain of data mining is expanding. Data mining is the process of extracting information from large amount of data which are stored in huge repositories.

Introduction Today’s world produces an enormous amount of data in a regular basis from various sources. Data in such huge volumes do not constitute knowledge i.e., they cannot be directly exploited by human beings and no useful information can be deduced simply by their observation. Thus, more elaborate techniques are required in order to extract the hidden knowledge and make these data valuable to the end-users [4]. Data mining was developed to help extract Knowledge from the raw data, using algorithms that could discover several statistic properties in the original data. Data mining produces results like association rules, clusters, decision trees and other structures that describe properties of the raw data. The common characteristic of all these techniques is that big portions of the available data are abstracted and represented by a small number of knowledge carrying representatives, which we call patterns ( Tiwari & Thakur , 2012). Patterns represent the huge quantity of heterogeneous data in compact and rich semantics way

Data warehouse  Data warehouses are used to consolidate data located in disparate databases. A data warehouse stores large quantities of data by specific categories so it can be more easily retrieved, interpreted, and sorted by users. Warehouses enable executives and managers to work with vast stores of transactional or other data to respond faster to markets and make more informed business decisions. It has been predicted that every business will have a data warehouse within ten years. But merely storing data in a data warehouse does a company little good. Companies will want to learn more about that data to improve knowledge of customers and markets. The company benefits when meaningful trends and patterns are extracted from the data.

Issues Related to Data warehouse The size of single data warehouse was quite large . So it becomes tedious task to handle the management of data warehouse. For analysis purpose business analyst demands the consolidated information. Exponential increase in data day by day and the storing cost does not hold data warehouse as the best solution for the problem . Desired patterns are in volatile form in data warehouse, so even for small analysis the whole process of data mining has to be performed for obtaining certain results.

Advent of pattern warehouse As the size of the data warehouse is growing due to massive increase of data, business analyst are now not in the need of huge analytical data but they are interested in getting only the relevant patterns hidden within repositories. And so the concept of pattern warehouse was introduced[1]. FIG : PROCESS OF KNOWLEDGE DISCOVERY FROM DATABASES

Pattern warehouse & Pattern mIning Pattern warehouse is a kind of repository which stores the relevant patterns which are the representative of the relationship that exist between the data elements. Pattern mining is performed upon the patterns stored in pattern warehouse for generating analytical outcomes. Through pattern mining the analyst has to deal with small amount of information[7]

Recent approach The recent approach consist of an evolutionary algorithm (genetic algorithm) which works upon the optimization engine and generates optimal patterns from pattern warehouse[7]. The workflow to obtain optimal patterns is : Pattern Warehouse  Optimization engine  Repository for Optimal Patterns

Flowchart of the Recent existing approach [7]

Result analysis

Taking a step Ahead of existing approach Limitations in using genetic algorithm : No guarenteE to give global optimum regarding false frequent patterns Cannot assure that this will give constant optimization response time. Cannot use in dynamic problem. Domain of applicability is limited.

Proposed Methodology Proposed an algorithm which works upon the optimization engine for generating optimal patterns from pattern warehouse. The proposed algorithm uses particle swarm optimization.. The steps of algorithm step by step and then finally draw a flowchart and provides the execution of whole process.

Particle swarm optimization Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling.  The system is initialized with a population of random solutions and searches for optima by updating generations. It uses a number of agents (particles) that constitute a swarm moving around in the search space looking for the best solution. Each particle is treated as a point in a N-dimensional space which adjusts its “flying” according to its own flying experience as well as the flying experience of other particles.

Flow chart depicting the proposed PSO Algorithm :

Theoretical Analysis /EXPECTED OUTCOME Basis of choice : while going through various research papers on comparison between different nature inspired algorithms, pso was founded to be more effective and versatile. Comparision between Genetic and particle swarm optimization : ga was designed basically for discrete optimization where bit o and 1 are used to encode discrete design variables whereas pso was designed for continuous problems and can choose any value to encode design variables. Unlike ga , pso is designed to solve continuous problem but it was modified later for discrete or binary optimization problems as well. Ga solves problems where here is no predetermined shape, size & complexity whereas in pso the source and destination are need to define uniquely and clearly. unlike GA, PSO has no evolution operators such as crossover and mutation. 

Future exploration motives To take on the Architectural Aspects of the pattern warehouse and try to make pattern retrieval more efficient and scalable.

References Agarwal , V. and Tiwari , A., “From Data Warehouse to Pattern Warehouse: A Progressive Step”, International Journal of Engineering Research”, 2016, Vol. 5, No.4, pp: 249-252. J. Han and M. Kamber , “Data mining: Concepts and Techniques”, Second Edition, Morgan Kaufmann Publishers, San Francisco, Elsevier, 2006. A. Tiwari , R. K. Gupta and D. P. Agrawal , “A Survey on Frequent Pattern Mining: Current Status and Challenging Issues”, Information Technology Journal, 9(7):1278-1293, 2010. Terrovitis , M., & Vassiliadis , P. (2003). Architecture for pattern base management systems. Department of Electrical and Computer Engineering. National Technical University of Athens. Tiwari , V., & Thakur , R. S. (2014). P2ms: A Phase-Wise Pattern Management System For Pattern Warehouse. International Journal of Data Mining, Modeling and Management. Dunham, M. H. 2006 Data Mining: Introductory and Advanced Topics. Pearson Education. Vishakha Agarwal & AKHILESH TIWARI., “A NOVEL OPTIMAL PATTERN MINING ALGORITHM USING GENETIC ALGORITHM”, International Journal of COMPUTER APPLICATIONS” , JUNE2016, Vol. 144, No.4 Wikipedia and various other concerned websites .

Thank you….!