DATA MINING 05 Angela Mary Binoy 06 Annies Minu SathiyaSeelan
Index What is Data Mining? Architecture. KDD process.
What is Data Mining ? Data mining refers to extracting or “mining” knowledge from large amounts of data. Data mining field brings together techniques from learning , pattern recognition , statistics , databases and visualization to deal with the issues of information extraction from large data bases. Data mining field finds its application in market analysis and management like for e.g. customer relationship management , cross selling, market segmentation.
ARCHITECTURE OF DATA MINING Architecture of a typical data mining system may have the following major components: 1) Database , Data warehouse , World Wide Web : - This is one or set of databases, data warehouses, spreadsheets or other kind of information repositories. Data cleaning and data integration techniques may be performed. 2) Databases or Data warehouse Server : - It is responsible for fetching the relevant data, based on the user’s requirement needed for data mining.
3) Knowledge base : - This is domain knowledge that is used to guide the search , and gives interesting and hidden patterns from data. Such knowledge can include concept hierarchies, used to organize attribute or attribute values into different levels of abstraction. -Knowledge such as user beliefs, which can be used to asses a pattern’s interestingness based on it’s unexpectedness may also be included -Other example are constraints, threshold & metadata. 4) Data Mining Engine : - This is essential to the data mining system & ideally consists of a set of functional modules for tasks such as characterization, association & correlation analysis, classification, prediction, cluster analysis, outlier analysis & evolution analysis .
5) Pattern Evaluation Module : - It is integrated with the mining module and it gives the search of only the interesting patterns. 6 ) Graphical User Interface : - Used to communicate between users and the data mining system, allowing the users to interact with the system by specifying a data mining query or task, & performing exploratory data mining based on the intermediate data mining results. -This component allows the user to browse database or data warehouse schemas or data structures, evaluate mined patterns, & visualize the patterns in different forms.
Knowledge Discovery Data(KDD) The unifying goal of the KDD process is to extract knowledge from Data in the context of large databases . It consists of an iterative sequence of the following steps: 1) Data Cleaning : -To remove noise and inconsistent data. 2) Data Integration : -Combining multiple data sources. 3) Data Selection : -Data relevant to the analysis task are retrieved from the database.
4) Data Transformation : - Data are transformed into forms appropriate for mining by performing summary or aggregation operations, for instance. 5) Data Mining : - An essential process where intelligent methods are applied in order to extract data patterns. 6) Pattern Evalution : -To identify the truly interesting patterns representing knowledge base on some interestingness measures. 7) Knowledge Presentation : - Visualization and knowledge representation techniques are used to present the mined knowledge to the user.
Steps 1 to 4 are different forms of data preprocessing, where the data are prepared for mining. -The data mining step may interact with the user or knowledge base. -The interesting patterns are represented to the user & may be stored as a new knowledge in the knowledge base. -Data mining is only step which is more essential because it uncovers hidden patterns for evaluation.
KDD and Data Mining are not same thing. KDD is the overall process of discovering useful knowledge from data whereas Data Mining is only one step in the KDD process. KDD is the nontrivial process of identifying valid , potentially useful and ultimately understandable patterns in data and Data Mining is an application of specific algorithms for extracting patterns for data. How does KDD defer from Data Mining: