MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact of MIS on Business, Using Information Systems for Competitive Advantage, Managing Information System Resources

ShivaniTiwari24572 49 views 33 slides Oct 04, 2024
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact
of MIS on Business, Using Information Systems for Competitive Advantage,
Managing Information System Resources


Slide Content

COMPUTER APPLICATION IN MANAGEMENT session 4 Shivani Tiwari

Data Management for Decision Making

Introduction An organization is nothing but an information processing system The ability to provide relevant, accurate and timely information is critical to the success of any organization Any successful organization should have an integrated database to be able to create information The characteristics of organization-wide database Sharable Consistent Reduced redundancy Standardized

Typ es of Databases Types of Databases : Transaction Databases : are used to enter raw data and transactions from original sources are created by OLTP must be standardized, sharable, consistent and with reduced redundancy across the organization Operational Databases : Built from transaction databases Large databases that support all the application for day to day transaction & reporting processes Not designed to store historic data or to support ad-hoc queries

Data Warehouses : Designed for strategic decision support and built from operational databases Contain vast amount of data Smaller, local data warehouses are called Data marts Necessary in organizations where high volume of data processing is required Cross functional flow of information is required Single and centralized data source is a necessity Increased quality and consistency of organization’s data is must Typ es of Databases

Unstructured databases : Business data exists in various unstructured formats Text formats GIS data in the form of maps & locations Chemical data in the form of protein structures, molecule structures etc. Software engineering data in the form of program statements Multimedia data in the form of audio, video, images etc. WWW is a universal repository of strictly structured data to completely unstructured pages Typ es of Databases

Data Warehousing A data warehouse is a single, centralized, enterprise-wide repository that combines all data from all legacy systems and theoretically gives all users access to appropriate information. The basic concept of a Data Warehouse is to facilitate a single version of truth for a company for decision making and forecasting. A Data warehouse is an information system that contains historical and commutative data from single or multiple sources. Data Warehouse Concepts simplify the reporting and analysis process of organizations. Data Warehousing

Characteristics of a good Data warehouse Data warehouse is governed by some specific rules. Time Dependent : A data warehouse contains information collected over a period of time, i.e. historic information. There is a connection between information stored in a warehouse & the time when it was entered. Every record entered contains an element of time, explicitly or implicitly. Once data is inserted in the warehouse, it can't be updated or changed. Data Warehousing

Characteristics of a good Data warehouse Non-volatile : Data warehouse is also non-volatile. The previous data is not erased or overwritten when new data is entered in it. Data is read-only and periodically refreshed. This also helps to analyze historical data and understand what & when happened. It does not require transaction process, recovery and concurrency control mechanisms. Only two types of data operations performed in the Data Warehousing are Data loading (insertion) Data access (retrieval) Data Warehousing

Characteristics of a good Data warehouse Subject Oriented : A data warehouse is subject oriented in a sense that it offers information regarding a business function instead of companies' operational transactions. The functions can be sales, marketing, distributions, etc. A data warehouse never focuses on the day-to-day operations. It emphasizes on modeling and analysis of data for decision making . It also provides a simple and concise view around the specific subject by excluding data which not helpful to support the decision process. Data Warehousing

Characteristics of a good Data warehouse Integrated : Data in a data warehouse is from various operations and sources across the organization and requires to be made standardized and consistent A data warehouse is developed by integrating data from varied sources like a mainframe, relational databases, flat files, etc. it must keep consistent naming conventions, format, and coding The integration helps in effective analysis of data. Consistency in naming conventions, attribute measures, encoding structure etc. have to be ensured. after transformation and cleaning process all this data is stored in common format in the Data Warehouse Data Warehousing

Components of Data warehouse There are mainly 5 components of Data Warehouse Architecture: Database ETL Tools Meta Data Query Tools DataMarts Data Warehousing

Components of Data warehouse Database The central database is the foundation of the data warehousing environment. This database is implemented on the RDBMS technology. Although, it is constrained by the fact that traditional RDBMS system is optimized for transactional database processing and not for data warehousing. For instance, ad-hoc query, multi-table joins, aggregates are resource intensive and slow down performance. New index structures are used to bypass relational table scan and improve speed Use of multidimensional database (MDDBs) to overcome any limitations which are placed because of the relational Data Warehouse Models Data Warehousing

Components of Data warehouse ETL (Extract, Transform and Load) Tools The data sourcing, transformation, and migration tools are used for performing all the conversions, summarizations, and all the changes needed to transform data into a unified format in the data warehouse. These are also called Extract, Transform and Load (ETL) Tools. Their functionality includes: Anonymize data as per regulatory stipulations. Eliminating unwanted data in operational databases from loading into Data warehouse. Search and replace common names and definitions for data arriving from different sources. Calculating summaries and derived data In case of missing data, populate them with defaults. De-normalize repeated data arriving from multiple data sources. These ETL Tools have to deal with challenges of Database & Data heterogeneity. Data Warehousing

Components of Data warehouse Metadata Metadata provides the necessary details to provide data legibility, use and administration. IT contains data about data, activity and knowledge. In other words, Metadata is data about data which defines the data warehouse. It is used for building, maintaining and managing the data warehouse. It is like the encyclopedia about data warehouse. It sets the framework for the data warehouse The ultimate goal of metadata is to corral, catalogue, integrate, guide and support various transformations and loading processes, schema layouts, system tables, partition settings, indices, view definitions etc. Metadata can be classified into following categories: Technical Meta Data : This kind of Metadata contains information about warehouse which is used by Data warehouse designers and administrators. Business Meta Data: This kind of Metadata contains detail that gives end-users a way easy to understand information stored in the data warehouse. Data Warehousing

Query Tools One of the primary objectives of data warehousing is to provide information to businesses to make strategic decisions. Query tools allow users to interact with the data warehouse system. These tools fall into four different categories: Query and reporting tools Reporting tools - Report writers & Production reporting Managed query tools - SQL Application Development tools - custom reports are developed using Application development tools iii . Data mining tools - Data mining is a process of discovering meaningful new correlation, pattens, and trends by mining large amount data. Data mining tools are used to make this process automatic. iv. OLAP tools - These tools are based on concepts of a multidimensional database. It allows users to analyze the data using elaborate and complex multidimensional views Data Warehousing

DataMarts A Datamart contains data from the data werahouse tailored to support the specific analytical requirements of a given business unit or function. Data mart is a subsidiary of a data warehouse. The data mart is used for partition of data which is created for the specific group of users or functions. Data marts could be created in the same database as the Data warehouse or a physically separate Database. Data Warehousing

D AT A M ININ G

Data Mining Data mining is the most innovative as well as most used concept related to the database management techniques. Data mining, in present times, is an innovation also known as knowledge discovery process used for analyzing the different perspectives of data and encapsulate into proficient information. Like the mining (for minerals) is the process or industry of obtaining coal or other minerals from a mine, we need some specific information by analyzing the huge organizational/business data. In DM, large amount of data are inspected, facts are discovered and brought to the attention of the person doing the mining. The process of data mining uses various tools which are used to predict the behavior of huge data which is further used to take decision While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. DM is a more efficient mode of finding useful facts about data.

Data Mining Data Mining Definition : (Different types) Type 1 : Data Mining is the process used for the extraction of hidden predictive data from huge databases. Type 2 : Data Mining is process of discovering the patterns in very large data sets involving the different methods like Machine Learning, statistics, different database systems. Type 3: Data mining is defined as a process used to extract usable data from a larger set of any raw data which implies analyzing data patterns in large batches of data using one or more software.

Data Mining Type 4 : The automated extraction of hidden data from a large amount of database is Data Mining. Type 5 : Data mining refers to the process of extracting the valid and previously unknown information from a large database to make crucial business decisions.

Automatic pattern predictions based on trend and behaviour analysis. Prediction based on likely outcomes. Creation of decision-oriented information. Focus on large data sets and databases for analysis. Clustering based on finding and visually documented groups of facts not previously known. Features of Data Mining

Data Mining Techniques Data mining involves effective data collection and warehousing as well as computer processing. For segmenting the data and evaluating the probability of future events, data mining uses sophisticated mathematical algorithms. Data mining is also known as Knowledge Discovery in Data (KDD). There are various Techniques of Data Mining as follows. Decision Trees Sequential Patterns Clustering Prediction Association Classification

Decision Trees A tree shaped structure that represents a set of decisions. These decisions generate rules for the classification of the data set. A decision tree is a predictive model that can be viewed as a tree structure Each branch of the tree is a classification question & the leaves are partitions o the data set with their classification. It is a predctive model that makes prediction on the basis of a series of decisions. It divides up data on each branch point without loosing any data., i.e., the total no. of records in a given parent node is equal to the sum of ecords contained in its children. Because of their structure and ability to easily generate rules, they are the favoured technique for building understandable models Because of their clarity, they also allow for more complex profit & ROI models to be added easily on top of the predictive model.

Sequential Patterns A sequential pattern function analyses collection of related records and detects frequently occurring patterns in these records over time. Sequential pattern mining functions are quite powerful & can be used to detect the set of records associated with some patterns. For example, use of this function could be in the discovery of a rule that states that 70% of the time when Stock X increased its value by a maximum of 10% over a 5 day trading period, and Stock Y increased its value by 10% - 20% during the same period ten the value of Stock Z also increased by 17% - 20% in the subsequent week.

Clustering Clustering is the method by which similar records are grouped together. Usually this is done to give the end user a top level view or bird’s eye view of what is going on in the database. Demographic data such as income, age, occupation, housing, religion and caste, taken from census report are usually clustered. Clustering is one of the oldest technique used in data mining. In order to predict a value in a record, records with similar predictor values in the historical database should be looked into and the prediction value from the record nearest to the unclassified record should be used. This is known as nearest-neighbor technique . The input to a clustering operator is a collection of untagged records. No classes are known at the time the clustering operator is applied. The goal of a clustering function is to produce a reasonable segmentation of the set of input records according to some criterion.

4. Prediction This method discovers the relationship between independent and dependent instances. For example, in the area of sales; to predict the future profit, sale acts as independent instance and profit could be dependent. Then based on historical data of sales and profit, associated profit is predicted.

Data Mining Process : The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages: Data Mining Interpretation/Evaluation Selection Pre-processing Transformation However, It exists, in many variations on this theme. For example, the Cross Industry Standard Process for Data Mining (CRISP-DM) defines six phases of the Knowledge Discobery process: Business Understanding Data Understanding Data Preparation Modeling Evaluation Depl o yme n t or a simplified process such as 1. pre-processing 2. data mining 3. results validation The CRISP-DM methodology is the leading methodology used by majority of data miners.

Data Mining Data Mining Process : CRISP-DM methodology simplified process (1) pre-processing, (2) data mining, and (3) results validation. pre-processing : Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a data mart or data warehouse. Pre- processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing noise and those with missing data.

Data Management for Decision Making Data Mining Data Mining Process : Data Mining : Data mining involves six common classes of tasks: Anomaly detection (Outlier/change/deviation detection) The identification of unusual data records, that might be interesting or data errors and require further investigation. Association rule learning (Dependency modeling) Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis .

Data Management for Decision Making Data Mining Data Mining Process : Data Mining : (six common tasks) Clustering It is the task of discovering groups and structures in the data that are in some way or another “similar”, without using known structures in the data. Classification It is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as “legitimate” or as “spam”. Regression Attempts to find a function which models the data with the least error. Summarization providing a more compact representation of the data set, including visualization and report generation.

Data Management for Decision Making Data Mining Data Mining Process : Results validation : The final step of knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found by the data mining algorithms are necessarily valid. It is common for the data mining algorithms to find patterns in the training set which are not present in the general data set. This is called overfitting. To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The learned patterns are applied to this test set and the resulting output is compared to the desired output. For example, a data mining algorithm trying to distinguish “spam” from “legitimate” emails would be trained on a training set of sample e-mails. Once trained, the learned patterns would be applied to the test set of e-mails on which it had not been trained. The accuracy of the patterns can then be measured from how many e-mails they correctly classify. A number of statistical methods may be used to evaluate the algorithm If the learned patterns do not meet the desired standards, then it is necessary to re-evaluate and change the pre- processing and data mining steps. If the learned patterns do meet the desired standards, then the final step is to interpret the learned patterns and turn them into knowledge.
Tags