1. Data warehouse Fundamentals for MCA SPPU.pptx

1. Data Warehouse Fundamentals

Data Data is an unorganized raw facts that need to processing without which it is seemingly random and useless to humans. Data can be a number, symbol, character word, video, etc. Data doesn’t depend on information.

10 th marks scored Varsha 70% 60% Nitin

10 th marks scored Varsha 70% Nitin 60%

Data Warehouse

Understand DW 1)Why developed or invented? 2)Operational data vs. strategic information? 3)Which data? 4)What is a need? 5)What type of data is stored in a data warehouse?

What is OD and SD Operational data Day to day Strategic information Which Product sales more And which is not a sale. How to grow business. This information helps in decision-making for business.

Introduction A Data Warehouse consists of data from multiple heterogeneous data sources and is used for analytical reporting and decision-making. A Data Warehouse is a central place where data is stored from different data sources. The term Data Warehouse was first invented by Bill Inmom in 1990. A Data Warehouse is always kept separate from an Operational Database. The data in a DW system is loaded from operational transaction systems like − Sales Marketing HR, etc. It may pass through operational data storage or other transformations before it is loaded into the DW system for information processing. It is a big collection of data from small small databases and It is a godown of data.

Data Warehouse Introduction-

A Data Warehouse is used for reporting and analyzing information and stores both historical and current data. The data in the DW system is used for Analytical reporting, which is later used by Business Analysts, Sales Managers, or Knowledge workers for decision-making. The term "Data Warehouse" was first coined by Bill Inmon in 1990. According to Inmon , a data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to make informed decisions in an organization.

Why DW is separated from operational databases Operational databases support concurrent processing of multiple transactions. Concurrency control and recovery mechanisms are required for operational databases to ensure the robustness(strong) and consistency of the database. An operational database query allows to read and modify operations, while an OLAP query needs only read-only access of stored data. An operational database maintains current data. On the other hand, a data warehouse maintains historical data. Note − A data warehouse does not require transaction processing, recovery, and concurrency controls, because it is physically stored and separate from the operational database.

Understanding a data warehouse A data warehouse is a database, which is kept separate from the organization's operational database. There is no frequent updating done in a data warehouse. It possesses consolidated( joint,aggregative ) historical data, which helps the organization to analyze its business. A data warehouse helps executives to organize, understand, and use their data to take strategic decisions. A data warehouse system helps in consolidated historical data analysis.

conclusion Here, the point to be noted is that the data used for analysis or reporting is heterogeneous data collected from varied/multiple sources. The main purpose of data warehousing is to combine such data for the purpose of analyzing and reporting. The data is then used for strategic planning and decision-making for the organizations.

Features of a Data Warehouse It is a central data repository where data is stored from one or more heterogeneous data sources. A DW system stores both current and historical data. Normally a DW system stores 5-10 years of historical data. A DW system is always kept separate from an operational transaction system.

Characteristics of a Data Warehouse 1. Subject Oriented − In a DW system, the data is categorized and stored by a business subject rather than by application like equity plans, shares, loans, etc. Data divide in subjects like Customer, product, sale, revenue, supplier, etc Customer-related data stored in the customer subject, sale data stored in the sale subject , etc. DATA stored in proper format with their names. 2. Integrated − Data from multiple data sources are integrated in a Data Warehouse like relational databases, flat files, operational transaction, etc One of the key characteristics of a data warehouse is that it contains integrated data. This means that the data is collected from various sources, such as transactional systems, and then cleaned, transformed, and consolidated into a single, unified view. This allows for easy access and analysis of the data, as well as the ability to track data over time.

3. Non Volatile − Data in data warehouse is non-volatile. It means when data is loaded in DW system, it is not altered. We cant edit , update or change the data Another characteristic of a data warehouse is that it is non-volatile. This means that the data in the warehouse is never updated or deleted, only added to. This is important because it allows for the preservation of historical data, making it possible to track trends and patterns over time. 4. Time Variant − A DW system contains historical data as compared to Transactional system which contains only current data. In a Data warehouse you can see data for 3 months, 6 months, 1 year, 5 years, etc.

A data warehouse is also time-variant, which means that the data is stored with a time dimension. This allows for easy access to data for specific time periods, such as last quarter or last year. This makes it possible to track trends and patterns over time.

OLTP Systems OLTP means Online Transaction Processing Systems. We perform the transaction online. This System manages transaction-oriented applications For example Online banking-amazon, Paytm, Google Pay, phone pay, etc. Online airline ticket booking Sending a text message Education Add a book to the shopping cart Point of Sale

OLTP is used in every organization to record their transactional data. OLTP system is an online database changing system. Therefore, it supports database queries such as insert, update, and delete information from the database. This system covers all day-to-day operations such as purchasing, payroll, accounting, etc. of an organization.

Data Warehouse Vs Operational Database An Operational System contains the current data of an organization and a Data warehouse normally contains the historical data. An Operational Database query allows to read and modify operations (insert, delete, and Update) while an OLAP query needs only read-only access to stored data (Select statement).

OLTP VS OLAP

Difference bet OLTP and OLAP(DW) OLTP(Database) It is a customer-orientated process. Current data It uses a normalized database model to design The OLTP database is accessible via Simple query. It is used for short transactions.OLTP requires mechanisms such as concurrency and recovery control Handles a large number of small transactions on a day to day basis . the size is few gigabytes to hundred of gigabytes OLAP(DW) It is a market-orientated process Current/Historical data It used the star and snow flake schema model for designing the database. The DW has read only access with Complex query. It does not support data access for day to day operations. Handles large volumes of data in multiple databases More than terabytes of data.

OLTP VS OLAP OLTP Can have fast response time. Oracle ,My SQL, SQL server,DB2.(RDBMS) This is used for operational processing and transaction processing. Operation- INSERT, DELETE and UPDATE commands Short and fast updates No of users- Thousands of users allowed at a time Single transaction. OLAP(DW) Can have longer response time Tableau, Power BI, and SAP. DW IS USED FOR information processing and analysis. Data warehouses can be expensive to build Strong technical knowledge and experience is required Operation- SELECT command Updates are scheduled and done periodically No of users- Only a few users allowed at a time Aggregated transaction.

Advantages of DW Faster and more efficient data analysis – With a data warehouse, data can be processed and analyzed much more quickly than with traditional methods. This means that companies can get the information they need faster, which can help them make better decisions. Better decision-making – By using a data warehouse, companies can gather and analyze more data than they could before. This allows them to make more informed decisions based on a deeper understanding of their business. Improved data quality – Data warehouses are designed to ensure that data is consistent and accurate. This means that companies can trust the information they are using to make decisions. Increased data accessibility – Because data warehouses are organized in a way that makes it easy to access information, companies can quickly find the data they need. This can save time and improve productivity. Cost savings – By using a data warehouse, companies can save money on storage and processing costs. This is because data warehouses are designed to be more efficient than traditional storage methods.

Applications Banking – With the help of a data warehouse bankers can manage all their available resources more effectively as well as analyze consumer data, market trends, government regulations and reports, and more importantly financial decision-making. Finance- analysis of customer expenses that enable them to take better strategies to maximize profits. Education - It requires DW to have a comprehensive(integrated) view of their students, faculty and staff, resource data, etc. Universities use warehouses for extracting information used for the proposal of research grants, understanding their student demographics, and human resource management. The entire financial department of most universities depends on data warehouses, inclusive of the Financial aid department.

Applications2 4. Government – The government utilizes the warehouses for research in compliance(approval), whereas the state government uses it for services related to human resources like recruitment, and accounting like payroll management. The government uses data warehouses to maintain and analyze tax records, health policy records 5. Healthcare – One of the most important sector which utilizes data warehouses is the Healthcare sector. All of their financial, clinical, and employee records are fed to warehouses as it helps them to strategize and predict outcomes, track and analyze their service feedback, generate patient reports, share data with tie-in insurance companies, medical aid services, etc

Application 3 6. Manufacturing and Distribution Industry – They also use them for product shipment records, records of product portfolios(stock), identifying profitable product lines, analyzing previous data, and customer feedback to evaluate the weaker product lines and eliminate them. 7. Retailers – Retailers are the mediators between wholesalers and customers. They have to maintain the records of both parties to ensure their existence in the market. They also analyze sales to determine fast-selling and slow-selling product lines. 8. Services Sector- In the service sector, DW is used for maintaining customer details, financial records, revenue patterns, customer profiling, resource management, and human resources 9. Insurance – In the insurance sector, DW is used to maintain existing customer records and analyze them to boost decision-making of positive outcomes.

Note – DW improves the decision-making process of a business and boosts organizational performance. Staging Area: The data from the data source layer doesn’t directly enter the storage area. It needs to be cleaned and processed. So, data enters the staging area, a temporary transformation repository. If your data warehouse design allows data transformation within the targeted database, which could be your DWH or a data mart, then this staging area is absent. A Staging Area makes data cleansing and consolidation from operational data Source easier, especially for corporate data warehouses that consolidate all of an organization’s important data. OR Since the data, extracted from the external sources does not follow a particular format, there is a need to validate this data to load into Datawarehouse. For this purpose, it is recommended to use the ETL tool.

Summary data Summary data is generated by the warehouse manager. It updates as new data loads into the warehouse. This component can include lightly or highly summarized data. Its main role is to speed up query performance.

Types of DW Enterprise(particular company) Data Warehouse (EDW) – is a centralized warehouse. It provides decision support services across the enterprise. It offers a unified(integrated) approach to organizing and representing data. It also provides the ability to classify data according to the subject and give access according to those divisions.

Operational Data Store – It is nothing but a data store required when neither DW nor OLTP System supports organization reporting needs. It can sit in between data sources and enterprise data warehouses In ODS DW is refreshed in real time. Hence it is widely used for routine activities like storing records of the employees. OR This type of data warehouse refreshes in real time. It is often preferred for routine activities like storing employee records. It is required when data warehouse systems do not support the reporting needs of the business.

An operational data store is a staging environment for storing and preparing data for operational and analytical uses. It’s referred to as a staging area because it’s not usually the final destination for an organization’s data. ODS is a repository containing a snapshot of the most current view of consolidated data from multiple transactional systems, making it ideal for performing operational reporting

Data Mart Data Mart - A data mart is a simple form of data warehouse focused on a single subject or line of business . With a data mart, teams can access data and gain insights faster, because they don't have to spend time searching within a more complex data warehouse or manually aggregating data from different sources. It is a data store that is designed for a particular department of an organization, or data mart is a subset of DW that is usually oriented to a specific purpose. i . Reasons for creating Data Mart – Easy access of frequent data Improved end user’s response time Easy creation of data mart Less cost Less load on data warehouse

Data mart is invented because of server down problem in big collection of data warehouse. Or because of less load on data warehouse.

Data Mart -

Needs of data mart Data Mart helps to enhance user response time due to a reduction in the volume of data It provides easy access to frequently requested data. Data marts are simpler to implement when compared to corporate Datawarehouse. At the same time, the cost of implementing a Data Mart is certainly lower compared with implementing a full data warehouse. Compared to a Data Warehouse, a data mart is agile. In case of a change in model, DataMart can be built quicker due to a smaller size.

Types of data mart 1) Dependent Data Mart – Dependent Data Mart is created by extracting the data from a central repository, Datawarehouse. First data warehouse is created by extracting data (through the ETL tool) from external sources and then a data mart is created from the data warehouse. A dependent data mart is created in the top-down approach of data warehouse architecture. This model of data mart is used by big organizations. 2 ) Independent data mart- An Independent Data Mart is created directly from external sources instead of the data warehouse. A first data mart is created by extracting data from external sources and then a data warehouse is created from the data present in the data mart. An Independent data mart is designed in a bottom-up approach to data warehouse architecture. This model of data mart is used by small organizations and is cost-effective comparatively.

3. Hybrid Data Mart- This type of Data Mart is created by extracting data from an operational source or from data warehouse. 1Path reflects accessing data directly from external sources and 2Path reflects dependent data model of data mart.

Advantages of DM Data marts contain a subset of organization-wide data. This Data is valuable to a specific group of people in an organization. It is a cost-effective alternative to a data warehouse , which can take high costs to build. Data Mart allows faster access to Data. Data Mart is easy to use as it is specifically designed for the needs of its users. Thus a data mart can accelerate business processes.123 4VG6B Data Marts need less implementation time compared to Data Warehouse systems. It is faster to implement Data Mart as you only need to concentrate on the subset of the data. It contains historical data which enables the analyst to determine data trends.

Disadvantages- Many times enterprises create too many disparate and unrelated data marts without much benefit. It can become a big hurdle(barrier) to maintain. Data Mart cannot provide company-wide data(huge data) analysis as their data set is limited.

Top Down and Bottom up Development Methodology

1.Top down development methodology-

Top Down Development Methodology Data Warehouse design approaches are very important aspect of building data warehouse. Selection of right data warehouse design could save lot of time and project cost. There are two different Data Warehouse Design Approaches normally followed when designing a Data Warehouse solution and based on the requirements of your project you can choose which one suits your particular scenario. These methodologies are a result of research from Bill I nmon and Ralph Kimball.

Top down development methodology 1. Bill Inmon – Top-down Data Warehouse Design Approach- “Bill Inmon ” is sometimes also referred to as the “father of data warehousing”; his design methodology is based on a top-down approach. In the top-down approach, the data warehouse is designed first and then data mart are built on top of data warehouse. Below are the steps that are involved in top-down approach: Data is extracted from the various source systems. The extracts are loaded and validated in the stage area. Validation is required to make sure the extracted data is accurate and correct.

At this step, you will apply various aggregation, and summarization techniques on extracted data and load it back to the data warehouse. On top of DW we create the data mart as per need.

Advantages of Top Down Methodology Also, this model is considered as the strongest model for business changes. That’s why, big organizations prefer to follow this approach. Creating data mart from data warehouse is easy.

Disadvantages The cost, time taken in designing and its maintenance is very high.

Ralph Kimball – Bottom-up Data Warehouse Design Approach Ralph Kimball – Bottom-up Data Warehouse Design Approach As per this method, data marts are first created to provide the reporting and analytics capability for specific business processes, later with these data marts enterprise data warehouse is created. Basically, the Kimball model reverses the Inmon model i.e. Data marts are directly loaded with the data from the source systems and then the ETL process is used to load into Data Warehouse. The above image depicts how the top-down approach works.

2.Bottom up Development Methodology-

First, the data is extracted from external sources (same as happens in top-down approach). Then, the data go through the staging area (as explained above) and loaded into data marts instead of data warehouse. The data marts are created first and provide reporting capability. It addresses a single business area. These data marts are then integrated into data warehouse. This approach is given by Kimball as – data marts are created first and provides a thin view for analyses and data warehouse is created after complete data marts have been created.

advantages As the data marts are created first, so the reports are quickly generated. We can accommodate more number of data marts here and in this way data warehouse can be extended. Also, the cost and time taken in designing this model is low comparatively.

disadvantages This model is not strong as top-down approach as dimensional view of data marts is not consistent as it is in above approach.

Tools for Data Warehouse Development Amazon Redshift Microsoft Azure Snowflake Big Query Microsoft Azure Apache Hive etc.

Data Warehouse Life Cycle

Data Warehouse Development Life Cycle

Requirement Specification: It is the first step in the development of the Data Warehouse and is done by business analysts. In this step, Business Analysts prepare business requirement specification documents. More than 50% of requirements are collected from the client side and it takes 3-4 months to collect all the requirements. After the requirements are gathered, the data modeler starts recognizing the dimensions, facts & and combinations based on the requirements. We can say that this is an overall blueprint of the data warehouse. But, this phase is more about determining business needs and placing them in the data warehouse.

Data Modelling: This is the second step in the development of the Data Warehouse. Data Modelling is the process of visualizing data distribution and designing databases by fulfilling the requirements to transform the data into a format that can be stored in the data warehouse. For example, whenever we start building a house, we put all the things in the correct position as specified in the blueprint. That’s what data modeling is for data warehouses. Data modeling helps to organize data, and creates connections between data sets, and it’s useful for establishing data compliance and security that line up with data warehousing goals. It is the most complex phase of data warehouse development. And, there are many data modeling techniques that businesses use for warehouse design. There are three data models for data warehouses: Star Schema Snowflake Schema Galaxy Schema.

ELT Design and Development: This is the third step in the development of the Data Warehouse. ETL or Extract, Transfer, Load tool may extract data from various source systems and store it in a data lake. An ETL process can extract the data from the lake, after that transform it and load it into a data warehouse for reporting. we need ELT tools. This is where ETL tools like SAS Data Management, IBM Information Server, Hive, etc. come into the picture. A good ETL process can be helpful in constructing a simple yet functional data warehouse that’s valuable throughout every layer of the organization.

OLAP Cubes: This is the fourth step in the development of the Data Warehouse. An OLAP cube, also known as a multidimensional cube or hypercube, is a data structure that allows fast analysis of data according to the multiple dimensions that define a business problem. A data warehouse would extract information from multiple data sources and formats like text files, excel sheets, multimedia files, etc. The extracted data is cleaned transformed and loaded into an OLAP server (or OLAP cube) where information is pre-processed in advance for further analysis.

UI Development: This is the fifth step in the development of the Data Warehouse. So far, the processes discussed have taken place at the backend. There is a need for a user interface for how the user and a computer system interact, in particular the use of input devices and software, to immediately access the data warehouse for analysis and generating reports. The main aim of a UI is to enable a user to effectively manage a device or machine they’re interacting with. There are plenty of tools in the market that help with UI development. BI tools like Tableau or PowerBI for those using BigQuery are great choices.

Maintenance: This is the sixth step in the development of the Data Warehouse. In this phase, we can update or make changes to the schema and data warehouse’s application domain or requirements. Data warehouse maintenance systems must provide means to keep track of schema modifications as well, for instance, modifications. At the schema level, we can perform operations for the Insertion, and change dimensions and categories. Changes are, for example, adding or deleting user-defined attributes.

Test and Deployment: This is often the ultimate step in the Data Warehouse development cycle. Businesses and organizations test data warehouses to ensure whether the required business problems are implemented successfully or not. Warehouse testing involves the scrutiny(filter) of enormous(very big) volumes of data. Data that has to be compared comes from heterogeneous data sources like relational databases, flat files, operational data, etc. The overall data warehouse project testing phases include: Data completeness, Data Transformation, Data is loaded by means of ETL tools, Data integrity, etc. After testing the data warehouse, we deployed it so that users could immediately access the data and perform analysis. Basically, in this phase, the data warehouse is turned on and lets the user take the benefit of it. At the time of data warehouse deployment, most of its functions are implemented. The data warehouses can be deployed at their own data center or on the cloud.

Kimball Lifecycle Diagram1 The Kimball Lifecycle methodology was conceived(idea) during the mid-1980s by members of the Kimball Group and other colleagues at Metaphor Computer Systems, a pioneering decision-support company. Since then, it has been successfully utilized by thousands of data warehouse and business intelligence (DW/BI) project teams across virtually every industry, application area, business function, and technical platform. Originally referred to as the Business Dimensional Lifecycle approach, this name covered our method’s core principles: Focus on adding business value(need) across the enterprise Dimensionally structure the data that are delivered to the business Iteratively develop the DW/BI environment in manageable lifecycle increments rather than attempting a galactic Big Bang approach

When the approach was first published in the 1990s, most alternative approaches weren’t emphasizing(applying) these Kimball principles. However, since then, they’ve been broadly adopted and have become mainstream industry best practices. The Kimball Lifecycle approach is illustrated in the following diagram. It provides an overall roadmap depicting the sequence of high-level tasks required for successful DW/BI projects.

Kimball Lifecycle Diagram As per Kimball Lifecycle, we start building a data warehouse with an understanding of business requirements and determining how best to add value to the organization. The organization must agree on what the value of this data is before deciding to build a data warehouse to hold it. Once the requirements are gathered, the implementation phase begins with design steps across three different tracks – technology, data, and BI applications. Once we are done with this implementation, the Lifecycle comes back together to deploy the query tools, reports, and applications to the user community. The incremental approach of the Lifecycle helps to deliver business value in a short span of time and at the same time helps to build an enterprise-wide information resource in the long term.

1. Program/Project Planning and Management: The first box on the roadmap focuses on getting the program/project launched, including scoping, justification, and staffing. Throughout the Lifecycle, ongoing program and project management tasks keep activities on track.

2 . Business Requirements Definition Success of the project depends on a solid understanding of the business requirements. Understanding the key factors driving the business is crucial for the successful translation of the business requirements into design considerations. There are many methods and tools to ensure requirement gathering is best done – we will not dwell (stay) more into it now. What follows the business requirement definition is three concurrent tracks focusing on Technology Data Business intelligence applications

or Thoroughly understand the business requirements and define the scope of the data warehouse project. Create a detailed list of data dimensions, facts, and key performance indicators (KPIs) to drive the design.

3 . Technical Architecture Design The objective here is to finalize the overall architectural framework and vision. We do this based on considering business requirements, the current technical environment, and the planned strategic technical directions of the organization. Based on the technical architecture, we do: Evaluation and selection of Products that will deliver needed capabilities Hardware platform Database management system Extract-transformation-load (ETL) tools Data access query tools Reporting tools must be evaluated Installation of selected products/components/tools Testing of installed products to ensure appropriate end-to-end integration within the data warehouse environment.

3. Data track – Dimensional modelling Data tracking primarily deals with the design of the dimensional model. Dimensional modeling is a vast subject area comprising of many methods, suggestions, and best practices. Here, a detailed data analysis of a single business process is performed to identify the fact table granularity, associated dimensions and attributes, and numeric facts. The primary constructs of a dimensional model are fact tables and dimension tables. Two important design methods in dimension modeling are – star schema and snowflake schema. Star schema contains a central fact table directly connected to the dimension table. The snowflake schema is a variation of the star schema used in a data warehouse. The snowflake schema is a more complex schema than the star schema because the tables that describe the dimensions are normalized

Or Dimensional modeling- Design the dimensional model based on Kimball's star schema approach to optimize query performance and facilitate analysis. Identify the core dimensions, hierarchies, and relationships to represent the business data accurately.

4. Physical design We start with a logical data model definition and then proceed to the physical design of the model. In physical design, we define the data structures. Some key activities include: setting up the database environment setting up appropriate security preliminary performance tuning strategies, from indexing to partitioning and aggregations. OLAP databases are also designed during this process.

5. ETL design and development This is by far the most important stage of the lifecycle and it takes 70% of the effort to achieve. In this stage, we perform extraction, transformation, and loading (ETL) of source data into the target models. Raw data is extracted from the operational source systems and is transformed into meaningful information for the business. Data quality conditions are continuously monitored. Kimball calls ETL a “data warehouse back room”. ETL system strives to deliver high throughput, as well as high-quality output.

6. BI Application track BI applications deliver business value from the DW/BI solution, rather than just delivering the data. The goal is to deliver capabilities that are accepted by the business to support and enhance their decision-making. First, we start to identify the candidate BI applications and appropriate navigation interfaces to address the users’ needs and needed capabilities. This will give the desired specification of the tool required. Later, we configure the business metadata and tool infrastructure. This is followed by the construction and validation of the specified analytic and operational BI applications and the navigational portal.

7. Deployment Deployment should be deferred(stop till) until all the pieces such as training, documentation, and validated data are not ready for production release. Also, it is critical that deployment be well orchestrated(neatly set up) and adequately( satisfactorily ) planned. Deployment should ensure the results of technology, data, and BI application tracks are tested and fit together properly. While deploying the solution, appropriate education and support infrastructure has to be in place.

8. Maintenance Maintenance begins once the system is deployed into production. Maintenance work ensures ongoing support, education, and communication with business users. Also, technical operational tasks that are necessary to keep the system performing optimally are conducted as needed. Usage monitoring, performance tuning, index maintenance, and system backup are done periodically by technical experts.

9. Growth process The organization has reason to be happy if the data warehouse system tends to grow. DW growth is considered a symbol of success. There may be new requests which require attention. The key here is to build upon the foundation that has already been established.

Or Deployment, Maintenance, and Growth: The three Lifecycle tracks converge at deployment, bringing together the technology, data, and BI applications. The deployed iteration enters a maintenance phase , while growth is addressed by the arrow back to project planning for the next iteration of the DW/BI system.

Tools for data warehouse development Amazon Redshift- The Amazon Redshift platform is a fully managed cloud-based tool data warehouse designed to store and analyze large-scale data using SQL queries with Business Intelligence (BI) tools like Tableau, Microsoft Power BI, etc. It is a simple, cost-effective tool and is considered a very critical part of Amazon Web Services, one of the most popular cloud computing platforms. This tool was developed by Amazon company. Data analysis can be done with the system in a matter of seconds, which makes it ideal for high-speed data analysis.

2. Microsoft Azure: This is a public cloud computing platform. This tool was launched by Microsoft This tool is used for building, testing, deploying, and managing applications and services through a Microsoft-managed data center. This tool offers infrastructure as a service (IaaS), Platform as a service(PaaS), and software as a service(SaaS) it consists of over 200 different products and cloud services, such as Data Analytics, Virtual Compute, Storage, Virtual Networks, Internet Traffic Manager, Websites, Media Services, Mobile Services, Integration, etc.

3. Google BigQuery BigQuery is a cost-effective data warehousing tool with built-in machine learning capabilities that allows scalable analysis of data. This is a Platform as a Service that makes it easy to query big datasets using super-fast SQL queries. Google Inc. announced BigQuery in 2010 and made it available to users in 2011. It supports automatic data transfer and full access to the stored database. Data scientists who run machine learning or data mining operations may find it the perfect solution since they deal with large datasets. Feature- Cloud Dataflow, Spark, and Hadoop (data tools to handle large-scale processing) make it easy to read and write data into BigQuery . Data in billions of rows can be analyzed to get data insights using BigQuery’s SQL-lite syntax.

4. Snowflake Snowflake is a cloud-based Data Warehouse Tool that provides a faster, easier-to-use, and more flexible framework than other data warehouses. Since Snowflake runs completely in the cloud, it offers a complete SaaS (Software as a Service) architecture. Snowflake simplifies data processing by letting users work (data blending, analysis, and transformations) with varied forms of data structures (structured) using a single language, SQL.

5. Teradata Teradata DWH (Data warehouse) is a simple, cost-effective relational database management system offered by the Teradata organization. For viewing and managing large amounts of data, it is considered one of the best data warehousing tools. It also optimizes database performance through smart in-memory processing at no additional cost. With its ability to consume, analyze, and manage data, it meets all of the integration and ETL (Extract, Transform, and Load) requirements. Business users use it with basic training and query knowledge.

6. SAP (Systems Applications and Products) The SAP Data Warehouse is a data management platform aimed at mapping all business processes in an organization. For reporting and analytics purposes, data from various SAP applications is extracted, consolidated, and made available in a unified format in the data warehouse. Both enterprise IT and line of business users can utilize SAP’s data management features to gain valuable insights from data. As a leading provider of the best business information management solutions, it is one of the best data warehouse tools. It offers open and scalable solutions with data security and governance capabilities.

7. PostgreSQL PostgreSQL is a renowned open-source database management solution known for its reliability, robustness, and performance. It is used as a primary data storage or data warehouse for many applications, including mobile, web, geospatial, and analytics applications. PostgreSQL allows the data warehouse to analyze, transform, model, and deliver the data in a database server, making it both flexible and intelligent. In short, it helps developers to create applications, and managers to create fault-tolerant environments for their data and protect data integrity. It also helps with managing your data, regardless of how large or small the dataset may be.

1. Data warehouse Fundamentals for MCA SPPU.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

1. Data warehouse Fundamentals for MCA SPPU.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77