Data analytics unit 1 aktu updated syllabus new

yogendra2210162 661 views 136 slides Oct 16, 2024
Slide 1
Slide 1 of 136
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136

About This Presentation

Data analytics


Slide Content

Data Analytics (BCS-052) Unit 1 Introduction to Data Analytics

Syllabus Introduction to Data Analytics: Sources and nature of data, classification of data (structured, semi-structured, unstructured), characteristics of data, introduction to Big Data platform, need of data analytics, evolution of analytic scalability, analytic process and tools, analysis vs reporting, modern data analytic tools, applications of data analytics. Data Analytics Lifecycle: Need, key roles for successful analytic projects, various phases of data analytics lifecycle – discovery, data preparation, model planning, model building, communicating results, operationalization.

Data Data is a collection of information that can be used to answer questions and solve business challenges. Data can be organized in the form of charts, tables, or graphs.  It can be made up of facts, numbers, names, figures, or descriptions of things. 

Analytics Analytics is the process of using math and machine learning to find patterns in data sets and gain insights.  Data analytics is a broad field that includes analytics, as well as other processes like collecting and storing data.

Data Analytics Data analytics is the process of analyzing raw data to find patterns, draw conclusions, and make informed decisions.  Data analytics is the collection, transformation, and organization of data to draw conclusions, make predictions, and drive informed decision making.  It's a broad field that uses tools, technologies, and processes to transform data into actionable insights.

Sources of Data The data are collected in the following ways. These are: Primary sources and Secondary sources

Sources of Data (Contd…) 1. Primary Sources Data The data which is Raw, Original, and Extracted directly from the official sources is known as primary data. The data which are collected for the first time by an individual or the group of individuals, institutions or organisation are known as Primary sources of data. This type of data is collected directly by performing techniques such as questionnaires, interviews, and surveys.

Sources of Data (Contd…) 1.1. Interview Method The data collected during this process is through interviewing the target audience by a person called interviewer and the person who answers the interview is known as the interviewee. Some basic business or product related questions are asked and noted down in the form of notes, audio, or video and this data is stored for processing. These can be both structured and unstructured like personal interviews or formal interviews through telephone, face to face, email, etc.

Sources of Data (Contd…) 1.2. Survey Method The survey method is the process of research where a list of relevant questions are asked, and answers are noted down in the form of text, audio, or video. The survey method can be obtained in both online and offline mode like through website forms and email. Then that survey answers are stored for analyzing data. Examples are online surveys are social media polls.

Sources of Data (Contd…) 1.3. Observation Method The observation method is a method of data collection in which the researcher observes the behavior and practices of the target audience using some data collecting tool and stores the observed data in the form of text, audio, video, or any raw formats. In this method, the data is collected directly by posting a few questions on the participants. For example, observing a group of customers and their behavior towards the products.

Sources of Data (Contd…) 1.4. Experimental Method The experimental method is the process of collecting data through performing experiments, research, and investigation. The most frequently used experiment methods are CRD (completely randomized design), RBD (randomized block design), LSD (latin square design), FD (factorial design).

Sources of Data (Contd…) 2. Secondary Sources of Data Secondary data is the data which has already been collected and reused again for some valid purpose. This type of data is previously recorded from primary data, and it has two types of sources: Internal Source and External source. Secondary sources of data consist of published and unpublished records which include government publications, documents and reports.

Sources of Data (Contd…) 2.1. Internal source These types of data can easily be found within the organization such as market record, a sales record, transactions, customer data, accounting resources etc. The cost and time consumption is less in obtaining internal sources.

Sources of Data (Contd…) 2.2. External Source The data which can’t be found at internal organizations and can be gained only through external third-party resources is external source data. The cost and time consumption is more because this contains a huge amount of data. Examples of external sources are Government publications, news publications, Registrar General of India, planning commission, syndicate services, and other non-governmental publications.

Nature of Data The nature of data can be understood on the basis of the class to which it belongs. By nature, data are either quantitative or qualitative. Qualitative Data: which is a group of non-numerical data such as words; sentences mostly focus on behavior and actions of the group. Quantitative Data: which is in numerical forms and can be calculated using different scientific tools and sampling data?

Nature of Data (Contd…) With reference to the types of data their nature is as follows: Numerical Data Descriptive Data Graphic and Symbolic Data Enumerative Data Descriptive Data

Nature of Data (Contd…) Numerical Data: All data in science are derived by measurement and stated in numerical values. Most of their nature is numerical. Descriptive Data: Science is not known for descriptive data. However, qualitative data in sciences are expressed in terms of definitive statements concerning objects. These may be viewed as descriptive data. Here, the nature of data is descriptive. Graphic and Symbolic Data: Graphic and symbolic data are modes of presentation. They enable users to grasp data by visual perception. The nature of data, in these cases, is graphic.

Nature of Data (Contd…) 4. Enumerative Data: Most data in social sciences are enumerative in nature. However, they are refined with the help of statistical techniques to make them more meaningful. They are known as statistical data. 5. Descriptive Data: All qualitative data in social sciences can be descriptive in nature. These can be in the form of definitive statements. However, if necessary, numerical values can be assigned to descriptive statements, which may be reduced to numerical data.

Classification of Data Data classification is the process of organising data into categories that make it easy to retrieve, sort and store for future use. The classification of data makes it easy for the user to retrieve it. Data classification is important for data security and for fulfilling different types of business or personal objectives.

Purpose of Data Classification Systematic classification of data helps organisations to manipulate, track and analyse individual pieces of data. Data professionals have a specific goal when categorising data. The goal affects the approach they take and classification levels, they use.

Why is Data Classification Data classification is used to categorise structured data, but it especially important for getting the most out of unstructured data. Data categorisation also helps to identify duplicate copies of data. Eliminating redundant data contributes to efficient use of storage and maximises data security measures.

Types of Data Classification Three types of data classification: Structured Data Semi-structured Data Unstructured Data

Structured Data Data having a pre-defined structure which can also be categorized as quantitative data and is well-organized defined as Structured Data. Because of having a pre-defined structure-property, data can be organized into tables — columns and rows just like in spreadsheets. Most of the time when data is having relations and can’t store in spreadsheets due to the large size in this case structured data stored in relational databases tables.

Characteristics of Structured Data Data conforms to a data model and has an easily identifiable structure. Data is stored in the form of rows and columns. Data is well-organised so, definition, format and meaning of data is explicitly known. Data resides in fixed fields within a record or file. Similar entities are grouped together to form relations or classes. Entities in the same group have same attribute. Data elements are addressable, so efficient to analyse and process.

Sources of Structured Data SQL Databases Spreadsheets such as Excel OLTP System Online forms Sensors such as GPS Network and Web server logs Medical devices

Advantages of Structured Data Structured data has a well-defined structure that helps in easy storage and access of data. Data mining is easy, i.e., knowledge can be easily extracted from data. Operations such as updating and deleting is easy due to well-structured from of data. Business Intelligence operations such as data warehousing can be easily undertaken. Easily scalable in case there is an increment of data. Ensuring security to data is easy.

Unstructured Data Unstructured data is typically categorized as qualitative rather than quantitative. It doesn’t have a pre-defined structure or specific format. Data that lies in this category are audio, video, images, and text files contents which have different properties for making these data available for analysis and can’t be stored in relational databases tables. So, these are stored in their raw format and analysis is done by applying Image processing, Natural Language Processing, and Machine Learning.

Characteristics of Unstructured Data Data neither conforms to a data model nor has any structure. Data cannot be stored in the form of rows and columns as in databases. Data does not follow any semantic or rules. Data lacks a particular format or sequence. Data has no easily identifiable structure. Due to lack of identifiable structure, it cannot be used by computer program easily.

Sources of Unstructured Data Web pages Images (JPEG, GIF, PNG, etc) Videos Reports Word documents Surveys

Advantages of Unstructured Data Its supports the data which lacks a proper format or sequence. The data is not constrained by a fixed schema. Very flexible due to the absence of schema. Data is portable. It is very scalable. It can deal easily with the heterogeneity of sources. These types of data have a variety of business intelligence and analytics applications.

Disadvantages of Unstructured Data It is difficult to store and manage unstructured data due to lack of schema and structure. Ensuring security to data is a difficult task. Indexing the data is difficult and is error due to unclear structure and not having pre-defined attributes, due to which search results are not very accurate.

Semi-Structured Data Semi-Structured data contains elements of both structured and unstructured, its schema is not fixed as structured data and with the help of metadata (which enables users to define some partial structure or hierarchy), it can be organized to some extent so not unorganized as unstructured data. Metadata includes tags and other markers just like in JSON, XML, or CSV which separates the elements and enforces the hierarchy, but the size of the element varies, and order is not important.

Characteristics of Semi-structured Data Data does not conform to a data model but has some structure. Data cannot be stored in the form of rows and columns as in databases. Similar entities are grouped together and organised in a hierarchy. Entities in the same group may or may not have the same attributes or properties. Size and type of the same attribute in a group may differ. Due to lack of a well-defined structure, it cannot use by computer programs easily.

Sources of Semi-structured Data Emails XML and other markup languages TCP/IP packets Zipped files Integration of data from different sources Web pages

Advantages of Semi-structured Data The data is not constrained by a fixed schema. Flexible, i.e., Schema can be easily changed. Data is portable. It is possible to view structured data as semi-structured data. Its supports users who cannot express their need in SQL. It can deal easily with the heterogeneity of sources.

Disad vantages of Semi-structured Data Lack of fixed, schema makes it difficult to store the data. Interpreting the relationship between data is difficult as there is no separation of the schema and the data. Queries are less efficient as compared to structured data.

Difference Between Structured, Semi-structured, and Unstructured Data Parameters Structured Data Semi-Structured Data Unstructured Data Data Structure The information and data have a predefined organization. The contained data and information have organizational properties- but are different from predefined structured data. There is no predefined organization for the available data and information in the system or database. Technology Used Structured Data words based on relational database tables. Semi-Structured Data works based on Relational Data Framework (RDF) or XML.   Unstructured data works based on binary data and the available characters. Flexibility The data depends a lot on the schema. Thus, there is less flexibility. The data is comparatively less flexible than unstructured data but way more flexible than the structured data. Schema is totally absent. Thus, it is the most flexible of all. Management of Transaction It has a mature type of transaction. Also, there are various techniques of concurrency. It adapts the transaction from DBMS. It is not of mature type.   It consists of no management of transaction or concurrency. Management of Version It is possible to version over tables, rows, and tuples. It is possible to version over graphs or tuples. It is possible to version the data as a whole. Scalability Scaling a database schema is very difficult. Thus, a structured database offers lower scalability. Scaling a Semi-Structured type of data is comparatively much more feasible. An unstructured data type is the most scalable in nature. Performance of Query A structured type of query makes complex joining possible. Semi-structured queries over various nodes (anonymous) are most definitely possible. Unstructured data only allows textual types of queries.

Characteristics of Data Several characteristics in the data such as: 1. Accuracy : The data must conform to actual, real-world scenarios and reflect real-world objects and events. Analysts should use verifiable sources to confirm the measure of accuracy, determined by how close the values with the verified correct information sources. 2. Completeness : Completeness measures the data's ability to deliver all the mandatory values that are available successfully.

Characteristics of Data (Contd…) 3. Consistency: Data consistency describes the data’s uniformity as it moves across applications and networks and when it comes from multiple sources. Consistency also means that the same datasets stored in different locations should be the same and not conflict. Note that consistent data can still be wrong. 4. Timeliness: Timely data is information that is readily available whenever it’s needed. This dimension also covers keeping the data current; data should undergo real-time updates to ensure that it is always available and accessible.

Characteristics of Data (Contd…) 5. Uniqueness: Uniqueness means that no duplications or redundant information are overlapping across all the datasets. No record in the dataset exists multiple times. Analysts use data cleansing and deduplication to help address a low uniqueness score. 6. Validity: Data must be collected according to the organization’s defined business rules and parameters. The information should also conform to the correct, accepted formats, and all dataset values should fall within the proper range.

Introduction to Big Data Platform Big Data Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques and frameworks. Big Data is the technical term used in reference to the vast quantity of heterogeneous datasets. Examples of big data includes Cell phone details, Social media content, Health records, Transactional data, Web searches, Financial documents, Weather information.

Introduction to Big Data Platform (Contd…) Data which are very large in size are called Big Data. Normally, we work on data of size MB (Word Doc, Excel) or maximum GB (Movies, Codes) but data which are Petabytes, is called Big Data. The size of large data can range from several terabytes (1 trillion bytes) to petabytes and even Exabytes. It is the concept of gathering useful insights from such voluminous amounts of structured, semi-structured and unstructured data that can be used for effective decision-making in the business environment.

Sources of Big Data These data come from many sources like: Social networking sites: Facebook, Google, LinkedIn all these sites generate huge amount of data on a day-to-day basis as they have billions of users worldwide. E-commerce site: Sites like Amazon, Flipkart generates huge amount of logs from which users buying trends can be traced. Weather Station: All the weather station and satellite gives very huge data which are stored and manipulated to forecast weather.

Sources of Big Data (Contd…) Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly publish their plans and for this they store the data of its million users. Share Market: Stock exchange across the world generates huge amount of data through its daily transaction.

Applications of Big Data Banking and Securities Communications, Media and Entertainment Healthcare Providers Education Manufacturing and Natural Resources Government Insurance Retail and Wholesale trade Transportation Energy and Utilities

Uses of Big Data Location Tracking Fraud Detection and Handling Advertising Entertainment and Media

Real World Big Data Examples Discovering consumer shopping habits Personalised marketing Fuel optimisation tools for the transportation industry Monitoring health conditions through data from wearables Live road mapping for autonomous vehicles Streamlined media streaming Predictive inventory ordering

Issues with Big Data There are three issues with Big Data, and they are as follows: Low Quality and Inaccurate Data: Low-quality data or inaccurate data quality may lead to inaccurate results or predictions which does nothing apart from wasting time and effort of the individuals. Processing Large Data Sets: Due to a large amount of data, no traditional data management tool or software can directly process because the size of these large datasets are usually in Terabytes which is difficult to process.

Issues with Big Data (Contd…) Integrating Data from a Variety of Sources: Data comes from various types of sources like social media, different websites, captured images/ videos, customer logs, reports created by individuals, newspapers, e-mails, etc. Collecting and integrating various data which are of different types is basically a challenging task .

Big Data Characteristics Volume (Huge Amount of Data) Veracity (Inconsistencies & uncertainty in data) Variety (Different formats of data from various sources) Value (Extract Useful Data) Velocity (High speed of accumulation of data)

Big Data Characteristics (Contd…) Volume Big Data is a vast 'volumes' of data generated from many sources daily, such as business processes, social media platforms, networks, human interactions, and many more. Facebook can generate approximately a billion messages, 4.5 billion times that the "Like" button is recorded, and more than 350 million new posts are uploaded each day. Big data technologies can handle large amounts of data.

Big Data Characteristics (Contd…) 2. Variety Big Data can be structured, unstructured, and semi-structured that are being collected from different sources. Data will only be collected from databases and sheets in the past, But these days the data will comes in array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.

Big Data Characteristics (Contd…) 3. Veracity Veracity means how much the data is reliable. It has many ways to filter or translate the data. Veracity is the process of being able to handle and manage data efficiently. 4. Value Value is an essential characteristic of big data. It is not the data that we process or store. It is valuable and reliable data that we store, process, and analyze.

Big Data Characteristics (Contd…) 5. Velocity Velocity plays an important role compared to others. Velocity creates the speed by which the data is created in real-time. Big data velocity deals with the speed at the data flows from sources like application logs, business processes, networks, and social media sites, sensors, mobile devices, etc.

What is a big data platform? A big data platform is an integrated computing solution that combines numerous software systems, tools, and hardware for big data management. Big data Platform workflow is divided into the following stages: 1. Data Collection 2. Data Storage 3. Data Processing 4. Data Analytics

What is a big data platform? (Contd…) 5. Data Management and Warehousing 6. Data Catalog and Metadata Management 7. Data Observability 8. Data Intelligence

Characteristics of a Big Data Platform 1. Ability to accommodate new applications and tools depending on the evolving business needs. 2. Support several data formats. 3. Ability to accommodate large volumes of streaming data. 4. Have a wide variety of conversion tools to transform data to different preferred formats. 5. Capacity to accommodate data at any speed. 7. The ability for quick deployment. 8. Have the tools for data analysis and reporting requirements.

Different Types of Big Data Platforms and Tools Hadoop Delta Lake Migration Platform Data Catalog and Data Observability Platform Data Ingestion and Integration Platform Big Data and IoT Analytics Platform Data Discovery and Management Platform Cloud ETL Data Transformation Platform

Need/Importance of Big Data Reduction in cost Time reduction New product development in optimized offer Well groomed decision making

Challenges of Big Data Rapid data growth: The growth velocity at such a high rate creates a problem to look for insights using it. There no 100% efficient way to filter out relevant data. Storage: The generation of such a massive amount of data needs space for storage, and organizations face challenges to handle such extensive data without suitable tools and technologies.

Challenges of Big Data (Contd…) Unreliable data: It cannot be guaranteed that the big data collected and analyzed are totally (100%) accurate. Data security: Firms and organizations storing such massive data (of users) can be a target of cybercriminals, and there is a risk of data getting stolen. Hence, encrypting such data is also a challenge for firms and organizations.

Data analytics Data analytics is the process of examining datasets to find trends and draw conclusions about the information they contain. Data analytics technologies and techniques are widely used in commercial industries to enable organisations to make more-informed business decisions. Scientists and researchers also use analytics tools to verify or disprove scientific models, theories and hypotheses.

Need of Data Analytics Data analytics is important for many reasons, including: Informed decision-making: Data analytics helps businesses make better decisions by providing a holistic view of their performance and identifying opportunities for improvement. Improved customer experience: Data analytics can help businesses understand their customers' preferences and needs, which can lead to personalized experiences and better customer outcomes Fraud detection and security: Data analytics can help businesses identify suspicious activity and minimize risk. Healthcare: Data analytics can help healthcare professionals make evidence-based decisions about patient care, disease diagnosis, and treatment optimization.

Analytic Scalability Analytic scalability is the ability to use data to understand and solve a large variety of problems. And because problems come in many forms, analytics must be flexible enough to address problems in different ways. This might include the use of statistical tools and forecasting.

Evolution of Analytic Scalability The amount of data, organizations process continues to increase. For instance, it is estimated that Walmart collects more than 2.5 petabytes of data every hour from its customer transactions. A petabyte is the equivalent of about 20 million filing cabinets ‘worth of text. An exabyte is 1,000 times that amount, or one billion gigabytes.

Data Analytic Process The collection, transformation, and organization of data to draw conclusions make predictions for the future and make informed data-driven decisions is called Data Analysis. The profession that handles data analysis is called a Data Analytic.

Data Analytic Process (Contd…) Six steps of data analytic process: Define the Problem or Research Question Collect Data Data Cleaning Analyzing the Data Data Visualization Presenting Data

Data Analytic Process (Contd…) Define the Problem or Research Question • Data analyst is given a problem/business task. • The analyst has to understand the task and the stakeholder’s expectations for the solution. • A stakeholder is a person that has invested their money and resources to a project. • Questions to ask yourself for the Ask phase are: 1. What are the problems that are being mentioned by my stakeholders? (The analyst must find the root cause of the problem to fully understand the problem) 2. What are their expectations for the solutions? (The analyst must be able to ask different questions to find the right solution to their problem.)

Data Analytic Process (Contd…) 2. Collect Data The data has to be collected from various sources like Internal or External Sources. • Internal data is the data available in the organization that you work for while external data is the data available in sources other than your organization. • The data that is collected by an individual from their own resources is called First-Party Data. • The data that is collected and sold is called Second-Party Data. • Data that is collected from outside sources is called Third-Party Data. • The common sources from where the data is collected are Interviews, Surveys and Questionnaires. • The collected data can be stored in a spreadsheet or SQL database. • The best tools to store the data are MS Excel or Google Sheets in the case of Spreadsheets and there are so many databases like Oracle, Microsoft to store the data

Data Analytic Process (Contd…) 3. Data Cleaning (Clean and Process Data) Clean data means data that is free from misspellings and redundancies. There are different functions provided by SQL and Excel to clean the data and formatted data helps in finding trends and solutions. The most important part of the Process phase is to check whether your data is biased or not. Bias is an act of favoring a particular group/community while ignoring the rest. Biasing is a big no-no as it might affect the overall data analysis. The data analyst must make sure to include every group while the data is being collected.

Data Analytic Process (Contd…) 4. Analyzing the Data • The cleaned data is used for analyzing and identifying trends. • It also performs calculations and combines data for better results. • The tools used for performing calculations are Excel or SQL. • Using Excel, we can create pivot tables and perform calculations while SQL creates temporary tables to perform calculations. • Programming languages are another way of solving problems for data analysis is R and Python.

Data Analytic Process (Contd…) 5. Data Visualization • The data now transformed has to be made into a visual (chart, graph). • The reason for making data visualizations is that there might be people, mostly stakeholders that are non-technical. • Visualizations are made for a simple understanding of complex data. Tableau and Looker are the two popular tools used for data visualizations. • Tableau is a simple drag and drop tool that helps in creating visualizations. • Looker is a data viz tool that directly connects to the database and creates visualizations.

Data Analytic Process (Contd…) 6. Presenting the Data Presenting the data involves transforming raw information into a format that is easily comprehensible and meaningful for various stakeholders. This process encompasses the creation of visual representations, such as charts, graphs, and tables, to effectively communicate patterns, trends, and insights gleaned from the data analysis. The goal is to facilitate a clear understanding of complex information, making it accessible to both technical and non-technical audiences.

Types/Levels/Methods of Data Analytics Descriptive Data Analysis Descriptive analytics looks at the past performance and understands the performance by mining historical data to understand the cause of success or failure in the past. Almost all management reporting such as sales, marketing, operations, and finance uses this type of analysis. Descriptive analytics is used when the organisation has a large dataset on past events or historical events. “What happened?” Or “What was the trend?”

Types/Levels/Methods of Data Analytics (Contd…) 2. Diagnostic Analytics It is the process of using the data to understand the underlying reasons behind past events, trends and outcomes to answer. “Why did this happen?” 3. Predictive Analytics It is the process of applying statistical and machine learning techniques on historical data to make prediction to answer. “What might happen in the future?”

Types/Levels/Methods of Data Analytics (Contd…) 4. Prescriptive Analytics It is the process of using the data to recommend actions in response to a given forecast to optimized desired outcomes. “What should we do?”

Data Analytic Tools Data analytic have many types of tools: Tableau Public OpenRefine KNIME RapidMiner Google Fusion Tables NodeXL Wolfram Alpha Google Search Operators Solver Dataiku DSS

Data Analytic Tools (Contd…) 1. Tableau Public Tableau, one of the top 10 Data Analytics tools, is a simple tool which offers data visualization. With Tableau’s visuals, you can investigate a hypothesis, explore the data, and cross-check your insights.

Data Analytic Tools (Contd…) Uses of Tableau Public 1. You can publish interactive data visualizations to the web for free. 2. No programming skills required. 3. Visualizations published to Tableau Public can be embedded into blogs and web pages and be shared through email or social media. The shared content can be made availables for downloads. Limitations of Tableau Public Data size limitation

Data Analytic Tools (Contd…) 2. OpenRefine Formerly known as GoogleRefine, the data cleaning software that helps you clean up data for analysis. It operates on a row of data which have cells under columns, quite like relational database tables. Uses of OpenRefine Cleaning messy data Transformation of data

Data Analytic Tools (Contd…) Limitations of OpenRefine Refine does not work very well with big data. 3. KNIME KNIME, ranked among the top Data Analytics tools helps you to manipulate, analyze, and model data through visual programming. It is used to integrate various components for data mining and machine learning via its modular data pipelining concept.

Data Analytic Tools (Contd…) Uses of KNIME Rather than writing blocks of code, you just have to drop and drag connection points between activities. This data analysis tool supports programming languages. In fact, analysis tools like these can be extended to run text mining, python, and R. Limitation of KNIME Poor data visualization

Data Analytic Tools (Contd…) 4. RapidMiner RapidMiner provides machine learning procedures and data mining including data visualization, processing, statistical modeling, deployment, evaluation, and predictive analytics. RapidMiner, counted among the top 10 Data Analytics tools, is written in the Java and fast gaining acceptance. Uses of RapidMiner It provides an integrated environment for business analytics, predictive analysis, text mining, data mining, and machine learning. Along with commercial and business applications, RapidMiner is also used for application development, training, education, and research.

Data Analytic Tools (Contd…) Limitations of RapidMiner RapidMiner has size constraints with respect to the number of rows. For RapidMiner, you need more hardware resources. 5. Google Fusion Tables An incredible tool for data analysis, mapping, and large dataset visualization, Google Fusion Tables can be added to business analytics tools list. Ranked among the top 10 Data Analytics tools, Google Fusion Tables is fast gaining popularity.

Data Analytic Tools (Contd…) Uses of Google Fusion Tables 1. Visualize bigger table data online. 2. Filter and summarize across hundreds of thousands of rows. Limitations of Google Fusion Tables Only the first 100,000 rows of data in a table are included in query results or mapped. The total size of the data sent in one API call cannot be more than 1MB.

Data Analytic Tools (Contd…) 6. NodeXL NodeXL is a free and open-source network analysis and visualization software. Ranked among the top 10 Data Analytics tools, it is one of the best statistical tools for data analysis which includes advanced network metrics, access to social media network data importers, and automation. Uses of NodeXL This is one of the best data analysis tools in Excel that helps in: 1. Data Import 2. Graph Visualization 3. Graph Analysis 4. Data Representation

Data Analytic Tools (Contd…) Limitations of NodeXL 1. Multiple seeding terms are required for a particular problem. 2. Need to run the data extractions at slightly different times. 7. Wolfram Alpha Wolfram Alpha, one of the top 10 Data Analytics tools is a computational knowledge engine founded by Stephen Wolfram. With Wolfram Alpha, you get answers to factual queries directly by computing the answer from externally sourced instead of providing a list of documents or web pages.

Data Analytic Tools (Contd…) Uses of Wolfram Alpha Provides detailed responses to technical searches and solves calculus problems. Helps business users with information charts, graphs and helps in creating topic overviews and high-level pricing history. Limitations of Wolfram Alpha Wolfram Alpha can only deal with the publicly known number and facts, not with viewpoints. It limits the computation time for each query.

Data Analytic Tools (Contd…) 8. Google Search Operators It is a powerful resource that helps you filter Google results instantly to get the most relevant and useful information. Uses of Google Search Operators Fast filtering of Google results. Google is powerful data analysis tool can help discover new information or market research.

Data Analytic Tools (Contd…) 9. Solver The Solver Add-in is a Microsoft Office Excel add-in program that is available when you install Microsoft Excel or Office. Ranked among the best-known Data Analytic tools is a linear programming and optimization tool in excel. It is an advanced optimization tool that helps in quick problem solving.

Data Analytic Tools (Contd…) Uses of Solver It uses a variety of methods, from nonlinear optimization and linear programming and genetic algorithms, to find solutions. Limitations of Solver Poor scaling is one of the areas where Excel Solver lacks. It can affect solution time and quality.

Data Analytic Tools (Contd…) 10. Dataiku DSS Ranked among the top 10 Data Analytic tools, Dataiku is a collaborative data science software platform that helps the team build, prototype, explore, and deliver their own data products more efficiently. Uses of Dataiku DSS It provides an interactive visual interface. This data analytics tool lets you draft data preparation and modulization in seconds.

Data Analytic Tools (Contd…) Limitation of Dataiku DSS Limited visualization capabilities UI hurdles: Reloading of code/datasets Inability to easily compile entire code into a single document/notebook

Analysis vs Reporting Analysis involves data interpreting where reporting involving presenting factual, accurate data. Analysis answers why something is happening based on the data, whereas reporting tells what’s happening. Analysis delivers recommendations, but reporting is more about organizing and summarizing data.

Analysis vs Reporting (Contd…) Analytics Reporting Analytics is the method of examining and analyzing summarized data to make business decisions. Reporting is an action that includes all the needed information and data and is put together in an organized way Questioning the data, understanding it, investigating it, and presenting it to the end users are all part of analytics. Identifying business events, gathering the required information, organizing, summarizing, and presenting existing data are all part of reporting. The purpose of analytics is to draw Conclusions based on data. The purpose of reporting is to organize the data into meaningful information. Analytics is used by data analysts, scientists, and business people to make effective decisions. Reporting is provided to the appropriate business leaders to perform effectively and efficiently within a firm.

Modern Data Analytic Tools 1. Apache Hadoop 2. KNIME 3. Open Refine 4. Orange 5. Splunk 6. Talend 7. Power BI 8. Tableau 9. RapidMiner 10. R-programming 11. Data wrapper

Modern Data Analytic Tools (Contd…) Apache Hadoop Apache Hadoop, a Big Data analytics tool which is a Java based free software framework. It helps in effective storage of huge amount of data in a storage place known a cluster. There is a storage system in Hadoop popularly known as the Hadoop Distributed File System (HDFS), which helps to splits the large volume of data and distribute across many nodes present in a cluster.

Modern Data Analytic Tools (Contd…) 2. KNIME KNIME Analytics Platform is the open-source software for creating data science. KNIME analytics platform is one of the leading open solutions for data-driven innovation. This tool helps in discovering the potential and hidden in a huge volume of data., it also performs mine for fresh insights or predicts the new futures.

Modern Data Analytic Tools (Contd…) 3. Open Refine Open Refine tool is one of the efficient tools to work on the messy and large volume of data. It includes cleansing data, transforming that data from one format another. It helps to explore large datasets easily.

Modern Data Analytic Tools (Contd…) 4. Orange Orange is famous data visualisation and helps in data analysis for beginner and as well to the expert. It provides a clean, open-source platform. 5. Splunk It is a platform used to search, analyse, and visualise the machine-generated data gathered from applications, websites, etc. Splunk has evolved products in various fields such as IT, Security, Analytics.

Modern Data Analytic Tools (Contd…) 6. Talend It is one of the most powerful data integration tools available in the market and is developed in the Eclipse graphical development environment. This tool lets you easily manage all the steps involved in the process and aims to deliver compliant, accessible and clean data for everyone.

Modern Data Analytic Tools (Contd…) 7. Power BI It is a Microsoft product used for business analytics. It provides interactive visualizations with self-service business intelligence capabilities, where end users can create dashboards and reports by themselves, without having to depend on anybody.

Modern Data Analytic Tools (Contd…) 8. Tableau It is a market-leading Business Intelligence tool used to analyse and visualise data in an easy format. Tableau allows you to work on live data-set and spend more time on Data Analysis.

Modern Data Analytic Tools (Contd…) 9. RapidMiner RapidMiner tool operates using visual programming and also it is much capable of manipulating, analysing and modeling the data. RapidMiner tools make data science teams easier and productive by using an open-source platform for all their jobs like machine learning, data preparation, and model deployment.

Modern Data Analytic Tools (Contd…) 10. R-programming R is a free open-source software programming language and a software environment for statistical computing and graphics. It is used by data miners for developing statistical software and data analysis. It has become a highly popular tool for Big Data in recent years.

Modern Data Analytic Tools (Contd…) 11. Data wrapper It is an online data visualisation tool for making interactive charts. It uses data file in a csv, pdf or excel format. Data wrapper generate visualisation in the form of bar, line, map etc.

Applications of Data Analytics Data analytics finds applications across various industries and sectors, transforming the way organizations operate and make decisions. Here are some examples of how data analytics is applied in different domains: 1. Healthcare 2. Finance 3. E-commerce 4. Cyber security

Applications of Data Analytics (Contd…) 5. Supply Chain Management 6. Banking 7. Logistics 8. Retail 9. Manufacturing 10. Internet Searching 11. Risk Management

Applications of Data Analytics (Contd…) 1. Healthcare Data analytics is the healthcare industry by enabling better patient care, disease prevention, and resource optimization. For example, hospitals can analyze patient data to identify high-risk individuals and provide personalized treatment plans. Data analytics can also help detect disease outbreaks, monitor the effectiveness of treatments, and improve healthcare operations.

Applications of Data Analytics (Contd…) 2. Finance In the financial sector, data analytics plays a crucial role in fraud detection, risk assessment, and investment strategies. Banks and financial institutions analyze large volumes of data to identify suspicious transactions and optimize investment portfolios. Data analytics also enables personalized financial advice and the development of creative financial products and services.

Applications of Data Analytics (Contd…) 3. E-commerce E-commerce platforms utilize data analytics to understand customer behavior, personalize shopping experiences, and optimize marketing campaigns. By analyzing customer preferences, purchase history, and browsing patterns, e-commerce companies can offer personalized product recommendations, target specific customer segments, and improve customer satisfaction and retention.

Applications of Data Analytics (Contd…) 4. Cyber security Data analytics plays a vital role in cyber security by detecting and preventing cyber threats and attacks. Security systems analyze network traffic, user behavior, and system logs to identify anomalies and potential security breaches.

Applications of Data Analytics (Contd…) 5. Supply Chain Management Data analytics improves supply chain management by optimizing inventory levels, reducing costs, and enhancing overall operational efficiency. Organizations can identify bottlenecks, forecast demand, and improve logistics and distribution processes by analyzing supply chain data. Data analytics also enables better supplier management and enhances transparency throughout the supply chain.

Applications of Data Analytics (Contd…) 6. Banking Banks use data analytics to gain insights into customer behavior, manage risks, and personalize financial services. Banks can tailor their offerings, identify potential fraud, and assess credit worthiness by analyzing transaction data, customer demographics, and credit histories. Data analytics also helps banks to improve regulatory compliance.

Applications of Data Analytics (Contd…) 7. Logistics In the logistics industry, data analytics plays a crucial role in optimizing transportation routes, managing fleet operations, and improving overall supply chain efficiency. Logistics companies can minimize costs, reduce delivery times, and enhance customer satisfaction by analyzing data on routes, delivery times, and vehicle performance. Data analytics also enables better demand forecasting and inventory management.

Applications of Data Analytics (Contd…) 8. Retail Data analytics transforms the retail industry by providing insights into customer preferences, optimizing pricing strategies, and improving inventory management. Retailers analyze sales data, customer feedback, and market trends to identify popular products, personalize offers, and forecast demand. Data analytics also helps retailers enhance their marketing efforts, improve customer loyalty, and optimize store layouts.

Applications of Data Analytics (Contd…) 9. Manufacturing Data analytics is the manufacturing sector by enabling predictive maintenance, optimizing production processes, and improving product quality. Manufacturers can predict equipment failures, minimize downtime, and ensure efficient operations by analyzing sensor data, machine performance, and historical maintenance records. Data analytics also enables real-time monitoring of production lines, leading to higher productivity and cost savings.

Applications of Data Analytics (Contd…) 10. Internet Searching Data analytics powers internet search engines, enabling users to find relevant information quickly and accurately. Search engines analyze vast amounts of data, including web pages, user queries, and click-through rates, to deliver the most relevant search results. Data analytics algorithms continuously learn and adapt to user behavior, providing accurate and personalized search results.

Applications of Data Analytics (Contd…) 11. Risk Management Data analytics plays a crucial role in risk management across various industries, including insurance, finance, and project management. Organizations can assess risks, develop strategies, and make informed decisions by analyzing historical data, market trends, and external factors. Data analytics helps organizations identify potential risks and quantify their impact.

Key Role of Data Analytics Project There are certain key roles that are required for the complete and fulfilled functioning of the data science team to execute projects on analytics successfully. The key roles are: Business User Project Sponsor Project Manager Business Intelligence Analyst Database Administrator Data Engineer Data Scientist

Key Role of Data Analytics Project (Contd…) Business User The business user is the one who understands the main area of the project and is also basically benefited from the results. This user gives advice and consults the team working on the project about the value of the results obtained and how the operations on the outputs are done. The business manager, line manager, or deep subject matter expert in the project mains fulfills this role.

Key Role of Data Analytics Project (Contd…) 2. Project Sponsor The Project Sponsor is the one who is responsible to initiate the project. Project Sponsor provides the actual requirements for the project and presents the basic business issue. He generally provides the funds and measures the degree of value from the final output of the team working on the project. This person introduces the prime concern and brooms the desired output.

Key Role of Data Analytics Project (Contd…) 3. Project Manager This person ensures that key milestone and purpose of the project is met on time and of the expected quality. 4. Business Intelligence Analyst Business Intelligence Analyst provides business domain perfection based on a detailed and deep understanding of the data, key performance indicators (KPIs), key matrix, and business intelligence from a reporting point of view. This person generally creates reports and knows about the data feeds and sources.

Key Role of Data Analytics Project (Contd…) 5. Database Administrator (DBA) DBA facilitates and arranges the database environment to support the analytics need of the team working on a project. His responsibilities may include providing permission to key databases or tables and for making sure that the appropriate security stages are in their correct places related to the data repositories or not.

Key Role of Data Analytics Project (Contd…) 6. Data Engineer Data engineer grasps deep technical skills to assist with SQL queries for data management and data extraction and provides support for data intake into the analytic sandbox. The data engineer works jointly with the data scientist to help build data in correct ways for analysis.

Key Role of Data Analytics Project (Contd…) 7. Data Scientist Data scientist facilitates with the subject matter expertise for analytical techniques, data modelling and applying correct analytical techniques for a given business issues. He ensures overall analytical objectives are met. Data scientists outline and apply analytical methods and proceed towards the data available to the project.

Data Analytics Lifecycle In today’s digital-first world, data is importance. It undergoes various stages throughout its life, during its creation, testing, processing, consumption, and reuse. Data Analytics Lifecycle maps out these stages for professionals working on data analytics projects. Primarily it has 6 stages. Phase 1: Data Discovery Phase 2: Data Preparation Phase 3: Model Planning Phase 4: Model Building Phase 5: Communication and Publication of Results Phase 6: Operationalize/Measuring of Effectiveness

Data Analytics Lifecycle (Contd…)

Data Analytics Lifecycle (Contd…) Phase 1: Data Discovery The data science team is learns and investigates the problem. Create context and gain understanding. Learn about the data sources that are needed and accessible to the project. The team produces an initial hypothesis, which can be later confirmed with evidence.

Data Analytics Lifecycle (Contd…) Phase 2: Data Preparation Methods to investigate the possibilities of pre-processing, analysing, and preparing data before analysis and modelling. It is required to have an analytic sandbox. The team performs, loads, and transforms to bring information to the data sandbox. Data preparation tasks can be repeated and not in a predetermined sequence. Some of the tools used commonly for this process include - Hadoop, Open Refine, etc.

Data Analytics Lifecycle (Contd…) Phase 3: Model Planning The team studies data to discover the connections between variables. Later, it selects the most significant variables as well as the most effective models. In this phase, the data science teams create data sets that can be used for training for testing, production, and training goals. The team builds and implements models based on the work completed in the modelling planning phase. Some of the tools used commonly for this stage are MATLAB and STASTICA.

Data Analytics Lifecycle (Contd…) Phase 4: Model Building The team creates datasets for training, testing as well as production use. The team is also evaluating whether its current tools are sufficient to run the models or if they require an even more robust environment to run models. Commercial tools - MATLAB, STASTICA.

Data Analytics Lifecycle (Contd…) Phase 5: Communication Results After executing the model, team members will need to evaluate the outcomes of the model to establish criteria for the success or failure of the model. The team is considering how best to present findings and outcomes to the various members of the team and other stakeholders while taking into account warning and assumptions. The team should determine the most important findings, quantify their value to the business and create a narrative to present findings and summarize them to all stakeholders.

Data Analytics Lifecycle (Contd…) Phase 6: Operationalize The team distributes the benefits of the project to a wider audience. It sets up a pilot project that will deploy the work in a controlled manner prior to expanding the project to the entire enterprise of users. This technique allows the team to gain insight into the performance and constraints related to the model within a production setting at a small scale and then make necessary adjustments before full deployment. The team produces the last reports, presentations, and codes. Open source or free tools such as WEKA, SQL and MADlib.

Need of Data Analytics Life Cycle The Data Analytics Lifecycle outlines how data is created, gathered, processed, used, and analyzed to meet corporate objectives. It provides a structured method of handling data so that it may be transformed into knowledge that can be applied to achieve organizational and project objectives. The process offers the guidance and techniques needed to extract information from the data and move forward to achieve corporate objectives.

Need of Data Analytics Life Cycle (Contd…) Data analysts use the circular nature of the lifecycle to go ahead or backward with data analytics. They can choose whether to continue with their current research and conduct a fresh analysis considering the recently acquired insights. Their progress is guided by the Data Analytics lifecycle.
Tags