Big Data, Metrics and Data classification.pptx

ssuserc12830 8 views 58 slides Sep 17, 2025
Slide 1
Slide 1 of 58
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58

About This Presentation

Data analytics unit-2


Slide Content

Unit-2

Big Data, Metrics and Data classification

large and diverse collections of structured, unstructured, and semi-structured data  grow exponentially over time. Big data are large amounts of (unstructured) data that, through thorough analysis, often bring very interesting information and knowledge to the surface. With this you can optimize your business processes and fuel innovation.

Big data examples Tracking consumer behavior and shopping habits to deliver  hyper-personalized retail product recommendations  tailored to individual customers Monitoring payment patterns and analyzing them against historical customer activity to  detect fraud in real time Combining data and information from every stage of an order’s shipment journey with hyperlocal traffic insights to  help fleet operators optimize last-mile delivery Using AI-powered technologies like  natural language processing to analyze unstructured medical data  (such as research reports, clinical notes, and lab results) to gain new insights for improved treatment development and enhanced patient care Using image data from cameras and sensors, as well as GPS data, to  detect potholes and improve road maintenance in cities Analyzing public datasets of satellite imagery and geospatial datasets to visualize, monitor, measure, and predict  the social and environmental impacts of supply chain operations

Primary benefits Streamlined processes:  with insights your KPIs can turn green Increased productivity:  employees can accomplish much more work in significantly less time Higher customer satisfaction : segmentation enables better understanding of your customers Proactive operations:  shift your organization from reactive to proactive with predictive models Enhanced innovation:  develop and deliver new products and services faster Data-driven decisions:  use hard data to guide decisions, complementing intuition

The Vs of big data Volume enormous amount of data produced from a variety of sources and devices on a continuous basis. Velocity speed at which data is generated. real time or near real time, -processed, accessed, and analyzed at the same rate to have any meaningful impact.  Variety Data is heterogeneous, meaning it can come from many different sources and can be structured, unstructured, or semi-structured. Traditional structured data (such as data in spreadsheets or relational databases) Unstructured text, images, audio, video files semi-structured formats like sensor data that can’t be organized in a fixed data schema. 

Additional V’s veracity ,  variability , and  value .   Veracity : Big data can be messy, noisy, and error-prone, which makes it difficult to control the quality and accuracy of the data. Large datasets can be unwieldy and confusing, while smaller datasets could present an incomplete picture. The higher the veracity of the data, the more trustworthy it is. Variability:  data is constantly changing, inconsistency over time- context and interpretation but also data collection methods based on the information that companies want to capture and analyze. Value: B usiness value of the data - help drive decision-making. 

Types of Big Data: Big data is broadly classified into three main categories:  Structured Data:  Highly organized data that fits neatly into traditional database formats, like relational databases.  Semi-structured Data:  Data with some organizational properties, but not conforming to a strict relational model, like JSON or XML.  Unstructured Data:  Data lacking a predefined format or structure, such as text documents, images, or videos. 

Documents:  examples include emails, quotes, contracts, and text files Photos:  captured using smartphones, cameras, or specialized equipment Videos:  recorded with smartphones, video cameras, or advanced systems Sound Clips:  audio recordings captured through devices like microphones or smartphones Sensor or Machine Data:  generated by devices, machinery, or other automated systems RFID Tags:  data from wristbands or chips embedded in products Social Media Messages:  content created and shared on platforms Log Files:  generated by computers, websites, and other systems

Data Reliability Consistency and Dependability of data emphasizing that the same data ,when collected or measured repeatedly should produce similar results under the same conditions Backbone of data Quality

Different Phases of Analytics Descriptive Analytics, Predictive Analytics Prescriptive Analytics

Descriptive analytics is a branch of data analytics that focuses on summarizing and interpreting historical data to gain insights and understand patterns, trends, and relationships within the data. It involves using various statistical and visualization techniques to describe and present data meaningfully.

Descriptive analytics branch of data analytics focuses on summarizing and interpreting historical data gain insights and understand patterns, trends, and relationships within the data. involves using various statistical and visualization techniques to describe and present data meaningfully.

objective of descriptive analytics provide a clear and concise understanding of what has happened in the past What happened?’, ‘When did it happen?’, ‘How did it happen? descriptive analytics mines historical data to unveil actionable information for decision-making and anomaly detection purposes.

Data collection gather relevant data from various source data could be sourced from databases, spreadsheets, surveys, or other structured or unstructured data repositories Eg : ecommerce company and want to analyze customer purchasing behavior. collect data such as customer IDs, purchase dates, products purchased, quantities, prices, and customer demographics.

Cleaning and preparation involves identifying and resolving issues such as missing values, inconsistencies, duplicates, and outliers. Data cleaning ensures the data is high quality, reliable, and ready for further analysis. transforming the data into a consistent format Eg : identify missing values in the price column or duplicate records. 

Exploration  To understand its characteristics better identify initial patterns or trends.  various techniques such as summary statistics, data visualization, and exploratory data analysis.  Summary statistics - measures such as mean, median, mode, and standard deviation, provide an overview of the data’s central tendencies and dispersion. Data visualization techniques such as charts, graphs, and histograms help visualize the distribution and relationships within the data, -easy to identify patterns or anomalies.

Segmentation dividing the dataset into meaningful subsets based on specific criteria. more focused analysis and helps uncover insights specific to each segment. segmenting customer data by age group can provide insights into different customer segments’ preferences and buying behavior.

Summary and key performance indicators calculating summary measures such as averages, totals, percentages, or ratios relevant to the subject being analyzed Key performance indicators (KPIs) are specific metrics that help evaluate the performance of a business process, product, or service ecommerce data -calculate KPIs such as average order value, conversion rate, or customer retention rate. 

Historical trend analysis to understand how variables or metrics have changed over time. reveals patterns, seasonality, or long-term trends. example, analyzing sales data over several years can reveal sales peaks during certain seasons or identify declining trends in specific product categories. Historical trend analysis helps identify patterns that can enhance decision-making, forecast future performance, and identify areas of improvement.

Data reporting and visualization Reports summarize the analysis and findings, includes summary statistics, visualizations, and narrative descriptions. Reporting and visualization help stakeholders interpret and act upon the insights derived from the data. ecommerce example- visualizations such as line charts showing sales trends over time, a pie chart illustrating sales distribution across different product categories,

Continuous monitoring and iteration ongoing assessment, evaluation, and adaptation of strategies based on changing data insights. monitor sales data, update the analysis periodically, and track changes in purchasing behavior, market trends, or customer preferences. 

Examples of Descriptive Analytics Sales performance analysis Customer segmentation Website analytics Financial analysis

Prescriptive Analytics: type of  data analytics  that attempts to answer the question “What do we need to do to achieve this?” It involves the use of technology to help businesses make better decisions through the analysis of raw data.

Prescriptive analytics Prescriptive analytics is a form of data analytics that tries to answer “What do we need to do to achieve this?” It uses machine learning to help businesses decide a course of action based on a computer program’s predictions. Prescriptive analytics works with predictive analytics, which uses data to determine near-term outcomes. When used effectively, it can help organizations make decisions based on facts and probability-weighted projections instead of conclusions based on instinct. Prescriptive analytics isn’t foolproof—it’s only as effective as its inputs.

Prescriptive analytics involves the use of data, statistical algorithms, and machine learning techniques to determine the best course of action for a given situation. It goes beyond predicting outcomes by recommending actions that can optimize results.

advantages of Prescriptive Analysis Optimized Decision-Making:  The primary advantage of prescriptive analysis is its ability to guide decision-makers toward optimal actions. By evaluating multiple decision options and considering various constraints, organizations can make choices that maximize desired outcomes and align with strategic goals. Enhanced Strategic Planning:  Organizations can use prescriptive analysis to refine and improve their strategic plans. By considering multiple scenarios and assessing the potential impact of different decisions, businesses can develop more robust strategies that are adaptable to changing circumstances. Resource Optimization:  Whether it's allocating budgets, managing inventory, or scheduling workforce resources, prescriptive analysis enables organizations to optimize their resource utilization. This leads to cost savings and ensures that resources are deployed where they are most needed. Risk Mitigation:  Prescriptive analysis helps organizations identify and mitigate risks by evaluating the potential consequences of different decisions. By understanding the impact of uncertainties, businesses can proactively develop strategies to minimize risks and enhance resilience. Faster and Informed Decision-Making:  Prescriptive analysis empowers decision-makers with timely and informed recommendations. This leads to faster decision-making processes as executives can rely on data-driven insights rather than spending prolonged periods analyzing and debating potential courses of action

challenges in Prescriptive Analysis Data Quality and Availability:  Effective prescriptive analysis relies heavily on high-quality, relevant, and up-to-date data. If the data used for analysis is inaccurate, incomplete, or outdated, it can lead to unreliable recommendations and suboptimal decision-making. Changing Business Conditions:  Prescriptive models use historical data and assumptions about future conditions. Rapid changes in the business environment, such as market fluctuations, regulatory changes, or unexpected events, can challenge the accuracy and relevance of these models. Uncertainty and Assumptions:  Prescriptive models are built on assumptions about future events and conditions. Dealing with uncertainties and ensuring that models account for a range of possible scenarios can be challenging, especially in unpredictable environments.

Key Components of Prescriptive Analytics Data Collection and Preparation:  Prescriptive analysis begins with the collection of relevant and high-quality data. This data is then cleaned, organized in a standardized format, and prepared for analysis, ensuring accuracy in the insights derived. Data Modeling:  Before prescribing actions, it's crucial to predict possible outcomes.  Data modeling , often involving machine learning algorithms, establishes a foundation by forecasting various scenarios based on historical data. Optimization:  Algorithms are employed to evaluate multiple decision options and identify the one that maximizes or minimizes a defined objective. This step involves fine-tuning strategies for efficiency and effectiveness. Simulation:  To enhance decision-making, prescriptive analysis often includes simulation models. These models allow organizations to test different scenarios and understand the potential impact of various decisions before implementing them in the real world. Actionable Recommendations:  Prescriptive analysis culminates in providing actionable recommendations. These recommendations empower decision-makers to confidently choose the most advantageous course of action.

How Prescriptive Analytics Works Prescriptive analytics works with another type of data analytics: predictive analytics, which involves the use of  statistics  and modeling to determine future performance, based on current and historical data.  Using predictive analytics’ estimation of what is likely to happen, it recommends what future course to take.

Advantages help prevent fraud, limit  risk , increase  efficiency , meet business goals, and create more loyal customers. help organizations make decisions based on highly analyzed facts Prescriptive analytics can simulate the probability of various outcomes and show the probability of each, helping organizations to better understand the level of risk and uncertainty 

Disadvantages It is only effective if organizations know what questions to ask and how to react to the answers.  This form of data analytics is only suitable for short-term solutions. This means businesses shouldn’t use prescriptive analytics to make any long-term ones.

Examples of Prescriptive Analytics Evaluate whether a local fire department should require residents to evacuate a particular area when a wildfire is burning nearby Predict whether an article on a particular topic will be popular with readers based on data about searches and social shares for related topics Adjust a worker training program in real time based on how the worker is responding to each lesson

Prescriptive Analytics for Hospitals and Clinics analyze which hospital patients have the highest risk of readmission so that  healthcare providers  can do more, via patient education Prescriptive Analytics for Airlines automatically adjusting ticket prices and availability based on numerous factors, including customer demand, weather, and gasoline prices.

Prescriptive Analytics in Banking Create models for  customer relationship management Improve ways to cross-sell and upsell products and services Recognize weaknesses that may result in losses, such as  anti-money laundering (AML) Develop key security and regulatory initiatives like compliance reporting

Prescriptive Analytics in Marketing Marketers can use prescriptive analytics to stay ahead of consumer trends.

Text Analytics process of transforming unstructured text documents into usable, structured data. works by breaking apart sentences and phrases into their components, and then evaluating each part’s role and meaning using complex software rules and machine learning algorithms.

foundation of numerous natural language processing (NLP) features, including named entity recognition, categorization, and sentiment analysis. In broad terms, these NLP features aim to answer four questions: Who is talking? What are they talking about? What are they saying about those subjects? How do they feel?

Text mining describes the general act of gathering useful information from text documents. Text analytics refers to the actual computational processes of breaking down unstructured text documents, such as tweets, articles, reviews and comments, so they can be analyzed further. Natural language processing (NLP) is how a computer understands the underlying meaning of those text documents: who’s talking, what they’re talking about, and how they feel about those subjects.

How does text analytics work? Text analytics starts by breaking down each sentence and phrase into its basic parts. Each of these components, including parts of speech, tokens, and chunks

seven computational steps i Language Identification Tokenization Sentence breaking Part of Speech tagging Chunking Syntax parsing Sentence chaining

Language identification  identifying what language the text is written in. Spanish? Russian? Arabic? Chinese? Each language has its unique rules of grammar  language identification determines the whole process for every other text analytics function.

Tokenization process of breaking apart a sentence or phrase into its component pieces. Tokens are usually words or numbers. tokens can also be: Punctuation (exclamation points  amplify sentiment ) Hyperlinks (https://…) Possessive markers (apostrophes)

Tokenization is language-specific, so it’s important to know which language you’re analyzing. Most alphabetic languages use whitespace and punctuation to denote tokens within a phrase or sentence. Logographic (character-based) languages such as Chinese, however, use other systems.

Sentence breaking small text documents, such as tweets, usually contain a single sentence. But longer documents require  sentence breaking  to separate each unique statement. In some documents, each sentence is separated by a punctuation mark. But some sentences contain punctuation marks that don’t mean the end of the statement (like the period in “Dr.”)

Part of Speech tagging Part of Speech tagging (or PoS tagging) is the process of determining the part of speech of every token in a document When shown a text document, the tagger figures out whether a given token represents a proper noun or a common noun, or if it’s a verb, an adjective, or something else entirely. accurate part of speech tagging is critical for reliable sentiment analysis. Through identifying adjective-noun combinations, a sentiment analysis system gains its first clue that it’s looking at a sentiment-bearing phrase

Chunking Chunking refers to a range of sentence-breaking systems that splinter a sentence into its component phrases (noun phrases, verb phrases, and so on). Chunking in text analytics is different than Part of Speech tagging: PoS tagging means assigning parts of speech to tokens Chunking means assigning PoS -tagged tokens to phrases

For example: The tall man is going to quickly walk under the ladder. PoS tagging will identify  man  and  ladder  as nouns and  walk  as a verb. Chunking will return:  [the tall man] - noun phrase np [is going to quickly walk] - verb phrase vp [under the ladder] - prepositional phrase pp

Syntax parsing Syntax parsing is the analysis of how a sentence is formed. Syntax parsing is a critical preparatory step in  sentiment analysis  and other natural language processing features. The same sentence can have multiple meanings depending on how it’s structured: Apple  was doing poorly until  Steve Jobs … Because  Apple  was doing poorly,  Steve Jobs … Apple  was doing poorly because  Steve Jobs … In the first sentence,  Apple  is negative, whereas  Steve Jobs  is positive. In the second,  Apple  is still negative, but  Steve Jobs  is now neutral. In the final example, both  Apple  and  Steve Jobs  are negative.

Sentence chaining Sentence chaining uses a technique called  lexical chaining  to connect individual sentences based on their association to a larger  topic . Take the sentences: "I prefer a hatchback for city driving." "My neighbor just bought a new SUV." "Audi recently launched a new sedan." "SUVs are popular for their spaciousness." "Hatchbacks are known for their fuel efficiency."

  Even if these sentences appear scattered throughout a document, sentence chaining can reveal connections:  "Hatchback" and "SUV" are both types of cars, creating a link between sentences 1 and 2. "SUV" and "sedan" are also types of cars, linking sentences 2 and 3. The qualities "spaciousness" and "fuel efficiency" relate to the overall topic of "vehicle types" and can help link sentences 4 and 5, and potentially connect them to the previous sentences about specific car models.

Basic applications of text mining Voice of Customer Social Media Monitoring Voice of Employee
Tags