large and diverse collections of structured, unstructured, and semi-structured data grow exponentially over time. Big data are large amounts of (unstructured) data that, through thorough analysis, often bring very interesting information and knowledge to the surface. With this you can optimize your business processes and fuel innovation.
Big data examples Tracking consumer behavior and shopping habits to deliver hyper-personalized retail product recommendations tailored to individual customers Monitoring payment patterns and analyzing them against historical customer activity to detect fraud in real time Combining data and information from every stage of an order’s shipment journey with hyperlocal traffic insights to help fleet operators optimize last-mile delivery Using AI-powered technologies like natural language processing to analyze unstructured medical data (such as research reports, clinical notes, and lab results) to gain new insights for improved treatment development and enhanced patient care Using image data from cameras and sensors, as well as GPS data, to detect potholes and improve road maintenance in cities Analyzing public datasets of satellite imagery and geospatial datasets to visualize, monitor, measure, and predict the social and environmental impacts of supply chain operations
Primary benefits Streamlined processes: with insights your KPIs can turn green Increased productivity: employees can accomplish much more work in significantly less time Higher customer satisfaction : segmentation enables better understanding of your customers Proactive operations: shift your organization from reactive to proactive with predictive models Enhanced innovation: develop and deliver new products and services faster Data-driven decisions: use hard data to guide decisions, complementing intuition
The Vs of big data Volume enormous amount of data produced from a variety of sources and devices on a continuous basis. Velocity speed at which data is generated. real time or near real time, -processed, accessed, and analyzed at the same rate to have any meaningful impact. Variety Data is heterogeneous, meaning it can come from many different sources and can be structured, unstructured, or semi-structured. Traditional structured data (such as data in spreadsheets or relational databases) Unstructured text, images, audio, video files semi-structured formats like sensor data that can’t be organized in a fixed data schema.
Additional V’s veracity , variability , and value . Veracity : Big data can be messy, noisy, and error-prone, which makes it difficult to control the quality and accuracy of the data. Large datasets can be unwieldy and confusing, while smaller datasets could present an incomplete picture. The higher the veracity of the data, the more trustworthy it is. Variability: data is constantly changing, inconsistency over time- context and interpretation but also data collection methods based on the information that companies want to capture and analyze. Value: B usiness value of the data - help drive decision-making.
Types of Big Data: Big data is broadly classified into three main categories: Structured Data: Highly organized data that fits neatly into traditional database formats, like relational databases. Semi-structured Data: Data with some organizational properties, but not conforming to a strict relational model, like JSON or XML. Unstructured Data: Data lacking a predefined format or structure, such as text documents, images, or videos.
Documents: examples include emails, quotes, contracts, and text files Photos: captured using smartphones, cameras, or specialized equipment Videos: recorded with smartphones, video cameras, or advanced systems Sound Clips: audio recordings captured through devices like microphones or smartphones Sensor or Machine Data: generated by devices, machinery, or other automated systems RFID Tags: data from wristbands or chips embedded in products Social Media Messages: content created and shared on platforms Log Files: generated by computers, websites, and other systems
Data Reliability Consistency and Dependability of data emphasizing that the same data ,when collected or measured repeatedly should produce similar results under the same conditions Backbone of data Quality
Different Phases of Analytics Descriptive Analytics, Predictive Analytics Prescriptive Analytics
Descriptive analytics is a branch of data analytics that focuses on summarizing and interpreting historical data to gain insights and understand patterns, trends, and relationships within the data. It involves using various statistical and visualization techniques to describe and present data meaningfully.
Descriptive analytics branch of data analytics focuses on summarizing and interpreting historical data gain insights and understand patterns, trends, and relationships within the data. involves using various statistical and visualization techniques to describe and present data meaningfully.
objective of descriptive analytics provide a clear and concise understanding of what has happened in the past What happened?’, ‘When did it happen?’, ‘How did it happen? descriptive analytics mines historical data to unveil actionable information for decision-making and anomaly detection purposes.
Data collection gather relevant data from various source data could be sourced from databases, spreadsheets, surveys, or other structured or unstructured data repositories Eg : ecommerce company and want to analyze customer purchasing behavior. collect data such as customer IDs, purchase dates, products purchased, quantities, prices, and customer demographics.
Cleaning and preparation involves identifying and resolving issues such as missing values, inconsistencies, duplicates, and outliers. Data cleaning ensures the data is high quality, reliable, and ready for further analysis. transforming the data into a consistent format Eg : identify missing values in the price column or duplicate records.
Exploration To understand its characteristics better identify initial patterns or trends. various techniques such as summary statistics, data visualization, and exploratory data analysis. Summary statistics - measures such as mean, median, mode, and standard deviation, provide an overview of the data’s central tendencies and dispersion. Data visualization techniques such as charts, graphs, and histograms help visualize the distribution and relationships within the data, -easy to identify patterns or anomalies.
Segmentation dividing the dataset into meaningful subsets based on specific criteria. more focused analysis and helps uncover insights specific to each segment. segmenting customer data by age group can provide insights into different customer segments’ preferences and buying behavior.
Summary and key performance indicators calculating summary measures such as averages, totals, percentages, or ratios relevant to the subject being analyzed Key performance indicators (KPIs) are specific metrics that help evaluate the performance of a business process, product, or service ecommerce data -calculate KPIs such as average order value, conversion rate, or customer retention rate.
Historical trend analysis to understand how variables or metrics have changed over time. reveals patterns, seasonality, or long-term trends. example, analyzing sales data over several years can reveal sales peaks during certain seasons or identify declining trends in specific product categories. Historical trend analysis helps identify patterns that can enhance decision-making, forecast future performance, and identify areas of improvement.
Data reporting and visualization Reports summarize the analysis and findings, includes summary statistics, visualizations, and narrative descriptions. Reporting and visualization help stakeholders interpret and act upon the insights derived from the data. ecommerce example- visualizations such as line charts showing sales trends over time, a pie chart illustrating sales distribution across different product categories,
Continuous monitoring and iteration ongoing assessment, evaluation, and adaptation of strategies based on changing data insights. monitor sales data, update the analysis periodically, and track changes in purchasing behavior, market trends, or customer preferences.
Prescriptive Analytics: type of data analytics that attempts to answer the question “What do we need to do to achieve this?” It involves the use of technology to help businesses make better decisions through the analysis of raw data.
Prescriptive analytics Prescriptive analytics is a form of data analytics that tries to answer “What do we need to do to achieve this?” It uses machine learning to help businesses decide a course of action based on a computer program’s predictions. Prescriptive analytics works with predictive analytics, which uses data to determine near-term outcomes. When used effectively, it can help organizations make decisions based on facts and probability-weighted projections instead of conclusions based on instinct. Prescriptive analytics isn’t foolproof—it’s only as effective as its inputs.
Prescriptive analytics involves the use of data, statistical algorithms, and machine learning techniques to determine the best course of action for a given situation. It goes beyond predicting outcomes by recommending actions that can optimize results.
advantages of Prescriptive Analysis Optimized Decision-Making: The primary advantage of prescriptive analysis is its ability to guide decision-makers toward optimal actions. By evaluating multiple decision options and considering various constraints, organizations can make choices that maximize desired outcomes and align with strategic goals. Enhanced Strategic Planning: Organizations can use prescriptive analysis to refine and improve their strategic plans. By considering multiple scenarios and assessing the potential impact of different decisions, businesses can develop more robust strategies that are adaptable to changing circumstances. Resource Optimization: Whether it's allocating budgets, managing inventory, or scheduling workforce resources, prescriptive analysis enables organizations to optimize their resource utilization. This leads to cost savings and ensures that resources are deployed where they are most needed. Risk Mitigation: Prescriptive analysis helps organizations identify and mitigate risks by evaluating the potential consequences of different decisions. By understanding the impact of uncertainties, businesses can proactively develop strategies to minimize risks and enhance resilience. Faster and Informed Decision-Making: Prescriptive analysis empowers decision-makers with timely and informed recommendations. This leads to faster decision-making processes as executives can rely on data-driven insights rather than spending prolonged periods analyzing and debating potential courses of action
challenges in Prescriptive Analysis Data Quality and Availability: Effective prescriptive analysis relies heavily on high-quality, relevant, and up-to-date data. If the data used for analysis is inaccurate, incomplete, or outdated, it can lead to unreliable recommendations and suboptimal decision-making. Changing Business Conditions: Prescriptive models use historical data and assumptions about future conditions. Rapid changes in the business environment, such as market fluctuations, regulatory changes, or unexpected events, can challenge the accuracy and relevance of these models. Uncertainty and Assumptions: Prescriptive models are built on assumptions about future events and conditions. Dealing with uncertainties and ensuring that models account for a range of possible scenarios can be challenging, especially in unpredictable environments.
Key Components of Prescriptive Analytics Data Collection and Preparation: Prescriptive analysis begins with the collection of relevant and high-quality data. This data is then cleaned, organized in a standardized format, and prepared for analysis, ensuring accuracy in the insights derived. Data Modeling: Before prescribing actions, it's crucial to predict possible outcomes. Data modeling , often involving machine learning algorithms, establishes a foundation by forecasting various scenarios based on historical data. Optimization: Algorithms are employed to evaluate multiple decision options and identify the one that maximizes or minimizes a defined objective. This step involves fine-tuning strategies for efficiency and effectiveness. Simulation: To enhance decision-making, prescriptive analysis often includes simulation models. These models allow organizations to test different scenarios and understand the potential impact of various decisions before implementing them in the real world. Actionable Recommendations: Prescriptive analysis culminates in providing actionable recommendations. These recommendations empower decision-makers to confidently choose the most advantageous course of action.
How Prescriptive Analytics Works Prescriptive analytics works with another type of data analytics: predictive analytics, which involves the use of statistics and modeling to determine future performance, based on current and historical data. Using predictive analytics’ estimation of what is likely to happen, it recommends what future course to take.
Advantages help prevent fraud, limit risk , increase efficiency , meet business goals, and create more loyal customers. help organizations make decisions based on highly analyzed facts Prescriptive analytics can simulate the probability of various outcomes and show the probability of each, helping organizations to better understand the level of risk and uncertainty
Disadvantages It is only effective if organizations know what questions to ask and how to react to the answers. This form of data analytics is only suitable for short-term solutions. This means businesses shouldn’t use prescriptive analytics to make any long-term ones.
Examples of Prescriptive Analytics Evaluate whether a local fire department should require residents to evacuate a particular area when a wildfire is burning nearby Predict whether an article on a particular topic will be popular with readers based on data about searches and social shares for related topics Adjust a worker training program in real time based on how the worker is responding to each lesson
Prescriptive Analytics for Hospitals and Clinics analyze which hospital patients have the highest risk of readmission so that healthcare providers can do more, via patient education Prescriptive Analytics for Airlines automatically adjusting ticket prices and availability based on numerous factors, including customer demand, weather, and gasoline prices.
Prescriptive Analytics in Banking Create models for customer relationship management Improve ways to cross-sell and upsell products and services Recognize weaknesses that may result in losses, such as anti-money laundering (AML) Develop key security and regulatory initiatives like compliance reporting
Prescriptive Analytics in Marketing Marketers can use prescriptive analytics to stay ahead of consumer trends.
Text Analytics process of transforming unstructured text documents into usable, structured data. works by breaking apart sentences and phrases into their components, and then evaluating each part’s role and meaning using complex software rules and machine learning algorithms.
foundation of numerous natural language processing (NLP) features, including named entity recognition, categorization, and sentiment analysis. In broad terms, these NLP features aim to answer four questions: Who is talking? What are they talking about? What are they saying about those subjects? How do they feel?
Text mining describes the general act of gathering useful information from text documents. Text analytics refers to the actual computational processes of breaking down unstructured text documents, such as tweets, articles, reviews and comments, so they can be analyzed further. Natural language processing (NLP) is how a computer understands the underlying meaning of those text documents: who’s talking, what they’re talking about, and how they feel about those subjects.
How does text analytics work? Text analytics starts by breaking down each sentence and phrase into its basic parts. Each of these components, including parts of speech, tokens, and chunks
seven computational steps i Language Identification Tokenization Sentence breaking Part of Speech tagging Chunking Syntax parsing Sentence chaining
Language identification identifying what language the text is written in. Spanish? Russian? Arabic? Chinese? Each language has its unique rules of grammar language identification determines the whole process for every other text analytics function.
Tokenization process of breaking apart a sentence or phrase into its component pieces. Tokens are usually words or numbers. tokens can also be: Punctuation (exclamation points amplify sentiment ) Hyperlinks (https://…) Possessive markers (apostrophes)
Tokenization is language-specific, so it’s important to know which language you’re analyzing. Most alphabetic languages use whitespace and punctuation to denote tokens within a phrase or sentence. Logographic (character-based) languages such as Chinese, however, use other systems.
Sentence breaking small text documents, such as tweets, usually contain a single sentence. But longer documents require sentence breaking to separate each unique statement. In some documents, each sentence is separated by a punctuation mark. But some sentences contain punctuation marks that don’t mean the end of the statement (like the period in “Dr.”)
Part of Speech tagging Part of Speech tagging (or PoS tagging) is the process of determining the part of speech of every token in a document When shown a text document, the tagger figures out whether a given token represents a proper noun or a common noun, or if it’s a verb, an adjective, or something else entirely. accurate part of speech tagging is critical for reliable sentiment analysis. Through identifying adjective-noun combinations, a sentiment analysis system gains its first clue that it’s looking at a sentiment-bearing phrase
Chunking Chunking refers to a range of sentence-breaking systems that splinter a sentence into its component phrases (noun phrases, verb phrases, and so on). Chunking in text analytics is different than Part of Speech tagging: PoS tagging means assigning parts of speech to tokens Chunking means assigning PoS -tagged tokens to phrases
For example: The tall man is going to quickly walk under the ladder. PoS tagging will identify man and ladder as nouns and walk as a verb. Chunking will return: [the tall man] - noun phrase np [is going to quickly walk] - verb phrase vp [under the ladder] - prepositional phrase pp
Syntax parsing Syntax parsing is the analysis of how a sentence is formed. Syntax parsing is a critical preparatory step in sentiment analysis and other natural language processing features. The same sentence can have multiple meanings depending on how it’s structured: Apple was doing poorly until Steve Jobs … Because Apple was doing poorly, Steve Jobs … Apple was doing poorly because Steve Jobs … In the first sentence, Apple is negative, whereas Steve Jobs is positive. In the second, Apple is still negative, but Steve Jobs is now neutral. In the final example, both Apple and Steve Jobs are negative.
Sentence chaining Sentence chaining uses a technique called lexical chaining to connect individual sentences based on their association to a larger topic . Take the sentences: "I prefer a hatchback for city driving." "My neighbor just bought a new SUV." "Audi recently launched a new sedan." "SUVs are popular for their spaciousness." "Hatchbacks are known for their fuel efficiency."
Even if these sentences appear scattered throughout a document, sentence chaining can reveal connections: "Hatchback" and "SUV" are both types of cars, creating a link between sentences 1 and 2. "SUV" and "sedan" are also types of cars, linking sentences 2 and 3. The qualities "spaciousness" and "fuel efficiency" relate to the overall topic of "vehicle types" and can help link sentences 4 and 5, and potentially connect them to the previous sentences about specific car models.
Basic applications of text mining Voice of Customer Social Media Monitoring Voice of Employee