Big Data (2).pptx big data ppt for it's core definition

trivedidivya002 0 views 42 slides Oct 08, 2025
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

This ppt for big data


Slide Content

Big Data Compiled by Dr. Deebha Mumtaz

Evaluation Scheme S. No. Criteria Marks 1 Midterm 26 2 Internal Assessment - Quiz + Attendance + Class Evaluation 24 3 End Term 50 Total 100

Syllabus

Topics 1. Introduction to Big Data Definition Sources Characteristics Application Challenges

Big Data Introduction What is Data? Data are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. In computer science, data is a collection of facts or values that can be processed by a computer. Data can be numbers, words, measurements, observations, or descriptions.

Data Raw, unprocessed facts and figures. Has limited value on its own Information Structured, contextualized data Answers basic questions like "who," "what," "where," and "when" Knowledge Analysis and interpretation of information to uncover patterns and relationships, providing understanding of "how" things work. Insights: Deep, meaningful understanding derived from knowledge, revealing actionable strategies or recognizing trends. Wisdom: The ability to apply knowledge and insights to make sound judgments and decisions in complex situations, often considering long-term implications. I nformation C ontinuum The process for analyzing data, or a model for describing the evolution of communication or information.

Information Continuum Scenario: A Retail Company Analyzing Customer Behavior Data : Raw transaction data: "Customer X bought shoes for $50 on Feb 15, 2025." Information : Processed data: "Customers buying shoes also often buy accessories like socks or bags." Knowledge : Recognizing patterns: "Customers aged 25-35 tend to buy shoes and accessories together during winter sales." Insight : Actionable strategy: "Offering bundled discounts for shoes and accessories could increase sales for customers aged 25-35." Wisdom : Long-term perspective: "Bundle promotions can increase sales but should be balanced with maintaining the brand's premium image for long-term customer loyalty."

Big Data Data that is too large or too complex to be managed using traditional data processing, analysis, and storage techniques. Normally data in Peta bytes i.e. 10^15 byte size is called Big Data. It is stated that almost 90% of today's data has been generated in the past 3 years. Non-linear growth of digital global information-storage capacity and the waning of analog storage

Big Data The term big data has been in use since the 1990s, with some giving credit to John Mashey for popularizing the term The term "Big Data" usually refers to datasets that are too large, complex and unable to be processed by ordinary data processing systems to manage efficiently. These datasets can be derived from a variety of sources, including social media, sensors, internet activity, and mobile devices. The data can be structured, semi-structured and unstructured type of data. Big data helps to analyze the in-depth concepts for the better decisions and strategic taken for the development of the organization. Gartner defines Big Data as “Big data is high-volume, high-velocity and/or high-variety information that demands cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.” Big Data Analysis refers to the process of examining big data—to uncover hidden patterns, correlations, trends, and insights that can be used for decision-making and strategic planning

Sources of Big Data

Characteristics of Big Data

Volume

Volume: Scale of Data

Volume 90% of today’s data has been created in just the last 2 years About 402.74 million terabytes of data are created each day. In fact, we can say that we have already entered the exabyte era.

2. Velocity

2. Velocity

2. Velocity The speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Two kinds of velocity related to big data are the frequency of generation and the frequency of handling, recording, and publishing

3. Variety Big Data includes different types of data like structured data, unstructured data, and semi-structured data . This diversity requires advanced tools for data integration, storage, and analysis.

Sources of Big Data

Feature Structured Data Semi-structured Data Unstructured Data Format Predefined, organized Some organization, tags/markers No predefined format Storage Management Less Significant Large Schema Rigid, conforms to a model Flexible, no strict model No schema Storage Relational databases NoSQL databases, file systems Data lakes, object storage Analysis Easy with standard tools Requires specialized tools Complex, needs advanced techniques Examples Spreadsheets, SQL databases JSON, XML, email Text, images, videos

4 . Veracity The truthfulness or reliability of the data, which refers to the data quality and the data value Ensuring data quality, addressing data discrepancies, and dealing with data ambiguity are all major issues in Big Data analytics. Big data must not only be large in size, but also must be reliable in order to achieve value in the analysis of it.

5. Value The ability to convert large volumes of data into useful insights Data in itself is of no use or importance but it needs to be converted into something valuable to extract Information. Hence, you can state that Value! is the most important V of all the 5V’s

Why is Big Data Important?

Applications of Big Data 1 Banking/Finance: Fraud Detection Risk Management Algorithmic Trading 2. Communication Targeted Communication Improved Communication Effectiveness Crisis Communication 3. Healthcare: Personalized Medicine Predictive Analytics Drug Discovery Epidemiology

Applications of Big Data 4. Education: Personalized Learning Student Performance Analysis Curriculum Development 5. Manufacturing: Predictive Maintenance Quality Control : Supply Chain Optimization 6. Energy: Energy Efficiency Renewable Energy Forecasting

Applications of Big Data 7. Government: Policy Making Public Services National Security 8. Retail: Inventory Management Personalized Shopping Experience Supply Chain Optimization 9. . Transportation: Traffic Management Route Optimization Autonomous Vehicles

Challenges in Big Data Analysis

Challenges Data Volume: Managing and Storing Massive Amounts of Data: Businesses spent $21.5 billion on computing and storage infrastructure in the first quarter of 2023 alone. Finding space to store big data’s rapidly increasing volumes at its rising velocity with conventional means is challenging, slow, and expensive. Solution: Adopting scalable cloud storage solutions, such as Amazon S3, Google Cloud Storage, or Microsoft Azure, can help manage large volumes of data.

Challenges 2. Data Velocity: Processing Data in Real-Time Big Data encompasses a wide variety of data types, including structured data (e.g., databases), semi-structured data (e.g., XML, JSON), and unstructured data (e.g., text, images, videos). The diversity of data types can make it difficult to integrate, analyze, and extract meaningful insights. Solution: Employ data integration platforms and tools like Apache Nifi, Talend, or Informatica help in consolidating disparate data sources into a unified data model.

Challenges 3 . Data Veracity: Ensuring Data Quality and Accuracy For Big Data, ensuring the quality, accuracy, and reliability of data also referred to as data veracity becomes increasingly difficult. Inaccurate or low-quality data can lead to misleading insights and poor decision-making Solution: Implementing robust data quality standards , performing regular data audits, and employing data cleansing techniques. Trifacta, Talend Data Quality, and Apache Griffin and similar tools can help automate data quality management processes.

Challenges 4. Data Security and Privacy: Protecting Sensitive Information Cybercriminals are more likely to target businesses that store sensitive information, and each data breach can cost time, money, and reputati on. P rivacy laws like the European Union’s General Data Protection Regulation (GDPR), Digital Personal Data Protection (DPDP) Act, 2023 (Indian law), Aadhaar Act, 2016, make collecting vast amounts of data while upholding user privacy standards difficult. Solution Adoption of comprehensive data protection strategies such as encryption, access controls, and regular security audits. Organizations should stay informed about evolving data privacy regulations and ensure compliance by adopting privacy-by-design principles in their data management processes.

Challenges 5. Data Integration: Combining Data from Multiple Sources Compiling multiple file types from various sources into a single point of access can be difficult with conventional tools. Solution: Data integration platforms like Apache Camel, MuleSoft, and IBM DataStage can help the process of integrating data from multiple sources. Data virtualization tools let you access and view information from across sources without moving it, which increases visibility despite big data’s volume and velocity.

Challenges 6. Data Analytics: Extracting Valuable Insights The complexity of analyzing large, diverse datasets can be difficult. Traditional analytical tools may struggle to scale, and the lack of skilled data scientists can further hinder the ability to extract meaningful insights. Solution: Platforms like Apache Spark, Hadoop, or Google BigQuery, are designed to handle large-scale data processing and analysis. Providing training for employees can help bridge the skills gap and empower teams to effectively analyze Big Data.

Careers in Big Data Big Data analysts are responsible for analyzing data, identifying relationships, and building models. Skills: Python, R programming, Java, Ruby, Matlab, Pig, SQL, Hadoop, Hive, and MapReduce, ML, NLP Big Data Engineer / BigData Developer / BigData Architect: responsible for building and maintaining data pipelines, writes code to implement data processing logic, designs the overall data infrastructure Skills: Java / C / C++, HADOOP, HIVE/PIG/ML Big Data Admin/Hadoop Administrator: administers and manages hadoop clusters and all other resources in the entire Hadoop ecosystem. Skills:Linux and shell Scripting, Hadoop Ecosystem, ability to troubleshoot complex issue Top companies hiring big data professionals: Google, IBM, Salesforce, Oracle,. EMC, Splunk, GE, Apple, Adobe, Qualcomm In the US, the average salary for a big data engineer is around $134,277, according to Built In.

Big data Analytics Big data analytics is the process of examining large and diverse datasets to uncover hidden patterns, correlations, market trends, customer preferences, and other useful business information.

Big data Analytics- Scenario with a ride-sharing company Descriptive Analytics (What happened?): The company analyzes historical trip data to understand: What was the average fare per trip? Which areas have the highest demand for rides? What's the average driver rating? How many complaints were received about surge pricing? This information is often displayed in dashboards and reports to give a general overview of the business. Diagnostic Analytics (Why did it happen?) The company notices a sudden drop in ridership in a specific area. They investigate: Was there a sudden increase in surge pricing? Was there a public transportation strike that ended? Did a competitor offer a discount in that area? Was there negative social media sentiment about the company in that area? By analyzing the data, they identify the root cause of the ridership drop (e.g., a competitor's discount).

Big data Analytics- Scenario with a ride-sharing company 3. Predictive Analytics (What might happen?): The company wants to predict future demand for rides: They use machine learning models to analyze historical data (time of day, day of week, weather, events) to predict demand in different areas. They predict which drivers are likely to be available at certain times and locations. They forecast the impact of weather conditions on ride demand. This allows them to proactively position drivers and adjust surge pricing. 4. Prescriptive Analytics (How can we make it happen?): The company wants to optimize its operations: Based on predicted demand, they use optimization algorithms to recommend optimal driver positioning. They use simulation to test different pricing strategies and determine the best way to maximize revenue while maintaining customer satisfaction. They develop personalized recommendations for riders based on their past behavior. This allows them to proactively manage their fleet, optimize pricing, and improve customer experience.
Tags