UNIT-3 Data Visualization for the life used...

AadityaRathi4 11 views 17 slides Jul 26, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

opsdvkerf


Slide Content

UNIT-3 Processing data visualization TOPICS Processing Big Data Integrating disparate data stores Mapping data to the programming framework Connecting and extracting data from storage Transforming data for processing subdividing data in preparation for Hadoop Map Reduce.  

Processing Big Data   Data processing  means to process the data i.e. to convert its format. As we all know data is the very useful and when it is well presented, and it becomes informative and useful. Data processing system is also referred as  information system .

Types of Data Processing Manual Data Processing Mechanical Data Processing Electronic Data Processing Batch Data Processing Real-time Data Processing Online Data Processing Automatic Data Processing

Processing BIG Data

Integrating disparate data stores Integrating disparate data stores involves combining and connecting data from different sources or storage systems to provide a unified and coherent view of the information. This process is crucial for organizations that have data scattered across various databases, file systems, or platforms

Here are the key aspects of integrating disparate data stores Data Discovery Data Mapping Data Extraction Data Transformation Data Loading Data Synchronization

HADOOP MAPREDUCE Mapping data to the programming framework

Map and Reduce Map:  As the name suggests its main use is to map the input data in key-value pairs. The input to the map may be a key-value pair where the key can be the id of some kind of address and value is the actual value that it keeps. The Map() function will be executed in its memory repository on each of these input key-value pairs and generates the intermediate key-value pair which works as input for the Reducer or Reduce() function.

Reduce:   The intermediate key-value pairs that work as input for Reducer are shuffled and sort and send to the Reduce() function. Reducer aggregate or group the data based on its key-value pair as per the reducer algorithm written by the developer.

What is data mapping

The data mapping process in 5 steps Identify all data fields that must be mapped Standardize naming conventions across sources Create data transformation rules and schema logic Test your logic Complete the migration, integration, or transformation

Connecting and extracting data from storage

Data EXTRACTION Software Web Scraping Tools OCR (Optical Character Recognition PDF Extraction Screen Scraping Text Analytics

Data Transformation in Data Mining The goal of data transformation is to prepare the data for data mining so that it can be used to extract useful insights and knowledge.  

Phases of MapReduce – How Hadoop MapReduce Works It covers all the phases of MapReduce job execution like Input Files, Input Format, Input Splits, Record Reader, Mapper, Combiner, Partitioner, Shuffling, and Sorting, Reducer, Record Writer, and Output Format in detail.

THANK YOU!!