UNIT-3 Processing data visualization TOPICS Processing Big Data Integrating disparate data stores Mapping data to the programming framework Connecting and extracting data from storage Transforming data for processing subdividing data in preparation for Hadoop Map Reduce.
Processing Big Data Data processing means to process the data i.e. to convert its format. As we all know data is the very useful and when it is well presented, and it becomes informative and useful. Data processing system is also referred as information system .
Types of Data Processing Manual Data Processing Mechanical Data Processing Electronic Data Processing Batch Data Processing Real-time Data Processing Online Data Processing Automatic Data Processing
Processing BIG Data
Integrating disparate data stores Integrating disparate data stores involves combining and connecting data from different sources or storage systems to provide a unified and coherent view of the information. This process is crucial for organizations that have data scattered across various databases, file systems, or platforms
Here are the key aspects of integrating disparate data stores Data Discovery Data Mapping Data Extraction Data Transformation Data Loading Data Synchronization
HADOOP MAPREDUCE Mapping data to the programming framework
Map and Reduce Map: As the name suggests its main use is to map the input data in key-value pairs. The input to the map may be a key-value pair where the key can be the id of some kind of address and value is the actual value that it keeps. The Map() function will be executed in its memory repository on each of these input key-value pairs and generates the intermediate key-value pair which works as input for the Reducer or Reduce() function.
Reduce: The intermediate key-value pairs that work as input for Reducer are shuffled and sort and send to the Reduce() function. Reducer aggregate or group the data based on its key-value pair as per the reducer algorithm written by the developer.
What is data mapping
The data mapping process in 5 steps Identify all data fields that must be mapped Standardize naming conventions across sources Create data transformation rules and schema logic Test your logic Complete the migration, integration, or transformation
Connecting and extracting data from storage
Data EXTRACTION Software Web Scraping Tools OCR (Optical Character Recognition PDF Extraction Screen Scraping Text Analytics
Data Transformation in Data Mining The goal of data transformation is to prepare the data for data mining so that it can be used to extract useful insights and knowledge.
Phases of MapReduce – How Hadoop MapReduce Works It covers all the phases of MapReduce job execution like Input Files, Input Format, Input Splits, Record Reader, Mapper, Combiner, Partitioner, Shuffling, and Sorting, Reducer, Record Writer, and Output Format in detail.