Apache Oozie

771 views 23 slides Jun 05, 2022
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

internet of things
apache oozie


Slide Content

Internet of things N.Nagajothi I M.sc.,IT

Apache Oozie Apache ooize is a java web application used to schedule Apache Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work. It’s integrated with the Hadoop stack. Is an server based work flow scheduling system to Manage Hadoop jobs, It Supports,

Three types of workflows Oozie workflow jobs Oozie Bundle Coordinator jobs Oozie workflow jobs Sequence of action to be executed. Oozie Bundle Package of multiple coordinator And workflow jobs. Coordinator jobs workflow jobs triggered by time and date availability.

Users are permitted to create Directed Acyclic Graphs of workflow which can be run in parallel and sequentially in Hadoop. It consist of two parts: workflow engine coordinator engine workflow engine Responsibility of a workflow engine is to store and run workflow composed of Hadoop jobs. coordinator engine It runs workflow jobs based on predefined schedules and a availability of data.

Ooize is scalable and can manage the timely execution of thousands of workflow in a Hadoop cluster. Ooize is very much flexible as well one can easily start , stop , suspend and rerun jobs . Ooize makes it very easy to return failed workflow .

How it is work Ooize workflow consists of Action Nodes and Control Nodes . An Action node represents a workflow jobs . Moving files into HDFS ,running a map reduce, pig or Hive jobs , importing data using sqoop or running a Shell Script of a program written java. Control node Controls the workflow execution between actions by allowing contracts like conditional logic where in different branches dependent action node

Types of node Start Node Designates the start of the workflow jobs. End Node Signals end of the job. Error node Designates the occurrences of an error and corresponding error message to be printed.

Features of ooize Using it’s web service APIs one control jobs from anywhere. Ooize has to send email notification upon computation jobs Oozie has provision to execute jobs which are scheduled to run periodically Using its Web Service APIs one can control jobs from anywhere.

1. To add a group and users

2. After setting up Hadoop install the packages required for setting up ooize

3 . Next ,download and build ooize using the following commands

4.Download EXt2J3 to the ‘libext’ directory, this required for the ooize web console

5 . Prepare the ooize WAR file

6. Next, create sharelib on HDFS

7. Next,creat the ooize DB

8 . Finally use the command to start ooize server

9. The status of ooize can be checked from command line or the web console

10. To setup the ooize client, copy the client tar file to the “ooize client” and the path in bashrc file .

Ooize workflow for IOT data analysis Assuming that the data received from a machine has the following structure .

The goal of the analysis is to find the counts of each status/error code and produce an output with a structure

The ooize workflow comprising of hadoop streaming map reduce job action and email action that notify the success or failure of the job. The map program parses the status/error code from each line in the input and emits key-value pairs. Where key is the status/error code and value is 1. The reduce program receives the key-pairs emitted by the map program aggregated by the same key. Each key ,the reduce program calculates the count and emitskey,value pairs where key is the status/error code and the value is the count

Thank you