Apache Oozie Apache ooize is a java web application used to schedule Apache Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work. It’s integrated with the Hadoop stack. Is an server based work flow scheduling system to Manage Hadoop jobs, It Supports,
Three types of workflows Oozie workflow jobs Oozie Bundle Coordinator jobs Oozie workflow jobs Sequence of action to be executed. Oozie Bundle Package of multiple coordinator And workflow jobs. Coordinator jobs workflow jobs triggered by time and date availability.
Users are permitted to create Directed Acyclic Graphs of workflow which can be run in parallel and sequentially in Hadoop. It consist of two parts: workflow engine coordinator engine workflow engine Responsibility of a workflow engine is to store and run workflow composed of Hadoop jobs. coordinator engine It runs workflow jobs based on predefined schedules and a availability of data.
Ooize is scalable and can manage the timely execution of thousands of workflow in a Hadoop cluster. Ooize is very much flexible as well one can easily start , stop , suspend and rerun jobs . Ooize makes it very easy to return failed workflow .
How it is work Ooize workflow consists of Action Nodes and Control Nodes . An Action node represents a workflow jobs . Moving files into HDFS ,running a map reduce, pig or Hive jobs , importing data using sqoop or running a Shell Script of a program written java. Control node Controls the workflow execution between actions by allowing contracts like conditional logic where in different branches dependent action node
Types of node Start Node Designates the start of the workflow jobs. End Node Signals end of the job. Error node Designates the occurrences of an error and corresponding error message to be printed.
Features of ooize Using it’s web service APIs one control jobs from anywhere. Ooize has to send email notification upon computation jobs Oozie has provision to execute jobs which are scheduled to run periodically Using its Web Service APIs one can control jobs from anywhere.
1. To add a group and users
2. After setting up Hadoop install the packages required for setting up ooize
3 . Next ,download and build ooize using the following commands
4.Download EXt2J3 to the ‘libext’ directory, this required for the ooize web console
5 . Prepare the ooize WAR file
6. Next, create sharelib on HDFS
7. Next,creat the ooize DB
8 . Finally use the command to start ooize server
9. The status of ooize can be checked from command line or the web console
10. To setup the ooize client, copy the client tar file to the “ooize client” and the path in bashrc file .
Ooize workflow for IOT data analysis Assuming that the data received from a machine has the following structure .
The goal of the analysis is to find the counts of each status/error code and produce an output with a structure
The ooize workflow comprising of hadoop streaming map reduce job action and email action that notify the success or failure of the job. The map program parses the status/error code from each line in the input and emits key-value pairs. Where key is the status/error code and value is 1. The reduce program receives the key-pairs emitted by the map program aggregated by the same key. Each key ,the reduce program calculates the count and emitskey,value pairs where key is the status/error code and the value is the count