Apache Pig

abhicno 361 views 7 slides Mar 25, 2016
Slide 1
Slide 1 of 7
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7

About This Presentation

What is Pig?.Why Pig? How to use Pig.Is it similar to Hive?
How to write scripts in Pig?


Slide Content

Pig Prepared by : Abhishek Gautam

Pig It is a part of Hadoop ecosystem which is used to process large datasets . Used to automate ETL for unstructured data . It’s a procedural language . Used by both Data analyst & developers . The language used here is called Pig Latin . Pig relations are non-persistent across sessions.

Execution Modes In Pig Local Mode  : In Local Mode of Pig execution, all the input data will be taken from local file system. After execution it provides output on top of local file system . This mode of suitable only for small datasets and when trying out Pig. To start the local mode of execution, the following command is used.    pig -x local

Mapreduce Mode   : In this mode Apache Pig will take the input form HDFS paths only, and after processing data it will put output files on top of HDFS. In MapReduce mode of execution, Pig translates queries into MapReduce  jobs and runs them on a Hadoop Cluster.    This is the default mode of execution . To start the Mapreduce mode of execution, the following command is used.     pig -x mapreduce

Tez Mode   : Tez mode is more flexible and faster than Mapreduce mode but lack some performance issues.   To run Pig in Tez mode, you need access to a Hadoop cluster and HDFS installation. To start the tez mode of execution, the following command is used. pig -x tez

Pig Scripts Use the LOAD operator to load the data into a table . Use the STORE operator to store the data into another location . Use the  DUMP  operator to display results to your terminal screen . Use the  DESCRIBE  operator to review the schema of a relation.

Note While Loading data into a table if column name is not specified in a table then we can name the columns from starting as $0, $ 1,$2….