What is Pig?.Why Pig? How to use Pig.Is it similar to Hive?
How to write scripts in Pig?
Size: 73.29 KB
Language: en
Added: Mar 25, 2016
Slides: 7 pages
Slide Content
Pig Prepared by : Abhishek Gautam
Pig It is a part of Hadoop ecosystem which is used to process large datasets . Used to automate ETL for unstructured data . It’s a procedural language . Used by both Data analyst & developers . The language used here is called Pig Latin . Pig relations are non-persistent across sessions.
Execution Modes In Pig Local Mode : In Local Mode of Pig execution, all the input data will be taken from local file system. After execution it provides output on top of local file system . This mode of suitable only for small datasets and when trying out Pig. To start the local mode of execution, the following command is used. pig -x local
Mapreduce Mode : In this mode Apache Pig will take the input form HDFS paths only, and after processing data it will put output files on top of HDFS. In MapReduce mode of execution, Pig translates queries into MapReduce jobs and runs them on a Hadoop Cluster. This is the default mode of execution . To start the Mapreduce mode of execution, the following command is used. pig -x mapreduce
Tez Mode : Tez mode is more flexible and faster than Mapreduce mode but lack some performance issues. To run Pig in Tez mode, you need access to a Hadoop cluster and HDFS installation. To start the tez mode of execution, the following command is used. pig -x tez
Pig Scripts Use the LOAD operator to load the data into a table . Use the STORE operator to store the data into another location . Use the DUMP operator to display results to your terminal screen . Use the DESCRIBE operator to review the schema of a relation.
Note While Loading data into a table if column name is not specified in a table then we can name the columns from starting as $0, $ 1,$2….