Why Parquet Columnar storage format Can store nested Data E fficiency in file size and query performance Nested field can be read independently of other fields Many data processing understand avro format (Hive, Spark, Pig and MapReduce,etc)
Data Model boolean int32 int64 int96 float double binary fixed_len_byte_array UTF8 ENUM DECIMAL(precision,scale DATE LIST MAP
Record parquet message Nom { (required,optional,repeated) type nom ; required int32 age ; required binary nom (UTF8) } {"name": "right", "type": "string", "order": "descending"} Avro Parquet
Parquet File Format Header Block .. Block Footer Magic number Using parquet Magic number ,Schema,Encoding method,Block position,.. Columns Pages 128MB
Configuration parquet.block.size int parquet.page.size int parquet.dictionary.page.size int parquet.enable.dictionary boolean true parquet.compression String (SNAPPY,UNCOMPRESSED,LZO...)