Apache Parquet

megrhihaikel 560 views 11 slides Jun 03, 2017

Slide 1 of 11

About This Presentation

https://www.youtube.com/watch?v=1Kur7SitXRA&index=5&list=PLAV_dWz2GNAiDnEVI9ynfVr3WYt3moS3c

Size: 261.29 KB

Language: en

Added: Jun 03, 2017

Slides: 11 pages

Slide Content

Apache Parquet https://parquet.apache.org/

Why Parquet Columnar storage format Can store nested Data E fficiency in file size and query performance Nested field can be read independently of other fields Many data processing understand avro format (Hive, Spark, Pig and MapReduce,etc)

Data Model boolean int32 int64 int96 float double binary fixed_len_byte_array UTF8 ENUM DECIMAL(precision,scale DATE LIST MAP

Record parquet message Nom { (required,optional,repeated) type nom ; required int32 age ; required binary nom (UTF8) } {"name": "right", "type": "string", "order": "descending"} Avro Parquet

Parquet File Format Header Block .. Block Footer Magic number Using parquet Magic number ,Schema,Encoding method,Block position,.. Columns Pages 128MB

Encoding & Compression Run-length : True , True ,True ,False (3 true , 1 false) Dictionary encoding ( indexation ) Plain encoding Compression algorithms Snappy,gzip,...

Configuration parquet.block.size int parquet.page.size int parquet.dictionary.page.size int parquet.enable.dictionary boolean true parquet.compression String (SNAPPY,UNCOMPRESSED,LZO...)

Apache Parquet

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Apache Parquet

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx