Praktisi Mengajar - Workflow Analysis.pptx

SoniAdiyatma1 13 views 21 slides May 09, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

Materi Praktisi Mengajar


Slide Content

Praktisi Mengajar: Workflow Analysis Study: Basic Database

Workflow Analysis Business Understanding

Business Understanding Business Understanding Why Needs Business Understanding? GOALS Memilih variabel kunci yang menjadi target model yang kita bangun serta metrik-metriknya adalah indikator kesuksesan proyek kita Mengidentifikasi sumber data relevan yang aksesnya tersedia / perlu didapatkan aksesnya

Intro Step Suggest Recommendation Rekomendasi yang diberikan berdasarkan hasil analisis Get Insight Mengambil kesimpulan dari informasi yang didapatkan Define Business Problem Memahami permasalahan yang terjadi Problem Discovery Memilih metrics yang dapat menyelesaikan permasalahan

Review Case Brazilian E-Commerce Olist Business Problem : Bagaimana kondisi e-commerce olist? Problem Discovery : GMV: Gross Merchant Value Total Order Total Buyers Retention Rate, etc Insight? Recommendation?

Review Case Design Data Mart Table Order order_id order_date customer_id order_status Table Order Items order_id product_id price freight_value Table Customer customer_id customer_city Metrics Transaction order_id total_product total_price t otal_logistic _cost Transform Data Mart Transaction order_id order_date customer_id order_status customer_city total_product total_price total_logistic_cost Join Join Why Data Mart? Gather Information Efficient Support to explore data

Review Case Query Data Mart WITH metrics_transaction AS ( SELECT order_id, COUNT ( DISTINCT product_id) total_product, SUM (price) total_product_price, SUM (freight_value) total_logistic_cost FROM ` database - 385606. Latihan1.order_items` -- custom with your dataset GROUP BY 1 ) SELECT a. * , c.customer_city, c.customer_unique_id, COALESCE (b.total_product, ) total_product, COALESCE (b.total_product_price, ) total_product_price, COALESCE (total_logistic_cost, ) total_logistic_cost FROM ` database - 385606. Latihan1. order ` a -- custom with your dataset LEFT JOIN metrics_transaction b ON a.order_id = b.order_id LEFT JOIN ( SELECT DISTINCT customer_id, customer_unique_id customer_city FROM ` database - 385606. Latihan1.cust`) c -- custom with your dataset ON a.customer_id = c.customer_id CTE Sub Query

Review Case Create Data Mart View Table : Table yang dibentuk menggunakan query dan membutuhkan waktu dalam running query untuk menampilkan result Fact Table: Table yang dibentuk menggunakan query dan hasil akan ditampilkan berupa tabel fisik

Review Case View Table Click Save > Save view Setup Project and Dataset then fill name of table Save 1 2 3

Review Case Fact Table Add query with: create or replace table loyal-surfer- 321414 .Testing.data_mart as Run

Review Case Hashing P roses menghasilkan fixed-size output , dari variable-sized input yang dilakukan melalui penggunaan rumus matematika yang dikenal sebagai hash function. Setiap aset kripto menggunakan berbagai algoritma hashing yang berbeda untuk membuat berbagai jenis kode hash – algoritma ini bertugas untuk menghasilkan alfanumerik acak Contoh: reference

Review Case Step 1 - Check Definition (Not Yet Metadata) SELECT order_status, MAX ( order_purchase_timestamp ) purchase_date, MAX ( order_approved_at ) approve_date, MAX ( order_delivered_carrier_date ) pickup_date_by_courir, MAX ( order_delivered_customer_date ) delivered_date FROM `loyal-surfer-321414.Testing.data_mart_purchase` -- custom with your dataset GROUP BY 1 Metadata: data that provides information about other data but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification.

Review Case Step 2 - Check Basic Statistics SELECT min ( partition_columns ) min_date , max ( partition_columns ) max_date , count ( distinct order_id ) total_order , count ( distinct customer_unique_id ) total_buyers , round ( avg ( total_product ) , 2 ) avg_products , round ( avg ( total_product_price ) , 2 ) avg_gmv , approx_quantiles ( total_product_price, 100 )[ offset ( 50 )] median_gmv , STDDEV ( total_product_price ) std_gmv , round ( avg ( total_logistic_cost ) , 2 ) avg_logistic_cost , approx_quantiles ( total_logistic_cost, 100 )[ offset ( 50 )] median_logistic_cost , STDDEV ( total_logistic_cost ) std_logistic_cost FROM `loyal-surfer-321414.Testing.data_mart_purchase` where order_status = 'delivered' statistics values min_date 2016-09-15 max_date 2018-08-29 total_order 96478 total_buyers 93358 avg_products 1.04 avg_gmv 137.04 median_gmv 86.57 std_gmv 209.05 avg_logistic_cost 22.79 median_logistic_cost 17.17 std_logistic_cost 21.56

Review Case Step 2 - Check Basic Statistics SELECT FORMAT_DATETIME ( "%Y-%m-01" , partition_columns ) month_key , sum ( total_product_price ) total_gmv FROM `loyal-surfer-321414.Testing.data_mart_purchase` where order_status = 'delivered' group by 1 order by 1 month_key total_gmv 2016-09-01 135 2016-10-01 40325 2016-12-01 11 2017-01-01 111798 2017-02-01 234223 2017-03-01 359199 2017-04-01 340670 2017-05-01 489338 2017-06-01 421923 2017-07-01 481605 2017-08-01 554700 2017-09-01 607400 2017-10-01 648248 2017-11-01 987765 2017-12-01 726033

Review Case Hands On Step 3 - Deep Dive with our assumptions What Should we do?

Review Case Hands On Step 3 - Deep Dive with our assumptions WITH summary_boxplot AS ( SELECT APPROX_QUANTILES ( total_product_price, 4 )[ OFFSET ( 2 )] q2_gmv, APPROX_QUANTILES ( total_product_price, 4 )[ OFFSET ( 3 )] - APPROX_QUANTILES ( total_product_price, 4 )[ OFFSET ( 1 )] iqr FROM `loyal-surfer-321414.Testing.data_mart_purchase_remake` WHERE order_status = 'delivered' ) , summary_batas AS ( SELECT q2_gmv + 1.5 * iqr batas_atas, q2_gmv - 1.5 * iqr batas_bawah FROM summary_boxplot ) , fact_tagging_anomaly AS ( SELECT * , CASE WHEN total_product_price <= ( SELECT batas_atas FROM summary_batas ) AND total_product_price >= ( SELECT batas_bawah FROM summary_batas ) THEN else 1 END outlier_tagging FROM `loyal-surfer-321414.Testing.data_mart_purchase_remake` ) SELECT outlier_tagging, COUNT ( DISTINCT order_id ) total_orders FROM fact_tagging_anomaly WHERE order_status = 'delivered' GROUP BY 1

Review Case Hands On Step 3 - Deep Dive with our assumptions

Review Case Hands On Step 3 - Bonus Retention Rate with Cohort Analysis Query Cohort Analysis reference

Review Case Hands On Step 4 - Summary … … … …

Review Case Hands On Coba Lengkapi slide 17 untuk bikin 3 kelompok berdasarkan pendekatan boxplot (didalam whisker pembelanjaan normal, pembelanjaan outlier dan pembelanjaan extreme) Bikin summary estimasi interval untuk rata-rata gmv dan kasih tagging mana transaksi yang diluar LCL dan UCL Bandingkan taggingan menggunakan boxplot dan pendekatan distribusi normal, simpulkan hasilnya Lengkapi slide 20 untuk summary yang bisa diambil dari data olist

Praktisi Mengajar: Workflow Analysis Study: Basic Database
Tags