Business Understanding Business Understanding Why Needs Business Understanding? GOALS Memilih variabel kunci yang menjadi target model yang kita bangun serta metrik-metriknya adalah indikator kesuksesan proyek kita Mengidentifikasi sumber data relevan yang aksesnya tersedia / perlu didapatkan aksesnya
Intro Step Suggest Recommendation Rekomendasi yang diberikan berdasarkan hasil analisis Get Insight Mengambil kesimpulan dari informasi yang didapatkan Define Business Problem Memahami permasalahan yang terjadi Problem Discovery Memilih metrics yang dapat menyelesaikan permasalahan
Review Case Brazilian E-Commerce Olist Business Problem : Bagaimana kondisi e-commerce olist? Problem Discovery : GMV: Gross Merchant Value Total Order Total Buyers Retention Rate, etc Insight? Recommendation?
Review Case Design Data Mart Table Order order_id order_date customer_id order_status Table Order Items order_id product_id price freight_value Table Customer customer_id customer_city Metrics Transaction order_id total_product total_price t otal_logistic _cost Transform Data Mart Transaction order_id order_date customer_id order_status customer_city total_product total_price total_logistic_cost Join Join Why Data Mart? Gather Information Efficient Support to explore data
Review Case Query Data Mart WITH metrics_transaction AS ( SELECT order_id, COUNT ( DISTINCT product_id) total_product, SUM (price) total_product_price, SUM (freight_value) total_logistic_cost FROM ` database - 385606. Latihan1.order_items` -- custom with your dataset GROUP BY 1 ) SELECT a. * , c.customer_city, c.customer_unique_id, COALESCE (b.total_product, ) total_product, COALESCE (b.total_product_price, ) total_product_price, COALESCE (total_logistic_cost, ) total_logistic_cost FROM ` database - 385606. Latihan1. order ` a -- custom with your dataset LEFT JOIN metrics_transaction b ON a.order_id = b.order_id LEFT JOIN ( SELECT DISTINCT customer_id, customer_unique_id customer_city FROM ` database - 385606. Latihan1.cust`) c -- custom with your dataset ON a.customer_id = c.customer_id CTE Sub Query
Review Case Create Data Mart View Table : Table yang dibentuk menggunakan query dan membutuhkan waktu dalam running query untuk menampilkan result Fact Table: Table yang dibentuk menggunakan query dan hasil akan ditampilkan berupa tabel fisik
Review Case View Table Click Save > Save view Setup Project and Dataset then fill name of table Save 1 2 3
Review Case Fact Table Add query with: create or replace table loyal-surfer- 321414 .Testing.data_mart as Run
Review Case Hashing P roses menghasilkan fixed-size output , dari variable-sized input yang dilakukan melalui penggunaan rumus matematika yang dikenal sebagai hash function. Setiap aset kripto menggunakan berbagai algoritma hashing yang berbeda untuk membuat berbagai jenis kode hash – algoritma ini bertugas untuk menghasilkan alfanumerik acak Contoh: reference
Review Case Step 1 - Check Definition (Not Yet Metadata) SELECT order_status, MAX ( order_purchase_timestamp ) purchase_date, MAX ( order_approved_at ) approve_date, MAX ( order_delivered_carrier_date ) pickup_date_by_courir, MAX ( order_delivered_customer_date ) delivered_date FROM `loyal-surfer-321414.Testing.data_mart_purchase` -- custom with your dataset GROUP BY 1 Metadata: data that provides information about other data but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification.
Review Case Step 2 - Check Basic Statistics SELECT FORMAT_DATETIME ( "%Y-%m-01" , partition_columns ) month_key , sum ( total_product_price ) total_gmv FROM `loyal-surfer-321414.Testing.data_mart_purchase` where order_status = 'delivered' group by 1 order by 1 month_key total_gmv 2016-09-01 135 2016-10-01 40325 2016-12-01 11 2017-01-01 111798 2017-02-01 234223 2017-03-01 359199 2017-04-01 340670 2017-05-01 489338 2017-06-01 421923 2017-07-01 481605 2017-08-01 554700 2017-09-01 607400 2017-10-01 648248 2017-11-01 987765 2017-12-01 726033
Review Case Hands On Step 3 - Deep Dive with our assumptions What Should we do?
Review Case Hands On Step 3 - Deep Dive with our assumptions WITH summary_boxplot AS ( SELECT APPROX_QUANTILES ( total_product_price, 4 )[ OFFSET ( 2 )] q2_gmv, APPROX_QUANTILES ( total_product_price, 4 )[ OFFSET ( 3 )] - APPROX_QUANTILES ( total_product_price, 4 )[ OFFSET ( 1 )] iqr FROM `loyal-surfer-321414.Testing.data_mart_purchase_remake` WHERE order_status = 'delivered' ) , summary_batas AS ( SELECT q2_gmv + 1.5 * iqr batas_atas, q2_gmv - 1.5 * iqr batas_bawah FROM summary_boxplot ) , fact_tagging_anomaly AS ( SELECT * , CASE WHEN total_product_price <= ( SELECT batas_atas FROM summary_batas ) AND total_product_price >= ( SELECT batas_bawah FROM summary_batas ) THEN else 1 END outlier_tagging FROM `loyal-surfer-321414.Testing.data_mart_purchase_remake` ) SELECT outlier_tagging, COUNT ( DISTINCT order_id ) total_orders FROM fact_tagging_anomaly WHERE order_status = 'delivered' GROUP BY 1
Review Case Hands On Step 3 - Deep Dive with our assumptions
Review Case Hands On Step 3 - Bonus Retention Rate with Cohort Analysis Query Cohort Analysis reference
Review Case Hands On Step 4 - Summary … … … …
Review Case Hands On Coba Lengkapi slide 17 untuk bikin 3 kelompok berdasarkan pendekatan boxplot (didalam whisker pembelanjaan normal, pembelanjaan outlier dan pembelanjaan extreme) Bikin summary estimasi interval untuk rata-rata gmv dan kasih tagging mana transaksi yang diluar LCL dan UCL Bandingkan taggingan menggunakan boxplot dan pendekatan distribusi normal, simpulkan hasilnya Lengkapi slide 20 untuk summary yang bisa diambil dari data olist