4. Tools Proyek Data Science DTS-TA v.3.pptx

irvaimuhammad 108 views 178 slides Jul 03, 2024
Slide 1
Slide 1 of 178
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178

About This Presentation

Dalam modul ini, kita berkenalan dengan beberapa kakas yang digunakan data scientist.
Kakas yang digunakan oleh data scientist mana pun, seperti halnya para programmers,
adalah unsur penting untuk keberhasilan dan peningkatan kinerja. Sebagian besar effort
dalam proyek data science digunakan untuk p...


Slide Content

thematic Academy Data Scientist: Artificial Intelligence untuk Dosen dan Instruktur Pertemuan # 4 : Tools Proyek Data Science

Deskripsi Pelatihan Tujuan utama dari modul pelatihan ini adalah untuk membahas data science tools dengan menjelaskan seperangkat kakas dan teknik yang berkaitan dengan keterampilan dasar dalam ilmu komputer , matematika , dan statistik untuk melakukan tugas-tugas yang umumnya terkait dengan data science.

Capaian Pembelajaran Pada topik ini , kita akan mempelajari : Bahasa Pemrograman Python Development Environment Dasar- dasar library Python untuk proyek data science NumPy, SciPy, Pandas, Matplotlib, Seaborn, Scikit-learn

Python Bahasa pemrograman tingkat tinggi Penulisan kode / sintaks lebih sederhana dan tersedia banyak library Bersifat open-source dan cross-platform Diluncurkan oleh Guido Van Rosum pada tahun 1991. Data Analyst Data Engineer Data Scientist Business Intelligence ML Engineer Data Professional Cocok untuk pemula Sederhana tapi powerful High-demand skill Python

Kenapa Python? Freely usable including for commercial purpose Image Source: edureka.co

Python is popular

Python Menjadi yang Pertama dalam Daftar Keahlian yang Paling Dibutuhkan ( sumber : https://towardsdatascience.com )

Web apps, API Executable py2exe pyinstaller cx_freeze Interpreter Call within other PL

Python is easy to use (more concise) versus Bahasa C #include < stdio.h > int main() { printf ("Hello World!"); return 0; }

Mengapa Python? https://www.japan.go.jp/abenomics/_userdata/abenomics/pdf/society_5.0.pdf Talenta digital Data Professional Python high-demand skill

Python D igunakan pada YouTube “ Google runs millions of lines of Python code. The front-end server that drives youtube.com and YouTubes APIs is primarily written in Python, and it serves millions of requests per second !" ─ Dylan Trotter, Youtube Engineer, 2017 https://opensource.googleblog.com/2017/01/grumpy-go-running-python.html

Python Digunakan pada Quora “ We decided that Python was fast enough for most of what we need to do (since we push our performance-critical code to backend servers written in C++ whenever possible). As far as typechecking , we ended up writing very thorough unit tests which are worth writing anyway, and achieve most of the same goals .” ─ Adam D’Angelo, CEO Quora, 2014 https://www.quora.com/Why-did-Quora-choose-Python-for-its-development

Python Digunakan pada Beberapa Industri

Penerapan Python pada Proyek Data Science Data Exploration Data Pre- Processing Data Cleansing Data Modeling Scraping, crawling, data mining Coding, query Seleksi fitur , statistika deskriptif , class balancing , visualisasi data Transformasi fitur : Categorical encoding , binning Menangani nilai kosong (missing values), menghapus baris terduplikasi Data formating , menangani data pencilan (outliers) Melatih data dengan algoritma machine learning Melakukan klasifikasi , regresi , prediksi , klasterisasi Python

Memulai Python Python adalah bahasa interpreter , yang dapat mengurangi siklus edit-test-debug karena tidak memerlukan langkah kompilasi Untuk menjalankan Python , Anda memerlukan runtime/interpreter environment untuk mengeksekusi kode : Mode interaktif : Setiap perintah yang Anda tulis akan langsung ditafsirkan dan segera dieksekusi sehingga bisa langsung melihat hasilnya  IPython Mode skrip : Anda memasukkan satu set kode Python ke dalam format . py , program dijalankan baris demi baris Python interpreter . py Hasil output

Konsep IPython : REPL Environment R ead Proses membaca code E val Proses evaluasi ( eksekusi ) code P r int Proses menampilkan hasil (output) L oop Pengulangan proses R-E-P

Pilihan Development Environment Pilih Development Environment yang paling mudah dan nyaman : Anaconda Distribution ( https://www.anaconda.com/distribution/ ) Python, Conda , lebih dari 1000 library data science Miniconda ( https://docs.conda.io/en/latest/miniconda.html ) Python interpreter, Conda Jupyter Notebook ( https://jupyter.org/ ) Python installer ( https://www.python.org/downloads/ ). Google Colaboratory ( https://colab.research.google.com/ ). Notebooks Azure ( https://notebooks.azure.com/ )

Anaconda Distribution Anaconda Navigator Sebua h aplikasi dashboard interface pada paket Anaconda Distribution

Jupyter Notebook L ingkungan pemrograman interaktif berbasis web yang mendukung berbagai bahasa pemrograman termasuk Python Banyak digunakan oleh peneliti dan akademisi untuk pemodelan matematika , pembelajaran mesin , analisis statistik , dan untuk pengajaran pemrograman

Jupyter Notebook Skrip dapat ditulis dalam bentuk : Code : Algoritma dan formula matematis Markdown/Heading : Teks deskripsi , penjelasan code Raw NBConvert : Konversi format yang berbeda Hasil dapat diketahui langsung setelah menjalankan perintah Run

Google Colaboratory Skrip dapat ditulis dalam bentuk : Code : Algoritma dan formula matematis Teks: Teks deskripsi , penjelasan code Dapat digunakan pada https://colab.research.google.com/ dan hasil dapat diketahui langsung setelah menjalankan perintah Run

Bekerja dengan Git Git merupakan kakas yang bersifat open source untuk memudahkan bekerja dengan proyek berskala kecil maupun besar ( https://git-scm.com/ ) Git memiliki tiga status utama tempat file berada : modified, staged, committed: Modified berarti Anda telah mengubah file tetapi belum menyimpannya ke database Anda Staged berarti Anda telah menandai file yang dimodifikasi dalam versi terbaru untuk masuk ke tahap commit Commit berarti bahwa data disimpan dengan aman di database local Anda

Bekerja dengan Git Inisialisasi : git init Commit: git commit -m "first commit" Branch: git branch -M main Add: git remote add origin https://github.com/[user]/[repo].git Push: git push -u origin main Pull: git pull origin [branch]

Hello World! Bahasa C #include < stdio.h > int main() { printf ("Hello World!"); return 0; } Bahasa Python print(”Hello World!”) Lebih sederhana Tida k ada kurung kurawal {..} Tidak perlu titik koma ;

Software Development Kamus Semua variable dan struktur data yang digunakan dalam program Algoritma Rangkaian instruksi untuk mencapai tujuan program

Tipe Data Python f loat – bilangan riil int – bilangan bulat (integer) str – string, teks bool – True or False In [1]: height = 1.84 In [2]: tall = True Masalah Terlalu banyak data masukan untuk tipe data yang sama Tidak nyaman In [3]: height1 = 1.84 In [4]: height2 = 1.79 In [5]: height3 = 1.82 In [6]: height4 = 1.90 Solusi  Python List

Python List [a, b, c] Koleksi nilai-nilai Dapat mengandung beberapa tipe data berbeda In [7]: [1.84, 1.79, 1.82, 1.90, 1.80] Out[7]: [1.84, 1.79, 1.82, 1.90, 1.80] In [8]: height = [ 1.84, 1.79, 1.82, 1.90, 1.80 ] In [9]: height Out[9]: [ 1.84, 1.79, 1.82, 1.90, 1.80 ] In [10]: famz = ["Abe", 1.84, " Beb ", 1.79, "Cory", 1.82, "Dad", 1.90] In [11]: famz Out[11]: [“Abe", 1.84, “ Beb ", 1.79, “Cory", 1.82, “Dad", 1.90 ] [“Abe", 1.84 ] [ “ Beb ", 1.79 ] [“Cory", 1.82 ] [“Dad", 1.90 ]

First Python program The first thing to note, a Python code line that begins with a hash symbol will be considered a comment, which will not be executed.

Syntax error message A mistake in writing down a Python command will yield an error message when executed.

Semantic error If there is a mistake in the program logic, Python will not tell you a mistake if no syntax error is detected. E.g., suppose the intention is to print ‘Hello Python 101’, but we accidentally write ‘Hello Python 102’, then Python won’t complain although the code is wrong.

Outline Python: Overview Types, Expressions, Variables String operations Lists, Tuples Sets Dictionaries Conditions and Branching

Variables Variable bertugas menyimpan nilai , bisa berupa nilai . Nilai pada variable memiliki tipe Nilai dimasukkan ke dalam variable menggunakan assignment dengan tanda ‘=‘ Variable ditulis di sebelah kiri tanda “=“ Nilai variable ditulis di sebelah kanan tanda “=“ Nilai bisa diperoleh dari hasil sebuah ekspresi matematika , sebuah nilai konstan , atau hasil sebuah fungsi

Python types Python types Example expression int 11, -14, 0, 2 float 21.3201, 0.0, 0.8, -2.34 str “Hello Python 101” boolean True, False Use type() function to get the type of an expression variable nilai konstan

Expressions Note that / and // are different

my_var 1 13 8 20 y 40

Ekspresi dengan arithmetic operator dan variable Ekspresi matematika

Ekspresi dengan Assignment Operator dan Variable

Operasi matematika lainnya

print(____) assignment variable value/ nilai

Error/ kesalahan : Lokasi kesalahan Jenis kesalahan komentar

input

Outline Python: Overview Types, Expressions, Variables String operations Lists, Tuples Sets Dictionaries Conditions and Branching

String slicing You can get a substring by slicing. s[ m:n ] is a substring of s taken from character at index m until character at index n-1 If n ≤ m, maka hasil slicing adalah string kosong S e m a r a n g T a w a n g 1 2 3 4 5 6 7 8 9 10 11 12 13 14 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1

String concatenation We use “+” to concatenate strings

String replication Multiply a string with a number (using “*”) yields a new string containing the replication of the old string.

Strings are immutable! String is immutable: you cannot change the value of characters in a string. But you can reassign the variable to a new string

String: escape sequences Certain characters are difficult to input (e.g., for print() function), so we use prefix it with a backslash “\” to use them. E.g., newline, tab, and the backslash itself. Printing backslash character without escaping can be done using raw string notation (with the ‘r’ prefix)

Special Character Error/ kesalahan : Lokasi kesalahan Jenis kesalahan Special character / character Arti \b ASCII Backspace (BS) \r ASCII CR \ ooo Character dgn nilai octal ooo \ xhh Character dgn nilai hex hh

Fungsi-fungsi pada String: Upper upper() returns uppercase version of the string upper() A = “Surabaya is the city of heroes” B: “SURABAYA IS THE CITY OF HEROES”

String: replace substring with another Get a new string with the given substring replaced by a new substring A: “Jakarta is the largest” “Jakarta is the largest ” B: “Jakarta is the most crowded” most crowded

String: Find the occurrence of substring Find the location of the first occurrence of the given substring S e m a r a n g T a w a n g 1 2 3 4 5 6 7 8 9 10 11 12 13 14

String operator

Outline Python: Overview Types, Expressions, Variables String operations Lists, Tuples Sets Dictionaries Conditions and Branching

Type for Tuple, Set, List, Dict

List

Lists: Accessing elements Elements of lists can be accessed using index in the same way as tuples or strings. L = ['Surabaya', 3.4, 1293] 1 2 -3 -2 -1

Lists are mutable Unlike tuples or strings, elements of a list can be changed.

Lists are mutable If a list is referenced by two variables, changing its elements via one variable will cause the changes to be visible from the other variable. Suppose we have L and LL as follows: We change L[1], and the change appear in LL too

List slicing Slicing works like in tuples, but returns a new list (not sharing reference with the old one). Lst [3] is changed Lst35 includes element at Lst [3] But, Lst35 is NOT changed!

List slicing Thus slicing can be used to copy a list Lst_c is a copy of Lst Lst [0] is changed But Lst_c is NOT changed!

List Cek isi list Tambah isi list

Tambah , Hapus

Sort, Reverse, Max, Min

Copy List

Convert string to list To convert a string (of words, separated by spaces) into a list of words, we use the string method split() If we want, we can specify which character to be used as separator/delimiter (instead of spaces).

More list methods … See https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range Or you can type: help(list)

Tuples Tuple is an ordered sequence Tuples are written as comma-separated items within parentheses. Its length is the number of elements it contains

Tuple access & slicing We can access & slice tuples like string (‘Alice’, 85, 3.8, ‘USA’, 25) 1 2 3 4 1 2 3 4

Tuple nesting Tuples can be nested: a tuple may contain other tuples. 1 2 3 4

nt = (5, 7, ('Java', 'Bali'), (2, 11), ('Madura', (5, 7))) 1 2 3 4 ('Java', 'Bali') 'Java' 'Bali' nt [2][0] nt [2][1] nt [2] (2, 11) 2 11 nt [3][0] nt [3][1] nt [3] (‘Madura’, (5,7)) ‘Madura' (5,7) nt [4][0] nt [4][1] nt [4] 'B’ 'a’ 5 7 nt [2][1][0] nt [2][1][1] nt [4][1][0] nt [4][1][1]

Outline Python: Overview Types, Expressions, Variables String operations Lists, Tuples Sets Dictionaries Conditions and Branching

Sets Sets : unordered mutable collection of unique elements (Like lists and tuples) Sets can store elements of any Python types (Unlike lists and tuples) Sets do not record element position (Unlike lists and tuples) Sets only contain unique elements – duplicates are not stored. Elements of sets must be hashable , e.g.: immutable objects immutable containers/collections whose elements are all immutable objects So, sets cannot contain lists and other sets

Creating a set Or use the set() function Creating empty set can only be done using set() function, not curly braces! Not a set!

Set: adding elements Adding the same element doesn’t change the set “Bandung”, “Bogor”, “Depok” “Bandung”, “Bogor”, “Depok”, “Jakarta” “Bandung”, “Bogor”, “Depok”, “Jakarta”

Set: removing elements “Bandung”, “Bogor”, “Depok” “Bandung”, “Bogor”, “Depok”, “Jakarta”

Set: check if the set contain the given element “Bandung”, “Bogor”, “Depok”

Set intersection “Bogor”, “Depok”, “Jakarta” “Bogor”, “Depok”, “Bandung”

Set intersection “Bogor”, “Depok”

Set union “Bogor”, “Depok”, “Jakarta” “Bogor”, “Depok”, “Bandung”

Set union “Bogor”, “Depok”

Subset test “Bogor”, “Depok”, “Jakarta” “Bogor”, “Depok”

Subset test “Bogor”, “Depok” “Bogor”, “Depok”, “Jakarta”

More set methods … See https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset

Outline Python: Overview Types, Expressions, Variables String operations Lists, Tuples Sets Dictionaries Conditions and Branching

Dictionaries Dictionaries: a collection of pairs Each pair consists of a key followed by a value separated by a colon. A dictionary’s keys form a set: The keys are unique and immutable Position of keys in dictionary are not recorded Values may be immutable, mutable, and duplicates Denoted by a pair of curly brackets where each key-value pair is separated by a comma from other key-value pairs.

Dictionary: analogy with list Dictionaries are similar to lists in the sense that we use arbitrary immutable objects as indices instead of integers. Element1 1 Element2 2 Element3 3 Element4 … … Index Element List Key 1 Value 1 Key 2 Value 2 Key 3 Value 3 Key 4 Value 4 … … Key: is an index by label Value Dictionary

Dictionary “Jakarta” 10.1 “Surabaya” 3.4 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9

Dictionary: accessing elements using key “Jakarta” 10.1 “Surabaya” 3.4 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9

Dictionary: adding elements “Jakarta” 10.1 “Surabaya” 3.4 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Jakarta” 10.1 “Surabaya” 3.4 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4

Dictionary: adding elements “Jakarta” 10.1 “Surabaya” 3.4 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9

Dictionary: deleting elements “Jakarta” 10.1 “Surabaya” 3.4 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4 “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4

Dictionary: deleting elements “Jakarta” 10.1 “Surabaya” 3.4 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4

Dictionary: check if element is in dictionary “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4

Dictionary: get all keys using keys() method “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4 keys() returns a list-like object containing the dictionary’s keys (which can be converted to a list or other collection

Dictionary: get all keys using keys() method “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4

Dictionary: get all values using values() method “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4 values() method is similar, but for getting all values in the dictionary

Dictionary Terdiri atas kumpulan key dan value

More dictionary methods … See https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

Python List In [1]: height = [1.84, 1.79, 1.82, 1.90, 1.80] In [2]: height Out[2]: [1.84, 1.79, 1.82, 1.90, 1.80] In [3]: weight = [66.5, 60.3, 64.7, 89.5, 69.8] In [4]: weight Out[4]: [66.5, 60.3, 64.7, 89.5, 69.8] In [5]: weight / height ** 2 TypeError : unsupported operand type(s) for ** or pow(): 'list’ and 'int' Problem!

Solusi: Library dasar untuk perhitungan saintifik (scientific computing) dengan Python ( https://numpy.org/ ) Alternatif untuk Python List: Numpy Array untuk n - dimensi Mudah digunakan dan bersifat open source Jika library belum terpasang , tuliskan perintah instalasi : pip install numpy Kemudian impor : i mport numpy as np In [6]: import numpy as np In [7]: np_height = np.array (height) In [8]: np_height Out[8]: array([1.84, 1.79, 1.82, 1.9, 1.8]) In [9]: np_weight = np.array (weight) In [10]: np_weight Out[10]: array([66.5, 60.3, 64.7, 89.5, 69.8]) In [11]: bmi = np_weight / np_height ** 2 In [12]: bmi Out[12]: array([19.64201323, 18.81963734, 19.53266514, 24.79224377, 21.54320988])

Pengolahan data dapat berupa bermacam-macam bentuk dan formatnya : dokumen , gambar , video, suara , angka , atau teks Ketika data-data tersebut diproses , tidak secara mentah-mentah dibaca sebagai video atau audio. Tetapi sudah dilakukan transformasi ke dalam bentuk array atau matrix of number Array dengan minimal dua dimensi akan membentuk matriks dan dapat menggunakan NumPy import numpy as np np.<TAB> Digital data

NumPy juga dapat digunakan untuk membuat array berdimensi - n In [13]: import numpy as np In [14]: np_height = np.array ([1.84, 1.79, 1.82, 1.9, 1.8]) In [15]: np_weight = np.array ([66.5, 60.3, 64.7, 89.5, 69.8]) In [16]: type( np_height ) Out[16]: numpy.ndarray In [16]: type( np_weight ) Out[16]: numpy.ndarray ndarray = n-dimensional array In [17]: np_2d = np.array([[1, 2 , 3 , 4 , 5 ], [ 6 , 7 , 8 , 9 , 10 ]]) In [18]: np_2d Out[18]: array([[1, 2 , 3 , 4 , 5 ], [ 6 , 7 , 8 , 9 , 10 ]]) In [19]: np_2d.shape Out[19]: (2, 5) Array berdimensi 2 baris 5 kolom  Matriks   1 2 3 4 5 6 7 8 9 10 M =

SciPy ( dibaca “Sigh Pie ” ) merupakan library yang bersifat open source dan tersedia di https://www.scipy.org/ SciPy dibangun untuk untuk bekerja dengan NumPy array dan menyediakan kumpulan algoritma numerik , termasuk pemrosesan sinyal , optimasi , statistika , dan library Matplotlib untuk visualisasi data. SciPy memiliki fungsi yang lebih optimal dibanding NumPy. Jika library belum terpasang , tuliskan perintah instalasi : pip install scipy

SciPy Gunakan : from scipy import  module

Penanganan Sparse Data

Spatial Data

Pandas (Panel Data) merupakan library popular di Python yang digunakan untuk data structure dan data analysis Bersifat open source dan tersedia di https://pandas.pydata.org/ Pandas sangat berkaitan dengan NumPy Jika library belum terpasang , tuliskan perintah instalasi : pip install pandas Kemudian impor : i mport pandas as pd Data Wrangling / Data Munging Reshaping ( mengubah bentuk data) Joining ( menggabungkan data) Splitting ( pemisahan data) Time-series analysis (data berkala ) Data Cleansing Membersihkan data tidak lengkap ( Error ) Menangani data pencilan (outliers) Menghapus data duplikat

Terdapat 2 data objects : Series dan DataFrame Series  Data berbentuk 1 dimensi In [13]: np.array ([1, 2, 3, 4, 5]) Out[13]: array([1, 2, 3, 4, 5]) DataFrame  Data berbentuk 2 dimensi atau lebih In [14]: np.array ([[1, 2], [3, 4]]) Out[14]: array([[1, 2], [3, 4]]) Representasi Data di Negara Populasi Area Ibukota IN Indonesia 250 123456 Jakarta MA Malaysia 25 3456 KL SI Singapura 15 456 Singapura JP Jepang 60 5678 Tokyo TH Thailand 45 678 Bangkok Kolom: Fitur / atribut Baris: sampel

Pandas dapat mengimpor data dari berbagai format: comma-separated value (CSV), file teks , Microsoft Excel, database SQL, dan format HDF5 Unduh dataset: http://bit.ly/TabDataset CSV file  DataFrame import pandas as pd , Negara,Populasi,Area,Ibukota IN,Indonesia,250,123456,Jakarta MA,Malaysia,25,3456,KL SI,Singapura,15,456,Singapura JP,Jepang,60,5678,Tokyo TH,Thailand,45,678,Bangkok Negara Populasi Area Ibukota IN Indonesia 250 123456 Jakarta MA Malaysia 25 3456 KL SI Singapura 15 456 Singapura JP Jepang 60 5678 Tokyo TH Thailand 45 678 Bangkok In [1]: Tab = ... # deklarasi tabel In [2]: Tab Tab.csv

In [3]: import pandas as pd In [4]: Tab = pd.read_csv (“ Tab.csv ”) In [5]: Tab Out[5]: In [6]: Tab[“Negara”] # akses kolom Out[6]: In [7]: Tab.Ibukota # akses kolom Out[7]:

Matplotlib adala h library Python untuk visualisasi data dengan dua dimensi Bersifat open source dan tersedia di https://matplotlib.org/ Matplotlib berkaitan dengan NumPy dan Pandas Jika library belum terpasang , tuliskan perintah instalasi : pip install matplotlib Kemudian impor : import matplotlib.pyplot as plt bar chart Line chart Scatter plot

In [1]: import matplotlib.pyplot as plt In [2]: year = [1980, 1990, 2000, 2010, 2020] In [3]: price = [2.5, 7.6, 9.7, 15.8, 22.9] In [4]: plt.plot (year, price) In [5]: plt.show ()

In [6]: plt.scatter (year, price) In [7]: plt.bar (year, price)

Seaborn adalah library visualisasi data Python ( serupa dengan Matplotlib) yang menyediakan high-level interface untuk menggambar grafik statistika yang menarik dan informatif Library ini bersifat open source dan tersedia di https://seaborn.pydata.org/ Jika library belum terpasang , tuliskan perintah instalasi : pip install seaborn Kemudian impor : import seaborn as sns Heatmap Line chart Scatter plot

Scikit-learn adalah library untuk mempraktikkan machine learning dan membuat model Bersifat open source dan tersedia di https://scikit-learn.org/ Scikit-learn diawali dari project SciPy (Scientific Python) yang berisi fungsi-fungsi matematis Jika library belum terpasang , tuliskan perintah instalasi : pip install sklearn Kemudian impor : import sklearn Classification Support Vector Machines Decision Tree Random Forest Neural Network Nearest neighbors Clustering K-Means Clustering Hierarchical Clustering Model Selection Cross validation Metrics

Outline Python: Overview Types, Expressions, Variables String operations Lists, Tuples Sets Dictionaries Conditions and Branching

Comparison operators Comparison Operator Operands (Values) Boolean

Comparison: Equality x == 8 5 Boolean

Comparison: Equality 5 == 8? False

Comparison: Equality x == 8 8 Boolean

Comparison: Equality 8 == 8? True

Comparison: Greater than, Less than, etc. because 7 is greater than 6 because 6 is greater than or equal to 6 because 1 is less than 6, i.e., NOT greater than or equal to 6

Comparison: Greater than, Less than, etc. because 1 is less than 6 because 1 is not equal to 6 because 1 is equal to 1

Comparing strings Two strings (or generally, sequences) A and B are the same if their length is the same and for each position i , A[ i ] is equal to B[ i ].

The if Statement Suppose entrance to a tourist attraction is only given to those whose age is at most than 12 years old. E.g., if age is 13, 14, or more, then entrance is not granted and the person just moves on; if age is 12, 11, or less, then entrance is granted and (s)he moves on after enjoying the ride. We can write this branching using if statement if CONDITION: do_something_only_when_condition_holds everyone_do_something After the if-condition, the statements that we want to execute only when the if-condition is true MUST be written with an indentation!

The if Statement Because 11 is less than or equal to 12, if-condition holds Because 13 is more than 12, the indented statement is NOT executed. Indented statements are executed when the if-condition evaluates to true This statement is executed regardless whether the if-condition is true or not

The if-else statement: choosing one of two alternatives If-condition is True: 11 is less than or equal to 12 If-condition is False: 13 is more than 12 This is executed This is NOT executed This is executed, regardless of the if-condition This is NOT executed This is executed This is executed, regardless of the if-condition The else statement

Branching (IF) The elif statement The condition is True (11 is less than or equal to 12) Executed NOT executed This is always executed

The elif statement: one of several alternatives The elif statement if-condition is False: 13 is more than 12 Executed NOT executed This is always executed elif -condition is True, 13 is equal to 13

The elif statement: one of several alternatives The elif statement if-condition is False: 13 is more than 12 Executed NOT executed This is always executed elif -condition is False, 15 is not equal to 13

The if, else, and elif statements Generally, we may have the following form: An if statement Followed by zero or more elif statements Then followed by zero or one else statement. The If-statement and all of elif statements have a Boolean condition The Boolean conditions are tested in turns from top to bottom until one condition is found to be true. Once a true condition is found, the indented statements under the if/ elif statement for whose the condition is true are executed, and then we exit from the whole if- elif -else block. If no condition is found to be true, and there is an else statement, then the indented statements under it are executed. After that we exit the whole if- elif -else block. If there is no else statement, we directly exit the if- elif -else block.

Logic operators Logic Operator Boolean operands/values Boolean

Logic operator: not not True False not False True

Logic operator: or or A C B A B A or B False False False False True True True False True True True True

Logic operator: or The condition is true when pub_year is a number from {…, 1978, 1979, 1990, 1991, …} Was executed because pub_year is 1990

Logic operator: and and A C B A B A and B False False False False True False True False False True True True

Logic operator: and The condition is true when pub_year is a number from {1980, 1981, …, 1989} Was executed because pub_year is 1985

More on branching and conditions … https://docs.python.org/3/library/stdtypes.html#truth-value-testing https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not https://docs.python.org/3/library/stdtypes.html#comparisons https://docs.python.org/3/reference/expressions.html#comparisons https://docs.python.org/3/reference/expressions.html#boolean-operations https://docs.python.org/3/reference/expressions.html#conditional-expressions https://docs.python.org/3/reference/compound_stmts.html#the-if-statement

Outline Loops Functions Reading and Writing Files Pandas Numpy Arrays

The range object We create a range object by calling its constructor range() Range object is immutable sequence of numbers (of strict and specific patterns) and very useful for loops/iteration Similar to tuples, but range cannot be written as an explicit enumeration, and does not support concatenation and repetition (multiplication) To write it as an explicit enumeration, typecast it first as a list or a tuple.

The range object constructor range( stop ) stop must be a positive integer, otherwise the range object is empty generates integers [0, 1, …, stop -1]  length = stop . range( start , stop ) start and stop must be integers start < stop must hold, otherwise the range object is empty generates integers [ start , start + 1, …, stop ]  length = stop – start range( start , stop , step ) start and stop must be integers and step must be nonzero integer if step > 0, then start < stop must hold, otherwise the range object is empty if step < 0, then start > stop must hold, otherwise the range object is empty generates integers [ start , start + step , start + 2*step , …, start + k * step ] where k is the largest integer not exceeding (| stop – start |)/ step .

The range object

More on range object See https://docs.python.org/3/library/stdtypes.html#range

Loops: for statements Suppose we are given a list of strings of color names, e.g., [“ red ”, “ orange ”, “ green ”, “ purple ”, “ blue ”] We wish to replace each color name with “black”, resulting in [“black”, “black”, “black”, “black”, “black”] How do we do that?

colors = [“ red ”, “ orange ”, “ green ”, “ purple ”, “ blue ”] 1 2 3 4 for colors 0 in colors, colors 0 = black for colors 1 in colors, colors 1 = black for colors 2 in colors, colors 2 = black for colors 3 in colors, colors 3 = black for colors 4 in colors, colors 4 = black colors = [“black”, “ orange ”, “ green ”, “ purple ”, “ blue ”] colors = [“black”, “black”, “ green ”, “ purple ”, “ blue ”] colors = [“black”, “black”, “black”, “ purple ”, “ blue ”] colors = [“black”, “black”, “black”, “black”, “ blue ”] colors = [“black”, “black”, “black”, “black”, “black”] running index from 0 to 4 each index value is used to perform color name change range(0,5) returns the sequence [0, 1, 2, 3, 4]

Loop: for statement If we just want to use the value of each element in the list, but not changing the list at all, we can even iterate on the elements directly (not using index) Directly iterate on elements of the list color takes value of “red”, then “orange”, then “green”, and so on

Contoh For lainnya

Nested Loop

Loop: for statement We can also retrieve the index and the corresponding value simultaneously using enumerate() function. enumerate() takes the list as the argument and returning a sequence of pairs. Each pair consists of index and the corresponding element. Each iteration uses one pair at a time.

Loop: while statement For-loop runs a fixed number of iterations. The number of iterations is determined by the sequence of given in the loop condition. While-loop runs as long as the loop condition remains True. The loop stops/terminates as soon as the loop condition becomes False for the first time.

Contoh While

Loop: while statement Suppose we are given a list of color names (whose length and content are unknown), e.g., orange, blue, purple, etc. Starting from the left, we want to copy all occurrences of “orange” to a new list, but stop copying once a different color is found. we don’t know how many “orange” color names are in the initial part of the list, or whether there is any “orange” color name at all. How do we do it?

Suppose we copy from the list colors to oranges if colors is [“orange”, “orange”, “blue”, “orange”, “orange”, “orange”] then oranges is [“orange”, “orange”] if colors is [“green”, “orange”, “orange”] then oranges is [] We start from the left and check if the current color is orange. If so, add it to the list oranges. Otherwise, stop copying.

colors = [“ orange ”, “ orange ”, “ blue ”, “ orange ”, “ orange ”, “ orange ”] oranges = [] oranges = [“orange”] oranges = [“orange”, “orange”] colors 0 == “orange”, so add “orange” to oranges colors 1 == “orange”, so add “orange” to oranges colors 2 != “orange”, so stop the whole repetition

The while statement does not change the running index i automatically, so we need to first initialize it (line 3) and increment it at every iteration (line 7).

More on loops … https://docs.python.org/3/reference/compound_stmts.html#the-while-statement https://docs.python.org/3/reference/compound_stmts.html#the-for-statement

Outline Loops Functions Reading and Writing Files Pandas Numpy Arrays

Functions Functions take some input and produce some output or change(s) It is simply a piece of code that can be reused. You can define your own function Or, more often, you simply use other people’s functions You just need to know how the function works (what’s the input and output) and sometimes, how to import the function to your program Functions Output (value of b) Output (value of a)

Similar piece of code Shorter main code Repeated similar parts are separated as a function. The main code sends a value to the function and receives the function return value.

Python built-in function example: len () len (Q) Q is a sequence (e.g., string, tuple, list, range) or a collection (e.g., set, dictionary) Returns the length of Q, i.e., the number of elements it contains.

Python built-in function example: sum() sum(Q) Q is an iterable (e.g., tuple, list, set, etc.) containing numbers (or anything that can be summed) Returns the sum of all elements in Q

Python builtin function example: sorted() and list.sort () sorted(Q) Q is an iterable (list, tuple, etc.) Returns a new list containing the elements of Q in a sorted order The elements must be things that can be sorted, e.g., numbers, characters, strings Q itself does not change after calling sorted().

Python builtin function example: sorted() and list.sort () If Q is a list and we wish itself to be sorted, use the list sort() method, i.e., call Q.sort () Calling Q.sort () does not return a new list, but there is a change in Q after the method call.

Making functions Start with the keyword def Function input given as variables called formal parameters , written inside parentheses Use a descriptive name add1 since we want to return the input plus 1 Indented code block of statements Return the function result using keyword return After defining the function, we can call it Documentation string. Calling help(add1) will yield this string.

Function: how does it work? Variable c to be assigned the value of add1(7). Function add1 is called with argument value 7 7 is passed into the function as parameter Statements in the function run with parameter variable a replaced by 7 Return the value of b and execute the assignment def add1( ): b = a + 1 7 + 1 8 return a c = add1( ) 7 b 7 8 c:8 b

Function: variable scoping All variables defined inside a function, including parameter variables are called local variables. Values are assigned to local variables only when the function is called and they will be gone after exiting the function. In the next function call, those local variables are defined again from scratch with possibly new values.

Function: variable scoping When line 5 is executed add1(a) is called with a assigned to 7 Variable b becomes 8 and then 8 is returned. Variable c is now assigned to 8 and values of a and b are gone from the memory. Then, line 6 is executed. Process similar like above, but now with a different value for variable a. Variable b is defined again, gets a different value, and is returned.

Function can have multiple parameters Function my_mult multiplies both of its arguments Function looks fine when given numbers Careful : my_mult doesn’t give an error when arg1 is an integer and arg2 is a string because multiplication can also mean duplicating strings. Is this an intended behavior of my_mult ? Needs to perform more testing to make sure the function behaves as we intended.

Return statement is optional In a function execution, Python will exit from a function when a return statement is executed; or there is no more statement that can be executed So, functions can omit return statements e.g., when we simply want to print something or make some changes to an object/data. If a function exists NOT through a return statement, Python will (silently) return an object of type None to the caller.

Return statement is optional In a function execution, Python will exit from a function when a return statement is executed; or there is no more statement to be executed So, functions can omit return statements e.g., when we simply want to print something or make some changes to an object/data. If a function exists NOT through a return statement, Python will (silently) return an object of type None to the caller. Function body cannot be empty a trivial function that returns None will have a pass statement as its only statement. pass means “do nothing”.

Function can do multiple tasks This function: prints a statement; and returns its argument added by 2.

Function can have all kinds of statements, such as loops.

Collecting arguments We can define a function to receive varying number of arguments (i.e., not fixed to one argument or two argument in its definition).

Recursion n=5 Factorial(n=5) 5 * Factorial(5-1) 4 * Factorial(4-1) 3 * Factorial(3-1) 2 * Factorial(2-1) Factorial(1) 1 2 6 24 120 1 2 6 24 120

Lambda A lambda function is a small anonymous function. A lambda function can take any number of arguments, but can only have one expression.

Outline Loops Functions Reading and Writing Files Pandas Numpy Arrays

Working with files So far, we only work solely using data defined in our Jupyter Notebook Now we will learn how to work with text files in our system File in our filesystem is opened using open() function

The open() function Content of line 1 Content of line 2 Content of line 3 … File in file system open() File object We can read the content of the file and write into it by calling various methods of the file object
Tags