Dalam modul ini, kita berkenalan dengan beberapa kakas yang digunakan data scientist.
Kakas yang digunakan oleh data scientist mana pun, seperti halnya para programmers,
adalah unsur penting untuk keberhasilan dan peningkatan kinerja. Sebagian besar effort
dalam proyek data science digunakan untuk p...
Dalam modul ini, kita berkenalan dengan beberapa kakas yang digunakan data scientist.
Kakas yang digunakan oleh data scientist mana pun, seperti halnya para programmers,
adalah unsur penting untuk keberhasilan dan peningkatan kinerja. Sebagian besar effort
dalam proyek data science digunakan untuk pemrosesan data. Memilih kakas yang tepat
dapat menghemat banyak waktu dan dengan demikian memungkinkan kita untuk fokus
lebih banyak pada analisis data. Hal mendasar yang perlu diputuskan adalah memilih
bahasa pemrograman yang akan digunakan.
Beberapa orang hanya menggunakan satu bahasa pemrograman dalam tugas-tugas
mereka, yang pertama dan satu-satunya yang mereka pelajari. Karena boleh jadi
mempelajari bahasa pemrograman baru adalah tugas besar yang jika memungkinkan
harus dilakukan hanya sekali. Masalah utama adalah adanya kemungkinan tidak
tersedianya kakas tertentu dalam satu bahasa pemrograman. Sehingga pada akhirnya
kita harus mengimplementasikannya kembali atau membuat koneksi untuk
menggunakan beberapa bahasa lain hanya untuk tugas tertentu.
Jadi, kita harus siap untuk mengubah ke bahasa terbaik untuk setiap tugas dan kemudian
mendapatkan hasilnya. Sebagai alternatif, kita bisa memilih bahasa yang sangat fleksibel
dengan ekosistem yang kaya (misalnya, tersedianya library yang bersifat open-source).
Dalam modul ini, kita akan menggunakan Python sebagai bahasa pemrograman.
Python1 didesain oleh Guido van Rossum2 pada 1991 sebagai general-purpose language
dan bersifat open-source. Python merupakan bahasa pemrograman populer tetapi juga
memiliki properti yang sangat baik untuk programmer pemula, sehingga ideal untuk
orang yang belum pernah memprogram sebelumnya. Python juga bersifat cross-platform,
artinya bisa digunakan pada sistem operasi Windows, Mac OS, dan Linux.
Beberapa kelebihan dari properti tersebut adalah kode yang mudah dibaca, pengetikan
dan penggunaan memori yang dinamis. Python adalah bahasa interpreter, sehingga kode
dieksekusi segera di konsol Python tanpa memerlukan langkah kompilasi ke bahasa
mesin. Selain konsol Python (yang disertakan dengan instalasi Python apa pun), Anda dapat menemukan konsol interaktif lainnya, seperti IPython (Interactive Python)
3 dan
Google Colab4 yang memberi Anda lingkungan yang lebih interaktif untuk mengeksekusi
kode Python Anda.
Saat ini, Python adalah salah satu bahasa pemrograman yang paling fleksibel. Salah satu
ciri utama yang membuatnya begitu fleksibel adalah dapat dilihat sebagai bahasa
multiparadigma. Ini sangat berguna bagi orang yang sudah tahu cara memprogram
dengan bahasa lain, karena mereka dapat dengan cepat memulai pemrograman dengan
Python dengan cara yang sama. Misalnya, programmer Java akan merasa nyaman
menggunakan Python karena mendukung paradigma pemrograman berorientasi objek,
atau programmer C dapat mencampur kode Python dan C menggunakan cython.Dalam modul ini, kami menggunakan bahasa Python
karena, seperti yang dijelaskan sebelumnya, ini adalah bahasa pemrograman yang
matan
Size: 12.03 MB
Language: en
Added: Jul 03, 2024
Slides: 178 pages
Slide Content
thematic Academy Data Scientist: Artificial Intelligence untuk Dosen dan Instruktur Pertemuan # 4 : Tools Proyek Data Science
Deskripsi Pelatihan Tujuan utama dari modul pelatihan ini adalah untuk membahas data science tools dengan menjelaskan seperangkat kakas dan teknik yang berkaitan dengan keterampilan dasar dalam ilmu komputer , matematika , dan statistik untuk melakukan tugas-tugas yang umumnya terkait dengan data science.
Capaian Pembelajaran Pada topik ini , kita akan mempelajari : Bahasa Pemrograman Python Development Environment Dasar- dasar library Python untuk proyek data science NumPy, SciPy, Pandas, Matplotlib, Seaborn, Scikit-learn
Python Bahasa pemrograman tingkat tinggi Penulisan kode / sintaks lebih sederhana dan tersedia banyak library Bersifat open-source dan cross-platform Diluncurkan oleh Guido Van Rosum pada tahun 1991. Data Analyst Data Engineer Data Scientist Business Intelligence ML Engineer Data Professional Cocok untuk pemula Sederhana tapi powerful High-demand skill Python
Kenapa Python? Freely usable including for commercial purpose Image Source: edureka.co
Python is popular
Python Menjadi yang Pertama dalam Daftar Keahlian yang Paling Dibutuhkan ( sumber : https://towardsdatascience.com )
Web apps, API Executable py2exe pyinstaller cx_freeze Interpreter Call within other PL
Python is easy to use (more concise) versus Bahasa C #include < stdio.h > int main() { printf ("Hello World!"); return 0; }
Mengapa Python? https://www.japan.go.jp/abenomics/_userdata/abenomics/pdf/society_5.0.pdf Talenta digital Data Professional Python high-demand skill
Python D igunakan pada YouTube “ Google runs millions of lines of Python code. The front-end server that drives youtube.com and YouTubes APIs is primarily written in Python, and it serves millions of requests per second !" ─ Dylan Trotter, Youtube Engineer, 2017 https://opensource.googleblog.com/2017/01/grumpy-go-running-python.html
Python Digunakan pada Quora “ We decided that Python was fast enough for most of what we need to do (since we push our performance-critical code to backend servers written in C++ whenever possible). As far as typechecking , we ended up writing very thorough unit tests which are worth writing anyway, and achieve most of the same goals .” ─ Adam D’Angelo, CEO Quora, 2014 https://www.quora.com/Why-did-Quora-choose-Python-for-its-development
Python Digunakan pada Beberapa Industri
Penerapan Python pada Proyek Data Science Data Exploration Data Pre- Processing Data Cleansing Data Modeling Scraping, crawling, data mining Coding, query Seleksi fitur , statistika deskriptif , class balancing , visualisasi data Transformasi fitur : Categorical encoding , binning Menangani nilai kosong (missing values), menghapus baris terduplikasi Data formating , menangani data pencilan (outliers) Melatih data dengan algoritma machine learning Melakukan klasifikasi , regresi , prediksi , klasterisasi Python
Memulai Python Python adalah bahasa interpreter , yang dapat mengurangi siklus edit-test-debug karena tidak memerlukan langkah kompilasi Untuk menjalankan Python , Anda memerlukan runtime/interpreter environment untuk mengeksekusi kode : Mode interaktif : Setiap perintah yang Anda tulis akan langsung ditafsirkan dan segera dieksekusi sehingga bisa langsung melihat hasilnya IPython Mode skrip : Anda memasukkan satu set kode Python ke dalam format . py , program dijalankan baris demi baris Python interpreter . py Hasil output
Konsep IPython : REPL Environment R ead Proses membaca code E val Proses evaluasi ( eksekusi ) code P r int Proses menampilkan hasil (output) L oop Pengulangan proses R-E-P
Pilihan Development Environment Pilih Development Environment yang paling mudah dan nyaman : Anaconda Distribution ( https://www.anaconda.com/distribution/ ) Python, Conda , lebih dari 1000 library data science Miniconda ( https://docs.conda.io/en/latest/miniconda.html ) Python interpreter, Conda Jupyter Notebook ( https://jupyter.org/ ) Python installer ( https://www.python.org/downloads/ ). Google Colaboratory ( https://colab.research.google.com/ ). Notebooks Azure ( https://notebooks.azure.com/ )
Anaconda Distribution Anaconda Navigator Sebua h aplikasi dashboard interface pada paket Anaconda Distribution
Jupyter Notebook L ingkungan pemrograman interaktif berbasis web yang mendukung berbagai bahasa pemrograman termasuk Python Banyak digunakan oleh peneliti dan akademisi untuk pemodelan matematika , pembelajaran mesin , analisis statistik , dan untuk pengajaran pemrograman
Jupyter Notebook Skrip dapat ditulis dalam bentuk : Code : Algoritma dan formula matematis Markdown/Heading : Teks deskripsi , penjelasan code Raw NBConvert : Konversi format yang berbeda Hasil dapat diketahui langsung setelah menjalankan perintah Run
Google Colaboratory Skrip dapat ditulis dalam bentuk : Code : Algoritma dan formula matematis Teks: Teks deskripsi , penjelasan code Dapat digunakan pada https://colab.research.google.com/ dan hasil dapat diketahui langsung setelah menjalankan perintah Run
Bekerja dengan Git Git merupakan kakas yang bersifat open source untuk memudahkan bekerja dengan proyek berskala kecil maupun besar ( https://git-scm.com/ ) Git memiliki tiga status utama tempat file berada : modified, staged, committed: Modified berarti Anda telah mengubah file tetapi belum menyimpannya ke database Anda Staged berarti Anda telah menandai file yang dimodifikasi dalam versi terbaru untuk masuk ke tahap commit Commit berarti bahwa data disimpan dengan aman di database local Anda
Bekerja dengan Git Inisialisasi : git init Commit: git commit -m "first commit" Branch: git branch -M main Add: git remote add origin https://github.com/[user]/[repo].git Push: git push -u origin main Pull: git pull origin [branch]
Hello World! Bahasa C #include < stdio.h > int main() { printf ("Hello World!"); return 0; } Bahasa Python print(”Hello World!”) Lebih sederhana Tida k ada kurung kurawal {..} Tidak perlu titik koma ;
Software Development Kamus Semua variable dan struktur data yang digunakan dalam program Algoritma Rangkaian instruksi untuk mencapai tujuan program
Tipe Data Python f loat – bilangan riil int – bilangan bulat (integer) str – string, teks bool – True or False In [1]: height = 1.84 In [2]: tall = True Masalah Terlalu banyak data masukan untuk tipe data yang sama Tidak nyaman In [3]: height1 = 1.84 In [4]: height2 = 1.79 In [5]: height3 = 1.82 In [6]: height4 = 1.90 Solusi Python List
Python List [a, b, c] Koleksi nilai-nilai Dapat mengandung beberapa tipe data berbeda In [7]: [1.84, 1.79, 1.82, 1.90, 1.80] Out[7]: [1.84, 1.79, 1.82, 1.90, 1.80] In [8]: height = [ 1.84, 1.79, 1.82, 1.90, 1.80 ] In [9]: height Out[9]: [ 1.84, 1.79, 1.82, 1.90, 1.80 ] In [10]: famz = ["Abe", 1.84, " Beb ", 1.79, "Cory", 1.82, "Dad", 1.90] In [11]: famz Out[11]: [“Abe", 1.84, “ Beb ", 1.79, “Cory", 1.82, “Dad", 1.90 ] [“Abe", 1.84 ] [ “ Beb ", 1.79 ] [“Cory", 1.82 ] [“Dad", 1.90 ]
First Python program The first thing to note, a Python code line that begins with a hash symbol will be considered a comment, which will not be executed.
Syntax error message A mistake in writing down a Python command will yield an error message when executed.
Semantic error If there is a mistake in the program logic, Python will not tell you a mistake if no syntax error is detected. E.g., suppose the intention is to print ‘Hello Python 101’, but we accidentally write ‘Hello Python 102’, then Python won’t complain although the code is wrong.
Variables Variable bertugas menyimpan nilai , bisa berupa nilai . Nilai pada variable memiliki tipe Nilai dimasukkan ke dalam variable menggunakan assignment dengan tanda ‘=‘ Variable ditulis di sebelah kiri tanda “=“ Nilai variable ditulis di sebelah kanan tanda “=“ Nilai bisa diperoleh dari hasil sebuah ekspresi matematika , sebuah nilai konstan , atau hasil sebuah fungsi
Python types Python types Example expression int 11, -14, 0, 2 float 21.3201, 0.0, 0.8, -2.34 str “Hello Python 101” boolean True, False Use type() function to get the type of an expression variable nilai konstan
Expressions Note that / and // are different
my_var 1 13 8 20 y 40
Ekspresi dengan arithmetic operator dan variable Ekspresi matematika
Ekspresi dengan Assignment Operator dan Variable
Operasi matematika lainnya
print(____) assignment variable value/ nilai
Error/ kesalahan : Lokasi kesalahan Jenis kesalahan komentar
String slicing You can get a substring by slicing. s[ m:n ] is a substring of s taken from character at index m until character at index n-1 If n ≤ m, maka hasil slicing adalah string kosong S e m a r a n g T a w a n g 1 2 3 4 5 6 7 8 9 10 11 12 13 14 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
String concatenation We use “+” to concatenate strings
String replication Multiply a string with a number (using “*”) yields a new string containing the replication of the old string.
Strings are immutable! String is immutable: you cannot change the value of characters in a string. But you can reassign the variable to a new string
String: escape sequences Certain characters are difficult to input (e.g., for print() function), so we use prefix it with a backslash “\” to use them. E.g., newline, tab, and the backslash itself. Printing backslash character without escaping can be done using raw string notation (with the ‘r’ prefix)
Special Character Error/ kesalahan : Lokasi kesalahan Jenis kesalahan Special character / character Arti \b ASCII Backspace (BS) \r ASCII CR \ ooo Character dgn nilai octal ooo \ xhh Character dgn nilai hex hh
Fungsi-fungsi pada String: Upper upper() returns uppercase version of the string upper() A = “Surabaya is the city of heroes” B: “SURABAYA IS THE CITY OF HEROES”
String: replace substring with another Get a new string with the given substring replaced by a new substring A: “Jakarta is the largest” “Jakarta is the largest ” B: “Jakarta is the most crowded” most crowded
String: Find the occurrence of substring Find the location of the first occurrence of the given substring S e m a r a n g T a w a n g 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Lists: Accessing elements Elements of lists can be accessed using index in the same way as tuples or strings. L = ['Surabaya', 3.4, 1293] 1 2 -3 -2 -1
Lists are mutable Unlike tuples or strings, elements of a list can be changed.
Lists are mutable If a list is referenced by two variables, changing its elements via one variable will cause the changes to be visible from the other variable. Suppose we have L and LL as follows: We change L[1], and the change appear in LL too
List slicing Slicing works like in tuples, but returns a new list (not sharing reference with the old one). Lst [3] is changed Lst35 includes element at Lst [3] But, Lst35 is NOT changed!
List slicing Thus slicing can be used to copy a list Lst_c is a copy of Lst Lst [0] is changed But Lst_c is NOT changed!
List Cek isi list Tambah isi list
Tambah , Hapus
Sort, Reverse, Max, Min
Copy List
Convert string to list To convert a string (of words, separated by spaces) into a list of words, we use the string method split() If we want, we can specify which character to be used as separator/delimiter (instead of spaces).
More list methods … See https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range Or you can type: help(list)
Tuples Tuple is an ordered sequence Tuples are written as comma-separated items within parentheses. Its length is the number of elements it contains
Tuple access & slicing We can access & slice tuples like string (‘Alice’, 85, 3.8, ‘USA’, 25) 1 2 3 4 1 2 3 4
Tuple nesting Tuples can be nested: a tuple may contain other tuples. 1 2 3 4
nt = (5, 7, ('Java', 'Bali'), (2, 11), ('Madura', (5, 7))) 1 2 3 4 ('Java', 'Bali') 'Java' 'Bali' nt [2][0] nt [2][1] nt [2] (2, 11) 2 11 nt [3][0] nt [3][1] nt [3] (‘Madura’, (5,7)) ‘Madura' (5,7) nt [4][0] nt [4][1] nt [4] 'B’ 'a’ 5 7 nt [2][1][0] nt [2][1][1] nt [4][1][0] nt [4][1][1]
Sets Sets : unordered mutable collection of unique elements (Like lists and tuples) Sets can store elements of any Python types (Unlike lists and tuples) Sets do not record element position (Unlike lists and tuples) Sets only contain unique elements – duplicates are not stored. Elements of sets must be hashable , e.g.: immutable objects immutable containers/collections whose elements are all immutable objects So, sets cannot contain lists and other sets
Creating a set Or use the set() function Creating empty set can only be done using set() function, not curly braces! Not a set!
Set: adding elements Adding the same element doesn’t change the set “Bandung”, “Bogor”, “Depok” “Bandung”, “Bogor”, “Depok”, “Jakarta” “Bandung”, “Bogor”, “Depok”, “Jakarta”
Set: removing elements “Bandung”, “Bogor”, “Depok” “Bandung”, “Bogor”, “Depok”, “Jakarta”
Set: check if the set contain the given element “Bandung”, “Bogor”, “Depok”
Set intersection “Bogor”, “Depok”, “Jakarta” “Bogor”, “Depok”, “Bandung”
Set intersection “Bogor”, “Depok”
Set union “Bogor”, “Depok”, “Jakarta” “Bogor”, “Depok”, “Bandung”
Set union “Bogor”, “Depok”
Subset test “Bogor”, “Depok”, “Jakarta” “Bogor”, “Depok”
Subset test “Bogor”, “Depok” “Bogor”, “Depok”, “Jakarta”
More set methods … See https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
Dictionaries Dictionaries: a collection of pairs Each pair consists of a key followed by a value separated by a colon. A dictionary’s keys form a set: The keys are unique and immutable Position of keys in dictionary are not recorded Values may be immutable, mutable, and duplicates Denoted by a pair of curly brackets where each key-value pair is separated by a comma from other key-value pairs.
Dictionary: analogy with list Dictionaries are similar to lists in the sense that we use arbitrary immutable objects as indices instead of integers. Element1 1 Element2 2 Element3 3 Element4 … … Index Element List Key 1 Value 1 Key 2 Value 2 Key 3 Value 3 Key 4 Value 4 … … Key: is an index by label Value Dictionary
Dictionary: check if element is in dictionary “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4
Dictionary: get all keys using keys() method “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4 keys() returns a list-like object containing the dictionary’s keys (which can be converted to a list or other collection
Dictionary: get all keys using keys() method “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4
Dictionary: get all values using values() method “Jakarta” 10.1 “Bogor” 1.1 “Depok” 1.8 “Bandung” 2.5 “Medan” 2.2 “Makassar” 1.4 “Denpasar” 0.9 “Ambon” 0.4 values() method is similar, but for getting all values in the dictionary
Dictionary Terdiri atas kumpulan key dan value
More dictionary methods … See https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
Python List In [1]: height = [1.84, 1.79, 1.82, 1.90, 1.80] In [2]: height Out[2]: [1.84, 1.79, 1.82, 1.90, 1.80] In [3]: weight = [66.5, 60.3, 64.7, 89.5, 69.8] In [4]: weight Out[4]: [66.5, 60.3, 64.7, 89.5, 69.8] In [5]: weight / height ** 2 TypeError : unsupported operand type(s) for ** or pow(): 'list’ and 'int' Problem!
Solusi: Library dasar untuk perhitungan saintifik (scientific computing) dengan Python ( https://numpy.org/ ) Alternatif untuk Python List: Numpy Array untuk n - dimensi Mudah digunakan dan bersifat open source Jika library belum terpasang , tuliskan perintah instalasi : pip install numpy Kemudian impor : i mport numpy as np In [6]: import numpy as np In [7]: np_height = np.array (height) In [8]: np_height Out[8]: array([1.84, 1.79, 1.82, 1.9, 1.8]) In [9]: np_weight = np.array (weight) In [10]: np_weight Out[10]: array([66.5, 60.3, 64.7, 89.5, 69.8]) In [11]: bmi = np_weight / np_height ** 2 In [12]: bmi Out[12]: array([19.64201323, 18.81963734, 19.53266514, 24.79224377, 21.54320988])
Pengolahan data dapat berupa bermacam-macam bentuk dan formatnya : dokumen , gambar , video, suara , angka , atau teks Ketika data-data tersebut diproses , tidak secara mentah-mentah dibaca sebagai video atau audio. Tetapi sudah dilakukan transformasi ke dalam bentuk array atau matrix of number Array dengan minimal dua dimensi akan membentuk matriks dan dapat menggunakan NumPy import numpy as np np.<TAB> Digital data
NumPy juga dapat digunakan untuk membuat array berdimensi - n In [13]: import numpy as np In [14]: np_height = np.array ([1.84, 1.79, 1.82, 1.9, 1.8]) In [15]: np_weight = np.array ([66.5, 60.3, 64.7, 89.5, 69.8]) In [16]: type( np_height ) Out[16]: numpy.ndarray In [16]: type( np_weight ) Out[16]: numpy.ndarray ndarray = n-dimensional array In [17]: np_2d = np.array([[1, 2 , 3 , 4 , 5 ], [ 6 , 7 , 8 , 9 , 10 ]]) In [18]: np_2d Out[18]: array([[1, 2 , 3 , 4 , 5 ], [ 6 , 7 , 8 , 9 , 10 ]]) In [19]: np_2d.shape Out[19]: (2, 5) Array berdimensi 2 baris 5 kolom Matriks 1 2 3 4 5 6 7 8 9 10 M =
SciPy ( dibaca “Sigh Pie ” ) merupakan library yang bersifat open source dan tersedia di https://www.scipy.org/ SciPy dibangun untuk untuk bekerja dengan NumPy array dan menyediakan kumpulan algoritma numerik , termasuk pemrosesan sinyal , optimasi , statistika , dan library Matplotlib untuk visualisasi data. SciPy memiliki fungsi yang lebih optimal dibanding NumPy. Jika library belum terpasang , tuliskan perintah instalasi : pip install scipy
SciPy Gunakan : from scipy import module
Penanganan Sparse Data
Spatial Data
Pandas (Panel Data) merupakan library popular di Python yang digunakan untuk data structure dan data analysis Bersifat open source dan tersedia di https://pandas.pydata.org/ Pandas sangat berkaitan dengan NumPy Jika library belum terpasang , tuliskan perintah instalasi : pip install pandas Kemudian impor : i mport pandas as pd Data Wrangling / Data Munging Reshaping ( mengubah bentuk data) Joining ( menggabungkan data) Splitting ( pemisahan data) Time-series analysis (data berkala ) Data Cleansing Membersihkan data tidak lengkap ( Error ) Menangani data pencilan (outliers) Menghapus data duplikat
Terdapat 2 data objects : Series dan DataFrame Series Data berbentuk 1 dimensi In [13]: np.array ([1, 2, 3, 4, 5]) Out[13]: array([1, 2, 3, 4, 5]) DataFrame Data berbentuk 2 dimensi atau lebih In [14]: np.array ([[1, 2], [3, 4]]) Out[14]: array([[1, 2], [3, 4]]) Representasi Data di Negara Populasi Area Ibukota IN Indonesia 250 123456 Jakarta MA Malaysia 25 3456 KL SI Singapura 15 456 Singapura JP Jepang 60 5678 Tokyo TH Thailand 45 678 Bangkok Kolom: Fitur / atribut Baris: sampel
Pandas dapat mengimpor data dari berbagai format: comma-separated value (CSV), file teks , Microsoft Excel, database SQL, dan format HDF5 Unduh dataset: http://bit.ly/TabDataset CSV file DataFrame import pandas as pd , Negara,Populasi,Area,Ibukota IN,Indonesia,250,123456,Jakarta MA,Malaysia,25,3456,KL SI,Singapura,15,456,Singapura JP,Jepang,60,5678,Tokyo TH,Thailand,45,678,Bangkok Negara Populasi Area Ibukota IN Indonesia 250 123456 Jakarta MA Malaysia 25 3456 KL SI Singapura 15 456 Singapura JP Jepang 60 5678 Tokyo TH Thailand 45 678 Bangkok In [1]: Tab = ... # deklarasi tabel In [2]: Tab Tab.csv
In [3]: import pandas as pd In [4]: Tab = pd.read_csv (“ Tab.csv ”) In [5]: Tab Out[5]: In [6]: Tab[“Negara”] # akses kolom Out[6]: In [7]: Tab.Ibukota # akses kolom Out[7]:
Matplotlib adala h library Python untuk visualisasi data dengan dua dimensi Bersifat open source dan tersedia di https://matplotlib.org/ Matplotlib berkaitan dengan NumPy dan Pandas Jika library belum terpasang , tuliskan perintah instalasi : pip install matplotlib Kemudian impor : import matplotlib.pyplot as plt bar chart Line chart Scatter plot
In [1]: import matplotlib.pyplot as plt In [2]: year = [1980, 1990, 2000, 2010, 2020] In [3]: price = [2.5, 7.6, 9.7, 15.8, 22.9] In [4]: plt.plot (year, price) In [5]: plt.show ()
In [6]: plt.scatter (year, price) In [7]: plt.bar (year, price)
Seaborn adalah library visualisasi data Python ( serupa dengan Matplotlib) yang menyediakan high-level interface untuk menggambar grafik statistika yang menarik dan informatif Library ini bersifat open source dan tersedia di https://seaborn.pydata.org/ Jika library belum terpasang , tuliskan perintah instalasi : pip install seaborn Kemudian impor : import seaborn as sns Heatmap Line chart Scatter plot
Scikit-learn adalah library untuk mempraktikkan machine learning dan membuat model Bersifat open source dan tersedia di https://scikit-learn.org/ Scikit-learn diawali dari project SciPy (Scientific Python) yang berisi fungsi-fungsi matematis Jika library belum terpasang , tuliskan perintah instalasi : pip install sklearn Kemudian impor : import sklearn Classification Support Vector Machines Decision Tree Random Forest Neural Network Nearest neighbors Clustering K-Means Clustering Hierarchical Clustering Model Selection Cross validation Metrics
Comparison: Greater than, Less than, etc. because 7 is greater than 6 because 6 is greater than or equal to 6 because 1 is less than 6, i.e., NOT greater than or equal to 6
Comparison: Greater than, Less than, etc. because 1 is less than 6 because 1 is not equal to 6 because 1 is equal to 1
Comparing strings Two strings (or generally, sequences) A and B are the same if their length is the same and for each position i , A[ i ] is equal to B[ i ].
The if Statement Suppose entrance to a tourist attraction is only given to those whose age is at most than 12 years old. E.g., if age is 13, 14, or more, then entrance is not granted and the person just moves on; if age is 12, 11, or less, then entrance is granted and (s)he moves on after enjoying the ride. We can write this branching using if statement if CONDITION: do_something_only_when_condition_holds everyone_do_something After the if-condition, the statements that we want to execute only when the if-condition is true MUST be written with an indentation!
The if Statement Because 11 is less than or equal to 12, if-condition holds Because 13 is more than 12, the indented statement is NOT executed. Indented statements are executed when the if-condition evaluates to true This statement is executed regardless whether the if-condition is true or not
The if-else statement: choosing one of two alternatives If-condition is True: 11 is less than or equal to 12 If-condition is False: 13 is more than 12 This is executed This is NOT executed This is executed, regardless of the if-condition This is NOT executed This is executed This is executed, regardless of the if-condition The else statement
Branching (IF) The elif statement The condition is True (11 is less than or equal to 12) Executed NOT executed This is always executed
The elif statement: one of several alternatives The elif statement if-condition is False: 13 is more than 12 Executed NOT executed This is always executed elif -condition is True, 13 is equal to 13
The elif statement: one of several alternatives The elif statement if-condition is False: 13 is more than 12 Executed NOT executed This is always executed elif -condition is False, 15 is not equal to 13
The if, else, and elif statements Generally, we may have the following form: An if statement Followed by zero or more elif statements Then followed by zero or one else statement. The If-statement and all of elif statements have a Boolean condition The Boolean conditions are tested in turns from top to bottom until one condition is found to be true. Once a true condition is found, the indented statements under the if/ elif statement for whose the condition is true are executed, and then we exit from the whole if- elif -else block. If no condition is found to be true, and there is an else statement, then the indented statements under it are executed. After that we exit the whole if- elif -else block. If there is no else statement, we directly exit the if- elif -else block.
Logic operator: or or A C B A B A or B False False False False True True True False True True True True
Logic operator: or The condition is true when pub_year is a number from {…, 1978, 1979, 1990, 1991, …} Was executed because pub_year is 1990
Logic operator: and and A C B A B A and B False False False False True False True False False True True True
Logic operator: and The condition is true when pub_year is a number from {1980, 1981, …, 1989} Was executed because pub_year is 1985
More on branching and conditions … https://docs.python.org/3/library/stdtypes.html#truth-value-testing https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not https://docs.python.org/3/library/stdtypes.html#comparisons https://docs.python.org/3/reference/expressions.html#comparisons https://docs.python.org/3/reference/expressions.html#boolean-operations https://docs.python.org/3/reference/expressions.html#conditional-expressions https://docs.python.org/3/reference/compound_stmts.html#the-if-statement
Outline Loops Functions Reading and Writing Files Pandas Numpy Arrays
The range object We create a range object by calling its constructor range() Range object is immutable sequence of numbers (of strict and specific patterns) and very useful for loops/iteration Similar to tuples, but range cannot be written as an explicit enumeration, and does not support concatenation and repetition (multiplication) To write it as an explicit enumeration, typecast it first as a list or a tuple.
The range object constructor range( stop ) stop must be a positive integer, otherwise the range object is empty generates integers [0, 1, …, stop -1] length = stop . range( start , stop ) start and stop must be integers start < stop must hold, otherwise the range object is empty generates integers [ start , start + 1, …, stop ] length = stop – start range( start , stop , step ) start and stop must be integers and step must be nonzero integer if step > 0, then start < stop must hold, otherwise the range object is empty if step < 0, then start > stop must hold, otherwise the range object is empty generates integers [ start , start + step , start + 2*step , …, start + k * step ] where k is the largest integer not exceeding (| stop – start |)/ step .
The range object
More on range object See https://docs.python.org/3/library/stdtypes.html#range
Loops: for statements Suppose we are given a list of strings of color names, e.g., [“ red ”, “ orange ”, “ green ”, “ purple ”, “ blue ”] We wish to replace each color name with “black”, resulting in [“black”, “black”, “black”, “black”, “black”] How do we do that?
colors = [“ red ”, “ orange ”, “ green ”, “ purple ”, “ blue ”] 1 2 3 4 for colors 0 in colors, colors 0 = black for colors 1 in colors, colors 1 = black for colors 2 in colors, colors 2 = black for colors 3 in colors, colors 3 = black for colors 4 in colors, colors 4 = black colors = [“black”, “ orange ”, “ green ”, “ purple ”, “ blue ”] colors = [“black”, “black”, “ green ”, “ purple ”, “ blue ”] colors = [“black”, “black”, “black”, “ purple ”, “ blue ”] colors = [“black”, “black”, “black”, “black”, “ blue ”] colors = [“black”, “black”, “black”, “black”, “black”] running index from 0 to 4 each index value is used to perform color name change range(0,5) returns the sequence [0, 1, 2, 3, 4]
Loop: for statement If we just want to use the value of each element in the list, but not changing the list at all, we can even iterate on the elements directly (not using index) Directly iterate on elements of the list color takes value of “red”, then “orange”, then “green”, and so on
Contoh For lainnya
Nested Loop
Loop: for statement We can also retrieve the index and the corresponding value simultaneously using enumerate() function. enumerate() takes the list as the argument and returning a sequence of pairs. Each pair consists of index and the corresponding element. Each iteration uses one pair at a time.
Loop: while statement For-loop runs a fixed number of iterations. The number of iterations is determined by the sequence of given in the loop condition. While-loop runs as long as the loop condition remains True. The loop stops/terminates as soon as the loop condition becomes False for the first time.
Contoh While
Loop: while statement Suppose we are given a list of color names (whose length and content are unknown), e.g., orange, blue, purple, etc. Starting from the left, we want to copy all occurrences of “orange” to a new list, but stop copying once a different color is found. we don’t know how many “orange” color names are in the initial part of the list, or whether there is any “orange” color name at all. How do we do it?
Suppose we copy from the list colors to oranges if colors is [“orange”, “orange”, “blue”, “orange”, “orange”, “orange”] then oranges is [“orange”, “orange”] if colors is [“green”, “orange”, “orange”] then oranges is [] We start from the left and check if the current color is orange. If so, add it to the list oranges. Otherwise, stop copying.
colors = [“ orange ”, “ orange ”, “ blue ”, “ orange ”, “ orange ”, “ orange ”] oranges = [] oranges = [“orange”] oranges = [“orange”, “orange”] colors 0 == “orange”, so add “orange” to oranges colors 1 == “orange”, so add “orange” to oranges colors 2 != “orange”, so stop the whole repetition
The while statement does not change the running index i automatically, so we need to first initialize it (line 3) and increment it at every iteration (line 7).
More on loops … https://docs.python.org/3/reference/compound_stmts.html#the-while-statement https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
Outline Loops Functions Reading and Writing Files Pandas Numpy Arrays
Functions Functions take some input and produce some output or change(s) It is simply a piece of code that can be reused. You can define your own function Or, more often, you simply use other people’s functions You just need to know how the function works (what’s the input and output) and sometimes, how to import the function to your program Functions Output (value of b) Output (value of a)
Similar piece of code Shorter main code Repeated similar parts are separated as a function. The main code sends a value to the function and receives the function return value.
Python built-in function example: len () len (Q) Q is a sequence (e.g., string, tuple, list, range) or a collection (e.g., set, dictionary) Returns the length of Q, i.e., the number of elements it contains.
Python built-in function example: sum() sum(Q) Q is an iterable (e.g., tuple, list, set, etc.) containing numbers (or anything that can be summed) Returns the sum of all elements in Q
Python builtin function example: sorted() and list.sort () sorted(Q) Q is an iterable (list, tuple, etc.) Returns a new list containing the elements of Q in a sorted order The elements must be things that can be sorted, e.g., numbers, characters, strings Q itself does not change after calling sorted().
Python builtin function example: sorted() and list.sort () If Q is a list and we wish itself to be sorted, use the list sort() method, i.e., call Q.sort () Calling Q.sort () does not return a new list, but there is a change in Q after the method call.
Making functions Start with the keyword def Function input given as variables called formal parameters , written inside parentheses Use a descriptive name add1 since we want to return the input plus 1 Indented code block of statements Return the function result using keyword return After defining the function, we can call it Documentation string. Calling help(add1) will yield this string.
Function: how does it work? Variable c to be assigned the value of add1(7). Function add1 is called with argument value 7 7 is passed into the function as parameter Statements in the function run with parameter variable a replaced by 7 Return the value of b and execute the assignment def add1( ): b = a + 1 7 + 1 8 return a c = add1( ) 7 b 7 8 c:8 b
Function: variable scoping All variables defined inside a function, including parameter variables are called local variables. Values are assigned to local variables only when the function is called and they will be gone after exiting the function. In the next function call, those local variables are defined again from scratch with possibly new values.
Function: variable scoping When line 5 is executed add1(a) is called with a assigned to 7 Variable b becomes 8 and then 8 is returned. Variable c is now assigned to 8 and values of a and b are gone from the memory. Then, line 6 is executed. Process similar like above, but now with a different value for variable a. Variable b is defined again, gets a different value, and is returned.
Function can have multiple parameters Function my_mult multiplies both of its arguments Function looks fine when given numbers Careful : my_mult doesn’t give an error when arg1 is an integer and arg2 is a string because multiplication can also mean duplicating strings. Is this an intended behavior of my_mult ? Needs to perform more testing to make sure the function behaves as we intended.
Return statement is optional In a function execution, Python will exit from a function when a return statement is executed; or there is no more statement that can be executed So, functions can omit return statements e.g., when we simply want to print something or make some changes to an object/data. If a function exists NOT through a return statement, Python will (silently) return an object of type None to the caller.
Return statement is optional In a function execution, Python will exit from a function when a return statement is executed; or there is no more statement to be executed So, functions can omit return statements e.g., when we simply want to print something or make some changes to an object/data. If a function exists NOT through a return statement, Python will (silently) return an object of type None to the caller. Function body cannot be empty a trivial function that returns None will have a pass statement as its only statement. pass means “do nothing”.
Function can do multiple tasks This function: prints a statement; and returns its argument added by 2.
Function can have all kinds of statements, such as loops.
Collecting arguments We can define a function to receive varying number of arguments (i.e., not fixed to one argument or two argument in its definition).
Lambda A lambda function is a small anonymous function. A lambda function can take any number of arguments, but can only have one expression.
Outline Loops Functions Reading and Writing Files Pandas Numpy Arrays
Working with files So far, we only work solely using data defined in our Jupyter Notebook Now we will learn how to work with text files in our system File in our filesystem is opened using open() function
The open() function Content of line 1 Content of line 2 Content of line 3 … File in file system open() File object We can read the content of the file and write into it by calling various methods of the file object