DATA SCIENCE USING PYTHON Under the guidance of :- Name-R.J SHUKLA (N.I.L.E.T) Name of Student :- VIKASH YADAV ECE 4 TH YEAR ROLL NU- 2105250310075 Department of Electronics & Communication Engineering BUDDHA INSTITUTE OF TECHNOLOGY, GIDA, GORAKHPUR
CONTENTS 1.Introduction 2. What is Data Science ? 3 . Why Data Science is required ? 4. Why Python for Data Science? 5.Advantages & Disadvantages of Python 6.Python Libraries for Data Science
What is data science ? Data Science Computer Science + Mathematics/statistics + Visualization
Why Data Science ? Data is generated from different sources like :- Financial logs Text files Multimedia forms Audio file Video file Sensors Instruments Total Data Stored
Why data science ?
Data Science process
Who is a Data Scientist ?
Why Python for Data Science ? Interpreted Intuitive and minimalistic code Expressive language Dynamically typed Automatic memory management
WHY PYTHON FOR DATA SCIENCE ? Advantages Ease of programming Minimizes the time to develop and maintain code Modular and object-oriented Large community of users A large standard and user-contributed library Disadvantages Interpreted and therefore slower than compiled languages Decentralized with packages
Code Performance vs Development Time
Python Libraries for data science Some Popular Python Libraries are : - NumPy SciPy Pandas Scikit -Learn Visualization Libraries Matplotlib Seaborn All these libraries are installed on the SCC
Python Libraries for data science NumPy : Introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects. Provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance. Many other python libraries are built on NumPy. Link: http://www.numpy.org/
Python Libraries for data science SciPy Collection of algorithms for linear algebra, differential equations, numerical integration, optimization, statistics and more Part of SciPy Stack Built on NumPy Link: https://www.scipy.org/scipylib/
PYTHON LIBRARIES FOR DATA SCIENCE PANDAS : Pan el Da ta S ystem Pandas is an open source, BSD-licensed library. High-performance, easy-to-use data structures. Provides data analysis and data manipulation tools ( reshaping, merging, sorting, slicing, aggregation etc.) Allows handling missing data. Link: http://pandas.pydata.org/
Python Libraries for data science SciKit -Learn : Provides machine learning algorithms: classification, regression, clustering, model validation etc. Built on NumPy, SciPy and matplotlib Link: http://skikit-learn.org/
PYTHON LIBRARIES FOR DATA SCIENCE Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats A set of functionalities similar to those of MATLAB Line plots, scatter plots, BarCharts, histograms, pie charts etc. Relatively low-level; some effort needed to create advanced visualization MATPLOTLIB : Link: https://matplotlib.org/
PYTHON LIBRARIES FOR DATA SCIENCE Based on matplotlib Provides high level interface for drawing attractive statistical graphics Similar (in style) to the popular ggplot2 library in R SEABORN Link: https://seaborn.pydata.org/
Loading Python Libraries In [ ]: #Import Python Libraries import numpy as np import scipy as sp import pandas as pd import matplotlib as mpl import seaborn as sns