XII IP Ch 1 Python Pandas - I Series.pdf

Python Pandas - I

Pandas
•Pandas is a software library written for Python
programming language, used for data analysis.

•It is a fast, powerful, flexible and easy to use
open source data analysis and manipulation
tool.

•The main author of Pandas is Wes McKinney.

Key features of Pandas
•It can read or write in many different data formats.
•It can calculate in all the possible ways data is
organized i.e. across rows and down columns.
•It can easily select subsets of data from bulky subsets,
combine multiple datasets and also find and fill missing
data.
•It allows to apply operations to independent groups
within the data.
•It supports reshaping of data into different forms.
•It supports advanced time-series functionality.
•It supports visualization by integrating matplotlib and
seaborn libraries.

Installing Pandas
•To install the pandas library, the
following command is used

pip install pandas

Working in Pandas
•To work in Pandas we need to import
pandas library in the Python
environment.

import pandas as pd

Data Structures in Pandas
•A data structure is a way of storing and
organizing data in a computer so that it
can be accessed and worked with in an
appropriate way.
•Pandas provides three data structures:
–1D Series
–2D DataFrame
–3D Panel Data System

Series
•A Series is a one dimensional data
structure which contains any data
type (int, string, float, etc) of
homogenous data (same type)
•It contains
– A sequence of values (actual data)
– Associated data labels or index

DataFrame
•It is a data structure, which stores data in the
two-dimensional form.
•Columns may store values of different
datatype.
•A single column will have the same type of
values.

Series vs. DataFrame
SERIES DATAFRAME
Dimension 1 Dimensional 2 Dimensional
Type of Data Homogenous, all
values of same
data type.
Heterogeneous, can
have values of
different data types.
Mutable Value mutable Value mutable, size
mutable
Example
0 47
1 58
2 69
3 85
4 74
Name Class Marks
0 Arpit XII 47
1 Jai XII 58
2 Piyush XII 69
3 Aditya XII 85

Series

Creating Empty Series
import pandas as pd
s=pd.Series()
print(s)

Method 1
Creating Series using List/Tuple
import pandas as pd
S=pd.Series([47,58,69,85,74])
print(S)

0 47
1 58
2 69
3 85
4 74
dtype: int64

import pandas as pd
A=[47,58,69,85,74]
S=pd.Series(A)
print(S)
0 47
1 58
2 69
3 85
4 74
dtype: int64

Mentioning Index while creating Series
import pandas as pd
S=pd.Series([47,58,69,85,74], ['A', 'B', 'C', 'D', 'E'])
print(S) A 47
B 58
C 69
D 85
E 74
dtype: int64

Method 2
S=pd.Series([47,58,69,85,74], index=['A', 'B', 'C', 'D', 'E'])
Method 3
S=pd.Series(data=[47,58,69,85,74], index=['A', 'B', 'C', 'D', 'E'])
Method 4
S=pd.Series(index=['A', 'B', 'C', 'D', 'E'], data=[47,58,69,85,74])
Note: 1. Tuples can be used in place of lists.
2. If the indexes are not given, it uses the default indexes.

Method 5
Creating Series using a Dictionary
S=pd.Series({'A':47 , 'B':58 , 'C':69 , 'D':85 , 'E':74})

A 47
B 58
C 69
D 85
E 74
dtype: int64

import pandas as pd
marks=[20,22,25,18,23,22]
names=['Abhinav','Udit','Ansh','Jai','Kunal','Arushi']
S=pd.Series(marks, index=names)
print(S)

Abhinav 20
Udit 22
Ansh 25
Jai 18
Kunal 23
Arushi 22
dtype: int64
Q1. WAP to create the
given Series.

import pandas as pd
d={'Q1':50000,'Q2':47000,'Q3':52500,'Q4':36000}
s=pd.Series(d)
print(s)

Q2. WAP to create the
given Series using a
Dictionary.
Q1 50000
Q2 47000
Q3 52500
Q4 36000
dtype: int64

Method 6
Creating Series using range() function
import pandas as pd
s=pd.Series(range(101,106), index=range(1,6))
print(s)
1 101
2 102
3 103
4 104
5 105
dtype: int64

Method 7
Creating Series using Numpy Array (ndarray)
import numpy as np
import pandas as pd
n=np.array([2,4,6,8])
s=pd.Series(n)
print(s)

0 2
1 4
2 6
3 8
dtype: int64

Method 8
Creating Series using Scalar/ Constant value
import pandas as pd
s=pd.Series(55, index=[1,2,3,4,5])
print(s)
1 55
2 55
3 55
4 55
5 55
dtype: int64

Using mixed datatypes while creating
Series
•The Series can store only values of one datatype.
•If the values given at the time of creating a Series are
of different types, it takes them according to the
given precedence.
•String (Object)  float  int

import pandas as pd
s = pd.Series([10,20,25.6,30,40])
print(s)

0 10.0
1 20.0
2 25.6
3 30.0
4 40.0
dtype: float64

import pandas as pd
days=[31,28,31,30,31]
mon=['Jan','Feb','Mar','Apr','May']
s=pd.Series(data=days, index=mon, dtype=float)
print(s)

Jan 31.0
Feb 28.0
Mar 31.0
Apr 30.0
May 31.0
dtype: float64
Specifying the Datatype while creating
the Series

Specify missing values
•The missing values are denoted by the
keyword 'None' in Python.
•Adding None value to the Series, the datatype
is changed to float.

import pandas as pd
s=pd.Series([10,20,None,40,None])
print(s)

0 10.0
1 20.0
2 NaN
3 40.0
4 NaN
dtype: float64

Specifying duplicate indexes
•While creating Series object, there is no
compulsion for uniqueness.
•There can be duplicate entries in the index.

import pandas as pd
A=[10,20,30,40,50,60]
B=[1,2,3,1,3,5]
s=pd.Series(A,index=B)
print(s)
1 10
2 20
3 30
1 40
3 50
5 60
dtype: int64

import pandas as pd
s=pd.Series(range(1,15,3), index=[x for x in 'abcde'])
print(s)

a 1
b 4
c 7
d 10
e 13
dtype: int64
Specifying data/ indexes using a loop

Getting number of rows and count of non-NA
values in a Series
•len() function can be used to find the number
of rows in a Series.

print(len(s)) 5

•count() can be used to count the non-NAN
values.

print(s.count()) 3

Accessing Data from Series
•Data can be accessed from a Series using the
user-defined labels or in-built indexes.

Indexing
(Single value)
Slicing
(Part of a Series)
In-built (0,1,2…)

S = pd.Series(
[10,20,30,40,50])
in-built (only +ve)

print(S[0]) 10
print(S[-1]) Error
+ve, -ve

print(S[0:2]) 0 10
1 20

print(S[-3:-1]) 2 30
3 40

Indexing
(Single value)
Slicing
(Part of a Series)
User-defined
(numeric)

S = pd.Series(
[10,20,30,40,50],
index=[3,4,5,6,7])
user-defined

print(S[3]) 10

print(S[0]) Error

print(S[-1]) Error
+ve, -ve

print(S[0:2]) 3 10
4 20

print(S[-3:-1]) 5 30
6 40

Indexing
(Single value)
Slicing
(Part of a Series)
User-defined(text)

S =
pd.Series([10,20,3
0,40,50],
index=['A', 'B', 'C',
'D', 'E'])
user-defined, +ve, -ve

print(S[0]) 10

print(S[-1]) 50

print(S['D']) 40
user-defined, +ve, -ve

print(S[0:2]) A 10
B 20

print(S[-3:-1]) C 30
D 40

print(S['B':'D']) B 20
C 30
D 40

Attributes of Series
Attribute Description
index The index (row labels)
values NumPy representation of the Series
dtype dtype of the data
shape a tuple representing the dimensions
nbytes returns the number of bytes
ndim number of dimensions
size number of elements
hasnans returns True if there are any NaN values, else False
empty True / False (Series is empty or not)
name returns the name of the Series, can be changed

Application of Attributes

import pandas as pd
s=pd.Series({'Jan':31 , 'Feb':28 , 'Mar':31 , 'Apr':30 })

print(s.index) Index(['Jan', 'Feb', 'Mar', 'Apr'], dtype='object')
print(s.values) [31 28 31 30]
print(s.dtype) int64
print(s.shape) (4,)
print(s.nbytes) 32
print(s.ndim) 1
print(s.size) 4
print(s.hasnans) False
print(s.empty) False
print(s.name) None
s.name='Days'
print(s.name) Days

Operations on Series Object
•Modifying Elements of Series Object
•The head() and tail() Functions
•Vector operations
•Arithmetic
•Filtering Entries
•Sorting Series Values
•Adding & Removing values from Series Object

Modifying Elements of Series Object

Renaming indexes

The head() and tail() Functions
•The head() function returns the first n rows and
tail() function returns the last n rows.
•If n is not specified, the default value is 5.

s=pd.Series([10,20,30,40,50,60], index=['A','B','C','D','E','F'])

Vector operations
•If we apply a function or expression then it is
individually applied on each item of the
object.

Arithmetic
•When you perform arithmetic operations on
two Series objects, the data is aligned on the
basis of matching indexes (called Data
Alignment) and then performed arithmetic.
•For non-overlapping indexes the arithmetic
operation results as NaN.

Filtering Entries
A=pd.Series([10,20,30,40,50], index=[11,12,13,14,15])
Performs vector
operations and
results True/ False
Returns filtered result,
i.e. only which fulfill
the condition

Sorting Series Values
•You can sort the Series on the basis of values
or indexes.
•To sort on the basis of values
seriesobject.sort_values([ascending=True|False])

•To sort on the basis of indexes
seriesobject.sort_index([ascending=True|False])

Adding Values in the Series
A=pd.Series({'Jan':31, 'Feb':28, 'Mar':31, 'Apr':30})

A['Feb']=29 # modifies the value as index exists
A['May']=31 # adds a value

print(A)

Jan 31
Feb 29
Mar 31
Apr 30
May 31
dtype: int64

Removing Values from the Series
Temporary

A.drop('Apr')

Jan 31
Feb 29
Mar 31
May 31
dtype: int64
Permanent

A.drop('Apr', inplace=True)
OR
A = A.drop('Apr')
print(A)

Jan 31
Feb 29
Mar 31
May 31
dtype: int64

Viewing Values
A=pd.Series({'Jan':31, 'Feb':28, 'Mar':31, 'Apr':30})

•Using user-defined Index
print(A['Mar'])
print(A.loc['Mar'])

•Using in-built Index
print(A[0])
print(A.iloc[0])

XII IP Ch 1 Python Pandas - I Series.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

XII IP Ch 1 Python Pandas - I Series.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 40

Slide 41

Slide 43

Slide 44

Slide 45

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......