XII IP Ch 1 Python Pandas - I Series.pdf

wecoyi4681 1,453 views 45 slides Jul 22, 2024
Slide 1
Slide 1 of 45
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45

About This Presentation

Class 12 IP/IT Pandas Module Notes


Slide Content

Python Pandas - I

Pandas
•Pandas is a software library written for Python
programming language, used for data analysis.

•It is a fast, powerful, flexible and easy to use
open source data analysis and manipulation
tool.

•The main author of Pandas is Wes McKinney.

Key features of Pandas
•It can read or write in many different data formats.
•It can calculate in all the possible ways data is
organized i.e. across rows and down columns.
•It can easily select subsets of data from bulky subsets,
combine multiple datasets and also find and fill missing
data.
•It allows to apply operations to independent groups
within the data.
•It supports reshaping of data into different forms.
•It supports advanced time-series functionality.
•It supports visualization by integrating matplotlib and
seaborn libraries.

Installing Pandas
•To install the pandas library, the
following command is used

pip install pandas

Working in Pandas
•To work in Pandas we need to import
pandas library in the Python
environment.

import pandas as pd

Data Structures in Pandas
•A data structure is a way of storing and
organizing data in a computer so that it
can be accessed and worked with in an
appropriate way.
•Pandas provides three data structures:
–1D Series
–2D DataFrame
–3D Panel Data System

Series
•A Series is a one dimensional data
structure which contains any data
type (int, string, float, etc) of
homogenous data (same type)
•It contains
– A sequence of values (actual data)
– Associated data labels or index

DataFrame
•It is a data structure, which stores data in the
two-dimensional form.
•Columns may store values of different
datatype.
•A single column will have the same type of
values.

Series vs. DataFrame
SERIES DATAFRAME
Dimension 1 Dimensional 2 Dimensional
Type of Data Homogenous, all
values of same
data type.
Heterogeneous, can
have values of
different data types.
Mutable Value mutable Value mutable, size
mutable
Example
0 47
1 58
2 69
3 85
4 74
Name Class Marks
0 Arpit XII 47
1 Jai XII 58
2 Piyush XII 69
3 Aditya XII 85

Series

Creating Empty Series
import pandas as pd
s=pd.Series()
print(s)

Method 1
Creating Series using List/Tuple
import pandas as pd
S=pd.Series([47,58,69,85,74])
print(S)


0 47
1 58
2 69
3 85
4 74
dtype: int64

import pandas as pd
A=[47,58,69,85,74]
S=pd.Series(A)
print(S)
0 47
1 58
2 69
3 85
4 74
dtype: int64

Mentioning Index while creating Series
import pandas as pd
S=pd.Series([47,58,69,85,74], ['A', 'B', 'C', 'D', 'E'])
print(S) A 47
B 58
C 69
D 85
E 74
dtype: int64

Method 2
S=pd.Series([47,58,69,85,74], index=['A', 'B', 'C', 'D', 'E'])
Method 3
S=pd.Series(data=[47,58,69,85,74], index=['A', 'B', 'C', 'D', 'E'])
Method 4
S=pd.Series(index=['A', 'B', 'C', 'D', 'E'], data=[47,58,69,85,74])
Note: 1. Tuples can be used in place of lists.
2. If the indexes are not given, it uses the default indexes.

Method 5
Creating Series using a Dictionary
S=pd.Series({'A':47 , 'B':58 , 'C':69 , 'D':85 , 'E':74})

A 47
B 58
C 69
D 85
E 74
dtype: int64

import pandas as pd
marks=[20,22,25,18,23,22]
names=['Abhinav','Udit','Ansh','Jai','Kunal','Arushi']
S=pd.Series(marks, index=names)
print(S)

Abhinav 20
Udit 22
Ansh 25
Jai 18
Kunal 23
Arushi 22
dtype: int64
Q1. WAP to create the
given Series.

import pandas as pd
d={'Q1':50000,'Q2':47000,'Q3':52500,'Q4':36000}
s=pd.Series(d)
print(s)

Q2. WAP to create the
given Series using a
Dictionary.
Q1 50000
Q2 47000
Q3 52500
Q4 36000
dtype: int64

Method 6
Creating Series using range() function
import pandas as pd
s=pd.Series(range(101,106), index=range(1,6))
print(s)
1 101
2 102
3 103
4 104
5 105
dtype: int64

Method 7
Creating Series using Numpy Array (ndarray)
import numpy as np
import pandas as pd
n=np.array([2,4,6,8])
s=pd.Series(n)
print(s)

0 2
1 4
2 6
3 8
dtype: int64

Method 8
Creating Series using Scalar/ Constant value
import pandas as pd
s=pd.Series(55, index=[1,2,3,4,5])
print(s)
1 55
2 55
3 55
4 55
5 55
dtype: int64

Using mixed datatypes while creating
Series
•The Series can store only values of one datatype.
•If the values given at the time of creating a Series are
of different types, it takes them according to the
given precedence.
•String (Object)  float  int

import pandas as pd
s = pd.Series([10,20,25.6,30,40])
print(s)

0 10.0
1 20.0
2 25.6
3 30.0
4 40.0
dtype: float64

import pandas as pd
days=[31,28,31,30,31]
mon=['Jan','Feb','Mar','Apr','May']
s=pd.Series(data=days, index=mon, dtype=float)
print(s)


Jan 31.0
Feb 28.0
Mar 31.0
Apr 30.0
May 31.0
dtype: float64
Specifying the Datatype while creating
the Series

Specify missing values
•The missing values are denoted by the
keyword 'None' in Python.
•Adding None value to the Series, the datatype
is changed to float.

import pandas as pd
s=pd.Series([10,20,None,40,None])
print(s)


0 10.0
1 20.0
2 NaN
3 40.0
4 NaN
dtype: float64

Specifying duplicate indexes
•While creating Series object, there is no
compulsion for uniqueness.
•There can be duplicate entries in the index.


import pandas as pd
A=[10,20,30,40,50,60]
B=[1,2,3,1,3,5]
s=pd.Series(A,index=B)
print(s)
1 10
2 20
3 30
1 40
3 50
5 60
dtype: int64

import pandas as pd
s=pd.Series(range(1,15,3), index=[x for x in 'abcde'])
print(s)


a 1
b 4
c 7
d 10
e 13
dtype: int64
Specifying data/ indexes using a loop

Getting number of rows and count of non-NA
values in a Series
•len() function can be used to find the number
of rows in a Series.

print(len(s)) 5

•count() can be used to count the non-NAN
values.

print(s.count()) 3

Accessing Data from Series
•Data can be accessed from a Series using the
user-defined labels or in-built indexes.


Indexing
(Single value)
Slicing
(Part of a Series)
In-built (0,1,2…)

S = pd.Series(
[10,20,30,40,50])
in-built (only +ve)

print(S[0]) 10
print(S[-1]) Error
+ve, -ve

print(S[0:2]) 0 10
1 20

print(S[-3:-1]) 2 30
3 40

Indexing
(Single value)
Slicing
(Part of a Series)
User-defined
(numeric)

S = pd.Series(
[10,20,30,40,50],
index=[3,4,5,6,7])
user-defined

print(S[3]) 10

print(S[0]) Error

print(S[-1]) Error
+ve, -ve

print(S[0:2]) 3 10
4 20

print(S[-3:-1]) 5 30
6 40

Indexing
(Single value)
Slicing
(Part of a Series)
User-defined(text)

S =
pd.Series([10,20,3
0,40,50],
index=['A', 'B', 'C',
'D', 'E'])
user-defined, +ve, -ve

print(S[0]) 10

print(S[-1]) 50

print(S['D']) 40
user-defined, +ve, -ve

print(S[0:2]) A 10
B 20

print(S[-3:-1]) C 30
D 40

print(S['B':'D']) B 20
C 30
D 40

Attributes of Series
Attribute Description
index The index (row labels)
values NumPy representation of the Series
dtype dtype of the data
shape a tuple representing the dimensions
nbytes returns the number of bytes
ndim number of dimensions
size number of elements
hasnans returns True if there are any NaN values, else False
empty True / False (Series is empty or not)
name returns the name of the Series, can be changed

Application of Attributes

import pandas as pd
s=pd.Series({'Jan':31 , 'Feb':28 , 'Mar':31 , 'Apr':30 })

print(s.index) Index(['Jan', 'Feb', 'Mar', 'Apr'], dtype='object')
print(s.values) [31 28 31 30]
print(s.dtype) int64
print(s.shape) (4,)
print(s.nbytes) 32
print(s.ndim) 1
print(s.size) 4
print(s.hasnans) False
print(s.empty) False
print(s.name) None
s.name='Days'
print(s.name) Days

Operations on Series Object
•Modifying Elements of Series Object
•The head() and tail() Functions
•Vector operations
•Arithmetic
•Filtering Entries
•Sorting Series Values
•Adding & Removing values from Series Object

Modifying Elements of Series Object

Renaming indexes

The head() and tail() Functions
•The head() function returns the first n rows and
tail() function returns the last n rows.
•If n is not specified, the default value is 5.

s=pd.Series([10,20,30,40,50,60], index=['A','B','C','D','E','F'])

Vector operations
•If we apply a function or expression then it is
individually applied on each item of the
object.

Arithmetic
•When you perform arithmetic operations on
two Series objects, the data is aligned on the
basis of matching indexes (called Data
Alignment) and then performed arithmetic.
•For non-overlapping indexes the arithmetic
operation results as NaN.

Filtering Entries
A=pd.Series([10,20,30,40,50], index=[11,12,13,14,15])
Performs vector
operations and
results True/ False
Returns filtered result,
i.e. only which fulfill
the condition

Sorting Series Values
•You can sort the Series on the basis of values
or indexes.
•To sort on the basis of values
seriesobject.sort_values([ascending=True|False])

•To sort on the basis of indexes
seriesobject.sort_index([ascending=True|False])

Adding Values in the Series
A=pd.Series({'Jan':31, 'Feb':28, 'Mar':31, 'Apr':30})

A['Feb']=29 # modifies the value as index exists
A['May']=31 # adds a value

print(A)

Jan 31
Feb 29
Mar 31
Apr 30
May 31
dtype: int64

Removing Values from the Series
Temporary

A.drop('Apr')




Jan 31
Feb 29
Mar 31
May 31
dtype: int64
Permanent

A.drop('Apr', inplace=True)
OR
A = A.drop('Apr')
print(A)

Jan 31
Feb 29
Mar 31
May 31
dtype: int64

Viewing Values
A=pd.Series({'Jan':31, 'Feb':28, 'Mar':31, 'Apr':30})

•Using user-defined Index
print(A['Mar'])
print(A.loc['Mar'])

•Using in-built Index
print(A[0])
print(A.iloc[0])
Tags