ArhaanSiddiquee
1,059 views
43 slides
Nov 09, 2022
Slide 1 of 43
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
About This Presentation
Pandas Python notes for class 12 as per CBSE board
Size: 2.66 MB
Language: en
Added: Nov 09, 2022
Slides: 43 pages
Slide Content
Informatics Practices
Class XII ( Asper CBSEBoard)
Chapter 1
Data Handling
using Pandas -1
New
syllabus
2020-21
Visit : python.mykvs.in for regularupdates
Visit : python.mykvs.in for regularupdates
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Visit : python.mykvs.in for regularupdates
PythonLibrary–Matplotlib
Matplotlibisacomprehensivelibraryforcreatingstatic,animated,
andinteractivevisualizationsinPython.Itisusedtocreate
1. Developpublication quality plotswith just a few lines of code
2. Useinteractive figuresthat can zoom, pan, update...
We can customize and Take full controlof line styles, font properties,
axes properties... as well as export and embedto a number of file
formats and interactive environments
Visit : python.mykvs.in for regularupdates
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Visit : python.mykvs.in for regularupdates
PythonLibrary–Pandas
ItisamostfamousPythonpackagefordatascience,whichoffers
powerfulandflexibledatastructuresthatmakedataanalysisand
manipulationeasy.Pandasmakesdataimportinganddataanalyzing
mucheasier.PandasbuildsonpackageslikeNumPyandmatplotlib
togiveusasingle&convenientplacefordataanalysisand
visualizationwork.
Visit : python.mykvs.in for regularupdates
Basic Features ofPandas
1.Dataframe object help a lot in keeping track of ourdata.
2.With a pandas dataframe, we can have different data types
(float, int, string, datetime, etc) all in one place
3.Pandas has built in functionality for like easy grouping &
easy joins of data, rollingwindows
4.Good IO capabilities; Easily pull data from a MySQL
database directly into a dataframe
5.Withpandas,youcanusepatsyforR-stylesyntaxin
doingregressions.
6.Toolsforloadingdataintoin-memorydataobjectsfrom
different fileformats.
7.Data alignment and integrated handling of missingdata.
8.Reshaping and pivoting of datasets.
9.Label-based slicing, indexing and subsetting of large data
sets.
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Pandas –Installation/EnvironmentSetup
Pandas module doesn't come bundled with StandardPython.
If we install Anaconda Python package Pandas will be
installed bydefault.
Steps for Anaconda installation &Use
1.visit the sitehttps://www.anaconda.com/download/
2.Download appropriate anacondainstaller
3.After download installit.
4.During installation check for set path and alluser
5.After installation start spyder utility of anaconda from startmenu
6.Typeimport pandasaspdin leftpane(temp.py)
7.Then runit.
8.If no error is show then it shows pandas isinstalled.
9.Like default temp.py we can create another .py file from new
window option of file menu for newprogram.
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Pandas –Installation/EnvironmentSetup
PandasinstallationcanbedoneinStandardPython
distribution,usingfollowingsteps.
1.Theremustbeservicepackinstalledonourcomputerifwe
areusingwindows.Ifitisnotinstalledthenwewillnotbe
abletoinstallpandasinexistingStandardPython(whichis
alreadyinstalled).Soinstallitfirst(googleit).
2.Wecancheckitthroughpropertiesoptionofmycomputer
icon.
3.Now install latest version(any one above 3.4) ofpython.
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Pandas –Installation/EnvironmentSetup
4.Now move to script folder of python distribution in command
prompt (through cmd command ofwindows).
5.Execute following commands in command promptserially.
>pip installnumpy
>pip installsix
>pip installpandas
Wait after each command forinstallation
Now we will be able to use pandas in standard python
distribution.
6.Type import pandas as pd in python (IDLE)shell.
7.If it executed without error(it means pandas is installed on
yoursystem)
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Data Structures in Pandas
Two important data structures of pandas are–Series,DataFrame
1.Series
Seriesislikeaone-dimensionalarraylikestructurewith
homogeneousdata.Forexample,thefollowingseriesisa
collectionofintegers.
Basic feature of seriesare
Homogeneousdata
SizeImmutable
Values of Data Mutable
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
atwo-dimensionalarraywith
2.DataFrame
DataFrameislike
heterogeneousdata.
Basic feature of DataFrameare
Heterogeneousdata
SizeMutable
DataMutable
SR.
No.
Admn
No
StudentNameClassSectionGenderDateOf
Birth
1001284NIDHIMANDALI A Girl07/08/2010
2001285SOUMYADIP
BHATTACHARYA
I A Boy 24/02/2011
3001286SHREYAANG
SHANDILYA
I A Boy 29/12/2010
Data Handling usingPandas -1
Visit : python.mykvs.in for regular updates
PandasSeries
Itislikeone-dimensionalarraycapableofholdingdata
ofanytype(integer,string,float,pythonobjects,etc.).
Seriescanbecreatedusingconstructor.
Syntax:-pandas.Series(data,index,dtype,copy)
Creation of Series is also possible from –ndarray,
dictionary, scalar value.
Seriescanbecreatedusing
1.Array
2.Dict
3.Scalar value orconstant
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
Create an Empty Series
e.g.
import pandas aspseries
s = pseries.Series()
print(s)
Output
Series([], dtype:float64)
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
Create a Series fromndarray
Withoutindex
e.g.
import pandas aspd1
import numpy asnp1
data =np1.array(['a','b','c','d'])
s =pd1.Series(data)
print(s)
Output
1a
2b
3c
4d
dtype:object
Note : default index is starting
from 0
With indexposition
e.g.
import pandas as p1
import numpy asnp1
data =np1.array(['a','b','c','d'])
s =p1.Series(data,index=[100,101,102,103])
print(s)
Output
100a
101b
102c
103d dtype:
object
Note : index is starting from100
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
Create a Series fromdict
Eg.1(without index)
import pandas aspd1
import numpy asnp1
data = {'a' : 0., 'b' : 1., 'c' :2.}
s =pd1.Series(data)
print(s)
Output
a0.0
b1.0
c2.0
dtype:float64
Eg.2 (with index)
import pandas aspd1
import numpy asnp1
data = {'a' : 0., 'b' : 1., 'c' :2.}
s = pd1.Series(data,index=['b','c','d','a'])
print(s)
Output
b1.0
c2.0
dNaN
a0.0
dtype:float64
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Create a Series from Scalar
e.g
import pandas aspd1
import numpy asnp1
s = pd1.Series(5, index=[0, 1, 2, 3])
print(s)
Output
05
15
25
35
dtype:int64
Note :-here 5 is repeated for 4 times (as per no ofindex)
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
Mathsoperations withSeries
e.g.
import pandas aspd1
s =pd1.Series([1,2,3])
t = pd1.Series([1,2,4])
u=s+t#additionoperation print(u)
u=s*t # multiplicationoperation
print(u)
02
14
27
dtype:int64
01
14
212
dtype:int64
output
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
Headfunction
e.g
import pandas aspd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s.head(3))
Output
a1
b.2
c.3
dtype:int64
Return first 3elements
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
tail function
e.g
import pandas aspd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s.tail(3))
Output
c3
d.4
e.5
dtype:int64
Return last 3elements
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Accessing Data from Series with indexing and slicing
e.g.
import pandas aspd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print (s[0])# for 0 indexposition
print(s[:3])#forfirst3indexvalues
print(s[-3:])#slicingforlast3indexvalues
Output
1
a.1
b.2
c.3
dtype: int64 c3
d.4
e.5
dtype:int64
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
Retrieve Data Using Label as (Index)
e.g.
import pandas aspd1
s = pd1.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[['c','d']])
Output c
3
d4
dtype:int64
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
Retrieve Data from selection
There are three methods for data selection:
locgets rows (or columns) with particular labels from
the index.
ilocgets rows (or columns) at particular positions in
the index (so it only takes integers).
ix usually tries to behave like locbut falls back to
behaving like ilocif a label is not present in the index.
ix is deprecated and the use of locand ilocis encouraged
instead
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasSeries
Retrieve Data from
selection
e.g.
>>> s = pd.Series(np.nan,
index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s.iloc[:3] # slice the first three rows
49 NaN
48 NaN
47 NaN
>>> s.loc[:3] # slice up to and including
label 3
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
Data Handling usingPandas -1
>>> s.ix[:3] # the integer is in the index so
s.ix[:3] works like loc
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
Visit : python.mykvs.in for regularupdates
Pandas DataFrame
It is a two-dimensional data structure, just like anytable
(with rows &columns).
Basic Features of DataFrame
Columns may be of differenttypes
Size can bechanged(Mutable)
Labeled axes (rows /columns)
Arithmetic operations on rows andcolumns
Structure
Rows
It can be created usingconstructor
pandas.DataFrame( data, index, columns, dtype,copy)
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
CreateDataFrame
It can be created withfollowings
Lists
dict
Series
Numpyndarrays
AnotherDataFrame
Create an EmptyDataFrame
e.g.
import pandas as pd1
df1 =pd1.DataFrame()
print(df1)
output
Empty
DataFrame
Columns: []
Index: []
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Create a DataFrame fromLists
e.g.1
import pandas as pd1
data1 =[1,2,3,4,5]
df1 =pd1.DataFrame(data1)
print (df1)
e.g.2
import pandas aspd1
data1 =[['Freya',10],['Mohak',12],['Dwivedi',13]]
df1 =pd1.DataFrame(data1,columns=['Name','Age'])
print(df1)
Write below for numeric value asfloat
df1 =pd1.DataFrame(data,columns=['Name','Age'],dtype=float)
output
0
01
12
23
34
45
output
NameAge
2Dwivedi
1Freya10
2Mohak12
13
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Create a DataFrame from Dict of ndarrays /Lists
e.g.1
import pandas aspd1
data1 = {'Name':['Freya','Mohak'],'Age':[9,10]}
df1 =pd1.DataFrame(data1)
print(df1)
Output
NameAge
1Freya9
2Mohak10
Write below as 3rd statement in above prog forindexing
df1 = pd1.DataFrame(data1,index=['rank1','rank2','rank3','rank4'])
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Create a DataFrame from List ofDicts
e.g.1
import pandas aspd1
data1 = [{'x': 1, 'y': 2},{'x': 5, 'y': 4, 'z':5}]
df1 =pd1.DataFrame(data1)
print(df1)
Output
xyz
012NaN
1545.0
Write below as 3
rd
stmnt in above program forindexing
df = pd.DataFrame(data, index=['first','second'])
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Create a DataFrame from Dict ofSeries
e.g.1
import pandas aspd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b','c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])}
df1 =pd1.DataFrame(d1)
print(df1)
Output
onetwo
a1.01
b2.02
c3.03
dNaN4
Column Selection -> print (df['one'])
Adding a new column by passing as Series: ->
df1['three']=pd1.Series([10,20,30],index=['a','b','c'])
Adding a new column using the existing columnsvalues
df1['four']=df1['one']+df1['three']
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Create a DataFrame from .txt file
Having a text file './inputs/dist.txt' as:
1 1 12.92
1 2 90.75
1 3 60.90
2 1 71.34
Pandas is shipped with built-in reader methods. For example the
pandas.read_tablemethod seems to be a good way to read (also in chunks)
a tabular data file.
import pandas
df= pandas.read_table('./input/dists.txt', delim_whitespace=True,
names=('A', 'B', 'C'))
will create a DataFrameobjects with column named A made of data of type
int64, B of int64 and C of float64
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
Create a DataFrame from csv(comma separated value) file / import data
from cvsfile
e.g.
Suppose filename.csv file contains following data
Date,"price","factor_1","factor_2"
2012-06-11,1600.20,1.255,1.548
2012-06-12,1610.02,1.258,1.554
import pandas as pd
# Read data from file 'filename.csv'
# (in the same directory that your python program is based)
# Control delimiters, rows, column names with read_csv
data = pd.read_csv("filename.csv")
# Preview the first 1 line of the loaded data
data.head(1)
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Column addition
df= pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
c = [7,8,9]
df[‘C'] = c
ColumnDeletion
del df1['one'] # Deleting the first column using DELfunction
df.pop('two') #Deleting another column using POPfunction
Rename columns
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"})
a c
0 1 4
1 2 5
2 3 6
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Iterate over rows in adataframe
e.g.
import pandas aspd1
import numpy asnp1
raw_data1 = {'name': ['freya','mohak'],
'age': [10,1],
'favorite_color': ['pink','blue'],
'grade': [88,92]}
df1 = pd1.DataFrame(raw_data1, columns = ['name', 'age',
'favorite_color','grade'])
for index, row in df1.iterrows():
print (row["name"],row["age"])
Output
freya10
mohak1
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Head & Tail
head()returnsthefirstnrows(observetheindexvalues).Thedefaultnumberof
elementstodisplayisfive,butyoumaypassacustomnumber.tail()returnsthe
lastnrows.e.g.
Data Handling usingPandas -1
import pandas as pd
import numpyas np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df= pd.DataFrame(d)
print ("Our data frame is:")
print df
print ("The first two rows of the data frame is:")
print df.head(2)
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Data Handling usingPandas -1
Indexing a DataFrameusing .loc[ ] :
This function selects data by the label of the rows and columns.
#import the pandas library and aliasing as pd
import pandas as pd
import numpyas np
df= pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])
#select all rows for a specific column
print df.loc[:,'A']
PythonPandas
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Data Handling usingPandas -1
Accessing a DataFramewith a booleanindex :
In order to access a dataframewith a booleanindex, we have to create a
dataframein which index of dataframecontains a booleanvalue that is “True”
or “False”.
# importing pandas as pd
import pandas as pd
# dictionary of lists
dict= {'name':[“Mohak", “Freya", “Roshni"],
'degree': ["MBA", "BCA", "M.Tech"],
'score':[90, 40, 80]}
# creating a dataframewith booleanindex
df= pd.DataFrame(dict, index = [True, False, True])
# accessing a dataframeusing .loc[] function
print(df.loc[True]) #it will return rows of Mohakand Roshnionly(matching true only)
PythonPandas
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Binary operation over dataframe with series
e.g.
import pandas aspd
x = pd.DataFrame({0: [1,2,3], 1: [4,5,6], 2: [7,8,9]})
y = pd.Series([1, 2, 3])
new_x = x.add(y, axis=0)
print(new_x)
Output
012
0147
141016
291827
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Binary operation over
dataframewith dataframe
import pandas aspd
x = pd.DataFrame({0: [1,2,3], 1: [4,5,6], 2: [7,8,9]})
y = pd.DataFrame({0: [1,2,3], 1: [4,5,6], 2: [7,8,9]})
new_x = x.add(y,axis=0)
print(new_x)
Output
012
02814
141016
261218
Note :-similarly we can use sub,mul,divfunctions
Data Handling usingPandas -1
Visit : python.mykvs.in for regularupdates
PandasDataFrame
Merging/joiningdataframe
e.g.
import pandas as pd
left =pd.DataFrame({
'id':[1,2],
'Name': ['anil','vishal'],
'subject_id':['sub1','sub2']})
right =pd.DataFrame(
{'id':[1,2],
'Name': ['sumer','salil'],
'subject_id':['sub2','sub4']})
print(pd.merge(left,right,on='id'))
Output
01anil sub1sumer sub2
12vishal sub2salil sub4
Data Handling usingPandas -1
idName_x subject_id_xName_ysubject_id_y
Visit : python.mykvs.in for regularupdates
Data Handling usingPandas -1
Concatetwo DataFrameobjects with identical
columns.
df1 = pd.DataFrame([['a', 1], ['b', 2]],
... columns=['letter', 'number'])
>>> df1
letter number
0 a 1
1 b 2
>>> df2 = pd.DataFrame([['c', 3], ['d', 4]],
... columns=['letter', 'number'])
>>> df2
letter number
0 c 3
1 d 4
>>> pd.concat([df1, df2])
letter number
0 a 1
1 b 2
0 c 3
1 d 4
Visit : python.mykvs.in for regularupdates
Data Handling usingPandas -1
Export Pandas DataFrameto a CSV File
e.g.
import pandas as pd
cars = {'Brand': ['Honda Civic','ToyotaCorolla','FordFocus','AudiA4'],
'Price': [22000,25000,27000,35000]
}
df= pd.DataFrame(cars, columns= ['Brand', 'Price'])
df.to_csv(r'C:\export_dataframe.csv', index = False, header=True)
print (df)