XII IP Ch 2 Python Pandas - II DataFrame.pdf

Python Pandas -II

DataFrame
•It is a data structure, which stores data in the two-
dimensional form (tabular form).
•Columns may store values of different datatypes.
•A single column will have the same type of values.

•It has two indices –a row index (axis 0) and a
column index (axis 1)
•The indices can be numeric/string.
•It is value mutable i.e. we can change the values
•It is size mutable i.e. we can add/ delete the
rows/ columns
•The row index are specified using index
•The column index are specified using columns

Creating Empty DataFrame
import pandas as pd
df=pd.DataFrame()
print(df)
EmptyDataFrame
Columns: [ ]
Index: [ ]

Method 1 -Using a nested list
import pandas as pd
df= pd.DataFrame([['Delhi',40,32,24.1],
['Bengaluru',31,25,36.2],
['Chennai',35,27,40.8],
['Mumbai',29,21,35.2],
['Kolkata',39,23,41.8]],
index=[1,2,3,4,5],
columns=['City','Maxtemp','Mintemp','RainFall'])
print(df)

Method 1 -Using a nested list
import pandas as pd
L = [['Delhi',40,32,24.1],
['Bengaluru',31,25,36.2],
['Chennai',35,27,40.8],
['Mumbai',29,21,35.2],
['Kolkata',39,23,41.8]]
df= pd.DataFrame(L , index=[1,2,3,4,5],
columns=['City','Maxtemp','Mintemp','RainFall'])
print(df)

Method 2 -Using a Dictionary / Dictionary of Lists
import pandas as pd
df= pd.DataFrame(
{'City':['Delhi','Bengaluru','Chennai','Mumbai','Kolkata'],
'Maxtemp':[40,31,35,29,39],
'Mintemp':[32,25,27,21,23],
'RainFall':[24.1, 36.2, 40.8, 35.2, 41.8]},
index=[1,2,3,4,5])
print(df)

Method 3 -Using a Nested Dictionary
import pandas as pd
df= pd.DataFrame(
{'City':{1:'Delhi',2:'Bengaluru',3:'Chennai',4:'Mumbai',5:'Kolkata'},
'Maxtemp':{1:40, 2:31, 3:35, 4:29, 5:39},
'Mintemp':{1:32, 2:25, 3:27, 4:21, 5:23},
'RainFall':{1:24.1, 2:36.2, 3:40.8, 4:35.2 ,5:41.8}})
print(df)

Method 4 -Using List of Dictionaries
import pandas as pd
df= pd.DataFrame(
[{'City':'Delhi', 'Maxtemp':40, 'Mintemp':32, 'Rainfall':24.1},
{'City':'Bengaluru', 'Maxtemp':31, 'Mintemp':25, 'Rainfall':36.2},
{'City':'Chennai', 'Maxtemp':35, 'Mintemp':27, 'Rainfall':40.8},
{'City':'Mumbai', 'Maxtemp':29, 'Mintemp':21, 'Rainfall':35.2},
{'City':'Kolkata', 'Maxtemp':39, 'Mintemp':23, 'Rainfall':41.8}],
index=[1,2,3,4,5])
print(df)

Method 5 -Using Series Objects
import pandas as pd
A = pd.Series(['Delhi','Bengaluru','Chennai','Mumbai','Kolkata'],
index=[1,2,3,4,5])
B = pd.Series([40,31,35,29,39],index=[1,2,3,4,5])
C = pd.Series([32,25,27,21,23],index=[1,2,3,4,5])
D = pd.Series([24.1,36.2,40.8,35.2,41.8],index=[1,2,3,4,5])
df= pd.DataFrame({'City':A, 'Maxtemp':B, 'Mintemp':C, 'RainFall':D})
print(df)

Creating DataFrame from 2D NumpyArray
import numpyas np
import pandas as pd
A=np.array([[10,20,30],[40,50,60],[70,80,90]])
D=pd.DataFrame(A)
print(D) 0 1 2
0 10 20 30
140 50 60
270 80 90

Attributes of DataFrame
AttributeDescription
index The index(row labels)
columnsThe column labels
axes Alist of both the axes, axis 0 –index and axis 1-the
columns
valuesValues in the DataFrame
dtypesItwill display the data type of all the columns
size numberof elements
shape a tuple representing the dimensions
ndim number of dimensions
empty True/ False (DataFrameis empty or not)
T Transposesthe index and columns

Application of Attributes
D.index Int64Index([1, 2, 3, 4, 5], dtype='int64')
D.columns Index(['City', 'Maxtemp', 'Mintemp', 'Rainfall'], dtype='object')
D.axes [Int64Index([1, 2, 3, 4, 5], dtype='int64'),
Index(['City', 'Maxtemp', 'Mintemp', 'Rainfall'],dtype='object')]
D.values array([['Delhi', 40, 32, 24.1],
['Bengaluru', 31, 25, 36.2],
['Chennai', 35, 27, 40.8],
['Mumbai', 29, 21, 35.2],
['Kolkata', 39, 23, 41.8]], dtype=object)

D.dtypes City object
Maxtemp int64
Mintemp int64
Rainfall float64
dtype: object
D.size 20
D.shape (5,4)
D.ndim 2
D.empty False
D.T

Getting number of rows in a DataFrame
•len() function can be used to find the number
of rows in a DataFrame.
print(len(D)) 5

Indexing
•Indexing in pandas means simply selecting
particular rows and columns of data from a
DataFrame.
•Indexing could mean selecting all the rows and
some of the columns, some of the rows and all of
the columns, or some of each of the rows and
columns.

Selecting a Column
print(df.Rainfall)
OR
print(df['Rainfall'])

Selecting multiple Columns
To display multiple columns, we need to use
double square brackets.
print(df[['City','Rainfall','Maxtemp']])

Selecting a Row
print(df.loc[2])

Selecting multiple Rows
To display multiple rows, we need to use double
square brackets, or a range can be specified.
print(df.loc[[2,4,5]])
print(df.loc[2:4])

Obtaining a Subset using Row/Column names
We use loc to obtain a subset in the following
format:
df.loc[ row , col]
Here, row/colcan be an individual value, range
or a list.

>>> print(df.loc[3,'Mintemp'])
27
>>> print(df.loc[2,'City'])
Bengaluru
>>> print(df.loc[3:5 ,'Mintemp'])
3 27
4 21
5 23
Name: Mintemp, dtype: int64

>>> print(df.loc[3,'City':'Mintemp'])
City Chennai
Maxtemp35
Mintemp27
Name: 3, dtype: object
>>> print(df.loc[3:5,'City':'Mintemp'])
City MaxtempMintemp
3 Chennai 35 27
4 Mumbai 29 21
5 Kolkata 39 23

>>> print(df.loc[[1,4],'Mintemp'])
1 32
4 21
Name: Mintemp, dtype: int64
>>> print(df.loc[2,['Maxtemp','Rainfall']])
Maxtemp31
Rainfall 36.2
Name: 2, dtype: object
>>> print(df.loc[[1,2,4],['Maxtemp','Rainfall']])
MaxtempRainfall
1 40 24.1
2 31 36.2
4 29 35.2

>>> print(df.loc[1:4,['Maxtemp','Rainfall']])
MaxtempRainfall
1 40 24.1
2 31 36.2
3 35 40.8
4 29 35.2
>>> print(df.loc[:,['Maxtemp','Rainfall']])
MaxtempRainfall
1 40 24.1
2 31 36.2
3 35 40.8
4 29 35.2
5 39 41.8

Obtaining a Subset using in-built indexes
We use ilocto obtain a subset using the in-built
indexes
>>> df.iloc[0,2]
32
>>> df.iloc[0:3, 0:2]
City Maxtemp
1 Delhi 40
2 Bengaluru31
3 Chennai 35

Accessing Individual Value
For accessing an individual value, we can also use at
in place of loc, and iatin place of iloc.
>>> df.loc[3,'Maxtemp']OR
>>> df.at[3,'Maxtemp']
35
>>> df.iloc[0,2] OR
>>> df.iat[0,2]
32

Boolean Indexing
•If a DataFrame has the indexes as booleanvalues,
that is, True and False it is called Boolean
Indexing.
•The rows of such a DataFrame can be accessed
using the loc as we do in any other DataFrame
•1 and 0 can also be used to represent the
booleanvalues True and False respectively.

import pandas as pd
dict= {'name':["aparna", "pankaj", "sudhir", "Girish"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}
df= pd.DataFrame(dict, index = [True, False, True,
False])
print(df)

print(df.loc[True])

Modifying the data in a DataFrame
The values in a DataFrame can be modified by the same
method as we access the values.
>>> df['Mintemp']=[33,27,29,22,20]
>>> print(df)
City MaxtempMintempRainfall
1 Delhi 40 33 24.1
2 Bengaluru31 27 36.2
3 Chennai 35 29 40.8
4 Mumbai 29 22 35.2
5 Kolkata 39 20 41.8

>>> df.loc[4] =['Mumbai',33,20,35]
>>> print(df)
City Maxtemp Mintemp Rainfall
1 Delhi 40 33 24.1
2 Bengaluru31 27 36.2
3 Chennai 35 29 40.8
4 Mumbai 33 20 35.0
5 Kolkata 39 20 41.8
>>> df.loc[5,'Maxtemp']=42
>>> print(df)
City Maxtemp Mintemp Rainfall
1 Delhi 40 33 24.1
2 Bengaluru31 27 36.2
3 Chennai 35 29 40.8
4 Mumbai 33 20 35.0
5 Kolkata 39 20 41.8

Adding data in a DataFrame
If a column name or a row index is specified, which exists
in the DataFrame, it modifies the values in the DataFrame.
If a column name or a row index is specified, which does
not existin the DataFrame, it is added as a new
column/row.

>>> df['Humidity']=[30,40,55,38,60]
>>> print(df)
City MaxtempMintempRainfall Humidity
1 Delhi 40 33 24.1 30
2 Bengaluru 31 27 36.2 40
3 Chennai 35 29 40.8 55
4 Mumbai 33 20 35.0 38
5 Kolkata 42 20 41.8 60
>>> df['Avgtemp'] = (df['Maxtemp']+df['Mintemp'])/2
>>> print(df)
City MaxtempMintempRainfall HumidityAvgtemp
1 Delhi 40 33 24.1 3036.5
2 Bengaluru 31 27 36.2 4029.0
3 Chennai 35 29 40.8 5532.0
4 Mumbai 33 20 35.0 3826.5
5 Kolkata 42 20 41.8 6031.0

>>> df.loc[6] = ['Jaipur',48,26,12.0,16,37]
>>> print(df)
City MaxtempMintempRainfall HumidityAvgtemp
1 Delhi 40 33 24.1 3036.5
2 Bengaluru 31 27 36.2 4029.0
3 Chennai 35 29 40.8 5532.0
4 Mumbai 33 20 35.0 3826.5
5 Kolkata 42 20 41.8 6031.0
6 Jaipur 48 26 12.0 1637.0

Deleting Rows
To remove the rows from the DataFrame, we use the
function drop().
It displays the DataFrame, removing the row index
mentioned in the drop() function.
To remove the row permanently, a parameter
inplace=Truehas to be mentioned.

df.drop(3) OR
df.drop(3, axis=0) OR
df.drop([3], axis=0) OR
df.drop(index=3)
To remove the row permanently,
df.drop(3, inplace=True)
print(df)

Deleting Columns
To remove the columns from the DataFrame, we use the
function drop()/ pop()/ del command.
To remove the column, we need to specify the parameter
axis=1 with the drop() function. It displays the
DataFrame, removing the column mentioned.
To remove the column permanently, a parameter
inplace=Truehas to be mentioned.

df.drop('Humidity', axis=1)
df.drop(['Humidity'], axis=1)
df.drop(columns = 'Humidity')
To remove the column permanently,
df.drop('Humidity', axis=1, inplace=True)

The pop() function or the del command can also
be used to remove column permanently from the
DataFrame.
df.pop('Humidity')
print(df)
OR
del df['Humidity']
print(df)

Renaming the Row indexes / Column headings
New indexes/ column headings can be specified using the
attribute index and columns.
rename() function can also be used to rename existing indices/
column labels in a dataframe.
The old and new index/column labels are to be provided in the
form of a dictionary, where keys are the old index/column labels
and the values are the new names for the same.
To make the changes permanent, inplace=True needs to be used.

Using attributes
df.index= ['A','B','C','D','E']
df.columns= ['P','Q','R','S','T']
print(df)

Renaming Rows
df.rename({1:'A', 2:'B', 6:'E'})
df.rename({1:'A', 2:'B', 6:'E'}, axis=0)
df.rename(index={1:'A', 2:'B', 6:'E'})
# To make the changes permanent
df.rename({1:'A', 2:'B', 6:'E'}, axis=0, inplace=True)

Renaming Columns
df.rename({'Maxtemp':'High', 'Mintemp':'Low'}, axis=1)
df.rename(columns={'Maxtemp':'High', 'Mintemp':'Low'})
# to make changes permanent in the DataFrame
df.rename({'Maxtemp':'High', 'Mintemp':'Low'}, axis=1,
inplace=True)

To change the index column
To change the index column we can use the
function set_index()
To change the index back to the default indexes
(0,1,2…) we use the function reset_index()
To make the changes permanent, inplace=True
needs to be used.

df.set_index('City')
# To make the changes permanent
df.set_index('City', inplace=True)
print(df)

df.reset_index()
# To make the changes permanent
df.reset_index(inplace=True)

Iterating over a DataFrame
•To iterate over horizontal subsets, row wise
for iin df.iterrows():
print(i)
•To iterate over vertical subsets , column wise
for iin df.iteritems():
print(i)

Binary Operations in a DataFrame
•Operations requiring two values are called
binary operations.
•In a binary operation, the data from the two
DataFramesare aligned, and for the matching
row and column index the given operation is
performed and for the non-matching index,
NaNis stored as a result.

df1=pd.DataFrame(
[[10,20,30],[40,50,60],[70,80,90]],
index=[1,2,3],
columns=['A','B','C'])
print(df1)
df2=pd.DataFrame([[1,2],[3,4]],
index=[1,2],
columns=['A','B'])
print(df2)

Statistics with Pandas
1.min()
It is used to find minimum value from a Data
Frame
2. max()
It is used to find maximum value from a Data
Frame

DF.max() DF.min()
DF.max(axis=1) DF.min(axis=1)

3. mean()
It is used to find mean (average)
DF.mean() DF.mean(axis=1)

4. count()
count() can be used to find the number of non-NA
values along the rows / columns.
D.count()
ORD.count(0)
ORDF.count(axis=0)
ORD.count(axis='index')
D.count(1)
ORDF.count(axis=1)
ORD.count(axis='columns')

5. sum()
It is used to find the sum of values.
DF.sum() DF.sum(axis=1)

Applying functions on particular row/column
•Particular columns
DF[2016].min()
•Particular rows
DF.loc['Qtr1'].min()
•Particular subset
DF.loc['Qtr3':'Qtr4',2018:2019].count()

Sorting
Sorting means arranging the contents in ascending
or descending order.
The default sort order is ascending.
To arrange the data in descending order, add the
argument ascending=False

import pandas as pd
d = {'Name':['Sachin','Dhoni','Virat','Rohit','Shikhar'],
'Age':[26,25,25,24,31], 'Score':[87,67,89,55,47]}
df= pd.DataFrame(d)
print("Dataframecontents without sorting")
print (df)

df.sort_values('Score')
df.sort_values(by=['Age', 'Score'],ascending=[True,False])

Head and Tail functions
•The head() function returns the first n rows
and tail() function returns the last n rows.
•If n is not specified, the default value is 5.

XII IP Ch 2 Python Pandas - II DataFrame.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

XII IP Ch 2 Python Pandas - II DataFrame.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......