class xii ip preeti arora.pdf

7,414 views 81 slides Oct 13, 2023
Slide 1
Slide 1 of 81
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81

About This Presentation

Class XII- IP pdf


Slide Content

Data Handling
es using Pandas-

1.1 INTRODUCTION
Data scence or data analytics isthe process of analyzing large
set of data points to get answers to questions related to that
dataset. The need for data analytics arises to handle huge data
which is an area of concern for large business organizations,
government bodies, communities and consumers. How to
handle this big data paves the way forthe field of data analytics,

Data processing is an important part of analyzing the data because data is not always available
in the desired format. This can be better explained through Data Life Cycle. Fig. 1.1 describes
‘exactly what data life cycle is.

smolts

Bull

Data is stored in different formats—csv file, an Excel file or an HTML file. This data is transformed
or converted to a single format and stored somewhere and that's where data warehousing
comes into picture. Once we have stored the data, we can perform certain analysis on i
ke, we can perform join and merge data operation, search operation, etc. Once analysis is done,
‘we can plot this data in the form of graph which is data visualization. All these sequences of
operations for data analysis can be easily and effectively performed by Python and is libraries.
Python Pandas has become a buzzword in the Python community It fs an important tool used
nowadays in the field of data sciences.

ig. 1.t2Dat Lite Cele

Thus, for performing data analysis using Python, we need to Import this particular Python module
or, we can say Python Library. So, the next question arises—what is Python Pandas?

MLamicu wit LA

1.2 PANDAS

Pandas (PANe! Data) isa Python module that makes data science
Pandas (PANDA) Rene vo

NA

hat offers powerful and flexible data

Weis a hee data manila tol developed hy Wes
Mekinny at on pages ke uy and Mage 10 .

a ‚em, which is an ecometric term for
he term Pandas is derived from "Panel data syste

sional structured dataset Pandas is an open-source Python library which provides
ide

ange of ies including academic and commercial domains, which include finance, economics,

CCM: Pandas i software bay written forthe Python programming language for data manipula
and analysis,

It is but on packages like NumPy and Matplolib and gives us a single, convenient place to do
most of our data analysis and visualization work. You may wonder where is the need for Pandas
¡when NumPy can be used for data analysis Following are some of the differences between
Pandas and Numb

1. A NumPy array requires homogencous data, while a Pandas dataframe can have different
data types (float, in, string, datetime, etc).
2. Pandas havea simpler interface for operations like file loading,

Plotting, selection joining,
GROUP BY, which come very handy in data-processing

applications
Pandas dataframe (with column names) make it very eas

0 keep track of data,
Pandas is used when data isin tabular format, whereas N
array-based data manipulation,

is used for numeric

1.2.1 Features of Pandas

Pandas is the most popular library in scientific Py
an handle several tasks related to data process
1

thon ecosystem for doing data analysis, Pandas
sing and offers the following

1 salient features:
It can read or write in many diferent data format

(integer, oat, double, etc)

Columns from a Pandas data structure can be deleted or inserted,

"supports group by operation for data aggregation and tr

Performance merging and joining of data

4. offers good 10 (Inp
directly into a dataframe,

nsformations, and allows high

utput) capabilites as it

sy pulls data from

MySQL database

{ean easily select subsets of data from bulky data sets and can even combi
sets together

tiple data

[thas the functionality to find and fl missing data,

allows us to apply operations to independent groups within the data,
"supports reshaping of data into different forms.

pramisu with Ca

9. supports advanced time-series functionality (which isthe use ofa model to predict future
values based on previously observed values)

10. It supports visualization by integrating libraries such as matplotlib and seaborn, etc.
Pandas is best at handling huge tabular data sets comprising different data formats

1.3 INSTALLING PANDAS

Pandas module doesn't come bundled with Standard Python,

‘Therefore, itis to be installed using “pip” command from PyPI

(Python Package Index). To install Pandas:

Step 1: Open command prompt and run as administrator as
shown in Fig. 12(8).

ing command
prompt as administrator

is.

Step 2: Once the administrator window gets opened, before installing with pip, make sure that you
are onthe right path, Le, Scripts sub folder in Python37-32 folder, as shown in Fig, 1.2(b).

SES

Fig. 1.210: Bowring to Scripts older through Python defaut path

Step 3: Move into Python37-32 installation scrips folder and type dir to view its files. You will
see a file called pip.exe which will install Pandas as shown in Fig. 12(0).
SE

Fig. 1210 command diplying pipe j
HLGIITIEU wit LA

204),

Fig. 1.26): Pandas

ge installed sing PP
Now, we will be able to use Pandas in St

dard Python Distribution.
With the installation of Pandas, NumPy (Numeric Python) will also be installed
automatically onto our system, Since Pandas cannot handle arrays on its own, NumPy
is required tobe installed. NumPy isa Python library which handles arrays

which we
have already discussed in detail in Class XL

‘Testing Pandas

‘To check whether Pandas has been installed or not
* Type import pandas as pd in Python (IDLE) shel
* Mitis executed without error it means that P

À

>

’andas has been installed on our system.

prarmisu win Ca

ME TT zu

[ton 37.0 cy. orteesecso3, AT 26
ESP bit Intel) cn mio

yee "copyright", Comet or "cense()" for

Se es

AI UN EE vas

Sntorsation

As shown in the above figure
Python shell prompt which indicates that Pandas has been installed,

pressing Enter key, the cursor starts blinking In front of

1.4 DATA STRUCTURES IN PANDAS

A data structure fs a way of storing and organizing data in a computer so that itcan be accessed
and worked with in an appropriate way. Iti a collection of data values and operations that
‘an be applied to that data. I enables efficent storage, retrieval and modification ofthe dat
Pandas provides and deals with the following three data structures

Series: It is a one-dimensional structure storing homogeneous (same data type) mutable
(which can be modified) data

> Dataframe: Its a two-dimensional structure storing heterogeneous (multiple data type)
mutable data

rane: It isa three-dimensional way of storing items.

We shall discuss Series and dataframes data structures only as Panel is beyond the scope of
this book

1.4.1 Series
Series is ike a one-dimensional structure with homogencous (same type) data. I contains a
sequence of values and an associated position of data labels called is index, which by default
have numeric data labels starting from zero. We can also assign values of other data types as

index
For example, he following series isa collection of integers.

o #8
1 ss
2 0
3 7
4 @

thus, series in Pandas is like a one-dimensional structure capable of holding homogeneous
data of any type (integer string, oa, Python objects, etc) We can imagine a Pandas series asa

Column ina spreadsheet. A seres can also be described as an ordered dictionary with mapping

of index values to data vales

Cn En Be
2

o rT sunday
144 won ‘Monday 2
14 0 m Tuesday 3
3 0 perl 30 ‘Wednesday 4

Fig. 13: rampes of Serie type Objects

wiui Ca

he series data is mutable (ie. can be changed) 9.

stat fe cannot change the size of ase

ne importan members
‘One important point to ren en
Aie ad mes n , Serie) method. Alo, any list or dictionary day

es in Pandas can be created using S

ser is method.

ing thi
converted into series using

beted structure capable of holding any data type integers, string,
‘Em, Series ta one dimensional abel

floating point numbers, Python objects e

1.4.2 Creation of Series
There are several ways to create a series-type object. Befor
Pandas modules in your programs.

hat, make sure you have importe

Let us discuss these methods to create a series.

1.4.2.1 Creating an Empty Series using Series() Method

A basi seis which can be created isan Empty Sete. This is created just by using Series)
with no parameter ts syntax is

Practical Implementation-1

create an Empty Series using Pandas

See Pati, ts e

ive copyright, seredita” or =1i *
[Ebel For mre Information:
pandas 3 pa
Gera

Here,
Lis the series variable
Series() method dis
type attribute dis
dis an alternate

plays an empty list
plays the series data
rame given tothe Pa

(by default) along with its defaut dat
‘ype which is float by default
das module

a type.

1422 Creating Serie using Series() with
Ati created u
Toit. ts syntax fs;

Arguments

Pint Series) method by passing index and data elem

2. A seal or constant value

4

‘mathematical expression/function

tanııcu will

Let us discuss these methods one by one,

1. Creating a series using List
Like an array a list is also a one-dimensional datatype. But the difference lies in the fact that
an array contains elements of same data type, while a list may contain elements of same oF
different data types. A list can be converted into a series using Series() method. Its syntax is:

pd. Series (Idatal,

where,
+ data can be a Python sequence such as list, dictionary or scalar value.
+ index is the numeric values displayed with their respective values.

Practical Implementation-2

To create a series using List

in the above program, list of integers is converted into a Pandas series using Series() method.
But in the output, two columns are displayed irrespective ofthe fact that we haven't given
‘any index to these elements. This is because Pandas creates a default index and automatically
“assigns the index values from 0 to 3, whichis the length ofthe List-1

Creating a series using two different Lists
‘You can create a series using two different ist. The two lists are passed as arguments to Series()
method, out of which the first list (argument) willbe index and the other one (lis) willbe the
values. As observed from the output window for Practical Implementation-3 given below, month
names, the first argument to Series) method, will become index for the generated series.

Practical Implementation-3
‘To create a series from two different Lists.

Data Handling using Pandas-1

ion
range) method

Practical Implement

‘To create a series using F

‘output displays two columns—the fist one

As shown above in the Python she window, the
from 0 to 4. The last element display

displays index values and the second displays values
‘sing range() is size =1, which I (size) ~

Creating a series with range() and for loop
in the previous implementation, we have used range() to display the index values as integers
‘This can be modified by displaying index values through for loop by iterating between the
characters given along with ‘in’ keyword using range() method for displaying the associated
elements ofthe series

Practical Implementation:

‘To create a series with range() and for lop.

oe
pe RER ann, we
[as

With reference to the above output, the series elements are displayed starting from 1 to 15 wit
difference of. Also the index value gets displayed iteratively (sing fr loop starting from a ta‘!

Practical Implementation-6
Handling Moating poln values for generating a series.
aa rane

ean
o> scien pena, cer.

Since 7.5, one ofthe elements inthe list, sa float value, it shall convert the rest of the integer

values into float and, hence, the result displays afloat series
Creating a series using missing values(NaN)

te a series object for which size is defined but some

elements or data are missing. This is handled by defining NaN (Not a number) value(s), which

is an attribute of NumPy library and this ean be achieved by defining a missing value using
NaN. But before that, we need to import both Pandas and NumPy:

3 sotjrpd: sere ((728,8-4 ep, 234,51)

(CTI: NAN i à special Moating point value, indicates missing or ul vales in Pandas. vis an auibute |
cof NumPy Library.

Practical Implementation-7
‘To reste a series with explicitly defined index values. (Modification of Practical implementation-2)
With reference to Practical Implementation-2, no Index value has been assigned to the list
elements, so Python automatically assigns indexes to it. You can also create a series with your
‘own defined index values,

Deal index]
located

[ri posabtero change] [Sie 3
the deni pace EA

You can modify the Index values in place as per requirement

Accessing Data from a Series with Position
a series by passing the positon value and even through si,

We can also access data from
index by the corresponding nung

Using a series, we can access any position values through

dex. (Retrieving by label)

To access single and multiple values based on in

ae
233 seciesaepa series ((20,20,30,40,5
doo Het le

Single vale

Le ornées poston |
|

ls — fuite vales accessed

eres tacst or motile ders |

As shown in the above output window, we can call the value of index or label ofthe series listo
A value, Thus, index value ‘of label shall return its value as 40. Therefore,

“and multiple index values using the commands described above.

we can retrieve sing}
Alternative,
Practical Implementation-9

‘To perform indexing slicing and accessing data from a series

men
en

Base |
Bean ers sales
yoy

ae

e 4 .
eos le >)
dee: intse ype int
8
a4
es
valet

As shown above, using the first statement, the element at‘ position from the series ‘shall
be displayed. The second statement shall display first three elements (Le, values on index 0 1,
and 2) from the series and the third statement shall print the last three index values because
‘of negative indexing (-3 Index value)

EN
eS

pramisu with Ca

iocf} and oc!) indexing func

Indexing and accessing can also be done using iloc|] and oc[] methods,

ilocl}—iloc is used for indexing or selecting based on position, Le, we have to specify integer
index of row and column. It refers to posiion-based indexing, The syntax is

loc[]—0c is used for indexing or selecting based on name, Le, we have to specify row name
and column name. It refers to name-based or label-based indexing. The syntax is

(CTI: loci} method does not include the last element ofthe range passed in unlike loc

‘To display the data using index - «110€ (1:4)
To display the data location-wise - s.1oc{"b':'e")

In the context of the above series ‘s, on applying iloc[] and loclj, the output will be as
shown below:

ort pandas 32 pd
AA)

Paine)

2. Creating a series from Scalar or Constant Values
series can be created usinga scalar or constant value as shown inthe code window. Here,
data isa scalar value for which itis a must to provide an index. This constant value shall
be repeated to match the length ofthe index.

Practical Implementation-10
‘To create a series from a numerical scalar or constant value.

pandas a2 pa
a Sales, sea 0,100,100)
Peine werten

maothez foraat,
Martes2 =pa.zeries(55, index =10,1,2,0
print sortes2)

‘RESTART: ¢:/Users/preetiy

Ss —
e mes

eyes does

Here, 55 is repeated 5 times (as per the number of index)

Alternatively this can also be done using range() method for accessing each index position in
the ls.

A)

In be first case, constant or scalar value of 10 gets displayed for each index, from 0 to 3, last
value being excluded,

In the second case, again the scalar value 15
between 1 and 6 with a difference/gap of 2.
shall be displayed for each scalar value 15,

shall be displayed forall the index
Hence, the index values 1, 3 and

dues lying
5 respectively

Creating a series with index of String (text) type

String or text can be used as an index to the

elements of à series. In other words, we c
that index ofa series can be of string type.

an say

Practical Implementation-11

To create a series using string both as index.

and constant value

vest paca as a
RE se eet ase y

veamicu wiul Ca

BESTANT: c:/Users/preettnppoata
trineo

As observed from the output window given above, the index is provided on the left with the
values on the right and in Pandas, series indexes can be of string type.
3. Creating a series from Dictionary

A series in Pandas can be created using dictionary also. Using dictionary for creating series

gives us the advantage of built-in keys used as index. We don't require declaring an index
as a separate lis; instead, built-in keys will be treated as the index.

Practical Implementation-12
To create a series from Dictionary,

oa

à dictionary can be passed as input and, Ifo index specified the dictionary keys are taken

in a sorted order to construct index.

Naminga Series
also give a name to the two columns, index and values oa serie

AAA ER

We can ss using ‘name’ property

apart pau REN
aces toes acne dass cole

e is assigned the name Month and daa Is assigned the name Days

ote that the index columi
es Days gets displayed.

displayed at the bottom of the series. Name:

4. Grea

Inga series using a Mathematical Expression/Function

A series object can be created by defining a function or a mathematical expression tha,
determines the values for data sequence using the syntax as follows:

Practical Implementation-13

To generate a series using a mathematical expression.

ecbpd Series tinder ost catacsi)
EA 7

Practical Implementation-14

To generate a seri
es using a mathematica fi

function (exponentiation).

5 cated using
ated series and data is

Para fo ed

dimensi
lonal (1D) y

Ah program, IA We must ensy that

prarmisu wıuı Ca

Practical Implementation-15

‘To generate a series using a one-dimensional (1D) array.

‘The following practical implementation shows that we can use letters or strings as indices:

Practical Implementation-16

‘To generate a series using a one-dimensional (1D) array with explicit index.

When index labels are passed with the array, then the length of the index and array must be of

the same size, else it will result in a ValueError.

1.5 SERIES OBJECT ATTRIBUTES
We can obtain various properties of a series through its associated attributes. Some common
attributes related to series object are described below and are accessed using the syntax:

= 2

E
‘Seresindex Returns index of the series E]
po Rt ny E |
Serexdype Reta ype object f the underying da z
Seiesstpe Ruins tpl ofthe shape of unering daa 3
Sesezmbyes _ Retuns number of ys ofundrng dat =
Seresndm ___ Rtunsthenunberofimension E
Seiesite ___ Retumsnunberofeiemens E
Sevsitemsze Retos the sie of the dope E
Seveanomans_ Retum wu ter aan NaN

Seriesempty Retuns ue eres object is empty

Practical Implementation-17

‘To illustrate the various attributes related to a series object

> sise

1.5.1 Retrieving Values from a series using head() and tail) Functions
The Series.head() function ina series fetches first 'n from a Pandas object. By default t gives
tus the top 5 rows of data in the series. On the contrary, Series.tail() function displays the last
5 elements by default. However, we can pass the number as parameter for the number of values
tobe pulled out from the Series and Pandas shall print out the specified number of rows.

Practical Implementation-18
To illustrate the working of head) and tail) functions on a series.

1.6 MATHEMATICAL OPERATIONS ON SERIES

Mathematical processing cane performed o seg sao vals and functions Ale
sted operar sth "ean be sce peed A
{Sep iind hate Inden hl be same frat ce ees On HE

prarmisu wıuı Ca

practical Implementation-19
lo lustrate the working of mathe
sill 8 of mathematical operators on series

jonsider four series 5, s1, 3 and s4
a | 53 and sé and the output obtained after

operations on them. ding mática

¡ES

er lo 12.0,
Pal 102 NaN 104
type: intse Addition of sand 53

wsionafsands! Jp 0.545455 | [arithmetic operation is posible
565217 | | objects of same index: otherwise

0.585333 | [su as na Mota Number)
lstype: floated |

1.7 VECTOR OPERATIONS ON SERIES

ons. Any operation to be performed on

ZR” “anno fem com el

Series also supports vector opera a series gets

performed on every single element of it.

For example

prarmisu witli Ca

1.8 RETRIEVING VALUES USING CONDITIONS

We can also give conditions to retriev

For example,

as
See |
As shown nthe Python set window the fst Statement displays e serie. Inthe second
eme sched for condition fo each lement in the sens and returns Bosa Tr
tl ther words this ia wecortaed operation which checks for every element nthe
ni le third statement itis performing the FILTER operation and earn tere result
encia oly those ues that eur True rte ven Bolan expresion [2 a om

‘etna eeu output. Sry meat statement 22. wl check forthe vale which
Arne than real and wil play he value hat cacy matches with the Bolen Trae

1.9 DELETING ELEMENTS FROM A SERIES
We can delete an element from series using drop() method by passing the index of the element
to be deleted as the argument to it

ode vale

prarmisu win Ca

1.10 DATAFRAME
In the previous sections, we have discussed series in detail, One limitation of series is that it
is not able to handle 2D or multidimensional data related to real time.

For such tasks, Python Pandas provides another data structure called dataframes

Dataframe object of Pandas can store 2D heterogeneous data. I is a two-dimensional data
structure, just like any table (with rows & columns). Dataframe is similar to spreadsheets or
SQL tables. While working with Pandas, dataframe isthe most commonly-used data structure.

‘The basic features of dataframe are:

(0) Columns can he of diferent types. Le Columns
is possible to have any kind of data in “Is le[ole]|
columns, Le, numeric, string or Moating
point, ete

(li) Size of dataframe is mutable, be. the
number of rows and columns can be
increased or decreased any time,

(iit) ts data/values are also mutable and can be changed any time.
(0) Labelled axes (rows /columns)
© An

(1) Indexes may constitute numbers, strings or letters.

smetic operations on rows and columns.

1.10.1 Attributes of a Dataframe

Like series, we can access certain properties called
attributes ofa dataframe by using that property with
the dataframe name, separated by dot (operator

synta

taFrameobjecto. <attribute_nane>

Let us understand all he attributes while considering
the below dataframe as an example which holds QPI
(Quality Performance Index) in the four subjects for
‘three successive years

Aecountancy 854 79
v 2m m
Economics 803 76 775
De Ta 762 005

1. index
‘This attribute is used to fetch the Index's names as the Index could be 0,1, 2,3 and so on.
Also could be some names as in our example indexes are: Accountancy, I, Economics,

English

syntax:

2. columns
This attribute is used to fetch the column's names, sin our case it should give column name
as: 2018, 2019, 2020.

Syntax:

|
3. axes

This atribute is used to fetch both index and column names.

Syntax

<DataFrareObject>. <axes>
or

Print (d£. axes)

taframe,

<DataFraneos
taFranedbject>. <atype

or ”
Eine (df. stypes)

prarmisu wıuı Ca

5. size
Tri abate used och the ste ft data which Ihe product of te number
of rows and columns.

Here, in our example, we have 4 rows and 3 columns, so 4*3, ie, 12 isthe size of our

dataframe.

Syntax:

<Datarr
or

(ae. s

6. shape
‘This attribute also gives you the size but it also mentions its shape, Le, the number of rows.
and number of columns,
syntax:
<D

. <shape>

20bje

or

print (df. shape)
Pe

7. Naim
“This attribute is used to fetch the dimension of the given dataframe, Le, whether its 1D,
20, or 3D.
We are working on 2D Data Structure.

Syn

<DataFramedbject>. <ndin>
or

print (af. ndim

8. empty
‘This attribute gives you a Boolean output in the form of True or Fi
yy empty or missing value present in the dataframe.

alse by which we can find

A using Pandes-}

out if there is am

synta
<bataFraneobject>. <empty>
ther attribute that can check the presence of NaNs (Not a Number),

which

[Data Handling

We have anot
is isna()-
syntax:
<DataFrameobjecto. <isnaQ)>

pranııcu

dtaframe. By default, it gives the count

This attribute gives the count ofthe items in the

the rows.

We can set count (0) or count (1); 0s for displaying the count of rows (this is by defaut)

and 1 is for displaying the count of columns.

Instead, we can use axis='index or axis=columns'

Syntax:

rameOoject>. <count (>

10. T (Transpose)
This attribut is used to transpose the data
become rows,

Syntax:

1.10.2 Creation of Dataframe and Display
Before creating a dataframe, we must understand

The table represents the data of students witht

‘The data is represented in rows and columns. Each column repres

marks, grade, etc) and each row represents reco

rl

je begins with the creati

Creating a datafram
is us

empty dataframe, DataFrame() metho
À fedazaframe_nane>)is used

ame, Le, rows become columns and columns

the layout of Daaframe (Table & description.

thei marks along with the respective grades
sens an attribute (like name,

rd ofa student.

fon of an empty dataframe. To create an
nd To display a dataframe, the command

Data Handling using Pandas-t

Dataframe can be created with the following constructs:
© Series
> Dictionary
= NumPy ndarrays
1. Creating Dataframe from Lists
The simples form of generating dataframe is using lists. The lists passed as an argumenl|
to DataFrame() method and gets converted into a dataframe with elements displayed,

columns and index automatically created by Pandas. We can also supply index values fy
arranging elements in the form of rows and columns.

Example 1: To create a dataframe from list.

222 Gch epa. patates assed)
Dos paar

andas se pal
[atar PTS 2, trato, 2, 1
el per an cian hey ae

2. Creating Dataframe from Series.
Dataframes are two-dimensional representation of
series in the form of rows and columns, À
concept better, let us cre
series comprises their m:

series. When we represent two or more
t becomes a dataframe To understand thi
ate two series and then pass those into a dataframe. The first

arks and the second series constitutes their age.

ppLarnmicu will

Practical lmplementation-20

To create dataframe from to series of student data,

" Brass

IT rn ares sa

one mt, ater 2901)

Sorting Data in Dataframes.

Ve can sort the data present inside the dataframe by using sor values() funcion. In this function,
‘wo arguments are passed out of which the frst ls the sorting field and the second is order of
sorting. you are not providing anything then by default, the data shall be sorted in the ascending
order or it can be explicitly defined as ascending = True; for descending, you have to give
ascending = False

Practical Implementation-21

To sort the data of student dataframe based on marks.

nds a8 pt
EEE Spin

Na

“y keyword defines the name of the field or column based on which the data isto be sorted.
By default, the data Is sorted in ascending order. n oder t sort the data in descending order,
‘you have to make ascending as Fals, This shall sort the data inside the dataframe on the basis
of marks in descending order as displayed in the output window.

Data Handling using Pandas-1

scart coupe a

ET dE

say =

Song an on tet
ER isi descending order
ES rr

3. Creating Dataframe from Dictionary
Dicionariescanbe passed san input datato create a dataframe. The dictionary keys bye
retaken as column names. The diferent ways to create a dataframe using dico

(@) Dictionary of ist
(6) Dictionary of series

(©) List of dietionaries

‘Consider a student table represented as a dataframe, created using series containing name
ofthe students and their marks in 4 subjects

(2) Creating a Dataframe using dictionary of ist.

First of all, we shall create a series from dictionary elements representing studen
names and marks in thelr respective subject

Beaten AA = 0

- 0 x

nos 3 een an, 0x FA
me izo, terrier, tereaieat
hernie goss oe/rrogems/ yen en rate

(0) Creating a Dataframe from dict

nary of series.
‘dictionary of series can also be used to create a dataframe, For example, Stud. Result
is a dictionary of series containing names and m
subjects

< of four students in three

| Gcencing a dotan Tees aleron ar sort A
AS

LE paie pa

ecpacseblesttonisee, een! snaucyar monta

[pepe derien 189,701 0

PARA MES

ropacceriestt9, e,

des tnformatice practices

ENS a =

Alternatively, we can also create a dataframe from a dictionary of series by directly
passing the values to the dictionary using Series() method as shown in the given
Implementation:

Practical Implementation-22

To create a dataframe from a dictionary of series by directly passing the values tothe dictionary
using Series() method.

AA

| seoapebulee onan tpd.serkes le
CEnpischeapacsartes (09, 78,89,500
TesbecolcstspacSerten (1878060
re eeacces spas series (09,7067 S000)

tccesting aatatease

separa a

Ps aa
[lemas cerner
NER caga temente intomties metes

Data Handling using Pandas

neu LA

4. Creating a Dataframe using list of dictio
Practical Implementation-23

‘To create a dataframe by passing a list of dictionaries.

import pandas as pd E
nevstudent ={("Rinku':67, "Ritu! 88, Aditya’ :92),
CRin +77, "Ritu! 27, "Panka)":65),

(Rinku 288, jay" OS

newdt «pa.DataFrane (novstudent)
Print (nude)

RESTART: C:/Usexs/preet/AppData/Local,
List.py
Aditya Ajay Pankaj Rinku Ritu
920 7 6 67 76.0
max 67 65 77 58.0
700 67 74 88 NN

‘As shown in the output window, NaN (Nota Number) is automatically added in missing places.
5. Creating a Dataframe using NumPy ndarray

We can create a dataframe using NumPy ndarray by importing NumPy module in ou
program,

Practical Implementation-24

To create a dataframe from NumPy ndarray.

a PR
Fue Eat tomas Rn Optont Window Help

Vereating Dutafrane Eres ndarzay
Import pandas as pd

ot = pd.batatvane (daca sar

|4\aisplaying che datatcene
brine tae)

ccolums = column values)

_ we cad

prarmisu wıuı Ca

Drs —— ,
fie at Se Dé Ost Winton ne

Poe DRE OA BEE BRE 0 WE VAY BEE Ot

4.10.3 Selecting and Accessing, Modifying Records from Dataframe

You can access and retrieve the records from a dataframe through slicing. Slicing shall result in
the display of retrieved records as per the range defined with the dataframe object

Practical Implementation-25

‘To display records from the first to the third row.

rang Speen) fe tie e 3 wong
me | |? E E

[Records from the 1st to 5 EE] #

the 3 ow ae displayed

Practical Implementation-26
‘To create an indexed dataframe using lists.

You can give index to the dataframe explicitly Inthe following format:

"snos

‚me (student, index= ["Snol", "Sn02", '5

Data Handling using Pandas-I

as = pan:
ARA AA SOE ES
vane English Economic
soot nines et 5
E] ë
FETE: a
peer E

MLamicu wit Ca

In the above case, the default index value starting from O has been replaced by serial nur

as the index for all the records inthe dataframe,

ion-27

Practical Implement

‘To change the Index Column,

lex is displayed from Oto n-1, where n s the total number of records Pag,
y to select the other column present in the dataframeg

ditional 012.. and n. This can be done using set. index

By default the ind
also provides us with the Mexibili
the index column instead of the trad
method inthe following format

suppose we have to take Name as the Index column, This is done as:

[Batak wer | Townley a

Inder column changed ces tememos
[rename mm

As observed from the output, the default index has been replaced by the field ‘Name’ as inplac
attribute has been setto True, This 1 0 because a dataframe by default in not replaced/change
‘and on changing the index, a new dataframe with modified values is returned. Therefore, i
‘order to make changes permanently in the original dataframe, inplace attribute is used an
set to True.

Practical Implementation-28

‘To reset Index column

‘To undo the above operation, ie, allocating default values according t the student names, ve
can get the original Index values, Thus, we can get our original table with default integer inde
{sit was in the previous implementations using reset index() method, The command is

‘The “inplace’ keyword is made “True and shall result In replacing the default
index value

ntegers as th

” HET a

Bogs À]

pramisu with Ca

In the above case, the default index value starting from O has been replaced by s

as the Index for all the records In the dataframe.

al nu
be

Practical Implementation-27
‘To change the Index Column.

By default, the index is displayed from 0 to n-1, where n is the total number of records, Pang,
also provides us with the flexibility to select the other column present in the dataframe à}
the index column instead of the traditional 0,1,2.- and n. This can be done using set index |
method in the following format:

x('Name', inplace=True)

Suppose we have to take Name as the index column. This is done as:

ane English Economies 19 Accounts

mal Fr)

>>> dt set_Andex (name, Snplacest sve)

o A 3 38-3

‘As observed from the output, the default index has been replaced by the field ‘Name’ as inplae
attribute has been set to True. This is so because a dataframe by default in not replaced/change
And on changing the Index, a new dataframe with modified values is returned. Therefor,
‘der to make changes permanently in the original dataframe, inplace attribute is used an

set to True.
Practical Implementation-28

To reset Index column.

‘To undo the above operation, Le, allocating default values according to the student names,

can get the original index values, Thus, we can get our original table with default integer inde

ge it was in the previous Implementations using reset index() method, The command is
df. reset_index (inplace= True)

“The ‘inplace’ keyword Is made True’ and shall result in replacing the default integers as UN
‚dex value

[o> Ares der haplace =

i EN A
5 as A]

prarmisu win Ca

Iris clear from the above output that the default integers are displayed in place or, in other
words, in their original format

4.10.4 Adging/Moditying a Row

Wecan add a new row toa dataframe using the DataFrame.loc[] method. Consider the dataframe
Newstudent that has three rows for the five students. We need to add anew record a location
3 in the dataframe.

a Dataframe

[e ep]

Eo, ne,
| ada

vi guar or

Ras comunista]

"lle jay rena nn ee

We cannot use this method to add a row of data with already existing (duplicate) index value
(label). In such a case, a row with this index label will be updated/modified as shown below:

es su

If we try to add a row wit lesser values than the number of columns in the dataframe, it results
in a Valuetror, with the error message:

Value8rror: Cannot seta row with mismatched columns.

Similarly if we try to add a column with lesser values than the number of rowsin the DataFrame,
it results in a ValueEror, with the error message:

VolueBrror: Length of values doesnot match length of inde.

Data Handling using Pandas-1

4.10.5 Renaming Column Name in Dataframe
By default, the column label given to dataframe is range index, e, 0 to n but Pandas provides
us with the Mesbility to even change or rename any column inside a dataframe

Letus take a ist of Admno of students as:
al = 101,102,103, 104,108)
We shall be renaming this column name at" to Admno.

Ss —
[Betoun column name

[so

the only given column e

a dataframe?

we have learnt tor
fic column of

above Practical Implementation,

In the
Jp same, But what happens when we have to rename SPS
tn Pandas, this Is done by using the function rename() 32 exhibited in the Praccl

Implementation below

practical Implementation-30

‚me student with colur

Also rename the cl

Seas
aie pre

+ of the student, student's marks it

ns such as nam
mate Practices.

Create a datafra
urn Name as Nm and IP as Infor

subjects IP and BST.

Krane

ode, rename() method has been used to rename a column IR

As is evident from the above €
Pandas dataframe.

‘nother way to change column names in Pandas isto use

pester ere ots nl tse Way Cow ac hance aves fee Using renamel)
nd not al column names need to be changed, hange names of specific columns casi

veamicu wiui Ca

One ofthe biggest advantages of using rename() function i that we can use rename to change
‘as many column names as we want. Its syntax is:

inplace=True)

Where dis a dictionary and the keys are the columns you want to change. The values are th
new names for these columns. Als, inplace=True is given as the atribute of rename() to change
column names in place, Thus, as shown in the given output window the column names of student
<dataframe, Name and IP have been renamed as Nm and Informatics Practices respectively.

4.10.6 Adding Column to a Dataframe
You can add new columns to an already existing dataframe,
Syntax to add or change a column:

<éfobject>.<tiew Col_Ne

me>[<row Label

Practical Implementation-31
o add a new column to a dataframe.

In Practical Implementation-29, we generated a dataframe for AdmNo of all students
Suppose we have to add a new column for Name, Physics, Chemistry, Maths into the same
dataframe.

It will be done as follows:

a ('Nane")=('5!
EL’ Physics’ ]-pd. Series ((89,78, 65, 45, 591)

d£[ "Chemistry" J=pd.Series ((77, 83,74, 60, 56), Sndex=10,1,2,3,41)
af Mathe" ]=pa. Series (188,65, 79,78, 58})

6f {Total "| -dE["Physics'] +4£[*Chenistry*]+d£["Maths")

Gunjan", Tanya’, "Kirti,"

Let us implement this in Pandas code.

‘tog tune es es mot

Now in the above code, we have created five new columns in addition to the column Admno
which we renamed in Practical Implementation-30. We have passed a list of Names for
the new column Name. This shall be automatically copied to all the rows by Pandas. For
the next columns, Physics, Chemistry and Maths, we have given all the elements with their
“associated index values. Then the sisth column comprises the total of these three columns

for every student.

ud

va

tic operators in any of the columns y

‘Therefore, we can update the column values by arithm
in oy ¢ Fits specific column,

the dataframe. Also, we can assign or copy the values of the dataframe oF
with the help of assignment operator (=).

Suppose we have to add a new column for Average of total marks in three subjects forall the
students. It will be done as:

aE ["Average") = d£["Total')/3

DS aa TTA |
Pyro mane Physics Chemistry Maths Total Average
jo “ier mnt 0 ge PR lso
E ETA
Bo m 8 M53 ns man
É ju du 8 BR 82 6000
His ve SS DO de Ses

While adding a new column to an already created dataframe, the length of values of new column should
match and be equal to the length of the Index column.

tation-32

Practical Impleı

Create the following dataframe by the name (test result) consisting of scores obtained by

students in the test

Add a new column ‘qualif to define the status of the participants in the qualifying exam.
rare nc dense 070 ~ ©

Alternative we can also adda new column using insert() method,
By using insert function mn
ee "we can add a new column to the existing dataframe at any position/

pranmicu wit LA

syntax to add a new column to the existing dataframe:

<afobjeot>. insert (n, <new_column_nane>, [data])

Here,

1 isthe index ofthe column where the new column isto be inserted

unn_nane>: new column to be inserted

[data]: values to be added to the new column

With respect tothe above example, we can add new column ‘qualify’ at index 3 In te dataframe
using Insert) function

140.7 Selecting a Column from a Dat
‘The method of selecting/accessing a column or column(s) ofa dataframe Is similar to slicing
using series. Pandas provides three methods to access a dataframe column(s).
> Using the format of square brackets followed by the name of the column passed as a string
value, lke df-object{‘column,name’)
> Using the dot notation like df-object.column.name.
> Selecting or Accessing Rows/Columns from a DataFrame— using loc) and loc() method
+ 10c(): label-based indexing
+ iloe(): index-based indexing
Using numeric indexing and the loc attribute, like df-objectocf: <column.rumber>].
Here, stands for integer, which signifies that this command shall return a numeric value
‘denoting the row and the column range.
= We can select and display any column from the dataframe by simply giving the name
‘of the column to dataframe object, f, ke this
flTotal (square bracket format) or d Total (using dot notation} and implemented in
Pandas as fellows:

+ Usingiloc—In the above command, single column or multiple columns) can be displayed a,
a series, but if we intend to display all the rows and columns of the dataframe, this can
done using loc.

Syntax:
exes)

Af. {loc [row-indexes, colum
For example,
Af you want to access the first and the Ach columns from the given dataframe, this can be

done using ioc lke:

As itis observed from the output displayed, all the rows of the first (Name) and fifth (Total

column are shown respectively. Here, [] signifies all rows and [1,5] indicates 1st and Sth index

columns.

‘This was for selective columns. Ifyou wish to extract a range of columns, suppose the first five

columns ofthe dataframe df, then the command shall be [0:5] as shown below:
SERA]

Similarly, or the frst two columns, it would be

Pam |

ft 102 Gunde

Practical Implementation-33

Implementation of loe() and loc() method with respect to “student

ts anar sia

(PESRRE: comprame

1.40.8 Deleting a Column from a Datatrame
Like adding a new column toa daaframe, you can alo delete a column from a datframe.
Cums canbe deleted from an existing dataframe in three was:
2 Using the del keyword: wil simply delete the series entre column adits contents) fom
the datatrame (in-place)
> Using the popf) method: op) wil delete the series and also return the series as à result
{also in-place,
> Using the drop() method: ts sytax is:
a
Ie wil return a new dataframe withthe columnf) deleted or removed, a
Column and O stands for Row. By default, e alu of axis is 0

pllabels, axise1)

(

prarmisu wiuı Ca

1.11 ITERATIONS IN DATAFRAME—ITERROWS
Sometimes we need to perform iteration on a complete dataframe, Le, accessing and retrieving
each record one by one in a dataframe. In such cases, itis difficult to write a code to access thy
values separately: Therefore, it is necessary to perform iteration on dataframe hich can be
(done using any ofthe two methods:

+ <DFObject>iterrows()—It represents dataframe row-wise, record by record,

+ <DFObject>iteritems()—It represents datafr
Let us see how the above two methods work. Consider three series for yearly sales of
ABC Lt.

2015—Qtrt: 34500, Qur2: 45000, Qtr3: 50000, Qurá: 39000
2016—QrI: 44500, Qtr2: 65000, Qtr3: 70000, Qtré: 49000
2017—Qurt: 54500, Qtr2: 42000, Qtr3: 40000, Qtrá: 89000

‘The first step Isto represent these series Into a dataframe and then to perform iteration
(repetition) for accessing and displaying each record one by one.

65000, ges)" $000
ats pa.patarrame(total_sales) ‘converting data series into Datafrane DA

ESTA: i /Osers/peeati/appoata/tocal
aus 2016 200 |
34300. 44300 44500
4000 65009. 65000
50000 10009 70000
35000 45000 _ 45000

Using iterrows():

The first step has been completed by creating a data
ext step isto display the record in the created data
code in the previous code:

frame from the quarterly sales series, The
frame one by one by adding the following

Print "Rovindex
Print "Containing

Por Go zowerien 15
| Print (rovseries}

prarmisu wi Ca

‘This code on execution shall display the records of the dataframe one by one.

eas 0820, types dea

Using iteritems()
This n
Aataframe, which we have done In the above example, write
column-wise as shown below:

ethod shall display the data from the dataframe column-wise. After the creation of the

e code for displaying the series

ES de
[E o nu

A |
estes Pair |

E, alae m

am eu wiuı Ca

Practical Implementation-34
Write a program to iterate over a dataframe containing names and marks, then calculate grades
as per marks (as per the following criteria) and add them to the grade column:

Marks >= 90 Grade At

Marks70-90 GradeA

Marks 60-70 Grade B

Marks 50-60 GradeC

Marks 40-50 Grade D

Marks < 40

o En

joov'ajeov ‘sanjay’, ‘ADDY
55,60)

ates acs)

aenpaoatatrane (ta, coking ane", "azks"))

te S Tenis wild ad Haw to all rec

of aatate:

{clycolseries) in at.

SEN cr
cece

A

boss

1.12 BINARY OPERATIONS

Wis possible to perfor
Provides the methods
carrying out binary py

N

add, subtract, multi

2, ung) re BY and de operations on da ,

erations on On mul), div() and related fam > dataframe, Pandas
"rn darum, étions radd(), rsub() fr

pranmicu wit LA

Out of these operations, addl(), sub), mul) and div() methods perform the basic mathematical |}
operations for addition, subtraction, multiplication and division of two dataframes,
‘The functions rsub and radd stand for right-side subtra

Ion and right-side addition. rsub()
function subtracts each element in a dataframe with the corresponding element in the other
dataframe, rada() function adds each element in a dataframe withthe corresponding element
in the other dataframe.

Since al these operations involve two dataframes to act upon, they are known as Binary CDI"
means two! and ‘ary’ means digits or Python dataframes in this case) operations.

Practical Implementation-35

To perform binary operations on two dataframes.

‘We will perform operations on the following two dataframes—marks obtained by two students
ina dass test

student = ("Unit Test-1*:(5,6,8,3,10), ‘Unit Test-2":17,8,9,6,

13,3,6,6,8), "Unit Test

5,9,8,10,5

Students eine
(a Datarrans (codant)
es Fpaeontarrane (stesentı)
prints

Basta

E
HS
[print l'addition")
saa")
daa)

prime cut iplicacion”)
print as ara)
Prise eleisienn)
PEA]

las

Data Handling using Pand:

wieu wilt Ca

"unit Test-1 unit Test-2

Unit Zest-2

6

1

lo 8
i >
2 1
5 3

Dance
‘at Fest Unit Test-2
5
2
A
=
unit Test-2
o ser Lo
2 22000000 0308088
BOL 280
5 3000 5.480000
[4 1250000 3.000000
bee ee!
As is evident from the above output,

ais Jas; thts they are performed on each and every element of the dataframe. On te
atter hand, multipication and division operations ne Item-wise execution and not vecter
‘multiplication or vector division. Hence, there js à considerable difference in the functionisg
of these arithmetic operators,

1.13 MATCHING AND BROADGA:

The term broadcastin

STING OPERATION

18 comes from Nu

oF scalar values. Bro,

dataframes, sere)
different shapes whit

las while dealing with arrays 0í
his method, usually the small
'ompatible shapes. Broadcastint
allows accessing each and even
ys for performing the necesso")
35 it leads to ineficient use of meno

metic operations, In y
AY 50 that they have cı

prarmisu wıuı Ca

practical Implementation

6

‘To perform broadcasting operation on two

ays of similar shape,
‘This isthe simplest broadcasting operation involving two dataframes of
Consider two series, = [2,5,67.8] and

shape/size.

58.94.10] to implement broadcasting,

ee reno

+ Broadcasting using a Scalar or constant value
White working with a Scalar value, the single constant value shall be added to the first
“ataframe by stretching the Scalar value which will behave as the second dataframe. The
few elements in dataframe are simply copies of the original Scalar value, This sa more
lent method of broadcasting as b's a Scala value rather than an array

Practical Implementation-37
‘To perform broadcasting operation u
“The second dataframe holds the Scala value as 50:

sing Scalar or constant value,

1 using Pants -1

PA Data Hana

= Broadcasting using a 1D Array

The array which Is he second ame 1
en ur have o broadcast stretch Y M

both the elements of the array:

{diferent shape as compared to the first
isan array, vertically to match

output:

curtsere proue

SEY o, 0,1001

andas 79
ROCCA
Bekannt |

ataframe ®’ adds cach ol,

ray, which isd
Its very clear from the output displayed that array the ov of the a

of dataframe a with each element ofthis array, and with bo

alle missing values. In other words, nig,

values with n computational significance are cl les nee wad
du the daa whichis unetned or unable or fr wc he wer han ne
tae Panda allots thse mis sales wih NaN (Na Number) These missing a
eae er a y sine na) method which we wi discus in he next implemento

110,0, 8,3

Replacing Missing Values with O using fillna() function.

The command used shall be: d£fi1.na (0)

9 1 2 3 4
o 2 5.0 6.0 7.0 8.0
5 8 mau man nam Raw
12 10 4.0 m aa an

3 5 80 50 mau nan
>>> ar.sitinale)

Le

Itis now clear from the above output where
Its used to fill NA/NaN values inthe dat
+ Replacing Constant Val

every NaN is replaced with O using filina() meth
frame.

lue Column-wise

In the previous program, very NaN value was rey
ive different values column-wis
28 per the requirement,

placed with O, Apart from thi, the wer
'se with appropriate constant value instead of entering?
Lets tke an example where column vk A
where 1* column values will be replaced with -8, 2": missing col
values with -10 and 3% column ‘values with =S, Een :

E)

pramisu wich Ca

AN

Since there is no NaN value in the column with index 0, no value has been replaced. But

in the other two columns (column 1 and 2), values of NaN have been replaced with
-10 and -5 respectively,

+ Interpolate Value

fills be missing values by copying the value 5 65 10 08

from above adjacent cel using he following |} à ain aon kag fae
meta bes mee

10420 60 70 80

1.15 COMPARING THE SERIES

nda allows a comparison between two series by comparing each e
cach element ofthe other series.

nt of ane series with

ratical Implementation-38.

Comparing two series element-by-clement.

erre = à
in ort tn pond ip nn
an aa
BETH
ESB
bei
Peres
ta sb

keinen >
keine c
[printing two serien

Irene co

pranııcu

1.16 COMBINING DATAFRAMES
Pandas provides the capabilt
of

ty to combine or concatenate two or more dataframes on the ba
ows (row-wise) or colum

"ns (Column-wise) using concat() method.

>

pranmicu wit LA

We will first begin by simpl
A imply combining two dataframes.
series to dataframes and then concatenating these pet ss

plementation

Practical I

Combining two dataframes, converted from two series.

ace ~paraatareane (cous = (0

secre op

om two series, we will now perform concatenation of
vd coluran-wise, The difference between the two is just the
| ten itis row-wise concatenation and in

After creating two dataframes fr
these dataframes, both row-wise an
Value of the attribute “axis: Ifthe value of axis
ase axis = 1, It signifies column-wise concatenation

prarmisu wıdı Ca

1.17 BOOLEAN REDUCTION
any). and all() to provide a way to summarize
You can apply the reductions: empty(). anyO: rie à
Poel The alt) 1 used to check al or any item is non-zero, no-empty, oF not Fas,
Let us discuss these functions in detail
cempty(): I returns True value if dataframe fs entirely empty.
»

MI): It returns True if all the elements in the series or in a dataframe axis are non-zero,
not-empty or not-False.

DataPrame (('X": (True, True!

any): Unlike dfallQ), any(), as the name suggests, performs an OR operation. If any of the
values along the specified axis is True, this will return True,

hhead() and tail) function
The head() function is used to get the first nrows. By default, it displays the first 5 rows.
This function returns the frst n rows for the object based on position. It is useful for quickly
testing if your object has the right type of data in it.

sy
parar:
Here, nis the no. of rows to be extracted, By default, it displays the last 5 rows.
‘The tall) function is used to get the last n rows,

This function returns last n rows from the object based on position. It is useful for quick
verifying data, for example, after sorting or appending rows.

head (n)

Syntax:

ail(n)

prarmisu wil Ca

jementation-40)

practical
tate had) and) method nan Employee tata

Tce pandas pa re
[uo cacaos: nor on

,
etepd.Datarrane (Emp data)

ise
Primetar heady) 1y deraux
PeimetaeseatiO) my deta

(er
>
[psa ES
o tor sad
io = Hein
is aa
16 el
pte a
o “tor 12-01-2012
1 ist LEE) fu ap m
2 018 BER] T rom
3 ise eu
8 es
pia E
we 152

103 | [un pis as]

104 shaurya Voie | | [Som |
105 "genta 09-09-2007

Yinay_ 16-01-2012

Intheaboveprogram,head() and tal) function returns rows from top and bottom respectively;
since no argument has been given to these functions, the default function displays 5 rows.
Similar to display first 2 rows, we can use head(2) and o returns last 2 rows, we can use
tal? and to return 3rd to 4th row, we can write dfl25)

[spore pandas as
Ex cata=(+Empla"*[101, 102, 103,106, 105,106

Mate Handling using Pants |

‘enape':('Ronie", 'Foo}a", ‘Reinet, Sshasrya
‘popes (s2c01-2012", 18-01-2012", 08-09-20 |
1087092007", 116-01-2012"1

1

steps.batarrane (Exp data)

print (an) Si

Print (bad (23)

ne (ar-eaii(2))

Prin (ati 235)) J

wıuı LA

osers/preets/App0303/

{toa rois 10122
2 103 prinal 63-09-2007
5 toe uma 10-01-2012

ee Pin 16-01-2017
CRC
102 Pooja 15-01-2012)

| mi De ES
Emploi E

CORTE

1.18 BOOLEAN INDEXING

Boolean indexing is a type of indexing which uses actual values ofthe data in the datafran
Le. using Boolean vector In order to access a dataframe with a boolean index, we have to cres
a dataframe in which index ofthe dataframe contains a boolean value, that is “True” or “False

Practical Implementation-41

[Romeo AREA = D x
ue es foma un pis Window He
lore pando ao

ee 110, 12,12,13,14,151,

E
Pyenon/ryenon39/peges0 py
ruse 16 via

PRET

In the above program, a dataframe is created which has a boolean value as its index. This
‘boolean index ofthe dataframe can be accessed by a user using the functions Joc[] and loc]
‘The argument passed to these functions is a boolean value, Le, True or False. print(dfl)
statement shall print the dataframe with boolean index

Executing the next statement, dfiloc(True), shall return the dataframe with index value as
True. It must be remembered that while using the iloc function, only the integer value is passed
asthe argument and displays the row with index as 1 since 1 is passed as the argument to loc

1.19 CONCATENATION IN DATAFRAME

concat{) function is used in Pandas for performing concatenation operations along an axis
while performing optional set logic (union or intersection) of the indexes (ifany) on the other
axes. Thus, the concat() function performs the concatenation operation along an axis

“The syntax for concat( is:

+ objs ~ This is a sequence or mapping of Series, dataframe, or panel objects

+ axis - (0,1, .), default 0. This isthe axis to concatenate along.

+ Join - (inner; ‘outer), default ‘outer: Outer for union and inner for intersection

+ join_axes - This s the list of Index objects Specific indexes to use forthe other (n-1) axes
instead of performing inner /outer set logic.

+ ignore.Index - boolean, default False. If True, do not use the index values on the
concatenation axis. The resulting axis will be labelled 0... n-1.

Practical Implementation-42
Program to concatenate two dataframes.

AAN
arre oscarrame D
a]

[ra ceca, at2)

EEE Sie TIEFER
alent

isu wit Ca

The stove proa combines so arms wine non cre, hi
be combined the arguments and eu a

A cmo cones of te dran, an do. Te index of hres

ee al ot ofthese en ram ech dam respec

You cana the final index as per he sequence and not a per the row label. The above eng

Isto e modified bt, which given In the next implementation,

Practical Implementation-43
Program to concatenate two dataframes without taking row labels (Modification of Practica
Implementation-42)

AP — OX

na, acht "ViSay",
251,
Anta", URL, "Vinay"

TU)

Uosnaucya!, "Pinky
tiepd.DacaFraze (1)
dtanpd.DataFrane (42)
¡£2=pa.concar (18f1,4£2], ¿qnore_Andex=Tcue)

were]

pes) ut v
= = LS cor
RESTART: C:/Python3#2/prog ¢
zoll no name
o 720 anxse
a 1 Pine
||? 12 Rinks
Row labels are adjusted] | [3 33 Yasn
‘automatically 4 14 vijay
u | |s 35 nikhit
6 20 Shaurya
ia 21 Pinky
8 22 Amubnay
> 22 Kaushi
E 24 Vinay
E 25 Neetu

The sve por 4
irene spon
me
sw date
he dames hae os
cms a NE De o Ara ye

is the modification of th
nt and set its value as Tr
the resultant dataframe sh,
ing the concat() function.

1. Previous program. Here we have taken
Tere’ will ignore the row labels from both
tall adjust the index automatically as per the join

MLamicu wilt LA

Practical Implementation-44

Program to concatenate two dataframes along columns,
Ft senso cocon rene OH
Fae fomar fan Options Window Help

pandas =< pa =

ee
se
Een u
tampa: occ (ah)
rompa acre te

taupe: conc (latte) [Ea

RESTART? TOA
pee y
a pinky

12 mins anota
Bent ect

‘The above code concatenates the two dataframes along columns. To do this, the axis parameter.
is set as 1; (axis=1) means along column or, we can say, column-wise. Hence, the output
obtained above is along the columns of the two dataframes.

1.20 MERGE OPERATION IN DATAFRAME
Pandas provides a single function, merge(), as the entry point for all standard database join
operations between dataframe objects. We will first discuss merge() function.

Practical Implementation-45

Program to perform merging of dataframes,

A A Rm — OX
ie Eat Fomet Run Ostors Window Helo

Fe panda os Pa
mt (10,11,12,13,14,15),

capers
uña eo) ein (Rinku, "Yash VA
get ro 0": (20,21, 22,23,24, 25),

5 120 2, 2229.20 nant nay eet)
pat sta0,2,52-33,24,25)

incl Rr
ara carros)
ar 2)
Sr annee (atl 4621)
Gracy oavareeme (9)
ES
Enr)

rampe, "Raja 1)

1.0")

y which are common bete

utp consisting ost ENTER us

ing column values, which &
= of a common column. But apa
dataframes, which my

‘The above program shall generate the
the two dataframes for the correspon
se on the bast

en the merging is don
“This was the station when he mergngis done onthe MEET qe

from this, it can also be done on the basis of di
sil now see in the next implementation.
Practical Implementation-46

columns and different names,
Program to perform merging of dataframes with uncommon col

{UO AAAS |
ea es eit
ia gs,
AN
ea rein
pa eme, den
pa sacarme)

a {efe_one'rol1_s0"-tight_one'r013_n0")

ON)

O)

ae ten ae

te ee program, columns ni te éme ar 1 be merged har dre
tunes such as ofmering the argument eo (tobe pacs
‚and "right,on” for the right dataframe name and hence the output. "

1.21 080 FE

SV (Comma Separated Values) isa sim
a spreadsh

1.22 DATA TRANSFER BETWEEN p
CSV format isa kind of tabula day

a bai ta sep

'ATAFRAMES AND .CSV FILE

arated by comma
Ê comma and is stored in he form of pain tet

À a. —

prarmisu wıuı Ca

In CSV format

+ Each row of the table is stored in one row,

+The field-values ofa row ate stored to

her with comma after every field value.
Advantages of CSV format

+ Asimple, compact and ul

tous format for data stor
+A common format for data interchange

+ can be opened in popular spreadsheet packages lke MS Excel, Open Ofice-Cale, ete
+ Nearly all spreadsheets and databases support import/export to CSV format

(CEM: CSV is à simple fle format sed o stor

ular data, such a a spreadsheet or database,

4.22.1 Creating and Reading CSV File

ACSV isa text file, so it canbe created and edited using any text editor; More frequently, however,
à CSV file is created by exporting a spreadsheet oF database inthe program that created it

All CSV files follow a standard format, Le, each column is separated by a delimiter (such as a
comma, semicolon, space ora tab) and each new line indicates a new row.

Let us create a CSV file using Microsoft Excel on the basis of "Employee" table

Table 1.

Employee table

Bp [casas 20 coo e000
02 ECC [27 TT 2000
CET ET ET 000
Mans 35] Mumbar | 2500
ch 26 | oe
Seen 30 | angers TE

1. Launch Microsoft Excel.
2. Type the data given in Table 11 in the Excel sheet (Fig 1.5). You will also notice that some
cell values are missing to represent missing values (NaN) in Pandas dataframe.

Data Handling using Pandas-4

Arme. 2000
Don m
23 Mons 8000
SO one 21000 |

Fg. 15 Microsoft Excel Worksheet for Employee

3. Savethefilewitha propername
by clicking File -> Save or Save
As or press Ctrl + $ to open
the Save As window as shown
in Fig. 16.

4. Type the name of the file as
Employee and select file type
as CSV (Comma delimited)
Cest) from the drop-down
arrow (Fig. 16)

5. ClickonSave button Excel will
ask for confirmation to select
CSV format.

6. Clickon OK as shownin Fig 17.

Tose ec tt, SOE

REET ASS roe ate era each, deme fe ne at po et

La Ca

king permission to save in CSV format

7. Ie wi display a dialog box for asking permission to keep comma as delimiter for CSV fle
(Fig 18).

‘ny tan fa a og CA sd, Oo yu et oles matt mt
sk a cod ame tr. a1

A

Fig. 1.8: Seting delimiter for CSV format

&. Lastly click on Yes to retain and save the Excel file in CSV format.
To view this GS fe pen any Text Editor (Notepad preferably) and explore the ol
‘containing Employee.sv file, (In this cas, the path for CSV ile is: E:\Data\Employee<sV

Ene as
Mem
ne

san 2 pu
EEE

tpn
o

6 Gen 30 Segoe 28009
A Fig. 1.8: CV file contents using Noteped

wit ca

1fyou open the file in a Notepad editor, you will observe th

comma () delimiter and each new line indios anew rca i ID) EU

record (Fig. 1.9).
4.22.2 Reading from a CSV File to Dataframe

once ou know the pat of our le. The eat snc ad he da ea Pada al
We know that multiple rames have multiple data types I a ,

or numbers, some are float, strings or dat te teas al these data types as
srr a eet os re is ts pe aly wen leading the
to an integer or float. ÓN

syntax for read,csv() method is:

rt pandas as pd

<sb>=pd. read

practical Implementation-47

“o create and open "Employee.csv” file using Pandas.

‘The fist line imports the Pandas module. The read.csv method loads the data in a Pandas
dataframe “AF pd.read.esv(“path") shall fetch the data from csv file and display all records at
the command prompt as shown in the output window.

san: coun pret /apsta/iecal/Peopaes
CE coat

FRS men 20
lue ae

peal

One thing to be remembered Is thatthe missing values fom the ES Ale shall be treated as
[NaN (Not a Number) in Pandas dataframe.

Practical Implementation-48
To display the shape (number of rows and colum

Ve can se the total number of rows (records) a
‘he help of shape command.

ns) of the CSV file

columns (els) presenti the able wit

pranııcu

Bangalore

hal

In the above case, we have directly displayed the row count and column count at Python
prompt by giving the command as dfshape. We can also display it using variables.

The read.esv() method automatically takes the first row of the csv file and assigns it as the
ataframe header. After the creation of dataframe from a CSV file, you can perform all the
dataframes operations on it.

> Reading CSV file with specific/selected columns
‘While working with large tables in CSV format, there can be several columns contained in
it But you may require selective columns to be read into a dataframe. This can be done
by using “usecols” atrbute or option along with read_esv() method. For example. inthe
Case of Employee” table, you have to access Name, Age and Salary of employees This can
be done by giving the command as:
>>> df = pd. read ¢

"E:\\Data\\Employee.csv",
usecols = ("Name

"Age! 'Salary"))
>> dt
Practical Implementation-49

To display name, age and salary from Employeecesy

pranmicu wit LA

Inthe above code, We have ven she value o row tribe sed wth reads Nrows
means number of rows. Inthe above example, represents the rs ve rears even emp
Fear containing NaN vales chung headers. encon ali patata oia,

I ES een een

oi PS mnt sl

Reading CSV file without header

Ifyou do not want to display the first row as the header for dataframe using Employee table,
then this can be done by specifying None argument for ‘header option or ‘skiprows’
‘option using read.csv() method.

51

Practical Implementati
‘To display records without header.

por pandas 26 pt .
EEE mere er ma

aresul, the resultant

‘None is the argument passed to ‘header atribute 35
Information as the fist row, shown in the output below:

Inthe above example,
dataframe will contain the header

MIA Data Handling using Pandas Im

Reading CSV file without Index
Moe canes ead and Tne the records In the dtaframe wine del fp
Cae bern, wich are opel by desa, lo EEE te a
Index col = 0 using the read.csv() method. à

pranııcu

Practical Implementation-52

‘To display records without index numbers.

As shown in the above output window, there are no index numbers displayed along wit

records. Now Empid willbe treated as the first column instead of indexes.

+ Reading CSV file with new column names
‘You can read and load data from CSV file into a dataframe with new column names, Le yo:
can rename the columns while reading the ‚sv file. Ifthe header exists, you have to skip
it using skiprows option along with names option for renaming the columns.

Practical Implementation-53
To display Employee file with new column names.

eS o mn

In he above code, we have given the option skiprows = 1 which wll omit the default a
names from the CSV file. Names option holds the new column names to be displayed whi
loading the CSV file into the dataframe. Hence, the following output shall be displayed:

aesey
LES

DS Met 20 RM O |
elas |

30:0 Mangalore 200.0

4.23 UPDATING/MODIFYING CONTENTS IN A CSV FILE

Le above section, we learnt how to change colu
a ng alum ane Snr we can modi or update
les place the salary of the employee whose salary is 16000

‘eth NaN value. This can be done by using na values option
We respective salary as 16000 values option along with read. esv method for

practical Implementation-54

‘fo modify the salary of employee earning 16000 with NaN value.
a ea

pus pe
re E

ssary
Hamad, 19000.0,
EN]
fyderabad 20000.0
musbai 25000.0
Deine man
Bangalore 28000.0

As shoum in the above output window, the salary of employee Aakash has been modified from
16000 to NaN

artos

1:24 WRITING A CSV FILE WITH DEFAULT INDEX
csv) method is used. We can do this either by
rf y the contents of the original

Data Handling using

Tocreate a CSV file from a dataframe, the to
‘ansferrng the records directly to the CSV file or by copy
SV fl to another file.

Copying Employee.csv to Empnew.cs¥-

We can copy/erite the data, Le reste a duplicate copy of Employees as Empretes

To create a new CSV file by copying the contents of Employec.csu

tical Implementation-55

Me shal be rested con
on xen te en command, Epes (up) A,
era ul ds les Ve can Bros fla

the contents of the newly-created fle Empnewesv as shown below:

As shown in the above screenshot, the contents of the duplicate file Empnew are the same |
that of Employeecsv. Creating a duplicate file allows the user to change the data or add new
ata to the already existing Employee.csv fil without affecting the original data. In such a case
Pandas provides us with the option of creating a duplicate fil with the previous data using te
above mentioned command,

+ Saving Dataframe as CSV file
This is the direct approach of creating a CSV file by first creating a dataframe and then
loading it Into a CSV fil format

Practical Implementation-56

To create a student CSV file from dataframe.

BE

Pre

|
[po de «po cataane (student, columns rotin

prarmisu wıuı Ca

Our next step isto read the Student.csv Into a dataframe using read_csv() method.

o> at paar emmener ee

Inthe above output window, Unnamed: O column gets displayed automatically along withthe index
values To avoid this column, use Ihe attribute index.col =O with read,csv() method as shown:

Data Handling using Pants}

1.25 COPYING FIELDS INTO A NEW FILE
In reference to section 1.24, we have created Empl
‘ows. In certain situations it is required to create a

Joyee.sv Me with complete set of columns and
duplicate file containing only the selected fields,

pranııcu

Practical Implementation-57

‘To create a duplicate file for Employee.csv containing Empid and Employee name,

pos oop =

Learning Tip ensure

names ar the same in bth the fies (ase sensitive),

> Aedes type objets a Pandas data
‘rar oft (fay NuPy-supported dat type) and an ant
sees pe objec,

ects are value-mutable but size-immu :

fonction red to

> Meter compañontanton tua pa

> Oma nn nen Le, Tora ap
neo data, say represented in tabula format

de and om index

values wth min values.

2 tala has two diferent innen con

he em Bolt fora standard database join operation Beat

tanııcu will

7, me acronym CVs shor for Comma Separated a
pe ae Values snareterstotabulr data savedas intent wet

me <> 10e) funtion sane Om 3 ES fie in our name

: the data of diraframe ona cs

tu

me ancien read Pan is

> and tes

(OBJECTIVE TYPE QUESTIONS —

y. itin he blanks.
(a ls the most popular open-source Python ibray use
(0) inorder towork with Pandas in Pthon, you menden,

(e) Theos

CE
indered data.

lor doing ta analysis,
vary in your Python envionment
lor data structures of Pond
nés are and
is a Pandas data structure that rer
Presents a one-dimensional arayıike obje of

and universally popul

le) Tocreate a series object, method used,

(0 To create an empty series object, Series} method used with

(a) Missing data in Pandas series and dtaframes canbe ied wth à valve
(0) Oatarame has

() Selecting à subset trom a dtaframe requires. function

ü function is used to read data from 3 CSV le in your trame.
w function save he data of dtaframe on a CSV Sle,

State whether the following statements are True or Fale.
(0) A series object is 20 aray that stores ordered collection columns that can store data of diferent
tuple.
(b) A ataltame is à 10 arrapike object containing an array of data and an associated array of data
labels
(e) To acess subset ofa dataframe, we can use lc) method
(4) The teres) iterate over vertical subset in the form o col indexes)
(6) The value NaN/NAT/None are the same in Panda.
(0 The terhems) brings horizontal subsets from a dtarame.
(8) Theat and any) functions are used to check allor any tem is non-zero, natemptyornat ate
AH) CV eters to tabular data saved as pain ext where dat ales ae separated by comms,
i read, su) method automaticly takes the ast row ofthe CS fle
(0 Data and index in an n-array must be ol the same lent
(8) We need to define an index in Pandos.

ie chic Questions (MCCS)
(a Wich f the fotowing com
1) pp ns pron pans

{3 python asta python
(9 Aro dimensional bei aray that
Tee (ii) Datatrame fiv) Panel

rand is ed to instal Pandas?
(i) pp instal pandas
(1) python instal pandas
isan ordered callection of columns t streheerapeneous

(Sees, do aya
(9 in sata, so str son
roe I ne tee

Li} Rows and Columas both

Data Handling using Pandas-1

E

SOLVED QUESTIONS

Ans

Ans.

An

Ans

Which ofthe folowing statements false?
(i) atatameisize-mutble
(i) Datarame is values mutable

(a

(i) Oataframe simmutabe.

(ro), Datarame is capable o holding multiple types of data

bi or Im Ndim (ii) Empty div) Shape
a or m dim (ii) Empty (iv) Shape
° or (i) Ndim (ii) Empty (iv) Shape

y

SV stands or:
(0) Comma Separated Values
(i) Colum Separated Values

(in. Comma Separated Variables
(iv) Column Separated Variables

What sa Pandas Series?

Series. one-dimensionallbelled array capable of holding data of any type integer, string, oat, yt

objets, etc)

In Pandas, Sis series withthe following est:

S=pd.Series({5, 10, 15, 20, 251)

The series objets automatically Indexed as O, 1,2, 3 4. Write a statement to asign the seres 53,4

Suindex = (at

Wiha is datarame?

Ostatrame isa tue dimensional array with heterogeneous data, usual represented in tabular format. The

(at Is represented in rows and columns. Each column represents an tribute and each row representa

Dictionary S_maks contains the following data:

S_narks = {‘nane':("Rashng
"Grade": ['ALS,

How can we check a datafame dí has any missing values?
»
A dictionary Grade contains the following data:

At.isnol1 ().va2ues.an

an = [hanes [Rania >

"Grade :(°A1", AZ, "BL, MAL", «824, ©

Wire statements for the following
(a) Geste datatrame called Gr
(0) Find the output of Grillo) and 645]
(©) Add a column calles Percentage with the
(6) Rearange the columns as Name, Percent
(6) Drop the column ie, Grade) by name.
(0 Delete heard and 5th rows,

(e) Mat wil the folowing codes do?

Gros» 0)

(i) Gr dropl(.1.2 Luis» 0)

following data: (82, 89, None, 95, 68, None, 93)
age and Grade,

(i) Grdroploaxs = index")

veamicu wiul Ca

à. o) GE = Pd-PataPeame (Grado)
M a output fo both he commands the same

ws
@
(a
ws
Fist rove wl be dropped,
(© qu Fist four ows wl be dropped

(i) Fit cow wit be drops,

riesthat stores the ares of 0 ‘
2. Gvenase me statesin km ite code tind out thelagestad the smallest
tive ares from the gen series, The geen seis hasbeen crested the thea 5

print (Se :
An trate code forthe above problem woud be

sop 3 largest areas

sort

nt (Serl.sort_valuı

Outpt:

Top 3 largest areas are
266711

37632

a

type: intét
3 rallest areas are:
2348
2 450
1 00
ea et, oth having sar vales,
a series object name
1. Guen two objects, a st obec namely it and ases 5
de.2.4.6,8 Find ut te output produced by the folowing
(D prime (ise2*2) oe
A. Te output produced by the given statement
6, 8, 2, 4, 6
Te output produced by the given st
2 2
where 0,1.

tement (i wil Be

are the indexes

9. What will be the output ofthe following code:

Ans

10. write output of the flowing code

(0 seport ands

91, (*Rink 121
(data, coluns

Cae

10.0

An.

prarmisu with Ca

su rm the series even et
A more than 50000 km’

aos

12, conser two objects» andy. xis as whereas ita series. Both have val
be the output ofthe folowing two statements consideras thatthe a
ready? thatthe stow

(pit (02) (prima)
sy our answer
ns. (1 Tepe£ror: an only concatenate it (not nt’) ot

ue 10,2, 30,100. what wi
objects have been created

won
12
22
3 102
ype: nts

Inthe fist case, adding integer value to à ist not permitted; ou can ad it to anther st but
ding an integer value t ist is not permitted Tis i because at dors ot allow brondeasing
Operation, fe, performing athmetic operation to each elements not permite.

fut inthe second case, series can very el implement the Broadcasting operation. Thus, adding an
Integer value to Panda series is permitted and pete fine an, hence, the output eso brain.

13. A dictionary Grade contains the following

vive"

Graden{ ame! "Rashmi", *Harsh", "Ganesh',

readers ("AL's A2 ‘BEY
Write statements for he following:
(0 Create a datatrame named "Or
(i) Adda column called “marks withthe following data
(9792.35, 89,96 82),
(Delete 3rd and Sth cows
Ans. (] Gr pd Dataframe(Grade)
(i) GerMarks"}-97.92.95,89 9682)
(i) Grarop(241)
14. rite a program to find the Total salary of al employee
segregate fonction
As. import pandas as pd
"impo": (1,2, 3), “En
spa, DataFrame (3)
peineiaey
santo
for 3 in range (ten (38?)
sunlesunl +df-10c(ir'S
Print (sunt)

sin the dtaframe employe without wine any

15. Assume adataframe df that contains data about climatic conditions of various cies with
and C5 as indexes and give the output of any four questions from () o ()
city Manemp
Dei 40
Bengalury 31
Chennai 35
Mumbai 29
Kolkata 39

MinTemp
32
25
2
a
2

Rainfall
a

a
a
a
6
() >o>a01.chap

64
CETTE

Tri]
[E renga 151

ti paar

ch bet
C2 bangaru
@ chemai
Im >>
sw
artenp
Mintemp
Rnb
0
och
Bears
Mumbai
Kola

362
408
352
as

Mintemp [Rainfall]

5 1362]

An.

Chennai
35

2

408

Ans.

Weite 2 Python program to sort the fll

un lowing data in ascending order of age,

age

Designation

Seema

36

Manager

Nisha

En

EJ

Raji

prarmisu wıuı Ca

joe datatrame object spect indexes
sg towed me oi

i ae data rows?

row index(ans-0} ang

msi ci
e weteacode statement vole

ellin he row and “tem
ae 4 em” column froma dt

a column inde ans,

eve, the st 5 ws ae being displayed, den
pre ee he error and rewrite the correct code so that only
HET (CBSE Sample Paper 20201
ms

the functions you can use to iterate over taf,
name th aa

rows) and iertem } rames,
an, nem}

1s the base ifference between terra
wats he Hand items
Lotes etes over vertical bets in he forma

sents over horizontal subst

(obinder, seis) pars whe <OF>terows

Sin the form of (row inde,

series poe
a cone the tale even s under
ine. [nome [Gender [besan — Jay Ju
a [mia CC ET |
EC Im — [programar — [mentar eses
03 | Rahul TN [sucesso
[eos [nist [Tans wcinow—[so173656
is] TE Programmer ET ET

te a program to pin oy the Name, Gender and Designation fr al rows

ass import pandas as pd
exp ((°E01", *Ritika", °F",

a

£03", "Rahul", "M, "TT Officer", "Ahmedabad", "786655778:
2804", MRE jul", Mt, "Analyst", fi
LES", "Anita", 'F*, 'Progranner*, "Bangalore", "9818765544" ]
Jder' "Designation

anager’ , "Delhi", "9867585958"),

"Programmer" ‚ tumba,

1:02", "Get.

scknou", "901736

df = pa.DataFrame (emp, colunns=

print (ae)
for i, row in df. iterrows()
(row Name”), "ME,
row[*Gender"), NE"
row["Designation"))

°c (eig Ina PL
= ie
Mer Tim cea [Baia [es
KL Rahul Kings Eleven E —T00 |

Andre Russel _ |Koikaa Knight Riders _[Batsman_|7-__
sort Bumrah | Mumbai Indians

[ECN
be |

Veatkoni [RC 1 m.
(Pot Sharma _ | Mumbai Indians. [gatsman 1
oie sharma _[mumbarindins |

(i) Waite a command o sort all players according to BdPrice

25. te à program to iterate over a dataframe containing names and marks, which then Colulates grades
25 per mark (05 per guidelines below) and adds them to the grade column
Maris>=90 grade Re; Marks S060 grade C;
Marks 7090 grade Marks 4050 grade D;
Marks 60-70 erde; Marks grade;

"grade ap
TE my o

prarmisu wıuı Ca

26. ete commanding sen neon oad
from the lst Sl-11000015000 20000] nya ec umn nthe place (3 place named Salary

go erstngdatatame named EMP steady having clues

wine a Python code to create 5

m. wie sma Pion code to cet atome wt
103,21134).5,6,07,811 en

Ans. srport pandas a

adings (a nd) from te ist given below

2. whois the main author ol Pandas?
30, What are the advantages of CSV file formats?
Ans. hövantges
3. Asimple, compact and ubiauitous format for data sorte.
2. Acommon format for data interchange.
3. ltca be opened in popular spreadsheet packages Ike MS Excl Ca ete
. Nearly all spreadsheets and databases support importespor to CV format,
1. By default, ead esi] uses the values ofthe fist row as column headers in ataframes. Which argument
‘ntl you give to ensure that the top/itst row data used as data and not as column headers?
Ans. header = none
Fer example:
raf = pG.read_cev("iydata.cov", heades=none)
32. Espai rey the CSV format of storing es
Ans The acronym CSV stands for Comma Separated Values anders
data ales are separated by commas. In CSV format
+ Each row of the table is stored in one row, ie

ls a cite a e)
es ether with commas after every field value. However, after the

tabular data saved plainte where

the number of rows in a CS lis equal tthe

number ol row
+ Thefilávaluesofaroware stored og

las ils value, no comma Is given just the end of Ine
re where the separator characteris’ Res ah et rows

a nt elec Make sure to read the first row
suf we re Guy, re Me

Inyour dtaframe. Give column hea
2 data and not a column headers.

sport pandas as pd

ata Handling using Pandas-1

pd. read_cev (dora.

34. You want to read data from a CSV file i a dataframe but you want to provide your own column nan
to the dataframe. What addtional argument would you specify in read_esvl 1?

Ans. Names, for example:

35. Which argument would you give to cead_ csv if you only want to read the top 5 rows of data?
As. nrows, for example:

rt pandas as pd
af = pd.read_csv("Enployee.ceu™, now
print (af)

36. Write a commando store data of dataframe maf into a CSV le Mydata.csv, with separate character as"
Ans. nd. to eo

ydata.cov", sep="t")

pandas as pd

d = ("Fridge":[12), 'Gooker':(5), "Ju
at = pd.Datarrame (a)
print (dé)

Sf. to_eev("£ile.csu")

38. Wa? 0 en dt om a CV fe where separator characters”.
as data, not as column headers. ”

Make sure thatthe top row is used

pandas as pe

Sf = pa.read_csv("read.cev!,
print(afy

39. Write a Pandas program to

et fist n records of a datafram
weer dataframe with three columns as col, col2 and col.

said vatarrane

MLamicu wilt LA

an. Mirta Panas program to get last
o 3 recorás of à datateame,

sample Output

UNSOLVED QUESTIONS
1. What the significance of Pandas Libray?
2. Mame some common data structures of Python's Pandas Library

2. Write shapes of the following ndarays aa
2468 m2 SEA
6
horizontal

4. tame the function t trate over a datalame

5. Name the function t trate over a ataframe vertical

6 wnat 5 csv fe?
2. Wats the use of rows?
U How can we create CSV fe? Explain wih an example
9. Geste à CSV ie with defaut inde.

y code sippet
1 ow do you erat over a dataframe? Ex

in withthe el

11. wre commands to print the folowing deals a series object sal
{b) Indexes of the series

0 {0} If the series stores any NaN values

(e) The data type ofunderhing data
12. How do you fi al missing vals with previous non missing values?

13, consider he following tables tem and Customer and answer the questions that follow:

‘able: em .
em homme. Manufacturer Price
ren Personal Compute Hct india 22000
cos. Laptop HPUSA 55000
Fe Personal Computer DellUSA 32000
cos Personal Computer Zenith USA 37000
cos Laptop Del USR 57000
Table: Customer

Rem D Customertome y
tos Nay Dei
cos High Mumbai
cos Pandey Dei
7 Csnarma Chen
reo Kaganval Bengalora

‘Assume thatthe Panda has Ben imported a pd
Create a atarame called for table tem,

(BI Create datafome called dC fr table Customer

(6) Perform the default jon operation on tem 1D using two dataframes: and lc

(E) Perform te left jin operation on tem.D sing two dataframes: dl and df.

(e) Perform the right join operation on Item_1D using two datatrames: fl and dC

() Prtorm tne defaut operation on tem D using two dataramesllan df withthe ft indexa te

(6) Prorm the outer jin operation on item using two ataframes: fl and af

AH) Create à ne dtarame oft using datarames: ef and af. The new datarame dat will ol both
Tet index and ght index tre vales

(0 ferange the astatame ot in descending order of Pie.

1 Arange the Dtaframe fin descending order of City and Price

14. Consider he folowing series objet name:
0.430271

‘Wat wil be retuned by the following statements?
(a) 52100 5
(9) 5>0
(0 51 = pa.Seriesis) (a) 839pa. series st) +3
the folowing code?

15. What will Bethe output produce by

ationery = [*pencils", ‘motebooks", "ocalas!
PA.Series(120, 33, 52, 101, index ne
Pa.Sertes(117, 13, 31, 32

Stationery)

la) prine(s(1:11) “
Per

1 the following code, consider,
(0 print (st

fe) princes

1 he Series objets given in 0157
an 1 pein

prarmisu wıltı Ca

J wate a program to eat and pri a étre one oe pira
1 rote erin the following code frapmen,

taime and ri nly fst fv ros

ws
(8 52 - pas
m. fdte error

Fata)
22 Cone te lang a1 le containing the data agen bla
[ram [none Accounts [Mathe [en Te les
Tea CO CC 2
{mnigu mehta [67 ree
87. EE MI je io |
CC EC CP ET
1a [Deepak Virani [567 26s [ee — [78 je

15 [sin atk 176 6 [lars les]
(a) Read the ce file nto dataframe df whieh is stored with tb (AE) separater.
(0) Wrtethecodetofindthe total marks Total marks) fr each student and adeitothe newly-created
dsattame,
Ic Abo calulate the percentage obtained by each student under à new column “Average” in the
¿salame
13. Wie program that reads students marks from à result CSV and displays percentage ofeach student,

A Wie the name of function o store data from a dataframe into a CS le.

15. How can we impor specific columns from a CSV fie?

2%. What are the advantages of CSV file formats?

hat a bars do you require in order to bring data from a CSV fl ino a datafame?

28 Youwantto ead data from a CSV file inadataframe but you want to provid our own com names 19
te tare, What addtional argument would you pectin reac a

28 by eat nd cs ses th vale o fist row a column headers in dtaames Wich rumen wi
dre I ensure thatthe top/TS Fons ata is used as data nd not mn hes

A Which argument would you give to read.cs you only want to read top 10 rows of data

PL Gen te fotowing daaframe by the name Project regarding a campetion and answer the GENE

trenton
| rlment Na. [Name des Ten [roc al
NE au (8 Dana Analysis —

ee ne [one Anais
Mn Dima Ti] machine earning
48 Geet El evelopment.
Ty ret rm App Develop

ee

‚wer the questions given below
32. Consider the following dtaframe: CORONA and answer the al

[10 ste Jens ]
100 ven Er
110 [Mumbai 4000,

[izo [chennai 5000

JETA 3500

(Greate the above given deionary with the given indexes a
{5} Waite codeto add anew column “Recovery using the seres method to store the numb
recover in every tte
(©) Tosd new column “Deaths ing the asign) method to store the numberof deathsin every ste
(6) Toadé anew row to store detal of another state using loc (assume values),
(@) Toaddanencolumn “Percentage sing the insert method to store the percentage of recovena
very state [assume values), The column should be added asthe fourth column in the data
(6) Te et the column Percentage" using del command
( To delete the column “Death using pop) method
(e) Toinsert anew row of values using ioc] atthe At position,
(1) Yo delete Coes and State temporari from the dataframe
33. Creates datatome rom two series
(a) ila the fist tree records fe
(8) Display the laste records ror
Sana eter con of ame St, Sub, Subd Sub, Subs of fe students
(a) Display the cataframe
(8) Delay he fist 5 rows and bo
Cm traes o sly off employees and do the aan
(2) Display both he dattes :
ib Md 500028 bos ia both datarames and display them
36. Greate a dtarame sng is 110,13, 2,3 19
Die men 1 1% 33-24 230453251 15,6065, 70.25 and 6 the along
(0) Ad thet, 2 34 37
7. Crea daaeame of 2,25),
(2) Display the datatrame à
(0) Replace the misin sn
(o) Replace th
(Bene he mina vl
ele the missing val
2 Gest ttrame of

Name and Grade, Name.
om student dataframe,
Im student dataframe.

and Marks of five students,

tom 3 rows of student dataframe.

o datatrame and display it
Al 134445. 46 ond othe flog:

ic that the missing value rom

tee d presented by Nan,
with 2, -2, à
1 by cop

1 and 02;
BEE Re à 100,100, 100$, 1008 1009) Name’
fey {elas 1002, 1003, 1004, 1

(tente rome, 25,106, Name
(2) concatenate come

38 Create atar

tor

columns 0, 4,
ing the value fi =

‘Om the above ce,

Sari’ Abhay 1

hu Gavin
USeema if u

“191 'Shweta’, Sonia’, Nishant)

MEA AD display whether
tis empty or not.

prarmisu wıuı Ca

Caras and CSV es.

putomatile data set

Gene a ésatame of 5,618. 03, 0) 610,
Gene date of A ru, Tue. una
À ati te se ol sttement: pie?
de, oferente Between del pop and drop

thes Automobile data set has diferen character
ng, price, mileage, horsepower ete

‘cas€-BASED/SOURCE-BASED INTEGRATED quesrior
ns

and pay th
heres of a) and any une

all Fale, Fle) and
Fle and pay he a) and

sof an automobile seh best, wheel bse,

From the given data set, print the i
Ping ist five rows
pandas as pd

z

Pring lat five rows

port pandas as pd

oC» pa.read
16)

Companys name.

.rseponer = ("Company

2 create two dataframes using the following
the second dataframe as a new column 10

"horsepower"

dt = pd. read_csv ("D:/proeti/autonobiledat

Ipreetä/autonebi:

Car_Prico = (*Company": ['Toyota',
ice": (23845, 17995
[rtoyeta'y
ma, 80, 282+

"Honda
135925,
onda!

two Dictionaries. Merge the
"he fst dataframe on the Bass of

two dataframes and append
the manufacturing

audi

A and ls fe records of automobiles of diferent makes

set /autenobile data.cn

caset /autonobile_data.c

stat, "Andi "Ty

Expected Output:
Company Price horsepower

0 Toyota 23845 les!
1 Honda 17995 cy
2 paw 135025 182
3 Ag rum 160

Python Pandas merges two dataframes and appends the new dataframe as a new column,

Ans. import pandas as p
company’

d.Data

carPricept

car_Horsepower = {'Company': [
"horsep
lorsepowerD£ = pd.D;

tal wer)

CarsD£ = pd.merge (carr:
(carsD£)

Company")

eu wi
Tags