dataframe_operations and various functions

JayanthiM19 4 views 13 slides Mar 06, 2025

Slide 1 of 13

About This Presentation

Data Frames

Size: 3.17 MB

Language: en

Added: Mar 06, 2025

Slides: 13 pages

Slide Content

Applying Arithmetic Operations Addition, subtraction, multiplication, and division import pandas as pd d = { 'py_score' : pd.Series([ 88 , 79 , 81 ], index=[ 'a' , 'b' , 'c' ]), 'sql_score' : pd.Series([ 86 , 81 , 78 , 88 ], index=[ 'a' , 'b' , 'c' , 'd' ]), 'ca_score' : pd.Series([ 71 , 95 , 88 ], index=[ 'a' , 'b' , 'c’ ])} df = pd.DataFrame(d) print ( "Dataframe is:" ) print (df) print ( "sum of python and sql score" ) print (df[ 'py_score' ] + df[ 'sql_score’ ]) df[ 'total' ] = 0.4 * df[ 'py_score' ] + 0.3 * df[ 'sql_score' ] + 0.3 * df[ 'ca_score' ] print (df)

Sorting a Pandas DataFrame DataFrame can be sorted with .sort_values() sets the label of the row or column to sort by df.sort_values(by= 'py_score' , ascending= False ) specifies whether you want to sort in ascending (True) or descending (False) order To sort by multiple columns, then just pass lists as arguments for by and ascending: df.sort_values(by=['total', ‘py_score'], ascending=[False, False]) In this case, the DataFrame is sorted by the column total , but if two values are the same, then their order is determined by the values from the column py_score.

Filtering Data filter_score = df[ 'sql_score' ] >= 80 filter_score The expression df[filter_score] returns a Pandas DataFrame with the rows from df that correspond to True in filter_score Output is a Series filter_score filled with Boolean data.

Combining logical operations df[(df[ 'py_score' ] >= 80 ) & (df[ 'sql_score' ] >= 80 )]

Handling Missing Data Pandas usually represents missing data with NaN (not a number) values. Missing Data can occur when no information is provided for one or more items or for a whole unit. Checking for missing values using isnull() and notnull()

Filling missing values using fillna() import pandas as pd import numpy as np # dictionary of lists dict = { 'First Score' :[ 100 , 90 , np.nan, 95 ], 'Second Score' : [ 30 , 45 , 56 , np.nan], 'Third Score' :[np.nan, 40 , 80 , 98 ]} # creating a dataframe from dictionary df = pd.DataFrame( dict ) print (df) # filling missing value using fillna() df.fillna( ) Drop rows with at least one Nan value

Check for NaN in Pandas DataFrame

Import a csv file in to google colab session storage

Load Files into a DataFrame print(df.to_string()) By default, when you print a DataFrame, you will only get the first 5 rows, and the last 5 rows. The head() method returns the headers and a specified number of rows, starting from the top. If your data sets are stored in a file, Pandas can load them into a DataFrame. CSV Files ( Comma Separator Value Files )

Data Processing with Pandas DataFrame import pandas as pd df=pd.read_csv( 'data.csv' ) print (df.head( 3 )) # first 3 rows print (df.tail( 6 )) # 6 rows from last print (df[ 'Age' ].head()) #to refer the column Age # another method df.Age.head()

A common goal with data analysis is to visualize data To do this, we'll need matplotlib, which is a popular data visualization library. To do this , execute the command pip install matplotlib

dataframe_operations and various functions

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

dataframe_operations and various functions

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

DTI BPI Pivot Small Business - BUSINESS START UP PLAN

CATHOLIC EDUCATIONAL Corporate Responsibilities

Karin Schaupp – Evocation; lançamento: 2000

Pillars of Biblical Oneness in the Book of Acts

7-10. STP + Branding and Product &amp; Services Strategies.pptx

Business Legislation PPT - UNIT 1 jimllpkggg

7-10. STP + Branding and Product & Services Strategies.pptx