dataframe_operations and various functions

JayanthiM19 4 views 13 slides Mar 06, 2025
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Data Frames


Slide Content

Applying Arithmetic Operations Addition, subtraction, multiplication, and division import pandas as pd d = { 'py_score' : pd.Series([ 88 , 79 , 81 ], index=[ 'a' , 'b' , 'c' ]), 'sql_score' : pd.Series([ 86 , 81 , 78 , 88 ], index=[ 'a' , 'b' , 'c' , 'd' ]), 'ca_score' : pd.Series([ 71 , 95 , 88 ], index=[ 'a' , 'b' , 'c’ ])} df = pd.DataFrame(d) print ( "Dataframe is:" ) print (df) print ( "sum of python and sql score" ) print (df[ 'py_score' ] + df[ 'sql_score’ ]) df[ 'total' ] = 0.4 * df[ 'py_score' ] + 0.3 * df[ 'sql_score' ] + 0.3 * df[ 'ca_score' ] print (df)

Sorting a Pandas DataFrame DataFrame can be sorted with .sort_values() sets the label of the row or column to sort by df.sort_values(by= 'py_score' , ascending= False ) specifies whether you want to sort in ascending (True) or descending (False) order To sort by multiple columns, then just pass lists as arguments for by and ascending: df.sort_values(by=['total', ‘py_score'], ascending=[False, False]) In this case, the DataFrame is sorted by the column total , but if two values are the same, then their order is determined by the values from the column py_score.

Filtering Data filter_score = df[ 'sql_score' ] >= 80 filter_score The expression df[filter_score] returns a Pandas DataFrame with the rows from df that correspond to True in filter_score Output is a Series filter_score filled with Boolean data.

Combining logical operations df[(df[ 'py_score' ] >= 80 ) & (df[ 'sql_score' ] >= 80 )]

Handling Missing Data Pandas usually represents missing data with NaN (not a number) values. Missing Data can occur when no information is provided for one or more items or for a whole unit. Checking for missing values using isnull() and notnull()

Filling missing values using fillna() import pandas as pd import numpy as np # dictionary of lists dict = { 'First Score' :[ 100 , 90 , np.nan, 95 ], 'Second Score' : [ 30 , 45 , 56 , np.nan], 'Third Score' :[np.nan, 40 , 80 , 98 ]} # creating a dataframe from dictionary df = pd.DataFrame( dict ) print (df) # filling missing value using fillna() df.fillna( ) Drop rows with at least one Nan value

Check for NaN in Pandas DataFrame

Import a csv file in to google colab session storage

Load Files into a DataFrame print(df.to_string()) By default, when you print a DataFrame, you will only get the first 5 rows, and the last 5 rows. The head() method returns the headers and a specified number of rows, starting from the top. If your data sets are stored in a file, Pandas can load them into a DataFrame. CSV Files ( Comma Separator Value Files )

Data Processing with Pandas DataFrame import pandas as pd df=pd.read_csv( 'data.csv' ) print (df.head( 3 )) # first 3 rows print (df.tail( 6 )) # 6 rows from last print (df[ 'Age' ].head()) #to refer the column Age # another method df.Age.head()

A common goal with data analysis is to visualize data To do this, we'll need matplotlib, which is a popular data visualization library. To do this , execute the command pip install matplotlib