Sorting a Pandas DataFrame DataFrame can be sorted with .sort_values() sets the label of the row or column to sort by df.sort_values(by= 'py_score' , ascending= False ) specifies whether you want to sort in ascending (True) or descending (False) order To sort by multiple columns, then just pass lists as arguments for by and ascending: df.sort_values(by=['total', ‘py_score'], ascending=[False, False]) In this case, the DataFrame is sorted by the column total , but if two values are the same, then their order is determined by the values from the column py_score.
Filtering Data filter_score = df[ 'sql_score' ] >= 80 filter_score The expression df[filter_score] returns a Pandas DataFrame with the rows from df that correspond to True in filter_score Output is a Series filter_score filled with Boolean data.
Handling Missing Data Pandas usually represents missing data with NaN (not a number) values. Missing Data can occur when no information is provided for one or more items or for a whole unit. Checking for missing values using isnull() and notnull()
Filling missing values using fillna() import pandas as pd import numpy as np # dictionary of lists dict = { 'First Score' :[ 100 , 90 , np.nan, 95 ], 'Second Score' : [ 30 , 45 , 56 , np.nan], 'Third Score' :[np.nan, 40 , 80 , 98 ]} # creating a dataframe from dictionary df = pd.DataFrame( dict ) print (df) # filling missing value using fillna() df.fillna( ) Drop rows with at least one Nan value
Check for NaN in Pandas DataFrame
Import a csv file in to google colab session storage
Load Files into a DataFrame print(df.to_string()) By default, when you print a DataFrame, you will only get the first 5 rows, and the last 5 rows. The head() method returns the headers and a specified number of rows, starting from the top. If your data sets are stored in a file, Pandas can load them into a DataFrame. CSV Files ( Comma Separator Value Files )
Data Processing with Pandas DataFrame import pandas as pd df=pd.read_csv( 'data.csv' ) print (df.head( 3 )) # first 3 rows print (df.tail( 6 )) # 6 rows from last print (df[ 'Age' ].head()) #to refer the column Age # another method df.Age.head()
A common goal with data analysis is to visualize data To do this, we'll need matplotlib, which is a popular data visualization library. To do this , execute the command pip install matplotlib