Creating a Data Frame from a Dict of Series : You can create a Data Frame by passing a dictionary of Series objects, where the keys of the dictionary will become the column names of the Data Frame . Each Series in the dictionary must have the same length . For example: data = {'A': pd.Series ([1, 2, 3]), 'B': pd.Series ([4, 5, 6])} df = pd.DataFrame (data) Creating a Data Frame from a Dict of Dicts : You can also create a Data Frame by passing a dictionary of dictionaries, where the outer dictionary keys become the column names and the inner dictionary keys become the row index. For example: data = {'A': {'a1': 1, 'a2': 2, 'a3': 3}, 'B': {'a1': 4, 'a2': 5, 'a3': 6}} df = pd.DataFrame (data) In both cases, the resulting Data Frame will have the dictionary keys as the column names and the common index from the Series or inner dictionaries as the row index. Creating a Data Frame from a Dict of Series or Dicts
C reate a Data Frame in Python from a dictionary of NumPy ndarrays ( Ndimensional arrays). Here are the key points:Creating a Data Frame from a Dict of Ndarrays:You can create a Data Frame by passing a dictionary of NumPy ndarray objects, where the keys of the dictionary will become the column names of the Data Frame . Each ndarray in the dictionary must have the same length . For example: data = {'A': np.array ([1, 2, 3]), 'B': np.array ([4, 5, 6])} df = pd.DataFrame (data) In this case, the resulting Data Frame will have the dictionary keys as the column names and the common length of the ndarrays as the row index.The key difference from creating a Data Frame from a dictionary of Series or dictionaries is that here you are using NumPy ndarrays instead of Python builtin data structures like Series or dictionaries. Creating Data Frames from a Dict of Ndarravs
Creating a Data Frame from a Structured or Record Array:You can create a Data Frame directly from a NumPy structured or record array. A structured array is a special type of NumPy ndarray where each element is a row and the columns are defined by the data types specified when creating the array. For example: import numpy as np Create a structuredarray data = np.array ([('Alex', 10), ('Bob', 12), ('Clarke', 13)], dtype =[('Name', 'U10'), ('Age', int )]) Create a Data Frame from the structured array df = pd.DataFrame (data) In this case, the resulting Data Frame will have the field names from the structured array as the column names, and each row will correspond to an element in the structured array. The key advantage of creating a Data Frame from a structured array is that the column names and data types are automatically inferred from the array definition, making it a convenient way to convert structured data into a tabular format. Creating Data Frames from a Structured or Record Array
Creating a Data Frame from a List of Dicts : You can create a Data Frame by passing a list of dictionaries, where each dictionary represents a row and the keys of the dictionaries become the column names of the Data Frame. For example: data = [{'A': 1, 'B': 4}, {'A': 2, 'B': 5}, {'A': 3, 'B': 6}] df = pd.DataFrame (data) In this case, the resulting Data Frame will have the keys from the dictionaries as the column names, and each row will correspond to a dictionary in the list. The keys from the first dictionary in the list are used to determine the column names. If other dictionaries in the list have different keys, they will be included as columns with NaN values where data is missing. This method is useful when you have data stored in a list of dictionaries, as it allows you to easily convert it into a tabular Data Frame format for further analysis and manipulation. Creating Data Frames from a List of Dicts
Creating Data Frames from a Dict of Tuples You can create a Data Frame in Python from a dictionary where the values are tuples, and the keys become the column names. For example: data = {'A': (1, 2, 3), 'B': (4, 5, 6)} df = pd.DataFrame (data) In this case, the resulting Data Frame will have the dictionary keys as the column names, and the length of the tuples will determine the number of rows. Creating Data Frames from a Dict of Tuples, Selecting, Adding, and Deleting Data Frame Columns
Selecting Data Frames You can select data from a Data Frame in various ways: By column name: ` df ['A']` or ` df.A ` By row position: ` df.iloc ` (first row) By row label: ` df.loc [' row_label ']` By boolean indexing: ` df [ df ['A'] > 2] You can also select multiple columns or rows using lists, slices, and boolean conditions : Select multiple columns: ` df [['A', 'B']]` Select rows by position: ` df.iloc [0:2]` Select rows by label: ` df.loc ['row1':'row3']` Select rows by boolean condition: ` df [ df ['A'] > 2 & df ['B'] < 6]` The key is to use the appropriate selection method (by position, label, or boolean condition) to extract the desired data from the Data Frame. Creating Data Frames from a Dict of Tuples, Selecting, Adding, and Deleting Data Frame Columns
Adding and Deleting Data Frame Columns To add a new column to a Data Frame: Assign a Series or scalar value to a new column name ` df ['C'] = df ['A'] * df ['B']` ` df ['D'] = 0` Use the `assign()` method to create new columns ` df = df.assign (C= df ['A'] * df ['B'], D=0)` To delete columns from a Data Frame: Drop columns by name or index ` df = df.drop ('A', axis=1)` ` df = df.drop ( df.columns [[0, 1]], axis=1)` Assign `None` to delete a column inplace ` df ['A'] = None` `del df ['B']` The `axis=1` argument specifies that the operation should be applied to columns. Creating Data Frames from a Dict of Tuples, Selecting, Adding, and Deleting Data Frame Columns
Assigning New Columns in Method Chains You can assign new columns to a Data Frame using method chaining. This allows you to create new columns and perform other operations in a single statement. Example: df = df.assign (C= df ['A'] * df ['B'], D=0) In this example, a new column 'C' is created by multiplying columns 'A' and 'B', and a new column 'D' is created with a constant value of 0. Assigning New Columns in Method Chains, Row Selection, Row Addition, Row Deletion
Row Selection By position using ` df.iloc `: ` df.iloc ` Select first row ` df.iloc [0:2]` Select first two rows By label using ` df.loc `: ` df.loc ['row1']` Select row with label 'row1' ` df.loc ['row1':'row3']` Select rows with labels 'row1' to 'row3‘ By boolean indexing: ` df [ df ['A'] > 2]` Select rows where column 'A' is greater than 2 Assigning New Columns in Method Chains, Row Selection, Row Addition, Row Deletion
Row Addition Append a Series or DataFrame using ` df.append ()`: ` df = df.append ({'A': 4, 'B': 7}, ignore_index =True)` Concatenate DataFrames using ` pd.concat ()`: `df2 = pd.DataFrame ({'A': , 'B': })` ` df = pd.concat ([ df , df2], ignore_index =True)` Assigning New Columns in Method Chains, Row Selection, Row Addition, Row Deletion
Row Deletion: Drop rows by position using ` df.drop ()`: ` df = df.drop ( df.index )` Delete first row ` df = df.drop ( df.index [0:2])` Delete first two rows Drop rows by label using ` df.loc []` and ` df.drop ()`: ` df = df.drop ( df.loc [ df ['A'] < 2].index)` Delete rows where 'A' is less than 2 The key is to use the appropriate row selection method (by position, label, or boolean condition) to identify the rows you want to delete. Assigning New Columns in Method Chains, Row Selection, Row Addition, Row Deletion
Exploring and analyzing a Data Frame in Python involves several techniques to understand and manipulate the data. Here are some key methods: Exploring a Data Frame 1. Basic Information: - Use the `info()` method to get basic information about the Data Frame, including the number of rows and columns, data types, and memory usage. - Example: df.info() Exploring and Analysing a Data Frame
2. Data Types: - Use the ` dtypes ` attribute to see the data types of each column. - Example: df.dtypes 3. Head and Tail: - Use the `head()` and `tail()` methods to view the first and last few rows of the Data Frame. - Example: df.head () df.tail () Exploring and Analysing a Data Frame
4. Descriptive Statistics: - Use the `describe()` method to get descriptive statistics such as mean, standard deviation, minimum, and maximum for each column. - Example: df.describe () Exploring and Analysing a Data Frame
Analyzing a Data Frame 1. Grouping and Aggregating: - Use the ` groupby ()` method to group the data by one or more columns and apply aggregations such as `sum`, `mean`, `max`, etc. - Example: grouped_df = df.groupby (' column_name ').sum() 2. Filtering: - Use boolean indexing to filter rows based on conditions. - Example: filtered_df = df [ df [' column_name '] > 5] Exploring and Analysing a Data Frame
3. Sorting: - Use the ` sort_values ()` method to sort the Data Frame by one or more columns. - Example: sorted_df = df.sort_values (by=' column_name ') 4. Plotting: - Use plotting libraries such as ` matplotlib ` and ` seaborn ` to visualize the data. - Example: import matplotlib.pyplot as plt plt.plot ( df [' column_name '], df [' other_column ']) plt.show () Exploring and Analysing a Data Frame
5. Data Manipulation: - Use methods like `drop()`, ` dropna ()`, ` fillna ()`, `rename()`, and ` reset_index ()` to manipulate the Data Frame. - Example: df.drop (' column_name ', axis=1, inplace =True) Example Here is a complete example of exploring and analyzing a Data Frame: import pandas as pd import matplotlib.pyplot as plt Exploring and Analysing a Data Frame
#Create a sample Data Frame data = {'A': [1, 2, 3, 4, 5], 'B': [4, 5, 6, 7, 8], 'C': [7, 8, 9, 10, 11]} df = pd.DataFrame (data) #Basic Information print(df.info()) #Data Types print( df.dtypes ) #Head and Tail print( df.head ()) print( df.tail ()) Exploring and Analysing a Data Frame
#Plotting plt.plot ( df ['A'], df ['B']) plt.show () #Data Manipulation df.drop ('C', axis=1, inplace =True) print( df ) This example demonstrates how to explore and analyze a Data Frame by getting basic information, checking data types, viewing the first and last few rows, calculating descriptive statistics, grouping and aggregating data, filtering rows, sorting data, plotting data, and manipulating the Data Frame. Exploring and Analysing a Data Frame
Indexing and Selecting Data Frames 1. Selecting Columns: - Access columns by name using square brackets ` df [' column_name ']` or dot notation ` df.column_name `. - Select multiple columns using a list of column names ` df [['col1', 'col2']]`. Indexing and Selecting Data Frames
2. Selecting Rows: - By position using ` df.iloc `: - ` df.iloc ` # Select first row - ` df.iloc [0:2]` # Select first two rows - By label using ` df.loc `: - ` df.loc ['row1']` # Select row with label 'row1' - ` df.loc ['row1':'row3']` # Select rows with labels 'row1' to 'row3' - By boolean indexing: - ` df [ df [' column_name '] > 2]` # Select rows where ' column_name ' is greater than 2. 3. Selecting Rows and Columns: - Combine row and column selection: - ` df.loc ['row1', ' column_name ']` # Select value at row 'row1', column ' column_name ' - ` df.iloc [0, 1]` # Select value at row 0, column 1 (by position) Indexing and Selecting Data Frames
4. Transposing a Data Frame: - Use the `T` attribute to transpose the Data Frame: - ` df_transposed = df.T . 5. Interoperability with NumPy : - You can use NumPy functions directly on a Data Frame: - ` df.sum ()` # Apply NumPy's sum() function to the Data Frame - ` df.values ` # Access the underlying NumPy array. The key is to use the appropriate indexing method (by position, label, or boolean condition) to select the desired rows and columns from the Data Frame. The flexibility of Data Frame indexing allows you to extract specific subsets of the data for further analysis and manipulation. Indexing and Selecting Data Frames
Transposing a Data Frame You can transpose a Data Frame using the `T` attribute: - ` df_transposed = df.T `[170] This will swap the rows and columns of the Data Frame, effectively transposing it. Data Frame Interoperability with NumPy Functions You can use NumPy functions directly on a Data Frame: - ` df.sum ()` # Apply NumPy's sum() function to the Data Frame - ` df.values ` # Access the underlying NumPy array[171] Transposing a Data Frame, Data Frame Interoperability with Numpy Functions
This allows you to leverage the powerful functionality of NumPy's numerical operations and access the underlying data representation when working with Data Frames. Some key points: - You can apply NumPy functions like `sum()`, `mean()`, ` std ()`, etc. directly on a Data Frame. - The `values` attribute of a Data Frame returns the underlying NumPy ndarray , allowing you to access the raw data. - This interoperability between Data Frames and NumPy makes it easy to perform advanced numerical and statistical analysis on tabular data. By using these techniques, you can seamlessly transition between the high-level abstraction of Data Frames and the low-level control of NumPy arrays when working with data in Python. Transposing a Data Frame, Data Frame Interoperability with Numpy Functions