Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.

MeghaSharma504 689 views 14 slides May 14, 2024
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.


Slide Content

Data Science Exploratory Data Analysis (EDA) (Box plot, Scatter plot, Bar plot, Line plot) Part-II

Box plots (Box-and-Whisker Plots): A box plot, also known as a box-and-whisker plot, is a graphical representation of the distribution of a dataset based on five summary statistics: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It provides a visual summary of the central tendency, spread, and skewness of the data, as well as identifying outliers. The boxplot() function in Matplotlib is used to create boxplot.

Box Plot (Box-and-Whisker Plots): Minimum and Maximum : The smallest and largest exam scores in the dataset. Quartiles (Q1, Q2, Q3) : These divide the dataset into four equal parts. Q1 represents the 25th percentile, Q2 is the median (50th percentile), and Q3 is the 75th percentile. Interquartile Range (IQR) : The range between the first and third quartiles (Q3 - Q1). Whiskers : Lines extending from the box to the minimum and maximum values within 1.5 times the IQR from the first and third quartiles. Outliers : Data points outside the whiskers, indicating potential anomalies or extreme values.

Interpreting the Box Plot: If the Median is at the center of the Box and the whiskers are almost the same on both the ends, then the data is Normally Distributed. If the Median lies closer to the First Quartile and if the whisker at the lower end is shorter then it has a Positive Skew (Right Skew). If the Median lies closer to the Third Quartile and if the whisker at the upper end is shorter than it has a Negative Skew (Left Skew). If there are values that fall above or below the end of the whiskers, they are plotted as dots. These points are often called outliers.  In the example of exam scores, the box plot helps visualize the performance of students and understand the spread of scores across the class.

Skewness

Scatter plots Scatter plots visualize the relationship between two numerical variables by plotting each observation as a point on a two-dimensional graph. They help in identifying patterns, trends, and correlations between variables. The scatter() function in Matplotlib is used to create scatter plots. Check description box for Introduction to Machine Learning and Machine learning algorithms video link.

Interpreting the Scatter plots If the data points tend to move upwards from left to right, it indicates a positive relationship. Conversely, if the data points tend to move downwards from left to right, it indicates a negative relationship. If the data points form a roughly straight line, it suggests a linear relationship. If the data points follow a curved pattern or do not conform to a straight line, it suggests a nonlinear relationship. We can Calculate the correlation coefficient (such as Pearson's correlation coefficient) to quantify the strength and direction of the relationship between the variables. A correlation coefficient close to +1 indicates a strong positive relationship, close to -1 indicates a strong negative relationship, and close to 0 indicates no linear relationship.

Bar Plots Bar plots represent the frequency or count of categorical variables by displaying bars of varying heights. They are effective for comparing the distribution of categories or groups. In a bar chart, the x-axis represents the categories or groups, and the y-axis represents the frequency or count of observations in each category. The bars are separated from each other to represent distinct categories. The bar() function in Matplotlib is used to create bar plot. Bar charts are versatile and can be used to compare categorical data across different groups or time periods, such as sales by product category, votes by political party, or average temperatures by month.

Line plots Line plots show the trend or pattern of a numerical variable over time or another continuous variable. They are commonly used for time series data analysis. Line charts are used to visualize changes in one continuous numerical variable over time or another ordered categorical variable. Example : Stock prices over time: We can plot stock prices (continuous numerical variable) against time (ordered categorical variable) to see how they change over different time periods. Please check the description box for the link to Machine Learning videos.

Thanks for Watching! Please check the description box for the link to Machine Learning videos.