visual representation with BOX PLOT,BAR PLOTS

anjanasharma77573 161 views 34 slides Jul 04, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

visual representation with BOX PLOT,BAR PLOTS


Slide Content

Box Plot Box Plot is a graphical method to visualize data distribution for gaining insights and making informed decisions. Box plot is a type of chart that depicts a group of numerical data through their quartiles . It displays key summary statistics such as the  median ,  quartiles,  and potential  outliers  in visual manner . By using Box plot you can provide a summary of the distribution, identify potential and compare different datasets in a compact and visual manner.

Elements of Box Plot A box plot gives a five-number summary of a set of data which is- Minimum  – It is the minimum value in the dataset excluding the outliers . First Quartile (Q1)  – 25% of the data lies below the First (lower) Quartile . Median (Q2)  – It is the mid-point of the dataset. Half of the values lie below it and half above . Third Quartile (Q3)  – 75% of the data lies below the Third (Upper) Quartile . Maximum  – It is the maximum value in the dataset excluding the outliers.

The area inside the box (50% of the data) is known as the  Inter Quartile Range .  The  IQR  is calculated as – IQR = Q3-Q1 Outlies   are the data points  below and above  the  lower and upper limit . The lower and upper limit is calculated as –  Lower Limit = Q1 - 1.5*IQR Upper Limit = Q3 + 1.5*IQR The values below and above these limits are considered outliers and the minimum and maximum values are calculated from the points which lie under the lower and upper limit.

How to create a box plots? Let us take a sample data to understand how to create a box plot . Here are the runs scored by a cricket team in a league of 12 matches –  100, 120, 110, 150, 110, 140, 130, 170, 120, 220, 140, 110 . To draw a box plot for the given data first we need to arrange the data in ascending order and then find the minimum, first quartile, median, third quartile and the maximum.

We can now calculate the Upper and Lower Limits to find the minimum and maximum values and also the outliers if any. Lower Limit = Q1-1.5*IQR = 110-1.5*35 = 57.5 Upper Limit = Q3+1.5*IQR = 145+1.5*35 = 197.5 So, the minimum and maximum between the range [57.5,197.5] for our given data are –  Minimum = 100 Maximum = 170 The outliers which are outside this range are –  Outliers = 220

Draw a box plot for the following data points . 1,1,2,3,5,7,7,8,10,12,15

Use-Cases of Box Plot Box plots provide a visual summary of the data with which we can quickly identify the average value of the data, how dispersed the data is, whether the data is skewed or not ( skewness ). The Median gives you the average value of the data. Box Plots shows Skewness of the data- a) If the Median is at the center of the Box and the whiskers are almost the same on both the ends then the data is Normally Distributed . b) If the Median lies closer to the First Quartile and if the whisker at the lower end is shorter (as in the above example) then it has a Positive Skew (Right Skew) . c) If the Median lies closer to the Third Quartile and if the whisker at the upper end is shorter than it has a Negative Skew (Left Skew).

The dispersion or spread of data can be visualized by the minimum and maximum values which are found at the end of the whiskers . The Box plot gives us the idea of about the Outliers which are the points which are numerically distant from the rest of the data.

How to compare box plots? Let us have a look at how we can compare different box plots and derive statistical conclusions from them. Let us take the below two plots as an example: –

Compare the Medians —  If the median line of a box plot lies outside the box of the other box plot with which it is being compared, then we can say that there is likely to be a difference between the two groups. Here the Median line of the plot B lies outside the box of Plot A. Compare the Dispersion or Spread of data —  The Inter Quartile range (length of the box) gives us an idea about how dispersed the data is. Here Plot A has a longer length than Plot B which means that the dispersion of data is more in plot A as compared to plot B. The length of whiskers also gives an idea of the overall spread of data. The extreme values (minimum &maximum) give the range of data distribution. Larger the range more scattered the data. Here Plot A has a larger range than Plot B.

Comparing Outliers —  The outliers give the idea of unusual data values which are distant from the rest of the data. More number of Outliers means the prediction will be more uncertain. We can be more confident while predicting the values for a plot which has less or no outliers. Compare Skewness —   Skewness  gives us the direction and the magnitude of the lack of symmetry. We have discussed above how to identify skewness . Here Plot A is Positive or Right Skewed and Plot B is Negative or Left Skewed. This is all for Box Plots. Now you might have got the idea of Box Plots how to make them and how to derive information from them. For any queries do leave a comment down below.

Construct a box plot for the following data set. 3,5,8,8,9,11,12,12,13,13,16

A bar chart (aka bar graph, column chart) plots numeric values for levels of a categorical feature as bars. Levels are plotted on one chart axis, and values are plotted on the other axis. Each categorical value claims one bar, and the length of each bar corresponds to the bar’s value. Bars are plotted on a common baseline to allow for easy comparison of values. bar plot

The types of bar charts are as follows: Vertical bar chart Horizontal bar chart Even though the graph can be plotted using horizontally or vertically, the most usual type of bar graph used is the vertical bar graph. The orientation of the x-axis and y-axis are changed depending on the type of vertical and horizontal bar chart. Apart from the vertical and horizontal bar graph, the two different types of bar charts are: Grouped Bar Graph Stacked Bar Graph Types of Bar Graphs

Horizontal bars vs. vertical bars A common bar chart variation is whether or not the bar chart should be oriented vertically ( with categories on the horizontal axis ) or horizontally ( with categories on the vertical axis ). While the vertical bar chart is usually the default, it’s a good idea to use a horizontal bar chart when you are faced with long category labels. In a vertical chart, these labels might overlap, and would need to be rotated or shifted to remain legible; the horizontal orientation avoids this issue.

Grouped Bar Graph The grouped bar graph is also called the clustered bar graph, which is used to represent the discrete value for more than one object that shares the same category. In this type of bar chart, the total number of instances are combined into a single bar. In other words, a grouped bar graph is a type of bar graph in which different sets of data items are compared. Here, a single colour is used to represent the specific series across the set. The grouped bar graph can be represented using both vertical and horizontal bar charts.

Stacked Bar Graph The stacked bar graph is also called the composite bar chart, which divides the aggregate into different parts. In this type of bar graph, each part can be represented using different colours, which helps to easily identify the different categories. The stacked bar chart requires specific labelling to show the different parts of the bar. In a stacked bar graph, each bar represents the whole and each segment represents the different parts of the whole.

Properties of Bar Graph Some of the important properties of a bar graph are as follows: All the bars should have a common base. Each column in the bar graph should have equal width. The height of the bar should correspond to the data value. The distance between each bar should be the same.

Data visualization with R and ggplot2 Data visualization with R and ggplot2 in R Programming Language also termed as Grammar of Graphics is a free, open-source, and easy-to-use visualization package widely used in  R Programming Language . It is the most powerful  visualization package  written by Hadley Wickham. Building Blocks of layers with the grammar of graphics Data:  The element is the data set itself Aesthetics:  The data is to map onto the Aesthetics attributes such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, line type Geometrics:  How our data being displayed using point, line, histogram, bar, boxplot Facets:  It displays the subset of the data using Columns and rows Statistics:  Binning, smoothing, descriptive, intermediate Coordinates:  the space between data and display using Cartesian, fixed, polar, limits Themes:  Non-data link

Data Layer: ggplot2 in R the data Layer we define the source of the information to be visualize, let’s use the mtcars dataset in the ggplot2 package . library(ggplot2) library( dplyr ) ggplot (data = mtcars ) + labs(title = " MTCars Data Plot")

Aesthetic Layer: ggplot2 in R Here we will display and map dataset into certain aesthetics . # Aesthetic Layer ggplot (data = mtcars , aes (x = hp, y = mpg, col = disp ))+ labs(title = " MTCars Data Plot")

Geometric layer: ggplot2 in R geometric layer control the essential elements, see how our data being displayed using point, line, histogram, bar, boxplot . # Geometric layer ggplot (data = mtcars , aes (x = hp, y = mpg, col = disp )) + geom_point () + labs(title = "Miles per Gallon vs Horsepower", x = "Horsepower", y = "Miles per Gallon")

Geometric layer : Adding Size, color, and shape and then plotting the Histogram plot # Adding size ggplot (data = mtcars , aes (x = hp, y = mpg, size = disp )) + geom_point () + labs(title = "Miles per Gallon vs Horsepower", x = "Horsepower", y = "Miles per Gallon") # Adding shape and color ggplot (data = mtcars , aes (x = hp, y = mpg, col = factor( cyl ), shape = factor(am))) + geom_point () + labs(title = "Miles per Gallon vs Horsepower", x = "Horsepower", y = "Miles per Gallon")

# Histogram plot ggplot (data = mtcars , aes (x = hp)) + geom_histogram ( binwidth = 5) + labs(title = "Histogram of Horsepower", x = "Horsepower", y = "Count")

Quantile Quantile plots The quantile-quantile ( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same  population  or not . Q-Q plots are particularly useful for assessing whether a dataset is  normally distributed  or if it follows some other known distribution . They are commonly used in statistics, data analysis, and quality control to check assumptions and identify departures from expected distributions.

Quantiles And Percentiles Quantiles are points in a dataset that divide the data into intervals containing equal probabilities or proportions of the total distribution . They are often used to describe the spread or distribution of a dataset. The most common quantiles are : Median  (50th percentile) : The median is the middle value of a dataset when it is ordered from smallest to largest. It divides the dataset into two equal halves . Quartiles  (25th, 50th, and 75th percentiles) : Quartiles divide the dataset into four equal parts. The first quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the median, and the third quartile (Q3) is the value below which 75% of the data falls . Percentiles :  Percentiles are similar to quartiles but divide the dataset into 100 equal parts. For example, the 90th percentile is the value below which 90% of the data falls.

How to Draw Q-Q plot? To draw a Quantile-Quantile (Q-Q) plot, you can follow these steps : Collect the Data : Gather the dataset for which you want to create the Q-Q plot. Ensure that the data are numerical and represent a random sample from the population of interest . Sort the Data : Arrange the data in either ascending or descending order. This step is essential for computing quantiles accurately . Choose a Theoretical Distribution : Determine the theoretical distribution against which you want to compare your dataset. Common choices include the normal distribution, exponential distribution, or any other distribution that fits your data well .

Calculate Theoretical Quantiles : Compute the quantiles for the chosen theoretical distribution. For example, if you’re comparing against a normal distribution, you would use the inverse cumulative distribution function (CDF) of the normal distribution to find the expected quantiles . Plotting : Plot the sorted dataset values on the x-axis. Plot the corresponding theoretical quantiles on the y-axis. Each data point (x, y) represents a pair of observed and expected values. Connect the data points to visually inspect the relationship between the dataset and the theoretical distribution.