DrShubhamPatel2
1,479 views
43 slides
Jan 21, 2023
Slide 1 of 43
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
About This Presentation
box plot ppt for statistics student ,
if you want to learn box plot or understand the box plot topic how to draw box plot read this ppt carefully.
Size: 1.46 MB
Language: en
Added: Jan 21, 2023
Slides: 43 pages
Slide Content
Acharya Narendra Deva University of Agriculture and Technology, Kumarganj, Ayodhya (U.P.) 224229 Department of Statistics Assignment Topic – Box plot Submitted to : Dr. Vishal Mehta Assistant Professor Department of Agricultural Statistics Submitted by: Shubham Patel (ID NO. -: H-10325/18/22) Course name – Stat 502 (3+1) Course Title - S tatistical Methods in Applied Science Session – 2022-23 M.Sc. (Horti.) Vegetable Science 1 st year College of Agriculture
Content Sr. no. Content name 1. Definition 2. How to read box plot 3. Exeption 3. Parts of box plot 4. Formula of quartiles 5. Box plot distribution 6. Uses of Box Plot 7. How to compare box plot 8. Examples for raw frequency data no -1 9. Examples for raw frequency data no -2 10. Example for discreate frequency data 11. Example for continuous frequency data
Box plot The method to summarize a set of data that is measured using an interval scale is called a box plot. or A box plot is a graphical way that summarizes the important aspects of the distribution of numeric data. It is also referred to as a Box-and-Whisker Plot as it displays the data in a box-and-whiskers format.
Box plots can be drawn either vertically or horizontally Diagram of box plot
How to Read a Box Plot A boxplot is a way to show a five number summary in a chart. The main part of the chart (the “box”) shows where the middle portion of the data is: the interquartile range. At the ends of the box, you” find the first quartile (the 25% mark) and the third quartile (the 75% mark). The far left of the chart (at the end of the left “whisker”) is the minimum (the smallest number in the set) and the far right is the maximum (the largest number in the set). Finally, the median is represented by a vertical bar in the center of the box. Box plots aren’t used that much in real life. However, they can be a useful tool for getting a quick summary of data
Exception If your data set has outliers (values that are very high or very low and fall far outside the other values of the data set), the box and whiskers chart may not show the minimum or maximum value. Instead, the ends of the whiskers represent one and a half times the interquartile range
Parts of Box Plots Minimum : The minimum value in the given dataset First Quartile (Q1) : The first quartile is the median of the lower half of the data set. Median: The median is the middle value of the dataset, which divides the given dataset into two equal parts. The median is considered as the second quartile. Third Quartile (Q3): The third quartile is the median of the upper half of the data. Maximum: The maximum value in the given dataset.
Cont … Apart from these five terms, the other terms used in the box plot are: Interquartile Range (IQR): The difference between the third quartile and first quartile is known as the interquartile range. (i.e.) IQR = Q3-Q1 Outlier: The data that falls on the far left or right side of the ordered data is tested to be the outliers. Generally, the outliers fall more than the specified distance from the first and third quartile.
Parts of box plot
Quartiles formula For Raw & descreate data set :- Q1 = item value Median or Q2 = item value Q3 = item value Where is , N = number of observations L= lower limit of class interval CF = cumulative frequency from upper one F = frequency of selected class interval H= class interval
Cont.. For Continuous data set :- Q1 = Q2= Q3 = Where is , N = number of observations L= lower limit of class interval CF = cumulative frequency from upper one F = frequency of selected class interval H= class interval
Boxplot Distribution The box plot distribution will explain how tightly the data is grouped, how the data is skewed, and also about the symmetry of data. Positively Skewed : If the distance from the median to the maximum is greater than the distance from the median to the minimum, then the box plot is positively skewed.
Cont.. Negatively Skewed : If the distance from the median to minimum is greater than the distance from the median to the maximum, then the box plot is negatively skewed. Symmetric : The box plot is said to be symmetric if the median is equidistant from the maximum and minimum values
Box plot on a normal distribution
Uses of box Plot Box plots are widely used in statistics, process improvement, scientific research, economics, and in social and human sciences. Mainly used to explore data as well as to present the data in an easy and understandable manner. Box plots provide a visual summary of the data with which we can quickly identify the average value of the data, how dispersed the data is, whether the data is skewed or not (skewness). The Median gives you the average value of the data.
Box Plots shows Skewness of the data The dispersion or spread of data can be visualized by the minimum and maximum values which are found at the end of the whiskers. The Box plot gives us the idea of about the Outliers which are the points which are numerically distant from the rest of the data Cont …
How to compare box plots As we have discussed at the beginning of the article that box plots make comparing characteristics of data between categories very easy. Let us have a look at how we can compare different box plots and derive statistical conclusions from them. Let us take the below two plots as an example:-
Compare the Medians — If the median line of a box plot lies outside the box of the other box plot with which it is being compared, then we can say that there is likely to be a difference between the two groups. Here the Median line of the plot B lies outside the box of Plot A. Compare the Dispersion or Spread of data — The Inter Quartile range (length of the box) gives us an idea about how dispersed the data is. Here Plot A has a longer length than Plot B which means that the dispersion of data is more in plot A as compared to plot B. The length of whiskers also gives an idea of the overall spread of data. The extreme values (minimum &maximum) gives the range of data distribution. Larger the range more scattered the data. Here Plot A has a larger range than Plot B.
Comparing Outliers — The outliers gives the idea of unusual data values which are distant from the rest of the data. More number of Outliers means the prediction will be more uncertain. We can be more confident while predicting the values for a plot which has less or no outliers. Compare Skewness — Skewness gives us the direction and the magnitude of the lack of symmetry. We have discussed above how to identify skewness. Here Plot A is Positive or Right Skewed and Plot B is Negative or Left Skewed. Cont …
Examples for raw data Q.1- Draw the box plot from given data . Given- 20, 28, 40, 12, 30, 15, 50 Solution- Firstly arrange the data in accending order. 12, 15, 20, 28, 30, 40, 50 Number of observations (n) = 7 Q1 = item value = = = 2 nd item Q1 = 15
Q2 or mean = item value = = = 4 th item value = Q2 = 28
Q3 = item value = = = ( th = 6 th value = Q3 = 40
Q1= 15 Q2= 28 Q3 = 40 Minimum value = 12 Maximum value = 50
The box plot are normally distributed
Examples for raw data Q.1- Draw the box plot from given data . Given- 64, 25, 52, 32, 48, 29, 57, 21 Solution- Firstly arrange the data in accending order. 21, 25, 29, 32, 48, 52, 57, 64 Number of observations (n) = 8 Q1 = item value = = = 2.25 th item
= 52 + 5 = 52 + = 52+ 3.75 Q3 = 55.75 Q1= 26 Q2= 40 Q3 = 55.75 Minimum value = 21 Maximum value = 64
The box plot are normally distributed
Q. 2- Draw the box plot for given discreate frequency data . Given- Solution- Number of observations (N) = 43 X 10 20 30 40 50 60 F 4 7 15 8 7 2 X F CF 10 4 4 20 7 11 30 15 26 40 8 34 50 7 41 60 2 43 N = 43 Example for discreate frequency data
Q1= item = = = 11 th item now see the 11 th item value in the table Here we found the the Q1 = 20
Q2 = item = = = 22th item now see the 22 th item value in the table Here we found the the Q2 = 30
Q3 = item = = = 11 = 33th now see the 22 th item value in the table Here we found the the Q3 = 40 Q1= 20 Q2= 30 Q3 = 40 Minimum value = 10 Maximum value = 60
The box plot are normally distributed
Example for continuous frequency data Q.3 – Draw the box plot for given data . Given Solution = Number of observations (N) = 67 C.I. 10-20 20-30 30-40 40-50 50-60 60-70 70-80 F 12 19 5 10 9 6 6 C.I. F CF 10-20 12 12 20-30 19 31 30-40 5 36 40-50 10 46 50-60 9 55 60-70 6 61 70-80 6 67 N= 67
Q1 = item = = 16.75 th item Thus the Q1 lies in the class 20-30 ; & the corresponding value are – L= 20, = 16.75 , CF = 12 , F = 19 , h= 10 Q 1 = = 20+