Organizing and Displaying Data 3 A variable typically takes values that vary from one individual to another The distribution of a variable tells us what values the variable takes and how often it takes each value We can summarize a variable’s distribution with: A frequency table shows the number of individuals having each data value A relative frequency table shows the proportion or percentage of individuals having each data value
Frequency Distributions When we have a large set of quantitative data, it’s useful to organize it into smaller intervals or classes and count how many data values fall into each class A frequency table does just that
Frequency Distributions Simple Frequency Distribution Simple frequency table: Shows all the values that a variable can take and the number of times (frequency - f ) that each value appears in the data set The resulting table summarizes the “distribution” of the values in the sample or the population Grouped Frequency Distribution Grouped frequency table : Shows categories of values that the variable can take and the number of times ( f ) a value from the data set appears in a given category The resulting table also summarizes the distribution of the values in the sample or population
Frequency Distributions Minutes to Commute (n=10) 5 8 9 10 16 20 9 9 10 16
Simple Frequency Distributions Minutes to Commute (n=10) Minutes to Commute Frequency (f) 5 1 8 1 9 3 10 2 16 2 20 1
Creating a Frequency Table Example In order to encourage car pooling, a study was made of one-way commuting distances of workers in Montreal. A random sample of 60 of these workers was taken. The commuting distances of the workers in the sample are given below. Make a frequency table for these data
Grouped Frequency Table 9
Some Important Calculations
Relative Frequency Distributions Basic frequency tables as we have just completed show how many data values fall into each class It’s also useful to know the relative frequency of a class The relative frequency of a class is the proportion of all data values that fall into that class To find the relative frequency of a particular class, divide the class frequency f by the total of all frequencies n (the sample size)
Relative Frequency 12 Relative frequency = f = Class frequency n Total of all frequencies Percentage frequency = f = Class frequency x 100% n Total of all frequencies
Relative Frequency Distributions The sample size is n = 60. Notice that the sample size is the total of all the frequencies. Therefore, the relative frequency for the first class (the class from 1 to 8) is
Cumulative Frequency Distributions A cumulative frequency distribution is the sum of the frequencies of a class plus all classes below it in a frequency distribution Cumulative frequencies tell us how many data values are smaller than an upper class boundary
15 Class Frequency f Cumulative Frequency cf Relative Frequency f/n 1-8 14 14 0.23 9-16 21 35 0.35 17-24 11 46 0.18 25-32 6 52 0.10 33-40 4 56 0.07 41-48 4 60 0.07 Cumulative Frequency Table Once we have a frequency table, it is a fairly straightforward matter to add a column of cumulative frequencies
Cross Tabulations (Cross Tabs)
Cross Tabulations (Cross Tabs)
Cross Tabulations (Cross Tabs)
Cross Tabulations (Cross Tabs)
Histograms Histograms provide visual displays of data which has been organized into frequency tables We use bars to represent each class, where the width of the bar is the class width and the height of the bar is the class frequency For relative-frequency histograms, the height of the bar is the relative frequency of that class
Histograms c
Histograms Histograms form our commuting example: In both graphs, class boundaries are marked on the horizontal axis Histogram Relative Frequency Histogram
Histograms
Bar Graphs, Circle Graphs, and Time-Series Graphs Histograms provide a useful visual display of the distribution of data However, the data must be quantitative . We will now examine other types of graphs, some of which are suitable for qualitative or category data as well Let’s start with bar graphs . These are graphs that can be used to display quantitative or qualitative data
Bar Graphs
Bar Graphs Below are two bar graphs (clustered bar graphs) depicting the life expectancies for men and women
27
Pareto Charts Reasons for Being Late # of Times Late
Circle Graph … or Pie Chart Time Number (frequency) Percentage Number of Degrees Less than ½ an hour 296 59.2% .592 x 360 = 213º ½ an hour to 1 hour 83 16.6% .166 x 360 = 60º More than 1 hour 121 24.2% .242 x 360 = 87º TOTAL 500 Example
Pie chart
Time Series Graph A time-series graph is a graph showing data measurements in chronological order.
Time Series Graph
Other Types of Charts and Graphs 35 There are many other types of charts as well, some of which we will see later in the course, such as: Stem and Leaf Displays Box and Whisker Plots Scatter Plots
The Shape of a Distribution When a data set is graphed, it produces a shape that can give you useful information such as the data set’s variability, its mean, median, mode or its range, etc. Distribution shapes are influenced by their: Modality (Unimodal, Bimodal, Multimodal) Symmetry/skewness Kurtosis 36
Distribution Shapes Symmetric Uniform Skewed Left Skewed Right Bimodal
Bell-shaped or Symmetric Distribution In a bell-shaped (symmetric) distribution, much of the data tends to cluster in the middle This distribution is also called a Normal Distribution As we will see in the next chapter, the peak of the normal curve is also the mean, media and mode of the distribution
Kurtosis Kurtosis represents the degree of peakedness of a distribution (i.e. how pointy or flat it is) Normal Distribution