Summarizing Data and Measures of Central Tendency Chapter 13
WHAT ARE STATISTISC? Descriptive statistics is the focus, and they are simply numbers, for example, percentages, numerals, fractions, and decimals. These numbers are used to describe or summarize a larger body of numbers. GPA would be an example
WHY USE STATISTICS Expose to statistics will not go away, and the ability to understand its concepts can help in a number of areas (professional & personal). With increasing calls for accountability, it will become all the more important that classroom teachers understand the statistics reported to them and the statistics reported to others.
TABULATING FRQUENCY DATA The first method is to simply list the score in ascending or descending numerical order. The List – list scores in descending order, and this makes it easier to identify trends, patterns, and individual scores (if the number of scores is small). p.267 The Simple Frequency Distribution – will summarize data effectively only if the spread of scores is small. They tend to be so lengthy that it is difficult to make sense of the data.
TABULATING FRQUENCY DATA The Grouped Frequency Distribution – similar to the simple frequency distribution, except that ranges or intervals of scores are used for categories rather than considering each possible score as a category. (p.268 &269).
GRAPHING DATA A graph will almost always clarify or simplify the information presented by groups of numbers. Bar Graphs, or Histogram – type of graph used most frequently to convey statistical data. They are best used for graphically representing discrete or noncontinuous data. See page 274/275 for example. The Frequency Polygon – best used to graphically represent what is called continuous data, such as test scores. See page 274 & 275.
GRAPHING DATA Symmetrical Distributions – each half or side of the distribution is a mirror image of the other side. Asymmetrical Distribution – on the other hand, has nonmatching sides or halves. P. 279 Positively Skewed Distribution – results from an asymmetrical distribution of scores. The majority of the scores fall below the middle of the score distribution (p. 280). Negatively Skewed Distribution – also a result from an asymmetrical score distribution. The majority of the scores fall above the middle of the score distribution. Many high scores, but few low scores.
Tabulating Frequency Data Start with Data: 87 72 91 69 89 95 65 98 81 85 80 88 81 85 90 81 83 84 76 81 82 70 84 77 76 70 76 Just by looking at these scores, what, if anything can you tell about how the class did? On average, how did the students do? Did most of the students perform well on this test?
In Excel Enter all your data into one column on excel Click on the data tab up at the top The first button under the data tab is sort Click on sort and choose descending order
Frequency A simple list summarizes data conveniently if N, the number of scores is small If N is large, lists become difficult to interpret Trends are not always very clear, numbers tend to repeat themselves, and there are usually a lot of missing scores A simple frequency distribution considers all scores, including those that are missing.
Grouped Frequency Distribution Ranges or intervals of scores are used for categories rather than considering each possible score as a category. Constructing a grouped frequency distribution: Step 1: Determine the range of scores (symbolized by R). The range (or spread) of scores is determined by subtracting the lowest score (L) from the highest score (H). Formula: R = H - L Application: R = 98 – 65 The range of scores is 33.
Continued Step 2: Determine the appropriate number of intervals. The number of intervals or categories used in a grouped frequency distribution is somewhat flexible or arbitrary. As already stated, this decision is somewhat arbitrary. In making such decisions, though, be sure to use as many categories or intervals as are necessary to demonstrate variations in the frequencies of scores.
Continued Step 3: Divide the range by the number of intervals you decide to use and round to the nearest odd number. This will give you i , the interval width: Formula: i = _____ R _____ number of intervals Application: i = _____ 33_____ 10 = 3.3, round to the nearest odd number, 3 You can see there is an inverse relationship between the number of intervals and the width of each interval. That is, as fewer intervals are used, the width of each interval increases; as more intervals are used, the interval width decreases. Keep in mind that as i , the interval width, increases, we lose more and more information about individual scores.
Continued Step 4: Construct the interval column making sure that the lowest score in each interval, called the lower limit (LL), is a multiple of the interval width (i). The upper limit of each interval (UL) is one point less than the lower limit of the next interval. Within an interval width of 7, the LL of each interval could be 7, 14, 21, etc. (7x1, 7x2, 7x3, etc.). However, we eliminate those intervals below and above the intervals that include or “capture” the lowest and highest scores.
Example: Intervals 91-97 84-90 77-83 70-76 63-69 56-62 49-55 42-48 35-41 28-34 21-27 Tally ll ll l l lll l llll llll ll ll l l f 2 2 1 1 3 1 4 7 2 1 1
To make a Frequency Polygon- this is optional to read- you are not expected to create one on your own MP one column A1 to A10 f column B1 to B10 Click on insert line chart once get the line chart rt. click on the bottom line, choose "select data“ Now you will see a new window open called select data source click under horizontal, edit button highlight A1 to A10-ok bottom will change the x-axes to MP Rt. click the other line to delete.
Symmetrical, Asymmetrical, Negative Skewed, and Positive skewed Distributions Symmetrical Asymmetrical
Mean Average=mean Formula Average = sum of all the scores total number of scores
Median The median is the score that splits a distribution in half: 50% of the scores lie above the median, and 50% of the scores lie below the median. Known as the 50 th percentile Example: Determine the median for the following set of scores: 90, 105, 95, 100, and 110. Steps: 1. arrange the scores in ascending or descending numerical order (don ’ t just take the middle score from the original distribution.) 2. circle the score that has equal numbers of scores above and below it; this score is the median. Application: 110, 105, 100, 95, 90
Example 2: Even number of data Determine the median for the following set of scores: 90, 105, 95, 100, 110, 95. Steps: 1. arrange the scores in numerical order 2. circle the two middle scores that have equal numbers of scores above and below them. 3. compute the average of those two scores to determine the median. Application: 110, 105, 100, 95, 95, 90 Two middle scores: 95+100 = 195 = 97.5 = MDN 2 2 In this example the two middle scores are different scores, and the median is actually a decimal rather than a whole number (integer.) This can be confusing unless you remember that the median is a value, not necessarily a score.
Median Since the median is not affected by extreme scores, it represents central tendency better than the mean when distributions are skewed. In skewed distributions, the mean is pulled toward the extremes, so that in some cases it may give a falsely high or falsely low estimate of central tendency.
Positively Skewed Distribution In the positively skewed distribution the few scores of 100 or above pull M toward them. The mean presents the impression that the typical student scored about 80 and passed the test. However, the MDN shows that 50% of the students scored 60 or below. In other words, not only did the typical student fail the test (if we consider the middle student typical), but the majority of students failed the test (assuming a score of 60 is failing, of course.)
Negatively Skewed Distribution In the negatively skewed distribution the few scores of 40 or below pull the mean down toward them. Thus the mean score gives the impression that the typical student scored about 60 and failed the test. Again, the median contradicts this interpretation. It shows that 50% of the students scored 80 or above on the test and that actually the majority of students passed the test.
Percentiles- this is an important slide A percentile is a score below which a certain percentage of the scores lie. Percentiles divide a frequency distribution into 100 equal parts. Percentiles are symbolized P1, P2,…P99. P1 represents that score in a frequency distribution below which 1% of the scores lie. P2 represents that score in a frequency distribution below which 2% of the scores lie. P99 represents that score in a frequency distribution below which 99% of the scores lie.
MEASURES OF CENTRAL TENDENCY MEAN MEDIAN MODE
Mode Median The mode is the least reported measure of central tendency. The mode, or model score, in a distribution is the score that occurs most frequently. The mode is the least stable measure of central tendency. A few scores can influence the mode considerably. Gives useful information in addition to the mean. Discounts (relatively speaking) any outliers like one student who was absent and did really poorly would not affect the median like it does the mean