THE NORMAL CURVE Important Properties & Applications Presented by Anisha.G.V M.Ed. No. 2
What is Normal? The literal meaning of the term normal is average. In fields of measurement concerning education, psychology, and sociology, those who are judged as average in terms of some qualifiers or characteristics as termed as normal, and the others (below or above average) are termed as away from the normal.
Elementary Principles of Probability The simplest approach to understand normal probability curve is through consideration of the elementary principles of probability. In the normal frequency distribution curve, based upon the law of probable occurrence of certain events, the probability of a given event is defined as the expected frequency of occurrence of this event among the alike events. Mathematically it may be stated as a ratio. Probability of unbiased coin falling head is 1/2 and the probability of a dice showing four spot is 1/6.
Elementary Principles of Probability – contd. Probability ratio is defined by a fraction, the numerator of which equals the desired outcome or outcomes, and the denominator of which equals a total possible outcomes. A probability ratio always falls between the limits .00 (impossibility of occurrence) and 1.00 (certainty of occurrence). All possible degrees of likelihood may be expressed by appropriate ratios between these limits.
Example – Tossing 1 Coin If a coin is tossed, either a head (H) or a tail (L) will turn up. The probability that a head will appear is 1 chance in 2. Expressed as ratio, probability of H is ½ and T is ½. Therefore, total probability is H + T = ½ + ½ = 1.
Example – Tossing 2 Coins If we toss 2 coins C 1 and C 2 at the same time, there are following 4 possible arrangements which the coins may take. (1) (2) (3) (4) C 1 C 2 C 1 C 2 C 1 C 2 C 1 C 2 H H H T T H T T Probability of both heads = ¼ Probability of both tails = ¼ Probability of at least 1 head or 1 tail = ¼+ ¼ = ½ Total probability = ¼+¼+½ = 1
Some Formulas In the previous example of tossing 2 coins simultaneously, the expected appearance of heads and tails can be expressed by formula as (H+T) 2 = H 2 + 2HT +T 2 derived from ( a+b ) 2 = a 2 +2ab+b 2 . Consider we are going to toss n coins simultaneously, then the formula can be rewritten using binomial distribution as ( a+b ) n = n C a n b + n C 1 a n-1 b 1 + n C 2 a n-2 b 2 + …. + n C n-1 a 1 b n-1 + n C n a b n . Here we can substitute “H” in place of “a” and “T” in place of “b” to get expected appearances of heads and tails.
Some Formulas What is nCr ? nCr is the 'binomial function' whose value is with the ! meaning factorial (n! means n*(n-1)*(n-2)*...2*1 so 4! = 4*3*2*1 = 24 and 0! = 1 by definition.
Example – Tossing 10 Coins Let us find the probabilities of various combinations of heads and tails while tossing 10 coins once (or 1 coin 10 times). When the binomial (H+T) 10 is expanded using the previous formula the terms are (H+T) 10 = 10 C H 10 T + 10 C 1 H 9 T 1 + 10 C 2 H 8 T 2 + 10 C 3 H 7 T 3 + 10 C 4 H 6 T 4 + 10 C 5 H 5 T 5 + 10 C 6 H 4 T 6 + 10 C 7 H 3 T 7 + 10 C 8 H 2 T 8 + 10 C 9 H 1 T 9 + 10 C 10 H T 10 The final term will be (H+T) 10 = 1H 10 + 10H 9 T 1 + 45H 8 T 2 + 120H 7 T 3 + 210H 6 T 4 + 252H 5 T 5 + 210H 4 T 6 + 120H 3 T 7 + 45H 2 T 8 + 10H 1 T 9 + 1T 10 Here the total possible outcome is calculated by adding the numbers in left-hand side of each terms (frequencies). i.e., 1+10+45+120+210+252+210+120+45+10+1 = 1024.
Example – Tossing 10 Coins – contd. Probability ratio of getting 10 H = 1/1024 Probability ratio of getting 9 H and 1 T = 10/1024 Probability ratio of getting 8 H and 2 T = 45/1024 Probability ratio of getting 7 H and 3 T =120/1024 Probability ratio of getting 6 H and 4 T =210/1024 Probability ratio of getting 5 H and 5 T =252/1024 Probability ratio of getting 4 H and 6 T =210/1024 Probability ratio of getting 3 H and 7 T =120/1024 Probability ratio of getting 2 H and 8 T = 45/1024 Probability ratio of getting 1 H and 9 T = 10/1024 Probability ratio of getting 10 T = 1/1024 Sum of all probability ratio = 1024/1024 = 1
Graphic Representation The data obtained in the above example (H+T) 10 can be represented graphically by plotting the 11 terms of expansion in horizontal axis and their frequencies on vertical axis.
Graphic Representation – contd. If we plot a frequency curve for the above histogram it will look like a many-sided polygon.
Graphic Representation – contd. If the number of factors (coins in the above case) determining this polygon are increased, the lines which constitute the polygon would increase regularly in number and become progressively shorter. Finally, when the number of factors become very large, the polygon would exhibit a perfectly smooth surface like the curve below.
Normal Probability Curve The bell shaped curve shown in the previous slide (or in the background) is called normal probability curve or simply normal curve. It is not an actual distribution of scores or any test of ability or achievement, but is instead a mathematical model. The distributions of test scores approach the theoretical normal distribution as a limit, which is rarely, ideal and perfect.
Normal Probability Curve – contd. Normal distribution was first discovered by Abraham de Moivre (1667-1754), a French mathematician, who obtained it while working on certain problems in the games of chances. Later, two mathematical astronomers Pierre Laplace and Karl Gauss developed this distribution independently. This curve is also called Gaussian curve on the honour of Gauss. It is also known as “Normal curve of errors.” The normal curve takes into account the law which states “the greater a deviation from the mean or average value in a series, the less frequently it occurs.”
Characteristics and Properties of Normal Curve Bell shaped curve – shape of a normal curve is like that of a bell. Same numerical value of mean, median, and mode. Perfectly symmetrical – the curve inclines towards both sides equally from the center of the curve. As the curve does not touch the baseline, the mean is used as a starting point for working with normal curve. Curve is asymptotic – it approaches but never touches the baseline at the extremes. Therefore, theoretically it extends from minus infinity (-∞) to plus infinity (+∞). For going along some distance on the baseline of the normal curve on both sides of mean, standard deviation () is used as a unit of measurement. For all practical purpose, the curve is said to extend from -3 distance on the left to +3 distance on the right.
Characteristics and Properties of Normal Curve – contd. The points of influx occur at point ±1 standard deviation (±1) – the normal curve changes its direction from convex to concave at a point recognized as point of influx. If we draw perpendiculars from these 2 points of influx of the curve on horizontal axis, they will touch the axis at a distance one standard deviation unit on both sides of mean (±1). The total percentage of area of the normal curve within 2 points of influx is fixed. Limits – 68.26% of the total area of the curve falls between the limits M±1. 95.44% of the total area of the curve falls between the limits M±2. 99.73% of the total area of the curve falls between M±3.
Characteristics and Properties of Normal Curve – contd.
Characteristics and Properties of Normal Curve – contd. Total area under normal curve may be also considered to approach 100% probability. Normal curve is bilateral – 50% of area of the curve lies to the left of the maximum ordinate and 50% lies to the right side. Normal curve is a mathematical model in behavioural sciences – the curve is used as a measurement scale. The measurement unit of this scale is ±. Various measures with respect to a normal curve: Quartile deviation Q = Probability Error = 0.6745 Mean deviation AD = 0.7979 Skewness = 0. Kurtosis = 0.263
Characteristics and Properties of Normal Curve – contd. In this curve, the limits of the distance ±1.96 include 95% and the limits ±2.58 include 99% of the total area of the curve. Maximum ordinate of the curve – the curve has its maximum height or ordinate at the mean of the distribution . The normal curve serves as a model for describing the peakedness and flatness of a curve through the measure of kurtosis. For a normal curve the value of kurtosis is 0.263. If for a distribution the value of kurtosis is more than 0.263, the distribution is said to be more flat at the top than the normal curve. In case the value of kurtosis is less than 0.263, the distribution is said to be more peaked than the normal curve.
Equation of Normal Probability Curve y = In which x = Scores, expressed as deviations from the mean, laid off along the horizontal axis. y = Height of the curve above the horizontal axis, i.e., frequency of a given x value. N = Number of cases in the sample of group. = Standard deviation of the distribution. Π = 3.1416 (constant value). e = 2.7183 (constant value).
Equation of Normal Probability Curve – contd. When N and are known, it is possible from equation of the normal curve to compute The frequency (or y) of a given value x. The number or percentage between 2 points, or above or below a given point in the distribution. However these calculations are rarely necessary as tables are available from which this information may be readily obtained.
Table of Areas Under the Normal Curve
Table of Areas Under the Normal Curve – contd. The table gives the fractional parts of the total area under the normal curve found between the mean and ordinates ( y’s ) erected at various distances from the mean. Total area under the curve is taken arbitrarily to be 10,000 because of greater ease with which fractional parts of the total area may then be calculated. The first column of the table, x/ gives distances in tenths of measured off on the base line of normal curve from the mean as origin where x = X-M, i.e., x measures the deviation of a score X from M. If x is divided by , deviation from the mean is expressed in units. Such deviation scores are often called as Sigma scores or Z scores .
Table of Areas Under the Normal Curve – contd. Distance from the mean in hundredths of are given by the headings of the columns. To find the number of cases in the normal distribution between the mean and the ordinate erected at a distance of 1 from the mean, go down the x/ column until 1.0 is reached, and in the next column under .00 take the entry opposite to 1.0, viz , 3413. This means 3413 cases in 10,000 cases or 34.13% of the entire area of the curve lies between the mean and 1. The above calculation holds good for -1 also since in a normal curve both sides of the mean are symmetrical.
Table of Areas Under the Normal Curve – contd. While the normal curve does not actually meet the base line until we are at infinite distance to the right or left of the mean, for practical purposes the curve may be taken to end at points -3 & +3 distant from the mean. The figure shows that 99.73% of the entire distribution will fall within -3 & +3. By cutting off the curve at these 2 points, we disregard 0.27% of the distribution, a negligible amount in very large sample.
Applications of Normal Curve Normal curve has wide significance and applications in the field of measurement. Some of the main applications are given below. Use as a model – Normal curve represents a model distribution. Hence it may be used as a model To compare various distributions with it, to say, whether the distribution is normal or not, if not in what way it diverges from normal. To compare 2 or more distributions in terms of overlapping. To convert raw scores into comparable standard normalized scores. Z score = where X = raw score or actual score, M = mean, and = standard deviation. From this, the following interpretations can be made. If raw score = mean, Z score is 0. If raw score > mean, Z score is positive. If raw score < mean, Z score is negative.
Applications of Normal Curve –contd. Eg .: In a test Rama scored 60 in mathematics and 40 in science. The mean and SD in mathematics are 30 and 15 respectively. The mean and SD in science are 20 and 5 respectively. In which subject did Rama perform better? The Z score for mathematics is The Z score for science is The Z score of science is higher than the Z score of mathematics. So Rama has done better in science than mathematics. Z values may carry decimals, therefore, for greater convenience they are converted into T values by multiplying some constant, i.e., T score = 10z+50.
Applications of Normal Curve – contd. To compute percentiles and percentile ranks. Eg : The raw score of a student of class X on an achievement test is 60. The mean of the whole class is 50 with standard deviation 5. Find the percentile rank of the student. First we convert raw score 60 to Z score by using the formula Z = +2.00 According to the table of area under NPC, the area of the curve between M and +2 is 47.72%. The total percentage of cases below the score 60 is 50+47.22 = 97.22%. (say 97%). The percentile rank of a student who scored 60% marks in an achievement test in the class is 97.
Applications of Normal Curve – contd. Eg : In a class Rohit’s percentile rank in mathematics is 75. The mean of the class in mathematics is 60 and standard deviation is 10. Find Rohit’s marks in mathematics achievement test. According to definition of percentile rank, the position of Rohit on the NPC scale is 25% score above the mean. According to NPC table, the score of 25% cases from the mean is +.67. By applying the formula i.e. or X-60=10*0.67 or X = 60+6.7 = 66.7 (say 67) Rohit’s mark in mathematics is 67.
Applications of Normal Curve – contd. To understand and apply the concept of standard errors of measurement. For ability grouping. Eg : Given a group of 500 students who have been administered a general ability test. The teacher wishes to classify the group in 5 categories and assign them grades A, B, C, D, E according to ability. Assuming that general mental ability is normally distributed in the population, calculate the number of students that can be placed in groups A, B, C, D, and E. We know that the total area under the normal curve extends from -3 to +3 i.e., over the range of 6. Dividing this range by 5, we get the distance of each category =6/5 = 1.2. Thus each category is spread over a distance 1.2.
Applications of Normal Curve – contd. The category C will lie in the middle. Half of its area will be below the mean and the other half above the mean. The distance of each category is shown in the figure. According to NPC table, total percentage of cases from mean to 0.6 is 22.57 Total cases in between -0.6 to +0.6 is 22.57+22.57 = 45.14%. Hence in category C, total percentage = 45.14 Similarly, according to NPC table, total percentage of cases from mean to 1.8 = 42.41.
Applications of Normal Curve – contd. The total percentage of cases in category B is 46.41-22.57 = 23.84%. In category A, total percentage of cases is 50-46.41 = 3.59% Similarly in categories D and E, total percentage of cases will be 23.84% and 3.59% respectively. Thus 3.59% of 500 falls in category A. 23.84% of 500 falls in category B. 45.14% of 500 falls in category C. 23.84% of 500 falls in category D. 3.59% of 500 falls in category E.
Applications of Normal Curve – contd. To transform and combine qualitative data. To determine relative difficulty of test items. To determine the percentage of cases in a normal distribution within given limits. To determine the limits in any normal distribution which include a given percentage of the cases.
Use of Probability Curve in Mental Measurements Measurements of many natural phenomena and of many mental and social traits under certain conditions tend to be described symmetrically about their means in proportions which approximate those of the normal probability distribution. Phenomena which follow the normal probability curve (at least approximately) may be classified as follows: Biological Statistics – proportion of male to female births for the same country or community over a period of years, proportion of different types of plants and animals in cross fertilization ( mendelian ratio). Anthropometrical data – height, weight, etc., for large groups of the same age and sex. Social and economic data – birth rate, death rate, wage and output of large number of workers, etc. Psychological measurements – intelligence as measured by standard tests, speed of association, perception spun, reaction time, etc. Errors of observation – measures of height, speed of movement, linear magnitudes, physical and mental traits, certain errors which likely cause to deviate above or below their true values.
Importance of Normal Probability Curve The normal distribution is by far the most used distribution in statistics. Some important reasons for the same are The normal distribution appears to be a reasonable model of the behaviour of many of the random phenomena. It may be convenient on mathematical grounds alone to assume a normally distributed population. The normal distribution can be used as a good approximation to a number of theoretical distributions like the binomial, Poisson, etc. There is a very intimate connection between size of sample and the extent to which a sampling distribution approaches the normal form.
Importance of Normal Probability Curve – contd. Even if the variable is not normally distributed, it can sometimes be brought to normal form by simple transformation of variable. The entire theory of small sample tests, viz , t, F, test etc. is based on the fundamental assumption that the parent population from which the samples have been drawn follow normal distribution. Normal distribution is invariably applied in statistical quality control for setting control limits.
Divergence in Normality (Non-Normal Distribution) In the normal curve model, the mean, median, and mode all coincide and there is perfect balance between the right and left values of the curve. However, it is not at all essential for a normal distribution to be described by an exactly perfect bell-shaped curve. Such a perfect symmetrical curve rarely exists in our actual dealings as we usually cannot measure an entire population, instead we work on representative samples of the population. Therefore, in actual practice, a slightly deviated or distorted bell-shaped curve is also accepted as a normal curve on the assumption of normal distribution of the characteristics measured in the entire population.
Divergence in Normality (Non-Normal Distribution) –contd. In cases where the scores of individuals in the group seriously deviate from the average, the curves representing these distribution also deviate from the shape of a normal curve. This deviation or divergence from normality tends to vary in 2 ways. Skewness . Kurtosis.
Skewness A distribution is said to be skewed when the mean and median fall at different points in the distribution and the balance, i.e., the point of center of gravity is shifted to one side or the other (to the left or right). In a normal distribution, the mean equals the median and there is no skewness . There are 2 types of skewness Negative skewness . Positive skewness .
Skewness – contd. Negative Skewness Distribution is said to be skewed negatively or to the left when scores are massed at the high end of the scale, i.e., the right side of curve, and are spread out gradually towards the low end, i.e., the left side of curve. In negative skewed distribution, the value of median will be higher than that of mean.
Skewness – contd. Positive Skewness Distributions are skewed positively or to the right when scores are massed at the low, i.e., the left end of the scale, and are spread out gradually towards the high or right end.
Skewness – contd. Skewness in a given distribution may be computed by the following formula. Skewness ( Sk ) = = In case when the percentiles are known, the value of skewness can be computed by the following formula. Sk =
Kurtosis Kurtosis refers to the divergence in the height of the curve, specially in the peakedness . They are of 3 types Platy kurtic – flatter peak at the center than normal curve. Lepto kurtic – more peaked at the center than normal curve. Meso kurtic – almost resembles a normal curve.
Kurtosis – contd. The value of kurtosis for a given curve may be computed through the following formula. Kurtosis = Ku =
Factors Causing Divergence in Normal Curve/Normal Distribution Some of the reasons for divergence are Selection of the sample – if the sample size is small or sample is biased one. The scores made by small and homogeneous group likely yield narrow and leptokurtic distribution. Scores from small and highly heterogenous group yield platykurtic distribution. Unsuitable or poorly made tests – if the measuring tool of test is inappropriate for the group on which it has been administered – if the test is too easy, scores will pile up at the high end of scale; whereas if the test is too difficult, scores will pile up at the low end of the scale. The trait being measured is non-normal. Error in construction and administration of tests – poorly constructed test can cause asymmetry in distribution of scores. Similarly while administering test, unclear instructions, error in timings, errors in scoring practice, and lack of motivation to complete the test may cause deviation.
Conclusion Normal distribution is a very important concept in behavioural science because many variables used in behavioural research are assumed to be normally distributed. Normal curve is very helpful in educational evaluation and measurement. It provides relative positioning of the individual in a group. It can also be used as a scale of measurement in behavioural science. Normal distribution is a significant tool in hands of teacher through which he can decide the nature of distribution of scores obtained on the basis of measurement variable. He can judge the difficulty level of test items in question paper and finally know about his class, whether it is homogenous or heterogeneous.
References Statistics in Psychology and Education - Henry E. Garrett. Statistics in Psychology and Education - S.K. Mangal . Research in Education – Soti Shivendra Chandra & Rajendra K. Sharma. Educational Measurement Statistics - B.N. Dash & Nibedita Dash. Statistical Techniques of Analysis for Educational Evaluation - IGNOU Study Material. A Text Book of Subsidiary Statistics – Dr. K. X. Joseph.