ANALYSIS OF CONSTANTS ( MEAN,VARIENCE,SKEWNESS,KURTOSIS ) MEMBERS Pranav Sukale-123A9054 Tanay More-123A9056 Nikhil Wankhede-123A9062 Yuvraj Karunakaran-123A9063
I NTRODUCTION Purpose of Statistical Measures: Statistical measures help summarize, describe, and analyze datasets, enabling better decision-making and understanding of underlying patterns. Key Measures Covered: Mean Variance Skewness Kurtosis
Mean – Measuring Central Tendency Definition: The mean is the sum of all the values in a dataset divided by the total number of values. It is used to represent the "average" value of a dataset. Formula: The formula for calculating the mean (denoted as μ for population mean or x̄ for sample mean) is : Mean( x ̄ )= ∑xi/n Where: xi= Each individual data point in the dataset n = Number of data points in the dataset ∑ = Summation (adding up all data points) Types of Mean Arithmetic Mean: Common average, sensitive to outliers. Weighted Mean: Each value has a different weight. Geometric Mean: Used for multiplicative data (e.g., growth rates). Harmonic Mean: Best for rates and ratios (e.g., speed).
Mean in Data Analysis and Probability In probability and statistics, the mean is a fundamental concept: Expected Value (E[X]): In probability theory, the mean of a random variable is referred to as the expected value, which represents the long-run average outcome of a random process. For a discrete random variable 𝑋 with probabilities 𝑃(𝑋=𝑥𝑖): E[X]= n ∑ i=1 xi⋅P(X=xi) Example: Example of measuring steel beam thickness: Mean Thickness=5.2 + 5.0 + 4.9 + 5.1 + 5.3 + 5.2/6=5.12mm
Variance – Measuring Spread of Data Definition: Variance is a statistical measure that indicates how much the values in a dataset differ from the mean. It quantifies the spread or dispersion of the data points. A higher variance means the data points are more spread out from the mean, while a lower variance indicates they are closer to the mean. Variance is the average of the squared differences from the mean. Formula: The formula for variance (denoted as σ² for population variance ) is: Variance( σ2)=∑( xi− μ) ^ 2 /n Where: xi = Each data point in the dataset μ = Mean of the dataset n = Number of data points (xi− μ) ^2 = Squared deviation of each data point from the mean Explanation: Variance is calculated by: Finding the mean of the dataset. Subtracting the mean from each data point to get the deviation from the mean. Squaring these deviations to eliminate negative values and exaggerate larger deviations. Averaging these squared deviations to obtain the variance.
EXAMPLE Dataset: {2, 4, 6, 8, 10} Step-by-Step Calculation: Find the mean: μ=2+4+6+8+10/5=6 Calculate the squared deviations from the mean: ( xi− μ) ^ 2=(2−6)2,(4−6)2,(6−6)2,(8−6)2,(10−6)2 (xi− μ) ^ 2=16,4,0,4,16 Sum the squared deviations: ∑(xi− μ)2=16+4+0+4+16=40 Divide by the number of data points (n = 5): σ ^ 2=540=8 So, the variance of this dataset is 8 .
Skewness – Measuring Asymmetry of Data Definition: Skewness measures the asymmetry or the degree of distortion in a dataset's distribution. It indicates whether data points are skewed or distributed more on one side of the mean. A perfectly symmetrical distribution has a skewness of 0. Formula: The formula for skewness is Skewness = ( ∑( xi− μ) ^ 3/ n)/ σ ^ 3 Where: xi = Each data point μ = The mean of the dataset σ= Standard deviation n = Number of data points The numerator ∑(xi− μ) ^ 3 captures the third power of deviations, amplifying larger deviations and indicating the direction (positive or negative) of skew.
EXPLANATION Skewness tells us how lopsided the distribution is: Positive Skew (Right-Skewed): The tail on the right side is longer or fatter. The bulk of the data points are concentrated on the left. Negative Skew (Left-Skewed): The tail on the left side is longer or fatter. The bulk of the data points are concentrated on the right. Zero Skewness (Symmetrical Distribution): The distribution is symmetrical around the mean, like in a normal distribution.
Example Dataset 1 (Positive Skew): {2, 2, 3, 4, 20} Step 1: Calculate the mean: μ=2+2+3+4+20 / 5=6.2 Step 2: Compute the deviations from the mean and cube them: (2−6.2)^3,(2−6.2)^3,(3−6.2)^3,(4−6.2)^3,(20−6.2)^3. The presence of the extreme value (20) leads to a positive skew , with a skewness > 0. Dataset 2 (Negative Skew): {15, 18, 18, 19, 30} Step 1: Calculate the mean: μ=15+18+18+19+30 / 5=20 Step 2: Compute the deviations from the mean and cube them: (15−20)^3,(18−20)^3,(18−20)^3,(19−20)^3,(30−20)^3 The values lower than 20 lead to a negative skew , with a skewness < 0.
Kurtosis – Measuring Tail Heaviness Definition: Kurtosis measures the tailedness or the weight of the tails in a data distribution. It provides insight into the extremes of a distribution, indicating the presence of outliers. Unlike skewness, which measures asymmetry, kurtosis focuses on the shape of the distribution. Formula: The formula for kurtosis is Kurtosis= [(∑(xi− μ) ^ 4/ n)/ σ4 ] −3 Where :xi = Each data point μ = The mean of the dataset σ = Standard deviation n= Number of data points The "-3" is subtracted to normalize the kurtosis of a normal distribution to 0. This results in Leptokurtic: Kurtosis > 0 (heavy tails) Platykurtic: Kurtosis < 0 (light tails) Mesokurtic: Kurtosis = 0 (normal distribution)
E XPLANATION Kurtosis Categories Leptokurtic (Kurtosis > 0): Distribution has heavy tails and a sharper peak. Indicates more outliers than a normal distribution. Example: Financial returns, where extreme changes can occur (either gains or losses). Platykurtic (Kurtosis < 0): Distribution has lighter tails and a flatter peak. Indicates fewer outliers compared to a normal distribution. Example: Data with consistent performance, such as test scores in a well-designed exam. Mesokurtic (Kurtosis = 0): Distribution resembles the normal distribution. Indicates a moderate presence of outliers. Example: Standard normal distribution (bell curve).
Example Dataset 1 (Leptokurtic): {1, 1, 1, 1, 10} Step 1: Calculate the mean: μ=51+1+1+1+10=2.8 Step 2: Compute the fourth power of deviations from the mean: (1−2.8)^4,(1−2.8)^4,(1−2.8)^4,(1−2.8)^4,(10−2.8)^4 The presence of the extreme value (10) leads to a positive kurtosis (indicating heavy tails). Dataset 2 (Platykurtic): {5, 6, 7, 8, 9} Step 1: Calculate the mean: μ=5+6+7+8+9 / 5=7 Step 2: Compute the fourth power of deviations from the mean: (5−7)^4,(6−7)^4,(7−7)^4,(8−7)^4,(9−7)^4 The absence of extreme values leads to a negative kurtosis (indicating light tails).