Unit 3: Measures of Central Tendency and Dispersion
Size: 2.08 MB
Language: en
Added: Oct 21, 2025
Slides: 11 pages
Slide Content
Unit 3: Measures of Central Tendency and
Dispersion
Measures of Central
Tendency
Mean, Median, Mode
Measures of Dispersion
Range, IQR, Standard
Deviation
Applications
System Performance Analysis
Unit 3 Data Analysis and Visualization
Introduction to Descriptive Measures
What are Descriptive Measures?
Descriptive measures condense large datasets into key values
that characterize their fundamental properties.
Provide a comprehensive snapshot of data characteristics
Condense large amounts of information into meaningful
values
Enable quick and intuitive understanding of data
Central Tendency
Measures the "center" or
typical value
Dispersion
Quantifies the "spr ead" or
variability
How They Work Together
Together, these measures provide a complete picture of data
Supports data-driven decision making by highlighting key
patterns
Unit 3: Measures of Central Tendency and Dispersion
Arithmetic Mean
Definition & F ormula
The sum of all values divided by the number of observations. The
most common type of average.
Mean = x̄ =
∑x
n
Strengths
Simple to calculate
Uses all data points
Good for symmetric data
Mathematically convenient
Limitations
Sensitive to outliers
Can be skewed by extreme
values
Less effective for small
samples
Not ideal for all data types
Visualizing the Mean
Common Applications
Time series analysis System performance
Demographic studies Financial analysis
Unit 3: Measures of Central Tendency and Dispersion
Geometric Mean and T rimmed Mean
Geometric Mean
Definition: The nth root of the product of n values.
GM = ⁿ√(x × x × ... × x)
or
GM = e
When to Use:
Data that grows or decays multiplicatively
When all values must be positive
Use Cases:
Average growth rates
Network traffic gr owth
Trimmed Mean
Definition: Mean calculated after removing a specified
percentage of the smallest and largest values.
Sort data and remove x% from both ends
Calculate mean of remaining values
When to Use:
Datasets with outliers that skew the arithmetic mean
When a few extreme values exist
Use Cases:
System latency analysis
Performance metrics with irregular readings
1 2 n
(∑ ln(x)/ni
Unit 3: Measures of Central Tendency and Dispersion
Median and Mode
Median
Definition: The middle value of a dataset when arranged in order.
Calculation Steps:
1. Arrange data in ascending order
2. For odd n: Middle value
3. For even n: Average of two middle values
Characteristics:
Robust to outliers
Represents the 50th percentile
Use Cases:
Skewed distributions
Response times with outliers
Mode
Definition: The value that appears most frequently in a dataset.
Calculation:
Identify value(s) with highest frequency. Can be:
Unimodal (one mode)
Multimodal (multiple modes)
Characteristics:
Not affected by e xtreme values
Useful for categorical data
Use Cases:
Identifying most common categories
Popular operating systems
Visual Example
Unit 3: Measures of Central Tendency and Dispersion
Range, Quartile Deviation, and Inter-quartile Range
These measures quantify the spread or variability in a dataset,
providing insights into data dispersion beyond the range.
Range
The differ ence between the maximum and minimum values.
Range = Max - Min
Sensitive to outliers
Quartile Deviation
Half of the Inter-Quartile Range.
Q.D. = IQR / 2
Average distance from the median
Inter-Quartile Range (IQR)
The differ ence between Q3 and Q1.
IQR = Q3 - Q1
Measures middle 50% spread
Quartiles and IQR Visualization
Example Calculation
Dataset: 6, 7, 15, 36, 39, 41, 41, 43, 43, 47, 49
Q1 = 15
25th percentile
Q2 = 41
Median (50th)
Q3 = 43
75th percentile
IQR = 28
Q3 - Q1 = 43 - 15
Q.D. = 14
IQR / 2 = 28 / 2
Unit 3: Measures of Central Tendency and Dispersion
Mean Absolute Deviation
What is MAD?
Mean Absolute Deviation (MAD) measures the average distance
between data points and the mean, providing a robust measure
of dispersion.
Expressed in the same units as the data
Less sensitive to outliers than standard deviation
Formula & Calculation
MAD = Σ|x - x̄|
n
Steps:
1. Calculate the mean (x̄ )
2. Find absolute deviations from the mean
3. Sum the absolute deviations
4. Divide by the number of data points (n)
Visual Representation
Advantages Over Other Measures
Outlier Resistance
Less affected by e xtreme
values
Interpretability
Easier to understand than
variance
Additive Property
Preserves the sign of deviations
Simplicity
Easier to calculate by hand
i
Unit 3: Measures of Central Tendency and Dispersion
Variance and Standard Deviation
What They Measure
Variance and standard deviation quantify the spread or variability
of data points around the mean.
Larger values indicate greater spread
Smaller values indicate data points are closer to the mean
Key Differences
Variance
Average of squared deviations
Not easily interpretable
Standard Deviation
Square root of variance
Expressed in original units
Example: Exam Scores
Data: 92, 95, 85, 80, 75, 50
1. Calculate Mean
79.5
2. Deviations
12.5, 15.5, 5.5, 0.5, -4.5,
-29.5
3. Squared
Deviations
156.25, 240.25, 30.25,
0.25, 20.25, 870.25
4. Sum of Squared Deviations
1317.50
5. Divide by (n-1)
263.50
6. Variance (s²)
263.50
7. Standard Deviation (s)
16.20
Visualization
Unit 3: Measures of Central Tendency and Dispersion
Coefficient of V ariation
What is Coefficient of V ariation?
A relative measure of dispersion that compares the standard
deviation to the mean, expressed as a percentage.
Enables comparison between datasets with different units or
scales
Provides a standardized measure of relative variability
Formula and Calculation
Coefficient of Variation (CV) =
Standard Deviation ÷ Mean × 100%
Calculation Steps:
1. Calculate the mean of the dataset
2. Calculate the standard deviation
3. Divide the standard deviation by the mean
4. Multiply by 100 to express as a percentage
How CV Works
Practical Applications
System Resources
Compare CPU usage
(percentage) with disk I/O
(operations/second)
Performance Metrics
Assess variability of response
times across differ ent system
loads
Network Analysis
Compare stability of networks
with differ ent traffic volumes
Capacity Planning
Determine safe capacity limits
based on mean + CV ×
standard deviations
Unit 3: Measures of Central Tendency and Dispersion
Applications in System Performance Analysis
Latency and Response Time
Median (50th percentile) : Less affected by e xtreme outliers,
preferred for response time analysis
Trimmed Mean : Excludes specified per centage of fastest/slowest
responses, revealing performance regressions
Percentiles (90th, 95th, 99th): Reveal worst-case scenarios
and user experience
Resource Utilization
Standard Deviation: Monitors stability of CPU, memory, and
network bandwidth usage
Coefficient of V ariation (CV): Compares relative variability
across differ ent resources
Anomaly Detection
Identifies unusual system behavior using standar d deviations or
IQR method
Triggers alerts for events that fall outside expected ranges
Capacity Planning
Uses mean plus multiple of standard deviations to set safe
capacity limits
Ensures systems can handle peak loads with high confidence
System Performance Analysis
Unit 3: Measures of Central Tendency and Dispersion
Summary: Choosing the Right Measure
Key Decision Factors
Data Characteristics
Consider data distribution, presence of outliers, and sample
size
Analysis Goals
Determine what you want to learn from the data
System Context
Consider the specific system perfor mance being analyzed
Common Pitfalls
Using mean with skewed data or outliers
Ignoring dispersion when focusing on central tendency
Comparing measures without considering their units
Decision Guide
The most effective measur e depends on your specific
context. Always consider the nature of your data and your
analysis goals.
Unit 3: Measures of Central Tendency and Dispersion