Introduction and
Descriptive Statistics
Review Statistics and Probability
Modified by:
Dr.AchmadNizar Hidayanto
Nur Fitriah AyuningBudi
KhumaisaNuraini
Learning Outcomes
•Review key statistical and
research terms
1
•Review the concept of central
tendency
2
•Review the concept of
variability
3
Introduction to Statistics
PowerPoint Lecture Slides
Essentials of Statistics for the
Behavioral Sciences
Eighth Edition
by Frederick J Gravetterand Larry B. Wallnau
1.1 Statistics, Science and
Observations
•“Statistics” means “statistical procedures”
•Uses of Statistics
–Organize and summarize information
–Determine exactly what conclusions are
justified based on the results that were
obtained
•Goals of statistical procedures
–Accurate and meaningful interpretation
–Provide standardized evaluation procedures
1.2 Populations and Samples
•Population
–The set of all the individuals of interest in a
particular study
–Vary in size; often quite large
•Sample
–A set of individuals selected from a population
–Usually intended to represent the population
in a research study
Figure 1.1
Relationship between population and sample
Variables and Data
•Variable
–Characteristic or condition that changes or has
different values for different individuals
•Data (plural)
–Measurements or observations of a variable
•Data set
–A collection of measurements or observations
•A datum (singular)
–A single measurement or observation
–Commonly called a score orraw score
Parameters and Statistics
•Parameter
–A value, usually a
numerical value, that
describes a population
–Derived from
measurements of
the individuals in
the population
•Statistic
–A value, usually a
numerical value, that
describes a sample
–Derived from
measurements of
the individuals in
the sample
Descriptive & Inferential Statistics
•Descriptivestatistics
–Summarize data
–Organize data
–Simplify data
•Familiar examples
–Tables
–Graphs
–Averages
•Inferential statistics
–Study samples to make
generalizations about
the population
–Interpret experimental
data
•Common terminology
–“Margin of error”
–“Statistically significant”
Sampling Error
•Sample is never identical to population
•Sampling Error
–The discrepancy, or amount of error, that
exists between a sample statistic and the
corresponding population parameter
•Example: Margin of Error in Polls
–“This poll was taken from a sample of registered
voters and has a margin of error of plus-or-minus 4
percentage points” (Box 1.1)
Figure 1.2
A demonstration of sampling error
Figure 1.3
Role of statistics in experimental research
1.3 Data Structures, Research
Methods, and Statistics
•Individual Variables
–A variable is observed
–“Statistics” describe the observed variable
–Category and/or numerical variables
–Descriptivestatistics
•Relationships between variables
–Two variables observed and measured
–One of two possible data structures used to
determine what type of relationship exists
Relationships Between Variables
•Data Structure I: The Correlational Method
–One group of participants
–Measurement of two variables for each
participant
–Goal is to describe type and magnitude of the
relationship
–Patterns in the data reveal relationships
–Non-experimental method of study
Figure 1.4
Data structures for studies evaluating the
relationship between variables
Correlational Method Limitations
•Can demonstrate the existence of a
relationship
•Does notprovide an explanation for the
relationship
•Most importantly, does notdemonstrate a
cause-and-effect relationshipbetween the
two variables
Relationships Between Variables
•Data Structure II: Comparing two (or
more) groups of Scores
–One variable defines the groups
–Scores are measured on second variable
–Both experimental and non-experimental
studies use this structure
Figure 1.5
Data structure for studies comparing groups
Experimental Method
•Goal of Experimental Method
–To demonstrate a cause-and-effect
relationship
•Manipulation
–The level of one variable is determined by the
experimenter
•Control rules out influence of other
variables
–Participant variables
–Environmental variables
Figure 1.6
The structure of an experiment
Independent/Dependent Variables
•Independent Variableis the variable
manipulated by the researcher
–Independent because no other variable in the
study influences its value
•Dependent Variableis the one observed
to assess the effect of treatment
–Dependent because its value is thought to
depend on the value of the independent
variable
Experimental Method: Control
•Methods of control
–Random assignment of subjects
–Matching of subjects
–Holding level of some potentially influential variables
constant
•Control condition
–Individuals do not receive the experimental treatment.
–They either receive no treatment or they receive a neutral,
placebo treatment
–Purpose: to provide a baseline for comparison with the
experimental condition
•Experimental condition
–Individuals do receive the experimental treatment
Non-experimental Methods
•Non-equivalent Groups
–Researcher compares groups
–Researcher cannot control who goes into which
group
•Pre-test / Post-test
–Individuals measured at two points in time
–Researcher cannot control influence of the
passage of time
•Independent variable is quasi-independent
Figure 1.7
Two examples of non-experimental studies
Insert NEW Figure 1.7
1.4 Variables and Measurement
•Scores are obtained by observing and
measuring variables that scientists use to
help define and explain external behaviors
•The process of measurement consists of
applying carefully defined measurement
procedures for each variable
Constructs & Operational Definitions
•Constructs
–Internal attributes
or characteristics
that cannot be
directly observed
–Useful for
describing and
explaining behavior
•Operational
–Identifies the set of
operations required to
measure an external
(observable) behavior
–Uses the resulting
measurements as both
a definitionand a
measurement of a
hypothetical construct
Discrete and Continuous
Variables
•Discrete variable
–Has separate, indivisible categories
–No values can exist between two neighboring
categories
•Continuous variable
–Have an infinite number of possible values
between any two observed values
–Every interval is divisible into an infinite
number of equal parts
Figure 1.8
Example: Continuous Measurement
Real Limits of Continuous
Variables
•Real Limits are the boundaries of each
interval representing scores measured on
a continuous number line
–The real limit separating two adjacent scores
is exactly halfway between the two scores
–Each score has two real limits
•The upper real limit marks the top of the
interval
•The lower real limit marks the bottom of the
interval
Scales of Measurement
•Measurement assigns individuals or events to
categories
–The categories can simply be names such as
male/female or employed/unemployed
–They can be numerical values such as 68 inches
or 175 pounds
•The complete set of categories makes up a
scale of measurement
•Relationships between the categories determine
different types of scales
Scales of Measurement
Scale Characteristics Examples
Nominal •Label and categorize
•No quantitative distinctions
•Gender
•Diagnosis
•Experimental or Control
Ordinal •Categorizes observations
•Categories organized by
size or magnitude
•Rankin class
•Clothing sizes (S,M,L,XL)
•Olympic medals
Interval •Ordered categories
•Interval between categories
of equal size
•Arbitrary or absent zero
point
•Temperature
•IQ
•Golf scores (above/below
par)
Ratio •Orderedcategories
•Equal interval between
categories
•Absolute zero point
•Numberof correct answers
•Time to complete task
•Gain in height since last
year
Central Tendency
PowerPoint Lecture Slides
Essentials of Statistics for the Behavioral
Sciences
Seventh Edition
by Frederick J Gravetter and Larry B. Wallnau
1.5 Overview of central tendency
•Central tendency
–A single score to define the “center” of a
distribution
•Purpose: find the single score that is most
typical or best represents the entire group
Figure 1.9
What is the “center” of each distribution?
1.6 The Mean
•The mean is the sum of all the scores
divided by the number of scores in the
data.
PopulationMeanSample MeanN
X
n
X
M
The Mean: Three definitions
•Sum of the scores divided by the number
of scores in the data
•Amount each individual receives when
total is divided equally among all: M = ∑X /
n
•The balance point for the distribution
Figure 1.10
Computing the Mean from a
Frequency Distribution Table
QuizScore (X) f fX
10 1 10
9 2 18
8 4 32
7 0 0
6 1 6
Total n = Σf= 8 ΣfX= 66
M=??
The Weighted Mean
•Combine two sets of scores
•Three steps:
–Determine the combined sum of all the scores
–Determine the combined number of scores
–Divide the sum of scores by the total number
of scores21
21
mean (weighted) overall
nn
XX
M
Characteristics of the Mean
•Changingthe value of any score changes the
mean.
•Introducing a newscoreor removing a score
usually changes the mean.
•Addingor subtractinga constant from each
score changes the mean by the same constant.
•Multiplyingor dividingeachscore by a constant
multiplies or divides the mean by
that constant.
Figure 1.11
1.7 The Median
•The median is the midpoint of the scores
in a distribution whenthey are listed in
order from smallest to largest.
•The median divides the scores into two
groups of equal size.
Figure 1.12
Figure 1.13
The Precise Median for a
Continuous Variable
•A continuous variable can be infinitely divided
•The precise median is located in the interval
defined by the real limits of the value.
•It may be necessary to determine the fraction of
the interval needed to divide the distribution
exactly in half.
•interval in thenumber
50%reach toneedednumber
fraction
Figure 1.14
Median, Mean, and Middle
•Meanis the balance point of a distribution
–Defined by distances
–Often is not the midpoint of the scores
•Medianis the midpoint of a distribution
–Defined by number of scores
–Often is not the balance point of the scores
•Bothmeasure central tendency, using two
different concepts of middle or “central.”
Figure 1.15
1.8 The Mode
•The modeis the score or category that has
the greatest frequency of any in the
frequency distribution
–Can be used with any scale of measurement
–Corresponds to an actual score in the data
–The only one used with nominal data
•It is possible to have more than one mode
Figure 1.16
1.9 Selecting a Measure of Central
Tendency
Measure of
Central
Tendency
Appropriate to choose
when …
Should not be used
when…
Mean Nosituation precludes it•Extreme scores
•Skewed distribution
•Undetermined values
•Open-ended distribution
•Ordinal scale
•Nominal scale
Median •Extreme scores
•Skewed distribution
•Undetermined values
•Open-ended distribution
•Ordinal scale
•Nominal scale
Mode •Nominal scales
•Discrete variables
•Describingshape
•Interval or ratio data, except
to accompany mean or
median
Figure 1.17
Figure 1.18
Means or Medians in a Line Graph
Figure 1.19
Means or Medians in a Bar Graph
•Symmetrical distributions
–Mean and median have same value
–If exactly one mode, it has same value as the
mean and the median
–Distribution may have more than one mode,
or no mode at all
1.10 Central Tendency and the
Shape of the Distribution
Figure 1.20
Central Tendency in Skewed
Distributions
•Mean is found far toward the long tail (positive or
negative)
•Median is found toward the long tail, but not as
far as the mean
•Mode is found near the piled-up scores.
•If positivelyskewed, order from left to right is
mode, median, mean;
•If negativelyskewed, order from left to right is
mean, median, mode
Figure 1.21
Variability
PowerPoint Lecture Slides
Essentials of Statistics for the Behavioral
Sciences
Seventh Edition
by Frederick J Gravetter and Larry B. Wallnau
1.11 Overview
•Variabilitycan be defined several ways
–A quantitative measure of the differences
between scores
–Describes the degree to which the scores are
spread out or clustered together
•Purposes of Measure of Variability
–Describe the distribution
–Measure how well an individual score
represents the distribution
Figure 1.22
Population Distributions: Height, Weight
Three Measures of Variability
•The Range
•The Standard Deviation
•The Variance
1.12 The Range
•The distance covered by the scores in a
distribution
–From smallest value to highest value
•For continuous data, real limits are used
•For discrete variables range is number of
categories
range = URL for X
max—LRL for X
min
1.13 Standard Deviation and
Variance for a Population
•Most common and most important measure
of variability
–A measure of the standard, or average, distance from
the mean
–Describes whether the scores are clustered closely
around the mean or are widely scattered
•Calculation differs for population and samples
Developing the Standard Deviation
•Step One: Determine the Deviation Score (distance
from the mean) for eachscore:
•Step Two: Calculate Mean (Average) of Deviations
–Deviations sum to 0 because Mis balance point of the
distribution
–The Mean (Average) Deviation will always equal 0;
another method must be found
Deviation score = X —μ
Developing the Standard Deviation (2)
•Step Three: Get rid of negatives in
Deviations:
–Square each deviation score
–Using the squared values, compute the Mean
Squared Deviation, known as the Variance
–
•Variability is now measured in squared
units and is called the Variance.
Population variance equals the mean squared
deviation --Variance is the average squared
distance from the mean
Developing the Standard Deviation (2)
•Step Four:
–Variance measures the average squared
distance from the mean; not quite on goal
•Correct for having squared all the
deviations by taking the square root of the
varianceVariance Deviation Standard
Figure 1.23
Calculation of the Variance
Formulas for Population
Variance and Standard Deviation
•
•SS(sum of squares) is the sum of the
squared deviations of scores from the
mean
•Two equations for computing SSscores of number
deviations squared of sum
Variance
Two formulas for SS
Definitional Formula
•Find each deviation
score (X–μ)
•Square each deviation
score, (X–μ)
2
•Sum up the squared
deviations
Computational Formula
2
XSS
•Square each score and
sum the squared scores
•Find the sum of scores,
square it, divide by N
•Subtract the second
part from the first
N
X
XSS
2
2
Population Variance: Formula
and Notation
FormulaN
SS
N
SS
deviation standard
variance
Notation
•Lowercase Greek letter
sigma is used to denote
the standard deviation of
a population:
σ
•Because the standard
deviation is the square
root of the variance, we
write the variance of a
population as σ
2
Figure 1.24
Graphic Representation of Mean and Standard Deviation
1.14 Standard Deviation and
Variance for a Sample
•Goal of inferential statistics:
–Draw general conclusions about population
–Based on limited information from a sample
•Samples differ from the population
–Samples have lessvariability
–Computing the Variance and Standard
Deviation in the same way as for a population
would give a biasedestimate of the
population values
Figure 1.25
Population of Adult Heights
Variance and Standard Deviation
for a Sample
•Sum of Squares (SS) is computed as
before
•Formula has n-1rather than Nin the
denominator
•Notation uses sinstead of σ1
1
2
n
SS
n
SS
s sample of deviation standard
s sample of variance
Degrees of Freedom
•Population variance
–Mean is known
–Deviations are computed from a known mean
•Sample variance as estimate of population
–Population mean is unknown
–Using sample mean restricts variability
•Degrees of freedom
–Number of scores in sample that are
independent and free to vary
–Degrees of freedom (df)= n –1
1.15 More about Variance and
Standard Deviation
•Unbiased estimate of a population
parameter
–Average value of statistic is equal to parameter
–Average value uses all possible samples of a
particular size n
•Biasedestimate of a population parameter
–Systematicallyoverestimates or
underestimates (as with variance) the
population parameter
Transformations of Scale
•Adding a constant to each score
–The Mean is changed
–The standard deviation is unchanged
•Multiplying each score by a constant
–The Mean is changed
–Standard Deviation is also changed
–The Standard Deviation is multiplied by
that constant
Variance and Inferential
Statistics
•Goal of inferential statistics: To detect
meaningful and significant patterns in
research results
•Variability in the data influences how easy it
is to see patterns
–High variability obscurespatterns that would
be visible in low variability samples
–Variability is sometimes called error variance
Figure 1.27
Experiments with high and low variability