L.O 2-STATISTICS-compeletlyhhhhhhhhh.pdf

nadaazab009 11 views 100 slides Oct 25, 2025
Slide 1
Slide 1 of 158
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158

About This Presentation

jjjjj


Slide Content

Representinganddescribing
quantitativedata
LearningOutcome:Analyze,displayanddescribe
quantitativedatawithafocusonstandarddeviation.
L.O2

DescriptiveStatistics:Overview
MeasuresofCenter(Central
tendency)
Mode
Median
Mean
MeasuresofSpread
(dispersions)
Range
Inter-quartileRange
Variance
Standarddeviation
MeasuresofSymmetry
Skewnes
s

Findthemode,themedianand
themeanforthefollowing
values
Mean:The"average"number;foundbyaddingalldatapointsanddividingby
thenumberofdatapoints.
Example:Themeanof44,11,and77is(4+1+7)/3=12/3=
4(4+1+7)/3=12/3=4.
Median:Themiddlenumber;foundbyorderingalldatapointsandpicking
outtheoneinthemiddle(oriftherearetwomiddlenumbers,takingthe
meanofthosetwonumbers).
Example:Themedianof44,11,and77is44becausewhenthenumbersare
putinorder(1(1,44,7)7),thenumber44isinthemiddle.
Mode:Themostfrequentnumber—thatis,thenumberthatoccursthe
highestnumberoftimes.

Representingdatabydotplot

Describingthedot

Activity1

212223
24
262728
29
313233
34
1819
48

Describethefollowingdistributions:
Bothdistributionsareroughlysymmetric.
Thecenterforbothdistributionsis90goals.
Bothdistributionshavedifferentamountsof
VARIABILITY.

MeasuringSpread/Variability
•Range:Thedifferencebetweenthelargestandsmallest
observations.
•Quartiles:(usesmedian)Thequartilesmarkoutthe
middlehalfandimproveourdescriptionofspread.

LocateQ1andQ3:
Q1=25
Q3=41
Quartiles

Quartiles
1.ArrangeobservationsinAscendingorderandlocateM.
2.Thefirstquartile(Q1)
•liesone-quarterofthewayuplistofordered
observations
•Moflowerhalf
•largerthan25%oforderedobservations.

3.Thethirdquartile(Q3)
liesthree-quartersofthewayuplistoforderedobservations
•largerthan75%oforderedobservations.
•Mofupperhalf
4.The“secondquartile”
•median(M)
•Note:isnotlargerthan50%oforderedobservations
•isat50%mark

InterquartileRange(IQR):
Thedistancebetweenthefirstandthird
quartiles.
IQR=Q3–Q1
IQR=41–25=16

Note:IfanobservationfallsintheIQRitisnot
unusuallyhighorlow.
WeuseIQRtoidentifysuspectoutliers.

Outliers
•Unusuallyhighorunusuallylow
•Basic“ruleofthumb”foridentifyingisiftheobservation
fallsmorethan1.5xIQRabovethethirdquartileorbelow
thefirstquartile.

HowtofindOutliers?
3steps
1.FindIQR
2.Q3+1.5xIQR(uppercutoff)
3.Q1–1.5xIQR(lowercutoff)

Ex:McDonald’sChickenSandwiches
Problem:DeterminewhetherthePremiumCrispyChicken
ClubSandwichwith28gramsoffatisanoutlier.(2min)
Solution:
Herearethe14amountsoffatinorder:
99101012151616171717202328

WhichisresistantrangeorIQR?
•Range:
•IQR:

TheShape
•Theshapeofadistributionisdescribedbythefollowing
characteristics.
•Symmetry.
•Numberofpeaks
•Skewness.
•Uniform.

Symmetr
y.
•Whenitisgraphed,asymmetricdistributioncanbe
dividedatthecentersothateachhalfisamirrorimage
oftheother.
Numberof
peaks
•Distributionscanhavefewormanypeaks.Distributionswith
oneclearpeakarecalledunimodal,anddistributionswithtwo
clearpeaksarecalledbimodal.Whenasymmetricdistribution
hasasinglepeakatthecenter,itisreferredtoasbell-shaped.

Skewness.
•Whentheyaredisplayedgraphically,somedistributions
havemanymoreobservationsononesideofthegraphthan
theother.Distributionswithfewerobservationsontheright
(towardhighervalues)aresaidtobeskewedright;and
distributionswithfewerobservationsontheleft(toward
lowervalues)aresaidtobeskewedleft.

Uniform.
•Whentheobservationsinasetofdataareequallyspread
acrosstherangeofthedistribution,thedistributioniscalleda
uniformdistribution.Auniformdistributionhasnoclearpeaks.

s

InterpretingHistograms
Assesswherea
distributionis
centeredbyfinding
themedian
Assessthespread
ofadistribution
Shapeofa
distribution:roughly
symmetric,skewed
totheright,or
skewedtotheleft
Leftandrightsides
aremirror
images

ExamplesofSkewness

Shape:TypeofMound

ComparingtheMeanandMedian
Meanandmedianofasymmetricdistributionare
close
Meanisoftenpreferredbecauseitusesall
Inaskeweddistribution,themeanisfartheroutinthe
skewedtailthanisthemedian
Medianispreferredbecauseitisbetter
representativeofatypicalobservation

ResistantMeasures
Ameasureis
resistantifextreme
observations
(outliers)havelittle,
ifany,influenceon
itsvalue
Medianisresistant
tooutliers
Meanisnot
resistanttooutliers
www.stat.psu.edu

NumberofModes
Onewaytodescribetheshapeofadistributionisbyitsnumber
ofpeaks,ormodes.
Uniformdistribution—hasnomodebecausealldatavalueshave
thesamefrequency.

Anypeakisconsideredamode,evenifallpeaksdonothavethe
sameheight.
Adistributionwithasinglepeakiscalleda
single-peaked,orunimodal,distribution.
Adistributionwithtwopeaks,eventhoughnotthe
samesize,isabimodaldistribution.
Whatisthefollowingdistribution?

SymmetryorSkewness
Adistributionissymmetricifitslefthalf
isamirrorimageofitsrighthalf.
Asymmetricdistributionwithasingle
peakandabellshapeisknownasa
normaldistribution.

SymmetryorSkewness
Adistributionisleft-skewed
(ornegativelyskewed)ifthevalues
aremorespreadoutontheleft,
meaningthatsomelowvaluesare
likelytobeoutliers.
Adistributionisrightskewed
orpositivelyskewedifthe
valuesaremorespreadout
ontheright.Ithasatail
pulledtowardtheright.

Whatistherelationshipbetweenmean,medianandmode
foranormaldistribution?
Findthemeanmedianandmodeof:
1,2,2,3,3,3,4,4,4,4,4,4,5,5,5,6,6,7
Meanis4.
Medianis4.
Modeis4.

Whatistherelationshipbetweenmean,medianandmodeofa
left-skeweddistribution?
Findthemean,medianandmodeof:
0,5,10,20,40,45,45,50,50,50,60,60,60,60,60,60,70,70,70,70,70,
70,70,70
Themeanis51.5.
Themedianis60.
Themodeis70.

Whatistherelationshipbetweenmean,medianandmodeofa
right-skeweddistribution?
Findthemean,median,andmodeof:
20,20,20,20,20,20,20,20,30,30,30,30,30,30,45,45,45,50,50,60,
70,90
Themeanis36.1.
Themedianis30.
Themodeis20.

Variation
Variationdescribeshowwidelydataare
spreadoutaboutthecenterofadata
set.
Howwouldyouexpectthevariationtodifferbetweentimes
ina5Kcityrunanda5Kruninastatemeet?

ResistantMeasure(ofcenter)
Resistsinfluenceofextremeobservations.
•Mean(average)isnotresistant
•Median(midpoint)isresistant
•Themeanandmedianwouldbeexactlythesameifthe
distributionisexactlysymmetric.
•Inaskeweddistribution,themeanisfartheroutinthelong
tailthenthemedian.

Howdooutliersaffectthemedian?
Arethereanyoutliers?IfsoremoveandfindM.
•n=15
•Mnew=34vsMold=34
Medianisresistant

Howdooutliersaffectthe
mean?
16,19,24,25,25,33,33,34,34,37,37,40,42,46,49,73
Findthemeanoftheoriginaldata.
•EnterdataintoL1
•STAT:CALC:1–VarStats:L1
•,Mean=35.44
Findthemeanwithoutoutlier.
•Remove73fromL1
•STAT:CALC:1–VarStats:L1
•Mean=32.93
Wecanfindmedianalsowith1-VarStats.Scrolldownthelist.

IsBarrybonds73anoutlier?
•IQR=
41–25=16
•1.5xIQR=
=24
•Q3+24=
41+24=65(uppercutoff)
•Yesorno?
YesBondsrecordsettingyearof73isanoutlier.

StemPlot
s

StemPlots

•Inastemplot,theentriesontheleftarecalledstems;andtheentriesonthe
rightarecalledleaves.Intheexampleabove,thestemsaretens(8
represents80,9represents90,10represents100,andsoon);andthe
leavesareones.However,thestemsandleavescouldbeotherunits-
millions,thousands,ones,tenths,etc.
•Somestemplotsincludeakeytohelptheuserinterpretthedisplay
correctly.Thekeyinthestemplotaboveindicatesthatastemof11witha
leafof7representsanIQscoreof117.
•Lookingattheexampleabove,youshouldbeabletoquicklydescribethe
distributionofIQscores.Mostofthescoresareclusteredbetween90
and109,withthecenterfallingintheneighborhoodof100.Thescores
rangefromalowof81(twostudentshaveanIQof81)toahighof151.
Thehighscoreof151mightbeclassifiedasanoutlier.
•Note:Intheexampleabove,thestemsandleavesareexplicitlylabeled
foreducationalpurposes.Intherealworld,however,stemplotsusually
donotincludeexplicitlabelsforthestemsandleaves.

Stemplots
Anothersimplegraphicaldisplayforsmalldatasetsisastemplot.(Alsocalleda
stem-and-leafplot.)
Stemplotsgiveusaquickpictureofthedistributionwhileincludingtheactualnumerical
values.

Stem&LeafPlotsReview
Giventhefollowingvalues,drawastemandleafplot
20,32,45,44,26,37,51,29,34,32,25,41,56
Ages Occurrences
---------------------------------------------------------
---------
2 |0,6,9,5
|
3 |2,7,4,2
|
4 |5,4,1
|
5 |1,6

Stem-and-leafplots
•Summarizes
quantitativevariables
•Separateeach
observationintoa
stem(firstpartof#)
andaleaf(lastdigit)
•Writeeachleafto
therightofitsstem;
orderleavesif
desired
Sodiumin
Cereals

Example:Who’sTaller?
•Whichgenderistaller,malesorfemales?Asample
of14-year-oldsfromtheUnitedKingdomwas
randomlyselectedusingthe
website.
•Herearetheheightsofthestudents(incm):
•Constructstemplots.Reportback.
•Male:154,157,187,163,167,159,169,162,176,
177,151,175,174,165,165,183,180
•Female:160,169,152,167,164,163,160,163,
169,157,158,153,161,165,165,159,168,153,
166,158,158,166

Stem&LeafPlotsReview
Giventhefollowingvalues,drawastemandleafplot
20,32,45,44,26,37,51,29,34,32,25,41,56
Ages Occurrences
-------------------------------------------------------
-----------
2 |0,6,9,5
|
3 |2,3,4,2
|
4 |5,4,1
|
5 |1,6

https://www.khanacademy.org/math/ap-statistics/quantitative-data-ap/histograms-ste
m-leaf/e/reading_stem_and_leaf_plots

Boxplot

•Aboxplotsplitsthedatasetintoquartiles.Thebodyoftheboxplot
consistsofa"box"(hence,thename),whichgoesfromthefirstquartile
(Q1)tothethirdquartile(Q3).
•Withinthebox,averticallineisdrawnattheQ2,themedianofthedata
set.Twohorizontallines,calledwhiskers,extendfromthefrontandback
ofthebox.ThefrontwhiskergoesfromQ1tothesmallestnon-outlierin
thedataset,andthebackwhiskergoesfromQ3tothelargest
non-outlier.

HowtoInterpretaBoxplot
•Hereishowtoreadaboxplot.Themedianis
indicatedbytheverticallinethatrunsdown
thecenterofthebox.Intheboxplotabove,
themedianisabout400.
•Additionally,boxplotsdisplaytwocommon
measuresofthevariabilityorspreadinadata
set.

•Range.Ifyouareinterestedinthespreadofthedata,itis
representedonaboxplotbythehorizontaldistancebetweenthe
smallestvalueandthelargestvalue,includinganyoutliers.Inthe
boxplotabove,datavaluesrangefromabout-700(thesmallest
outlier)to1700(thelargestoutlier),sotherangeis2400.Ifyou
ignoreoutliers,therangeisillustratedbythedistancebetweenthe
oppositeendsofthewhiskers-about1000intheboxplotabove.
•Interquartilerange(IQR).Themiddlehalfofadatasetfalls
withintheinterquartilerange.Inaboxplot,theinterquartilerangeis
representedbythewidthofthebox(Q3minusQ1).Inthechart
above,theinterquartilerangeisequalto600minus300orabout
300.

Stem(andleaf)plot
Graphsquantitativedataandisusedforsmallor
mediumsizeddata.(Verycommongraph).
•Tipfororganizingthedata:Firstenterdatainto
list.Thensort.

Howtoconstructastemplot
1.Separateeachobservationintoastemconsistingofallbut
therightmostdigit(1digitleafs).
2.Writestemsverticallyinincreasingorderanddrawa
verticallinetotheirright;writeeachleaftorightofits
stem.
3.Ifdatanotenteredintocalculatorlist,writestemsagain
andrearrangeleavesinincreasingorderoutfromstem
(don’tneedtodoifyousortwithcalculator).

Howtoconstructastemplotcont.
4.Titlegraphandaddkeydescribingwhatstemsandleaves
represent
Key:3/5meansthesoft
drinkcontains35mgof
caffeineper8ounce
serving.

Split-stemandleafplot
•Whensplittingstems,besureeachstemis
assignedanequalnumberofpossibleleafdigits.
•Noticegraph“a”isvery“skyscraper-ish”.
•Sosplit-stem.

Back-to-backstemplots-goodtocompare
twosetsofdata
Remember:5stemsisagoodminimum
•Advantages:easytoconstruct;displayactual
datavalues
•Disadvantages:doesnotworkwellwithlarge
datasets

StemPlots
•Astemplotgivesaquickpictureoftheshapeof
adistributionwhileincludingthenumerical
values
–Separateeachobservationintoastemandaleaf
eg.14g->1|4256->25|632.9oz->32|9
–Writestemsinaverticalcolumnanddrawavertical
linetotherightofthecolumn
–Writeeachleaftotherightofitsstem
•Note:
–Stemplotsdonotworkwellforlargedatasets
–Notavailableoncalculator

Example1
Theages(measuredbylastbirthday)oftheemployees
ofDewey,CheatumandHowearelistedbelow.
a)Constructastemgraphoftheages
b)Constructaback-to-backcomparingtheoffices
c)Constructahistogramoftheages
22 31 21 49 26 42
42 30 28 31 39 39
20 37 32 36 35 33
45 47 49 38 28 48
OfficeA
OfficeB

Histograms
•Likeabarchart,ahistogramismadeupofcolumnsplottedonagraph.
Usually,thereisnospacebetweenadjacentcolumns.Hereishowtoread
ahistogram.
•Thecolumnsarepositionedoveralabelthatrepresentsaquantitative
variable.
•Thecolumnlabelcanbeasinglevalueorarangeofvalues.
•Theheightofthecolumnindicatesthesizeofthegroupdefinedbythe
columnlabel.

Histogram

Histograms
Histogramsbreaktherangeofdatavaluesinto
classesanddisplaysthecountor%of
observationsthatfallintothatclass
Dividetherangeofdataintoequal-widthclasses
Counttheobservationsineachclass:“frequency”
Drawbarstorepresentclasses:height=frequency
Barsshouldtouch(unlikebargraphs).

Histograms
Quantitativevariablesoftentakemanyvalues.Agraphofthe
distributionmaybeclearerifnearbyvaluesaregroupedtogether.
Themostcommongraphofthedistributionofonequantitative
variableisahistogram.

HistogramversusBarChart
Histogram BarChart
variables quantitative categorical
barspace nospace spacesbetween

Example
Belowaretimesobtainedfromamail-ordercompany's
shippingrecordsconcerningtimefromreceiptoforder
todelivery(indays)foritemsfromtheircatalogue?
a)Constructahistogramofthedeliverytimes
3 7 10 5 14 12
6 2 9 22 25 11
5 7 12 10 22 23
14 8 5 4 7 13
27 31 13 21 6 8
3 10 19 12 11 8

Example:Histogram
n=36
k=√36=6
w=(31–2)/6
=29/6≈4.85
K range1 Nr
1 2–6 9
2 7–11 12
3 12–167
4 17–212
5 22–264
6 27–312
2
4
6
8
271217222732
Frequency
DaystoDelivery
10
12

DisplayingQuantitativeVariables
HistogramsThemostcommongraphfor
distributionofonequantitativevariable.No
spacesbetweengroups.

Howtoconstructahistogram
1.Dividerangeofdataintoclassesofequalwidth
andcountnumberofobservationsineachclass;
besuretospecifyclassespreciselysothateach
observationfallsintoexactlyoneclass.
2.Labelandscaleaxisandtitlegraph!
3.Drawabarthatrepresentsthecountineach
class.

Remember:
•Leavenospace
betweenbars.
•Addabreak-in-scale
symbol(//)onanaxis
thatdoesnotstartat
0.
•5classesisagood
minimum.
•HistogramTips(page
39)

Histograms
Graphthatusesbars
toportray
frequenciesor
relativefrequencies
ofpossible
outcomesfora
quantitativevariable

ConstructingaHistogram
1.Divideintointervalsofequalwidth
2.Count#ofobservationsineach
interval
Sodiumin
Cereals

ConstructingaHistogram
3.Labelendpoints
ofintervalson
horizontalaxis
4.Drawabarover
eachvalueor
intervalwith
heightequalto
itsfrequency
(orpercentage)
5.Labelandtitle
SodiuminCereals

Histograms
•Histogramsbreaktherangeofdatavaluesinto
classesanddisplaysthecountor%of
observationsthatfallintothatclass
–Dividetherangeofdataintoequal-widthclasses
–Counttheobservationsineachclass:“frequency”
–Drawbarstorepresentclasses:height=frequency
–Barsshouldtouch(unlikebargraphs).

Example1
Theages(measuredbylastbirthday)oftheemployees
ofDewey,CheatumandHowearelistedbelow.
a)Constructastemgraphoftheages
b)Constructaback-to-backcomparingtheoffices
c)Constructahistogramoftheages
22 31 21 49 26 42
42 30 28 31 39 39
20 37 32 36 35 33
45 47 49 38 28 48
OfficeA
OfficeB

Example1:Histogram
n=24
k=√24≈4.9sopickk=4
w=(49–20)/4
=29/4≈7.38
K range Nr
1 20–274
2 28–358
3 36–437
4 44–515
2
4
6
8
20-27
27-35
36-43
44-51
NumbersofPersonnel
Ages

BarChart
•Abarchartismadeupofcolumnsplottedonagraph.
Hereishowtoreadabarchart.
•Thecolumnsarepositionedoveralabelthatrepresents
acategoricalvariable.
•Theheightofthecolumnindicatesthesizeofthegroup
definedbythecolumnlabel.

Skewedtotheright
(positivelyskewed)
•Rightsideofhistogramextendsmuchfartherout
thanleftside
•pulledtotheright
•longtailtotheright

154109137115152140154178101
103126126137165165129200148
Makeastemplotofthesedata.
EnterintoL2andsort,thenuse:
•Stemsfrom10-20
•Leaves:ones;so154lookslike15/4

Arethereanypotentialoutliers?
Aboutwhereisthecenterofthedistribution?
Potentialoutliers: 10139
Center: 115
Median:meanofthe9
th
and10
th
observ.12 669
1377
14 08
15 244
16 55
17 8
18
19
20 0
200
about138.5
n=18

Shape
Describeshape:
Theoverallshapeofthedistributionisirregular,asoften
happenswhenonlyafewobservationsareavailable.

Spread
Whatisthespreadofthescores(ignoringanyoutliers)?
178–101=77orfrom101to178

Center
b.:Findthemeanscorefromtheformulaforthemean.
Byhand:
Sumofthe18observations/18
=2539/18=
Calculatorkeystrokes:
2ndSTAT(list):MATH:MEAN(L2)or
STAT:CALC:1-VarStat(L2)
=
141.06
141.06

c.Findthemedianofthesescores.
Whichislarger:themedianorthemean?
Median=averageofthe9
th
and10
th
scores
=138.5
vs.mean=141.058
Explainwhy?
Themeanislargerthanthemedian
becauseoftheoutlierat200whichpulls
themeantowardthelongrighttailofthe
distribution.

DescribingDistributions
Shape
symmetric,skewed(leftorright),multi-modal
Outliers
Outliers-Extremevalues,unliketherestofthesampleand
withnospecialtreatmentcouldleadtoover-estimates.
dotheyexist,howmany,andonwhichends
Center
appropriatemeasure(mean,median,ormode)
Spread
appropriatemeasure(standarddeviationorIQR)

DescribingShape
Whenyoudescribeadistribution’sshape,concentrateonthemainfeatures.
Lookforroughsymmetryorclearskewness.

Statistics

Centraltendency

Centraltendency

Centraltendency

Centraltendency
•Seekstoprovideasinglevaluethatbestrepresentsa
distribution
•Typicalmeasuresare
–mode
–median
–mean

Mode
•themostfrequentlyoccurringscorevalue
•correspondstothehighestpointonthefrequency
distribution
Foragivensample
N=16:
333536373838
383939393940
40414145
Themode=39

Mode
•Themodeisnotsensitivetoextremescores.
Foragivensample
N=16:
333536373838
383939393940
40414150
Themode=39

Mode
•adistributionmayhavemorethanonemode
Foragivensample
N=16:
343435353535
363738383939
39394040
Themodes=35
and39

Mode
•theremaybenouniquemode,asinthecaseofa
rectangulardistribution
Foragivensample
N=16:
333334343535
363637373838
39394040
Nouniquemode

Median
•thescorevaluethatcutsthedistributionin
half(the“middle”score)
•50thpercentile
ForN=15
themedianis
theeighth
score=37

Median
ForN=16
themedianis
theaverage
oftheeighth
andninth
scores=37.5

Mean
•thisiswhatpeopleusuallyhaveinmindwhen
theysay“average”
•thesumofthescoresdividedbythenumberof
scores
Changingthevalueofasinglescoremaynotaffectthe
modeormedian,butitwillaffectthemean.
Forapopulation:Forasample:

Mean
X=7.07
Inmanycasesthemeanisthe
preferredmeasureofcentral
tendency,bothasadescriptionof
thedataandasanestimateofthe
parameter.
__
Inorderforthemeanto
bemeaningful,the
variableofinterestmust
bemeasuresonan
intervalscale.
0
1
2
3
4
5
Buddhist
Protestant
Catholic
Jewish
Muslim
Score
Frequency
X=2.4
__

Mean
Themeanis
sensitivetoextreme
scoresandis
appropriatefor
moresymmetrical
distributions.
X=36.8
__
X=36.5
__
X=93.2
__

•asymmetricaldistributionexhibitsnoskewness
•inasymmetricaldistributiontheMean=Median=Mode
Symmetry

•Skewnessreferstotheasymmetryofthedistribution
Skeweddistributions
•Apositivelyskewed
distributionisasymmetrical
andpointsinthepositive
direction.
Mode=70,000$
Median=88,700$
Mean=93,600$
modemean
media
n
•mode<median<mean

•Anegativelyskeweddistribution
Skeweddistributions
•mode>median>mean
mode
mean
media
n

Measuresofcentraltendency
+ -
Mode
•quick&easytocompute
•usefulfornominaldata
•poorsamplingstability
Median
•notaffectedbyextreme
scores
•somewhatpoor
samplingstability
Mean
•samplingstability
•relatedtovariance
•inappropriatefor
discretedata
•affectedbyskewed
distributions

Distributions
•Center:mode,median,mean
•Shape:symmetrical,skewed
•Spread

MeasuresofSpread
•thedispersionofscoresfromthecenter
•adistributionofscoresishighlyvariableifthe
scoresdifferwildlyfromoneanother
•Threestatisticstomeasurevariability
–range
–interquartilerange
–variance

Range
•largestscoreminusthesmallestscore
•thesetwo
havesamerange(80)
butspreadslookdifferent
•saysnothingabouthowscoresvaryaroundthecenter
•greatlyaffectedbyextremescores(definedbythem)

Interquartilerange
•Q1isthe"middle"valueinthe halfofthe
rank-ordereddataset.
•Q2isthemedianvalueintheset.
•Q3isthe"middle"valueinthe halfofthe
rank-ordereddataset.

Interquartilerange
•thedistancebetweenthe25thpercentileandthe75th
percentile
•Q3-Q1=70-30=40
•Q3-Q1=55-45=10
•effectivelyignoresthetopandbottomquarters,so
extremescoresarenotinfluential
•dismisses50%ofthedistribution

Deviationmeasures
•Toseehow
‘deviant’the
distributionis
relativetoanother,
wecouldsumthese
scores
•Butthiswouldleave
uswithabigfat
zero
ScoreDeviation
Amy 10 -40
Theo 20 -30
Max 30 -20
Henry 40 -10
Leticia 50 0
Charlotte 60 10
Pedro 70 20
Tricia 80 30
Lulu 90 40
SUM 0

Deviationmeasures
Soweuse
squareddeviations
fromthemean
ScoreDeviation
Sq.
Deviation
Amy 10 -40 1600
Theo 20 -30 900
Max 30 -20 400
Henry 40 -10 100
Leticia 50 0 0
Charlotte 60 10 100
Pedro 70 20 400
Tricia 80 30 900
Lulu 90 40 1600
SUM 0 6000
Thisisthesum
ofsquares
(SS)
SS=∑(X-X)
2
__

Variance
Wetakethe
“average”
squareddeviation
fromthemeanand
callitVARIANCE
(tocorrectforthefactthat
samplevariancetendsto
underestimatepop
variance)
Forapopulation:
Forasample:

Variance
1.Findthemean.
2.Subtractthemean
fromeveryscore.
3.Squarethe
deviations.
4.Sumthesquared
deviations.
5.DividetheSSbyNor
N-1.
Score
Dev’
n
Sq.Dev.
Amy 10 -40 1600
Theo 20 -30 900
Max 30 -20 400
Henry 40 -10 100
Leticia 50 0 0
Charlotte 60 10 100
Pedro 70 20 400
Tricia 80 30 900
Lulu 90 40 1600
SUM 0 60006000/
8=75
0

Thestandarddeviationisthesquarerootofthe
variance
Thestandarddeviationmeasuresspreadintheoriginal
unitsofmeasurement,whilethevariancedoessoin
unitssquared.
Varianceisgoodforinferentialstats.
Standarddeviationisnicefordescriptivestats.
Standarddeviation

Example
N=28
X=50
s
2
=555.55
s=23.57
N=28
X=50
s
2
=
140.74
s=11.86

DescriptiveStatistics:QuickReview
Forapopulation: Forasample:
Mean
Variance
Standard
Deviation

•Treatthislittledistributionasasampleandcalculate:
–Mode,median,mean
–Range,variance,standarddeviation
Exercise

UnusualFeatures
•Gaps.
•Outliers

Gaps
•Gapsrefertoareasofadistribution
wheretherearenoobservations.The
firstfigurebelowhasagap;thereareno
observationsinthemiddleofthe
distribution.

•Sometimes,distributionsarecharacterizedbyextremevaluesthat
differgreatlyfromtheotherobservations.Theseextremevalues
arecalledoutliers.Thesecondfigurebelowillustratesa
distributionwithanoutlier.Exceptforonelonelyobservation(the
outlierontheextremeright),alloftheobservationsfallbetween0
and4.Asa"ruleofthumb",anextremevalueisoftenconsidered
tobeanoutlierifitisatleast1.5interquartilerangesbelowthefirst
quartile(Q1),oratleast1.5interquartilerangesabovethethird
quartile(Q3).
Outliers

DotPlotExample
Comparedtoothertypesofgraphicdisplay,dotplotsareused
mostoftentoplotfrequencycountswithinasmallnumberof
categories,usuallywithsmallsetsofdata.
•Hereisanexampletoshowwhatadotplotlookslikeandhowto
interpretit.Suppose30firstgradersareaskedtopicktheir
favoritecolor.Theirchoicescanbesummarizedinadotplot,as
shownbelow

DotPlotExample
0
1
2
3
4
5
6
7
8
9
10
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

•Eachdotrepresentsonestudent,andthenumberof
dotsinacolumnrepresentsthenumberoffirstgraders
whoselectedthecolorassociatedwiththatcolumn.
Forexample,Redwasthemostpopularcolor
(selectedby9students),followedbyBlue(selectedby
7students).Selectedbyonly1student,Indigowasthe
leastpopularcolor.
•Inthisexample,notethatthecategory(color)isa
qualitativevariable;soitisnotappropriatetotalkabout
thesymmetryorskewnessofthisdotplot.Thedotplot
inthenextsectionusesaquantitativevariable,sowe
willillustrateskewnessandsymmetryofdotplotsinthe
nextsection.

BarChart
•Abarchartismadeupofcolumnsplottedonagraph.
Hereishowtoreadabarchart.
•Thecolumnsarepositionedoveralabelthatrepresents
acategoricalvariable.
•Theheightofthecolumnindicatesthesizeofthegroup
definedbythecolumnlabel.

BarChart
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Column1
Column1

Histograms
•Likeabarchart,ahistogramismadeupofcolumnsplottedonagraph.
Usually,thereisnospacebetweenadjacentcolumns.Hereishowtoread
ahistogram.
•Thecolumnsarepositionedoveralabelthatrepresentsaquantitative
variable.
•Thecolumnlabelcanbeasinglevalueorarangeofvalues.
•Theheightofthecolumnindicatesthesizeofthegroupdefinedbythe
columnlabel.

Histogram
Tags