Business Statistics for Managers with SPSS[1].pptx
profgnagarajan
43 views
90 slides
Sep 21, 2024
Slide 1 of 90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
About This Presentation
BSM with SPSS
Size: 973.98 KB
Language: en
Added: Sep 21, 2024
Slides: 90 pages
Slide Content
Business Statistics for Managers Dr. D. Pradeep Kumar
Statistics for Business and Economics Chapter 1 Statistics, Data, & Statistical Thinking
Contents The Science of Statistics Types of Statistical Applications in Business Fundamental Elements of Statistics Processes Types of Data Collecting Data The Role of Statistics in Managerial Decision Making
Learning Objectives Introduce the field of statistics Demonstrate how statistics applies to business Establish the link between statistics and data Identify the different types of data and data-collection methods Differentiate between population and sample data Differentiate between descriptive and inferential statistics
What Is Statistics? Statistics is the science of data. It involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information.
1.2 Types of Statistical Applications in Business
Application Areas Economics Forecasting Demographics Sports Individual & Team Performance Engineering Construction Materials Business Consumer Preferences Financial Trends
Statistics: Two Processes Describing sets of data and Drawing conclusions (making estimates, decisions, predictions, etc. about sets of data based on sampling)
Descriptive Statistics Involves Collecting Data Presenting Data Characterizing Data Purpose Describe Data X = 30.5 S 2 = 113 25 50 Q1 Q2 Q3 Q4 $
Involves Estimation Hypothesis Testing Purpose Make decisions about population characteristics Inferential Statistics Population?
1.3 Fundamental Elements of Statistics
Fundamental Elements Experimental unit Object upon which we collect data Population All items of interest Variable Characteristic of an individual experimental unit Sample Subset of the units of a population P in P opulation & P arameter S in S ample & S tatistic
Fundamental Elements Statistical Inference Estimate or prediction or generalization about a population based on information contained in a sample Measure of Reliability Statement (usually qualified) about the degree of uncertainty associated with a statistical inference
Four Elements of Descriptive Statistical Problems The population or sample of interest One or more variables (characteristics of the population or sample units) that are to be investigated Tables, graphs, or numerical summary tools Identification of patterns in the data
Five Elements of Inferential Statistical Problems The population of interest One or more variables (characteristics of the population units) that are to be investigated The sample of population units The inference about the population based on information contained in the sample A measure of reliability for the inference
1.4 Processes
Process A process is a series of actions or operations that transforms inputs to outputs. A process produces or generates output over time.
Process A process whose operations or actions are unknown or unspecified is called a black box . Any set of output (object or numbers) produced by a process is called a sample .
1.5 Types of Data
Types of Data Quantitative data are measurements that are recorded on a naturally occurring numerical scale. Qualitative data are measurements that cannot be measured on a natural numerical scale; they can only be classified into one of a group of categories.
Types of Data Types of Data Quantitative Data Qualitative Data
Quantitative Data Measured on a numeric scale. Number of defective items in a lot. Salaries of CEOs of oil companies. Ages of employees at a company. 3 52 71 4 8 943 120 12 21
Qualitative Data Classified into categories. College major of each student in a class. Gender of each employee at a company. Method of payment (cash, check, credit card). $ Credit
1.6 Collecting Data
Obtaining Data Data from a published source Data from a designed experiment Data from a survey Data collected observationally
Obtaining Data Published source : book, journal, newspaper, Web site Designed experiment : researcher exerts strict control over units Survey : a group of people are surveyed and their responses are recorded Observation study : units are observed in natural setting and variables of interest are recorded
Samples A representative sample exhibits characteristics typical of those possessed by the population of interest. A random sample of n experimental units is a sample selected from the population in such a way that every different sample of size n has an equal chance of selection.
Random Sample Every sample of size n has an equal chance of selection.
1.7 The Role of Statistics in Managerial Decision Making
Statistical Thinking Statistical thinking involves applying rational thought and the science of statistics to critically assess data and inferences. Fundamental to the thought process is that variation exists in populations and process data. A random sample of n experimental units is a sample selected from the population in such a way that every different sample of size n has an equal chance of selection.
Nonrandom Sample Errors Selection bias results when a subset of the experimental units in the population is excluded so that these units have no chance of being selected for the sample. Nonresponse bias results when the researchers conducting a survey or study are unable to obtain data on all experimental units selected for the sample. Measurement error refers to inaccuracies in the values of the data recorded. In surveys, the error may be due to ambiguous or leading questions and the interviewer’s effect on the respondent.
Key Ideas Types of Statistical Applications Descriptive 1. Identify population and sample (collection of experimental units ) 2. Identify variable(s) 3. Collect data 4. Describe data
Key Ideas Types of Statistical Applications Inferential 1. Identify population (collection of all experimental units ) 2. Identify variable(s) 3. Collect sample data ( subset of population) 4. Inference about population based on sample 5. Measure of reliability for inference
Key Ideas Types of Data 1. Quantitative (numerical in nature) 2. Qualitative (categorical in nature)
Key Ideas Data-Collection Methods 1. Observational 2. Published source 3. Survey 4. Designed experiment
The mean, median, and mode are measures of central tendency that are used to identify the core position of a data set. They are applied in different situations depending on the type of data and the level of measurement: Nominal data: The mode is the only appropriate measure of central tendency to use. The mode is the most frequent value in the data set. Ordinal data: The median or mode is usually the best choice. The median is the value in the middle of the data set. Interval or ratio data: The mean, median, and mode can all be used. The mean is the average value. Skewed distribution: The median is often the best measure of central tendency. Symmetrical distribution for continuous data: The mean, median, and mode are all equal. Data with extreme scores: The median is preferred because a single outlier can have a big effect on the mean. Data with missing or undetermined values: The median is preferred. The mean is the most commonly used measure of central tendency, but the best measure depends on the type of data.
43 Measures of Central Tendency Greg C Elvers, Ph.D.
44 Measures of Central Tendency A measure of central tendency is a descriptive statistic that describes the average, or typical value of a set of scores There are three common measures of central tendency: the mode the median the mean
45 The Mode The mode is the score that occurs most frequently in a set of data
46 Bimodal Distributions When a distribution has two “modes,” it is called bimodal
47 Multimodal Distributions If a distribution has more than 2 “modes,” it is called multimodal
48 When To Use the Mode The mode is not a very useful measure of central tendency It is insensitive to large changes in the data set That is, two data sets that are very different from each other can have the same mode
49 When To Use the Mode The mode is primarily used with nominally scaled data It is the only measure of central tendency that is appropriate for nominally scaled data
50 The Median The median is simply another name for the 50 th percentile It is the score in the middle; half of the scores are larger than the median and half of the scores are smaller than the median
51 How To Calculate the Median Conceptually, it is easy to calculate the median There are many minor problems that can occur; it is best to let a computer do it Sort the data from highest to lowest Find the score in the middle middle = (N + 1) / 2 If N, the number of scores, is even the median is the average of the middle two scores
52 Median Example What is the median of the following scores: 10 8 14 15 7 3 3 8 12 10 9 Sort the scores: 15 14 12 10 10 9 8 8 7 3 3 Determine the middle score: middle = (N + 1) / 2 = (11 + 1) / 2 = 6 Middle score = median = 9
53 Median Example What is the median of the following scores: 24 18 19 42 16 12 Sort the scores: 42 24 19 18 16 12 Determine the middle score: middle = (N + 1) / 2 = (6 + 1) / 2 = 3.5 Median = average of 3 rd and 4 th scores: (19 + 18) / 2 = 18.5
Median Example for Discrete frequency x: 1 2 3 4 5 6 7 8 9 F: 8 10 11 16 20 25 15 9 6 x f CF 1 8 8 2 10 8+10=18 3 11 18+11=29 4 16 45 5 20 65 6 25 90 7 15 105 8 9 114 9 6 120 The median class is 65 N= Σ fi =120 N/2= 120/2=60 The CF just greater than (N/2=60) is 65 Median=5 Median for continuous frequency distribution Wages : 2000-3000 3000-4000 4000-5000 5000-6000 6000-7000 No.of workers : 3 5 20 10 5 wages no.of workers cf 2000-3000 3 3 3000-4000 5 8 4000-5000 20 28 5000-6000 10 38 6000-7000 5 43 N= Σ fi =43 N/2= 43/2=21.5 The CF just greater than ( N/2=21.5) is 28 The corresponding interval is 4000-5000 Median= L+h /2( N/2-c.f) L = limit of the median class f = frequency of Median class h =Magnitude of Median class CF = The cf of the class preceeding the median class Median= 4000+(1000/2)(21.5-8) 4000+500(13.5) 4675
55 When To Use the Median The median is often used when the distribution of scores is either positively or negatively skewed The few really large scores (positively skewed) or really small scores (negatively skewed) will not overly influence the median
56 The Mean The mean is: the arithmetic average of all the scores ( X)/N the number, m, that makes (X - m) equal to 0 the number, m, that makes (X - m) 2 a minimum The mean of a population is represented by the Greek letter ; the mean of a sample is represented by X
57 Calculating the Mean Calculate the mean of the following data: 1 5 4 3 2 Sum the scores ( X) : 1 + 5 + 4 + 3 + 2 = 15 Divide the sum ( X = 15) by the number of scores (N = 5): 15 / 5 = 3 Mean = X = 3
Calculating the Mean for discrete data x= 1 2 3 4 5 6 7 Fi= 5 9 12 17 14 10 6 Mean= Xifi / fi =299/73 =4.06 xi fi fi*xi 1 5 5 2 9 18 3 12 36 4 17 68 5 14 70 6 10 60 7 6 42
59 When To Use the Mean You should use the mean when the data are interval or ratio scaled Many people will use the mean with ordinally scaled data too and the data are not skewed The mean is preferred because it is sensitive to every score If you change one score in the data set, the mean will change
60 Relations Between the Measures of Central Tendency In symmetrical distributions, the median and mean are equal For normal distributions, mean = median = mode In positively skewed distributions, the mean is greater than the median In negatively skewed distributions, the mean is smaller than the median
61 Measures of Dispersion Greg C Elvers, Ph.D.
62 Definition Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other The more similar the scores are to each other, the lower the measure of dispersion will be The less similar the scores are to each other, the higher the measure of dispersion will be In general, the more spread out a distribution is, the larger the measure of dispersion will be
63 Measures of Dispersion Which of the distributions of scores has the larger dispersion? The upper distribution has more dispersion because the scores are more spread out That is, they are less similar to each other
64 Measures of Dispersion There are three main measures of dispersion: The range The semi-interquartile range (SIR) Variance / standard deviation
65 The Range The range is defined as the difference between the largest score in the set of data and the smallest score in the set of data, X L - X S What is the range of the following data: 4 8 1 6 6 2 9 3 6 9 The largest score (X L ) is 9; the smallest score (X S ) is 1; the range is X L - X S = 9 - 1 = 8
66 When To Use the Range The range is used when you have ordinal data or you are presenting your results to people with little or no knowledge of statistics The range is rarely used in scientific work as it is fairly insensitive It depends on only two scores in the set of data, X L and X S Two very different sets of data can have the same range: 1 1 1 1 9 vs 1 3 5 7 9
67 The Semi-Interquartile Range The semi-interquartile range (or SIR ) is defined as the difference of the first and third quartiles divided by two The first quartile is the 25 th percentile The third quartile is the 75 th percentile SIR = (Q 3 - Q 1 ) / 2
68 SIR Example What is the SIR for the data to the right? 25 % of the scores are below 5 5 is the first quartile 25 % of the scores are above 25 25 is the third quartile SIR = (Q 3 - Q 1 ) / 2 = (25 - 5) / 2 = 10
69 When To Use the SIR The SIR is often used with skewed data as it is insensitive to the extreme scores
70 Variance Variance is defined as the average of the square deviations:
71 What Does the Variance Formula Mean? First, it says to subtract the mean from each of the scores This difference is called a deviate or a deviation score The deviate tells us how far a given score is from the typical, or average, score Thus, the deviate is a measure of dispersion for a given score
72 What Does the Variance Formula Mean? Why can’t we simply take the average of the deviates? That is, why isn’t variance defined as: This is not the formula for variance!
73 What Does the Variance Formula Mean? One of the definitions of the mean was that it always made the sum of the scores minus the mean equal to 0 Thus, the average of the deviates must be 0 since the sum of the deviates must equal 0 To avoid this problem, statisticians square the deviate score prior to averaging them Squaring the deviate score makes all the squared scores positive
74 What Does the Variance Formula Mean? Variance is the mean of the squared deviation scores The larger the variance is, the more the scores deviate, on average, away from the mean The smaller the variance is, the less the scores deviate, on average, from the mean
75 Standard Deviation When the deviate scores are squared in variance, their unit of measure is squared as well E.g. If people’s weights are measured in pounds, then the variance of the weights would be expressed in pounds 2 (or squared pounds) Since squared units of measure are often awkward to deal with, the square root of variance is often used instead The standard deviation is the square root of variance
76 Standard Deviation Standard deviation = variance Variance = standard deviation 2
77 Computational Formula When calculating variance, it is often easier to use a computational formula which is algebraically equivalent to the definitional formula: 2 is the population variance, X is a score, is the population mean, and N is the number of scores
78 Computational Formula Example
79 Computational Formula Example
80 Variance of a Sample Because the sample mean is not a perfect estimate of the population mean, the formula for the variance of a sample is slightly different from the formula for the variance of a population: s 2 is the sample variance, X is a score, X is the sample mean, and N is the number of scores
81 Measure of Skew Skew is a measure of symmetry in the distribution of scores Positive Skew Negative Skew Normal (skew = 0)
82 Measure of Skew The following formula can be used to determine skew:
83 Measure of Skew If s 3 < 0, then the distribution has a negative skew If s 3 > 0 then the distribution has a positive skew If s 3 = 0 then the distribution is symmetrical The more different s 3 is from 0, the greater the skew in the distribution
84 Kurtosis (Not Related to Halitosis) Kurtosis measures whether the scores are spread out more or less than they would be in a normal (Gaussian) distribution Mesokurtic (s 4 = 3) Leptokurtic (s 4 > 3) Platykurtic (s 4 < 3)
85 Kurtosis When the distribution is normally distributed, its kurtosis equals 3 and it is said to be mesokurtic When the distribution is less spread out than normal, its kurtosis is greater than 3 and it is said to be leptokurtic When the distribution is more spread out than normal, its kurtosis is less than 3 and it is said to be platykurtic
86 Measure of Kurtosis The measure of kurtosis is given by:
87 s 2 , s 3 , & s 4 Collectively, the variance (s 2 ), skew (s 3 ), and kurtosis (s 4 ) describe the shape of the distribution
Karl Pearson’s coefficient of skewness Bowley’s coefficient of skewness It is based on mean, mode and standard deviation. It is based on quartiles. It is the usual method of finding coefficient of skewness. It is usually used when difference between quartiles are given. Skewness = mean – mode Skewness = Q3 + Q1 – 2Median Coefficient of Skewness by Karl Pearson’s method = mean- mode / standard deviation Coefficient of Skewness by Bowley’s method = Q3 + Q1 – 2Median / Q3 - Q1 Tip Coefficient of Skewness by Karl Pearson’s method = mean- mode / standard deviation coefficient of Skewness by Bowley’s method = Q3 + Q1 – 2Median / Q3 - Q1 Explanation Final Answer Karl Pearson’s method: It is the usual method of finding the coefficient of skewness. It is based on mean, mode and standard deviation. Coefficient of Skewness by Karl Pearson’s method = mean- mode / standard deviation.Bowley’s method: It is usually used when the difference between quartiles are given. It is based on quartiles. Coefficient of Skewness by Bowley’s method = Q3 + Q1 – 2Median / Q3 - Q1
Caluculate karlpearsons co-efficient for following data X: 20 30 40 50 60 70 f: 8 12 20 10 6 4 Skp =M-M0/ Mean= Fixi / Fi =2460/60=41 Mode=40 Standard deviation =13.7 skp =41-40/13.7=0.07 X Fi XiFi X 2 X 2 F 20 8 160 400 3200 30 12 360 900 10800 40 20 800 1600 32000 50 10 500 2500 25000 60 6 360 3600 21600 70 4 280 4900 19600