Introduction to statistics and data.pptx

daku3579 43 views 52 slides Sep 03, 2024
Slide 1
Slide 1 of 52
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52

About This Presentation

This presentation shows various types of data and how to gain the insights from the given data


Slide Content

Introduction to Statistics and Data By Dr. Peeyush Vats 1

Scope of statistics Social Sciences Man power planning Crime Rates Income & Wealth Analysis of Society Planning Agriculture Industry Textiles Education etc. For Example 5 years plan in India Mathematics Statistics now treated as applied mathematics In modern business analysis Economics Family budgeting Applied in solving economic problems related to production, consumption, distribution of products as per income and wealth related patterns, wages, prices, profits and individual savings, investments, unemployment and poverty etc. 2

Scope of statistics 5. Business Management Trend Analysis Market Research and Analysis Product life Cycle Marketing Finance Sales Personnel

Data Collection and Data Sources 4 Data collection is a term used to describe a process of preparing and collecting data. Data are the basic inputs to any decision making process in business. The purpose of data collection is- to obtain information to keep on record to make decisions about important issues, to pass information on to others

Classification of data sources 5

Primary Data The data which are collected from the field under the control and supervision of an investigator. Primary data means the original data that has been collected for the special purpose. This type of data generally afresh and collected for the first time. It is useful for current studies and future studies. For example: Your own questionnaire.

Demerits of primary data Collection of data requires a lot of time Requires a lot of finance In some enquiries it is not possible to collect primary data. Requires a lot of labour 7

Secondary Data Data gathered and recorded by someone else prior to and for a purpose other than the current project. Secondary data is that has been collected for another purpose. It involves less time, cost and effort. Secondary data is data that is being reused, usually in different context . or Data collected from a source that has already been published in any form is called as secondary data. 8

Source of secondary Data 1. Published Printed source Books Journals/Periodicals Magazines/Newspaper 2. Published Printed source E-Journals General websites Weblogs 3. Unpublished personal records Diaries Letters 4. Government records Census data/Population statistics Health Records Educational Institute records 9

Merits of Secondary Data Use is very convenient Saves time and finance In some enquiries primary data can not be collected. Reliable secondary data are generally available for many investigations. 10

Demerits of Secondary Data Very difficult to find sufficiently accurate secondary data. Very difficult to find secondary data which exactly fulfils the need of present investigation. Extra caution is required to use secondary data. Not available for all types of enquires. 11

Quantitative and Quantitative Data Qualitative data : is a categorical measurement expressed not in terms of numbers, but rather by means of a natural language description. Quantitative data: are measures of values or counts and are expressed as numbers . Some examples of quantitative data are your height, your shoe size, and the length of your fingernails. 12

Populations A population is the set of all individuals or events of interest in a particular study Populations: - Are generally very large - Can consists of arbitrary categories of people, objects and events - Can include hypothetical or counterfactual events 13

Samples It is usually impractical for a researcher to examine every individual in the population. Instead researchers typically select a small representative group – a sample-from the population and limit their studies to individuals in the sample. The goal is to use the results obtained from the sample to help answer question about the population. 14

Types of Statistics The study of statistics can be organized in a variety of ways. One of the main ways is to subdivide statistics into two branches: Descriptive statistics and Inferential statistics. 15

Descriptive statistics If a business analyst is using data gathered on a group to describe or reach conclusions about that same group, the statistics are called descriptive statistics. For example, if an instructor produces statistics to summarize a class’s examination effort and uses those statistics to reach conclusions about that class only, the statistics are descriptive. 16

Inferential Statistics If a researcher gathers data from a sample and uses the statistics generated to reach conclusions about the population from which the sample was taken, the statistics are inferential statistics. The data gathered from the sample are used to infer something about a larger group. One application of inferential statistics is in pharmaceutical research. Some new drugs are expensive to produce, and therefore tests must be limited to small samples of patients. Utilizing inferential statistics, researchers can design experiments with small randomly selected samples of patients and attempt to reach conclusions and make inferences about the population 17

Descriptive Statistics It is Organising, summarising & describing data of the group

Descriptive statistics Mode Qu a rtile Median Central Tendency Dispersion Mean Ra n g e V aria nce Coefficient of Variation

Central Tendency: Measures of central tendency yield information about the center, or middle part, of a group of numbers. A single number to serve as a representative value around. Sometimes it is referred to as a “middle” number of the data. which all the numbers in the set tend to cluster Certain types of measures of central tendency are: Mean (average) Median (middle) Mode (most) Quartile (four equal parts)

a 1. Mean Th e (a r it h m e tic) m e an i s the su m of t h e obs e r v a tions divid e d b y the number of observations. The mean of a sample is denoted by (re d “ x bar”). The mean of a complete population is denoted by  (the lower case Greek letter mu). The mean of n data items x1, x2,…, xn, is given by the formula;

Mean Example 01 T en s tud e n ts w e r e polle d a s t o the number of sibli n gs i n t heir in d iv i d u al families The raw data is the following set: {3, 2, 2, 1, 3, 6, 3, 3, 4, 2}. Find the mean number of siblings for the ten students. Solution: sib l in g s (or)

w    x  f  .  f Weighted mean The weighted mean of n numbers x 1 , x 2 ,…, x n , that are weighted by the respective factors f 1 , f 2 ,…, f n is given by the formula:

Weighted Mean Example 01 Example 01: You take three 100-point exams in your statistics class and score 80, 80 and 95. The last exam is much easier than the first two, so your professor has given it less weight. The weights for the three exams are: Exam 1: 40 % of your grade. (Note: 40% as a decimal is .4.) Exam 2: 40 % of your grade. Exam 3: 20 % of your grade. What is your final weighted average for the class?

E x a m p l e : 2 Li st ed belo w a r e the g r ades of a s t u de n ts s e m e st er c o u r ses. Ca l cul a t e the Grade Point Average (GPA). Weighted Mean Example 02 Course Grade Points (x) Credits (f) x * f Math A 4 5 History B 3 3 Health A 4 2 Art C 2 2

The Mean (Arithmetic Average) It is the Arithmetic Average of data values: The Most Common Measure of Central Tendency Affected by Extreme Values (Outliers) n x n i  i  1 n x i  x 2      x n  0 1 2 3 4 5 6 7 8 9 10 Mean = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 6 x  Sample Mean 45

2. Median It is a Important Measure of Central Tendency In an ordered array, the median is the “middle” number data set; Below & above the median there is an equal number of observation If n is odd, the median is the (n+1)/2 th number. If n is even, the median is the average of (n/2) th & ((n/2) + 1) th numbers. Not Affected by Extreme Values 0 1 2 3 4 5 6 7 8 9 10 Median = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5

Table contains data for IQ scores for 10 individuals Rank the observations, i.e., write them down in o rder of size beginning with the smallest 69, 75, 76, 79, 81, 84, 85, 98, 100, 102 Median is the observation that has as many obser vations above it as below it in the ranked ord er Median

When n (total no. of observations) is odd : Median = (n+ 1)/2 th observation When n is even: Median = Average of (n/2) th observation & ((n/2) + 1) th observation Since here n is even: Ordered data: 69, 75, 76, 79, 81, 84 , 85, 98, 100, 102 Median = (81 + 84)/2 = 82.5 Median

Median Example 01 Ex 01: Ten students in a math class were polled as to the number of siblings in their individual families and the results were: 3, 2, 2, 1, 1, 6, 3, 3, 4, 2. Find the median number of siblings for the ten students. Solution: Position of the median = Average of (10/2) th value and (10+2)/2 th Since it is even, the number is Between the 5th and 6th values Data in order: 1, 1, 2, 2, 2, 3, 3, 3, 4, 6 Median = (2 + 3) / 2 = 2.5 siblings

Ex 02: Nine students in a math class were polled as to the number of siblings in their individual families and the results were : 3, 2, 2, 1, 6, 3, 3, 4, 2 Find the median number of siblings for nine students. Solution: Position of the median = (9+1)/2 = 5 th observation The 5th value is median in ordered data set. Data in order: 1, 2, 2, 2, 3, 3, 3, 4, 6 Median = 3 siblings Median Example 02

Solution: Position of the median is the sum of the frequencies divided by 2. Position of the median = Value (x) 1 2 3 4 5 Frequency (f) 4 3 2 4 2  (f)+1 2 = 15+1 2 = 8 th term Add the frequencies from either side until the sum is 8. The 8 th term is the median and its value is 3. Median Example 03 – Frequency Distribution Ex 03: Find the median for the distribution.

t e r m . Solution: Since  (f) is even, then median is average of (  (f)) /2 th and (  (f)/2)+1)th So The median is average of 7 th and 8 th term. Median=(2+3)/2=2.5 Median Example 04 – Frequency Distribution Value (x) 1 2 3 4 5 Frequency (f) 4 3 2 4 1 Ex 04: Find the median for the distribution.

3. Mode The mode is that value of the variable which occurs most frequently There May be Several Mode and it May Not be a Mode also There Used for Either Numerical or Categorical Data Not Affected by Extreme Values 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode

Mode The mode of a data set is the value that occurs the most often. If a distribution has two modes, then it is called bimodal Example: Ten students in a math class were polled as to the number of siblings in their individual families and the results were: 3, 2, 2, 1, 3, 6, 3, 3, 4, 2. Find the mode for the number of siblings 3, 2, 2, 1, 3, 6, 3, 3, 4, 2 Since the highest repeating number is 3 Therefore the mode for the number of siblings is 3

Solution: The mode in a frequency distribution is the value that has the largest frequency Therefore the mode for this frequency distribution is 5 as it occurs eight times. Value (x) 1 2 3 4 5 Frequency (f) 4 3 2 6 8 Mode Example 02 – Frequency Di stribution Ex 03: Find the mode for the distribution.

Application of Mean Median and Mode Average age and Height of students For prices of the houses, we use median in place of mean. Most shoe stores prefer to stock the most popular size, so we will go with mode. Average and Median is used for quantitative data whereas Mode is used for qualitative data

Dispe r sion Dispersion: Measures of dispersion, to describe the spread or the variation of a set of data. Dispersion R a n g e Variance and Standard Deviation Coefficient of Variation Note: Using measures of variability in conjunction with measures of central tenden cy makes possible a more complete numerical description of the data

1. Range It is the crudest measure of dispersion. Range only uses two observations i.e., Highest and Lowest Values It ig n o r es the p a t t e r n of di s t ri b u tion of the obs e r v a tions in b e t w een Highest and lowest Values Range is the difference between the highest and lowest values in aset of data. Range = X Largest - X smallest Measures of Dispersion 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5

No t e: T o ge t the r an g e f o r a v a r iabl e , y o u su b t r act its l o w e s t v alue f r om its highest value. Range Example

2. Variance Variance is a statistical measure of how much a set of observations differ from each other. Variation about the mean: For Population For Sample High variance means that most scores are far away from the mean Low variance means that most scores cluster tightly about the mean. Measures of Dispersion For the Population: use N in the den ominator. For the Sample : use n - 1 in the d enominator.

S = greater Value for the SD is larger for data considered as a Sample. Since the sample standard deviation depends upon the sample, it has variability. Thus the SD of the sample is greater than that of the population. Comparing Standard Deviations Standard Deviation is a summary statistic of how much scores vary from the mean & it is a Square root of the Variance Ex 01: Data : 10 12 14 15 17 18 18 24 Solution: Since N = 8 and Mean = 16 2  n  1  X  X  i = 4 . 3 N i  2  X      = 4 . 3

Comparing Standard Deviations Mean = 15.5 s = 3.338 11 1 2 13 14 1 5 1 6 1 7 1 8 1 9 2 2 1 Data A 11 1 2 13 14 1 5 1 6 1 7 1 8 1 9 2 2 1 Data B Mean = 15.5 s = .9258 11 1 2 13 14 1 5 1 6 1 7 1 8 1 9 2 2 1 Mean = 15.5 s = 4.57 Data C

4. Coefficient of Variation Measure of Relative Variation Always in a % Shows Variation Relative to Mean Used to Compare 2 or More Groups Formula ( for Sample Coefficient of Variation): The coefficient of variation allows investors to determine how much volatility, or risk, is assumed in comparison to the amount of return expected from investments. The lower the ratio of standard deviation to mean return, the better risk-return trade-off.    X  CV   S   100%

Stock A: Average Price last year = $14.40 Standard Deviation = $4.48 Stock B: Average Price last year = $13 Standard Deviation = $3.03 Solution Coefficient of Variation is given by (measure of risk); Therefore Coefficient of Variation for Stock A: CV = 31.11 % (more risky) Stock B: CV = 23.30 % Comparing Coefficient of Variation    X  C V   S   1 %

Correlation

Airline Cost Data 13- 47 Number of Passen g ers X Cost ($1,000) Y 61 4.280 63 4.080 67 4.420 69 4.170 70 4.480 74 4.300 76 4.820 81 4.700 86 5.110 91 5.130 95 5.640 97 5.560

Three Degrees of Correlation 13 - 70 r < r > r =

13- 49

Mea ning A correlation coefficient of 1 means that for every positive increase in one variable, there is a positive increase of a fixed proportion in the other. For example, shoe sizes go up in (almost) perfect correlation with foot length. A correlation coefficient of -1 means that for every positive increase in one variable, there is a negative decrease of a fixed proportion in the other. For example, the amount of gas in a tank decreases in (almost) perfect correlation with speed. Zero means that for every increase, there isn’t a positive or negative increase. The two just aren’t related. 13- 50

Research Papers A demand aggregation approach for inventory control in two echelon supply chain under uncertainty, OPSEARCH, 56 (3), 840-868, 2019 (SCIE, Scopus). A review of multi-objective inventory control problem, International Journal of Intelligent Enterprise, 5 (3), 213-230, 2018 (Scopus) Grey-based decision-making approach for the selection of distributor in a supply chain, International Journal of Intelligent Enterprise 9 (2), 207-225, 2022 (Scopus). Risk pooling approach in multi-product multi-period inventory control model under uncertainty, IEEE Explore, 2018 (Scopus). Risk-Pooling Approach in Inventory Control Model for Multi-products in a Distribution Network Under Uncertainty, Advanced Engineering Optimization Through Intelligent Techniques, Springer Nature, 2020 (Scopus). 4 Articles are under communication (1 SCI, 3 Scopus).
Tags