Introduction to Statistics Lesson 1. Singapore Management University
jizzyjizzywong
9 views
30 slides
Sep 18, 2024
Slide 1 of 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
About This Presentation
Overview of Statistics
Size: 3.55 MB
Language: en
Added: Sep 18, 2024
Slides: 30 pages
Slide Content
8 January 2022 COR-STAT1202 -- Part I 1 I. Overview of Statistics What are Probability and Statistics? Identify types of data and levels of measurement Describe different sampling methods Types of statistical study: observational versus experimental
8 January 2022 COR-STAT1202 -- Part I 2 What are Probability and Statistics? Probability Probability is the likelihood or chance that a particular event will occur. Two Definitions of Statistics Data that describe or summarize something Science of learning from data (collecting, organizing, modeling and interpreting data)
8 January 2022 COR-STAT1202 -- Part I 3 Two branches of Statistics Descriptive statistics Collecting, summarizing, and describing data Inferential statistics Drawing conclusions and/or making decisions concerning a population based only on sample data
8 January 2022 COR-STAT1202 -- Part I 4 Descriptive Statistics Collect data Government records Conduct a survey or an experiment Present data Use tables and graphs to present the results Summarize data Provide numerical results, such as sample mean
8 January 2022 COR-STAT1202 -- Part I 5 Inferential Statistics Use the sample statistics to estimate or test the population parameters ( inferential statistics ). The primary purposes of statistics is to help researchers assess the validity of this conclusion or how well a sample statistics estimates a population parameter. The margin of error in the estimate is used to describe the range of values (based on the raw data) to contain the population parameter.
8 January 2022 COR-STAT1202 -- Part I 6 Use statistics to make decisions The primary purpose of statistics is to make good decision about issues that involve uncertainty. For examples: To measure the degree of acid rain based on the PH levels of rain collected To predict the diamond prices based on the weight of diamonds To estimate different ingredients such as tar, nicotine, and carbon dioxide present in cigarettes
8 January 2022 COR-STAT1202 -- Part I 7 An example of whole process of statistical study All SMU Male Students Estimate Average Height Of All SMU Male Students Average Height Of 80 Students 80 Male SMU Students START Goals = Get the average height of male SMU students Randomly draw students from SMU male students Use raw data to summarize Make inferences about average height of male SMU students Draw conclusions
8 January 2022 COR-STAT1202 -- Part I 8 Type of data Data Categorical (qualitative) Numerical (quantitative) Discrete Continuous
8 January 2022 COR-STAT1202 -- Part I 9 Levels of measurement and measurement scales Nominal Data Ordinal Data Interval Data Ratio Data Levels of Measurement Lowest Level Highest Level Scales Categories Ordered Categories No true zero True zero exists Example Nationality Excellent, good, bad Temperature in Celsius Number of Hours
8 January 2022 COR-STAT1202 -- Part I 10 Data source Data Sources Primary Data Collection Secondary Data Compilation Observation or Survey Experiment Internet Print documents or existing studies
8 January 2022 COR-STAT1202 -- Part I 11 Key definitions A variable is a characteristic of an item or individual A random variable is a numerical measurement of the outcome of a random phenomenon. Often, the randomness results from the use of random sampling or a randomized experiment to gather the data
8 January 2022 COR-STAT1202 -- Part I 12 Key definitions A population is the collection of all items or things under consideration A sample is a portion of the population selected for analysis A parameter is a summary measure that describes a characteristic of the population A statistic is a summary measure computed from a sample .
8 January 2022 COR-STAT1202 -- Part I 13 Population vs. Sample 2 3 4 5 5 6 7 7 8 8 3 5 5 2 5 6 6 7 2 5 5 7 8 Population Sample Measures used to describe the population are called parameters, eg , population mean is 5 Measures computed from sample data are called statistics, eg, sample mean is 5.4
8 January 2022 COR-STAT1202 -- Part I 14 Reasons of sampling A census is the collection of data from every member of a population Too expensive Time consuming Impossible to do it A sample is the random selection of some data from the population Less expensive Less time consuming More practical
8 January 2022 COR-STAT1202 -- Part I 15 Probability sampling Items in the sample are chosen based on known probabilities Probability Samples Simple Random Systematic Stratified Cluster Other Sampling approaches
8 January 2022 COR-STAT1202 -- Part I 16 Simple random samples Choose a sample of items in such a way that every subject in the sample has the equal chance of being selected. Have ID/names of all the subjects (frame) in the population Randomly select the ID Obtain the sample
8 January 2022 COR-STAT1202 -- Part I 17 Systematic sampling Partition the N items in the frame into n groups of k items, eg N=90, n=10, k=9 Random choose a number s from 1 to k=9, say s=3 Select every k th member thereafter from the entire frame
8 January 2022 COR-STAT1202 -- Part I 18 Cluster sampling Cluster sampling involves the selection of all members in randomly selected groups or cluster. For example: Randomly selected some blocks from the Pine Grove Estate. Obtain the information from all the residences from the selected blocks.
8 January 2022 COR-STAT1202 -- Part I 19 Stratified sampling The population is represented by subgroups, or strata and then the sample consists of randomly selected members from each stratum. E.g. use blocks as the subgroups. Randomly selected the some residences from each block.
8 January 2022 COR-STAT1202 -- Part I 20 Convenience sampling (non-probability samples) The sample is chosen for convenience rather than a more sophisticated procedure. STAT201 I want to know what proportion of SMU students is left-handed The left-handed students please raise your hands Since 4 out of 40 students are left-handed, the proportion of left-handed SMU students is estimated to be 10% (not scientific!)
8 January 2022 COR-STAT1202 -- Part I 21 Bias Bias can arise in the study if the design or conduct of a statistical study tends to favor certain results: Members of sample differ in some specific way from the members of the general population. The researcher may intentionally or unintentionally distort the true meaning of the data. Data were collected intentionally or unintentionally in a way that makes the data unrepresentative of the population. Presentation of sample results tell only part of the story or shows the data in a misleading way.
8 January 2022 COR-STAT1202 -- Part I 22 Types of survey errors Coverage error or selection bias Some groups are excluded from the frame and have no chance of being selected Non-response error Some people who refuse to respond may be different from those who respond Measurement error The characteristic of population is not measured correctly Sampling error Variation from sample to sample
8 January 2022 COR-STAT1202 -- Part I 23 Types of statistical studies The people or objects chosen for the sample are called the subjects. In an observational study , researchers observe or measure characteristics of the subjects, but do not attempt to influence or modify these characteristics. In an experimental study , researchers apply some treatment and observe its effects on the subjects of the study.
8 January 2022 COR-STAT1202 -- Part I 24 Experimental study The treatment group is the group of subjects who receive the treatment of being tested. The control group in an experiment is the group of subjects who do not receive the treatment being tested.
8 January 2022 COR-STAT1202 -- Part I 25 Assigning treatment and control groups Assign subjects to treatment or control group at random so that each subject has an equal chance of being assigned to either group. Use a sufficiently large number of subjects.
8 January 2022 COR-STAT1202 -- Part I 26 Case-control studies Sometimes it may be impractical or unethical to create a controlled experiment, e.g. how smoking affects health. A case-control study is an observational study that resembles an experiment because the sample naturally divides into two (or more) groups. The subjects naturally form groups by their own choice before the observation (also called the retrospective studies). The subjects who engage in the behavior under study form the cases (like a treatment group in an experiment). The subjects who do not engage in the behavior under study are the controls (like the control group in an experiment). 22/5/2004 published in Strait Time
8 January 2022 COR-STAT1202 -- Part I 27 Confounding A study suffers from confounding if the effects of different factors are mixed so that we can not determine the effects of the specific factors we are studying. The factors that lead to the confusion are called the confounding factors.
Real life example: confounding “Pregnant Women Warned of Chlorinated Tap Water: Miscarriage Risk – But More Study Urged” San Francisco Chronicle, February 2, 1998 The subjects were 5144 pregnant women enrolled in the health plan The outcome was whether the woman had a miscarriage in the first trimester of pregnancy Two groups: those who drank 5 or more glasses a day of highly chlorinated tap water and those who drank less than 5 glasses of tap water This is a case-control study . A possible confounding factor is income because those drinking “better” water, such as bottled water, may trend to have higher incomes, and higher income women tend to have lower rate of miscarriage or healthier pregnancies 8 January 2022 COR-STAT1202 -- Part I 28
Useful and interesting websites 8 January 2022 COR-STAT1202 -- Part I 29 http://wps.prenhall.com/bp_groebner_busstats_8/145/37311/9551672.cw/-/9551709/index.html (old version, free) http://wps.pearsoned.com/phstat/ (new version, not free, need an access code packaged with textbook, ) Try to download the PHStat which is compatible to some Windows computers. Don’t spend too much time if PHStat does not work on your computer. Supplemental resources for the textbook https://media.pearsoncmg.com/intl/ge/2020pp/cws/ge_levine_smume_9/lsxl9ege_student_download.html
8 January 2022 COR-STAT1202 -- Part I 30 Recommended questions from the textbook (answers are at the back of the textbook) Question Page 1.2, 1.4, 1.6 48-49 1.30, 1.32, 1.34 59-60 1.52 6 2