Objectives Define statistics. Enumerate the importance and limitations of statistics. Explain the process of statistics. Know the difference between descriptive and inferential statistics. Distinguish between qualitative and quantitative variables. Distinguish between discrete and continuous variables. Determine the level of measurement of variables.
STATISTICS? 3
Definition of Statistics STATISTICS is the science of collecting, organizing, summarizing and analyzing information to draw conclusions or answer questions.
Definition of Statistics Collection of information. Organization and summarization of information. Information is analyzed to draw conclusions or answer specific questions. Results should be reported using some measure that represents how convinced we are that our conclusions reflect reality.
Importance of Statistics
It enables people to make decisions based on empirical evidence. Importance of Statistics Provides us with tools needed to convert massive data into pertinent information that can be used in decision making. Provides us information that we can used to make sensible decision.
DATA DATA are factual information used as a basis for reasoning, discussion, or calculation.
Field of Statistics
Field of Statistics Mathematical Statistics - The study and development of statistical theory and methods in the abstract. Applied Statistics - The application of statistical methods to solve real problems involving randomly generated data and the development of new statistical methodology motivated by real problems.
Limitation of Statistics
Limitation of Statistics 1. Statistics is not suitable to the study of qualitative phenomenon. 2. Statistics does not study individuals. 3. Statistical laws are not exact. 4. Statistics table may be misused. 5. Statistics is only, one of the methods of studying a problem.
Process of Statistics
Process of Statistics 1. Identify the research objective - A researcher must determine the question(s) he or she wants to answered. The question(s) must clearly identify the population that is to be studied.
Process of Statistics 2. Collect the information needed to answer the questions. - Conducting research on an entire population is often difficult and expensive, so we typically look at a sample.
EXAMPLE The Philippine Mental Health Associations contacts 1, 028 teenagers who are 13 to 17 years of age and live in Laoag City and asked whether or not they had been prescribed medications for any mental disorders, such as depression or anxiety. Population: Teenagers 13 to 17 years of age who live in Laoag City. Sample: 1, 028 teenagers 13 to 17 years of age who live in Laoag City.
EXAMPLE A farmer wanted to learn about the weight of his corn crop. He randomly sampled 100 plants and weighted the corn on each plant. Population: Entire corn crop Sample: 100 selected corn crop
Process of Statistics 3. Organize and summarize the information - Descriptive statistics allow the researcher to obtain an overview of the data and can help determine the type of statistical methods the research should use.
Process of Statistics 4. Draw conclusion from the information - Information collected from the sample is generalized to the population. - Inferential statistics uses methods.
Take Note! If the entire population is studied, then inferential statistics is not necessary, because descriptive statistics will provide all the information that we need regarding the population.
EXAMPLE 1. A badminton player wants to know his average score for the past 10 games.
EXAMPLE 2. A car manufacturer wishes to estimate the average lifetime of batteries by testing a sample of 50 batteries.
EXAMPLE 3. Janine wants to determine the variability of her six exam scores in Algebra.
EXAMPLE 4. A politician wants to determine the total number of votes his rival obtained in the past election based on his copies of the tally sheet of electoral returns.
EXAMPLE 5. A shipping company wishes to estimate the number of passengers traveling via their ships next year using their data on the number of passengers in the past three years.
Distinction Between Qualitative and Quantitative Variables
Qualitative and Quantitative Variables - Characteristics of the individuals within the population. Variables
Qualitative and Quantitative Variables - is variable that yields categorical responses. It is a word or a code that represents a class or category. Qualitative Variable
Qualitative and Quantitative Variables - takes on numerical values representing an amount or quantity. Quantitative Variable
EXAMPLE 1. Hair Color 2. Temperature 3. Stages of Breast Cancer 4. Number of Hamburger Sold
EXAMPLE 5. Number of Children 6. Zip Code 7. Place of Birth 8. Degree of Pain
Distinction Between Discrete and Continuous
Discrete and Continuous - is a quantitative variable that either a finite number of possible values or a countable number of possible values. Discrete Variable
Discrete and Continuous - is a quantitative variable that has an infinite number of possible values that are not countable. Continuous Variable
EXAMPLE 1. The number of heads obtained after flipping a coin five times. 2. The number of cars that arrive at a McDonald’s drive-through between 12:00 P.M. and 1:00 P.M. 3. The distance of a 2005 Toyota Car can travel in city conditions with a full tank of gas.
EXAMPLE 4. Number of words correctly spelled. 5. Time of a runner to finish one lap.
Levels of Measurement
Levels of Measurement Nominal Ordinal Interval Ratio Quantitative Qualitative
Levels of Measurement Nominal - They are sometimes called categorical scales or categorical data. Such a scale classifies persons or objects into two or more categories.
Example Nominal Method of Payment Type of School Eye Color
Levels of Measurement Ordinal - This involves data that may be arranged in some order, but differences between data values either cannot be determined or meaningless.
Example Food Preferences Stage of Diseases Social Economic Class Severity of Pain Ordinal
Levels of Measurement - This is a measurement level not only classifies and orders the measurement, but it also specifies that the distances between each interval on the scale are equivalent along the scale from low interval to high interval. Interval
Example Temperature on Fahrenheit/Celsius Thermometer Trait Anxiety IQ Interval
Levels of Measurement - A ratio scale represents the highest, most precise, level of measurement. It has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. Ratio
Example Height and Weight Time Time until death Ratio
Example Ranking of college athletic teams. Employee number. Number of vehicles registered. Brands of soft drinks. Number of car passers along C5 on a given day.
Assessments/Activities
Identify each of the following data sets as either Population or a Sample. The grade point average (GPAs) of all students at a college. The GPAs of a randomly selected group of students at a college campus. The ages of the nine Supreme Court Justice of the United States on January 1, 1842. The gender of every second customer who enter a movie theater. The lengths of Atlantic croakers caught on a fishing trip to the beach.
Identify the following measures as either Quantitative or Qualitative. The gender of the first 40 newborns in a hospital one year. The natural hair color of 20 randomly selected fashion models. The ages of 20 randomly selected fashion models. The fuel economy in miles per gallon of 20 new cars purchased last month. The political affiliation of 500 randomly selected voters.
Data Collection and Basic Concepts in Sampling Design
Objectives Determine the sources of data (primary and secondary data). Distinguish the different methods data collection under primary and secondary data. Determine the appropriate sample size. Differentiative various sampling techniques. Know the sources of errors in sampling.
Data Collection Data collection is the process of gathering and measuring information on variables of interest, in an established systemic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes.
Consequences from Improperly Collected Data
Data Collection Inability to answer research questions accurately. Inability to repeat and validate the study. Distorted findings resulting in wasted resources. Misleading other researches to pursue fruitless avenues of investigation. Compromising decisions for public policy. Causing harm to human participants and animal subjects.
Steps in Data Gathering
Steps in Data Gathering Set the objectives for collecting data. Determine the data needed based on the set objectives. Determine the method to be used in data gathering and define the comprehensive data collection points. Design data gathering forms to be used. Collect data.
Choosing of Method of Data Collection
Data Collection Decision-makers need information that is relevant, timely, accurate and usable. The cost of obtaining, processing and analyzing these data is high.
Sources of Data
Primary Sources Provide a first-hand account of an event or time period and are considered to be authoritative.
Primary Data Data documented by the primary source. The data collectors documented the data themselves.
Secondary Sources Offer an analysis, interception or a restatement of primary sources and are considered to be persuasive.
Secondary Data Data documented by a secondary source. The data collectors had the data documented by other sources.
The Primary Data Can Be Collected In 5 Methods
Methods 1. Direct Personal Interviews - the researcher has direct contact with the interviewee. The researcher gathers information by asking questions to the interviewee.
Methods 2. Interact/Questionnaire Method - this methods of data collection involve sourcing and accessing existing data that were originally collected for the purpose of the study.
Questions to be Considered Who exactly do we want to know according to the objectives and variables we identified earlier?
Questions to be Considered Of whom will we ask questions and what techniques will we use?
Questions to be Considered Are our informants mainly literate or illiterate?
Questions to be Considered How large is the sample that will be interviewed?
Key Design Principles of a Good Questionnaire
Key Design Principles of a Good Questionnaire Keep the questionnaire as short as possible. Decide on the type of questionnaire (open ended or closed ended). Write the questions properly. Order the questions appropriately.
Key Design Principles of a Good Questionnaire 5. Avoid questions that prompt or motivate the respondent to say what you would like to hear. 6. Write an introductory letter or an introduction. 7. Write special instructions for interviewers or respondents.
Key Design Principles of a Good Questionnaire 8. Translate the questions if necessary. 9. Always test your questions before taking the survey.
Open-Ended Question & Closed-ended Question
Open-ended Question - type of question that does not include response categories. The respondent is not given any possible answers to choose from.
Closed-ended Question - is a type of question that includes a list of response categories from which respondent will select his/her answer.
Advantages
Open-ended VS Closed-ended More detailed answer. Could reveal additional insights. Easy to encode, tabulate, and analyze. Easy to understand. Enables inter-study comparison. Saves time and money. High response rate.
Disadvantages
Open-ended VS Closed-ended Difficult to encode, tabulate, and analyze. Low response rate. Respondent has to be articulate. Respondent could feel threatened. Responses could have different levels of detail. Could frustrate respondents. Potentially biased response sets. Difficult or impossible to detect if respondent truly understood the questions.
Methods 3. Focus Group - is a group interview of approximately six to twelve people who share similar characteristics or common interest.
Methods 4. Experiment - is a method of collecting data where there is direct human intervention on the conditions that may affect the values of the variable of interest.
Experiment Ethical, moral, and legal concerns. Unrealistic controlled environments. Inability to control for all variables.
Methods 5. Observation - is a technique that involves systematically selecting, watching and recording behaviors of people or other phenomena and aspects of the setting in which they occur, for the purpose of getting specified information.
Methods Published report on newspaper and periodicals. Financial data reported in annual reports. Records maintained by the institution. Internal reports of the government departments. Information from official publications.
Take Note! Always investigate the validity and reliability of the data by examining the collection method employed by your source. Do not use inappropriate data for your research. The choice of methods of data collection is largely based on the accuracy of the information they yield.
Sample Size
Sample Size “How many participants should be chosen for a survey”?
Sample Size - is typically denoted by n and it is always a positive integer. - no exact sample size can be mentioned here and it can vary in different research settings.
Take Note! Representativeness, not size, is the more important consideration. Use no less than 30 subjects if possible. If you use complex statistics, you may need a minimum of 100 or more in your sample (varies with method)
Non-Statistical and Statistical Considerations
Non-Statistical Considerations - It may include availability of resources, man power, budget, ethics and sampling frame.
Statistical Considerations - It will include the desired precision of the estimate.
Criteria in Determining the Appropriate Sample Size
1. Level of Precision - Also called sampling error, the level of precision, is the range in which the true value of the population is estimated to be.
2. Confidence Interval - It is statistical measure of the number of times out of 100 that results can be expected to be within a specified range.
3. Degree of Variability - depending upon the target population and attributes under consideration, the degree of variability varies considerably.
Methods in Determining the Sample Size
1. Estimate the Mean or Average - The sample size required to estimate the population mean µ to with a level of confidence with specified margin of error e .
Take Note! If when is unknown, it is common practice to conduct a preliminary survey to determine and use it as an estimate of or use results from previous studies to obtain an estimate of . When using this approach, the size of the sample should be at least 30.
Example A soft drink machine is regulated so that the amount of drink dispensed is approximately normally distributed with a standard deviation equal to 0.5 ounce. Determine the sample size needed if we wish to be 95% confident that our sample mean will be within 0.03 ounce from the true mean.
2. Estimating Proportion (Infinite Population) - The sample size required to obtain a confidence interval for p with specified margin of error e .
Example Suppose we are doing a study on the inhabitants of a large town, and want to find out how many households serve breakfast in the mornings. We don’t have much information on the subject to begin with, so we’re going to assume that half of the families serve breakfast: this gives us maximum variability. So p = 0.5. We want 99% confidence and at least 1% precision.
3. Slovin’s Formula - Slovin’s formula is used to calculate the sample size n given the population size and error.
Example A researcher plans to conduct a survey about food preference of BS Stat students. If the population of students is 1000, find the sample size if the error is 5%
4. Finite Population Correction - If the population is small then the sample size can be reduced slightly.
Online Calculator of Sample Size https://select-statistics.co.uk/calculators/sample-size-calculator-population-proportion/ https://www.calculator.net/sample-size-calculator.html
Basic Sampling Design
Reason for Sampling Important that the individuals included in sample represent a cross section individuals in the population. If sample is not representative it is biased. You cannot generalize to the population from your statistical data.
Observation Unit An object on which a measurement is taken. This is the basic unit of observation, sometimes called an element.
Target Population The complete collection of observation we want to study.
Sample Population The collection of all possible observation units that might have been chosen in a sample; the population from which the sample was taken.
Sample A subset of a population.
Sampling Unit A unit that can be selected for a sample. We may want to study individuals, but do not have a list of all individuals in the target population.
Sampling Frame A list, map, or other specification of sampling units in the population from which a sample may be selected.
Sampling Bias This involves problems in your sampling, which reveals that your sample is not representative of your population.
Advantages of Sampling Over Complete Enumeration
Advantage of Sampling Less Labor Reduced Cost Greater Speed Greater Scope Greater Efficiency and Accuracy Convenience Ethical Considerations
Two Type of Sample
1. Probability Sample Samples are obtained using some objective chance mechanism, thus involving randomization. They require the use of a complete listing of the elements of the universe called sampling frame.
1. Non -Probability Sample Samples are obtained haphazardly, selected purposively or are taken as volunteers. The probabilities of selection are unknown. They should not be used for statistical inference.
Sampling Procedure
Sampling Procedure Identify the population Determine if population is accessible Select a sampling method. Choose a sample that is representative of the population. Ask the question, can I generalize to the general population from the accessible population?
Basic Sampling Technique of Probability Sampling
1. Simple Random Sampling Most basic method of drawing a probability sample. Assigns equal probabilities of selection to each possible sample. Results to a simple random sample.
Simple Random Sampling Advantages and Disadvantages It is very simple and easy to use. The sample chosen may be distributed over a wide geographic area.
When to Use Simple Random Sampling This is preferable to use if the population is not widely spread geographically.
2. Systematic Random Sampling It is obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 to n.
Obtaining a Systematic Random Sample
Obtaining a Systematic Random Sample Decide on a method of assigning a unique serial number, from 1 to N, to each one of the elements in the populations.
Obtaining a Systematic Random Sample Compute for the sampling interval:
Obtaining a Systematic Random Sample Select a number, from 1 to k, using a randomization mechanism. The element in the population assigned to this number is the first elements of the sample are those assigned to the numbers and so on until you get a sample of size.
Example We want to select a sample of 50 students from 500 students under this method kth item and picked up from the sampling frame.
Systematic Random Sampling Advantages and Disadvantages Drawing of the sample is easy. It is easy to administer in the field, and the sample is spread evenly over the population. May give poor precision when unsuspected periodicity is present in the population.
When to Use Systematic Random Sampling This is advisable to us if the ordering of the population is essentially random and when stratification with numerous data is used.
3. Stratified Random Sampling It is obtained by separating the population into non-overlapping groups called strata and then obtaining a simple random sample from each stratum.
Example A sample of 50 students is to be drawn from a population consisting of 500 students belonging to two institutions A and B. The number of students in the institution A is 200 and the institution B is 300. How will you draw the sample using proportional allocation?
Stratified Random Sampling Advantages and Disadvantages Stratification of respondents is advantageous in terms of precision of the estimates of the characteristics of the population. Values of the stratification variable may not be easily available for all units in the population especially if the characteristics of interest is homogenous.
When to Use Stratified Random Sampling If the population is such that the distribution of the characteristics of the respondents under consideration concentrated in small and spread segment of the population.
4. Cluster Sampling You take sample from naturally occurring groups in your population. The clusters are constructed such that the sampling units are heterogeneous within the cluster and homogeneous among the clusters.
Obtaining a Cluster Sample
Obtaining a Cluster Sample Divide the population into non-overlapping clusters. Number the clusters in the population from 1 to N.
Obtaining a Cluster Sample Select n distinct numbers from 1 to N using a randomization mechanism. The selected clusters are the clusters associated with the selected numbers. The sample will consist of all elements in the selected clusters.
Example A researcher wants to survey academic performance of high school students in MIMAROPA.
Cluster Sampling Advantages and Disadvantages There is no need to come out with a list of units in the population; all what is needed is simply a list of the clusters. In actual field applications adjacent households tend to have more similar characteristics than households distantly apart.
When to Use Cluster Sampling If the population can be grouped into clusters where individual population elements are known to be different with respect to the characteristics under study, this preferable to use.
5. Multi-Stage Sampling Selection of the sample is done in two or more steps or stages, with sampling units varying in each stage. The population is first divided into number of first-stage sampling units from which a sample is drawn.
Obtaining a Multi-Stage Sampling
Obtaining a Multi-Stage Sampling Organize the sampling process into stages where the unit of analysis is systematically grouped. Select a sampling technique for each. Systematically apply the sampling technique to each stage until the unit of analysis has been selected.
Example Suppose we wish to study the expenditure patterns of households in NCR. We can select a sample of households for this study using simple three-stage sampling.
Multi-Stage Sampling Advantages and Disadvantages It is easier to generate adequate sampling frames. Transportation costs are greatly reduced since there is some form of clustering among ultimate or final samples. It is complexity in theory may be difficult to apply in the field. Estimation procedures may be difficult for non-statisticians to follow.
When to Use Multi-Stage Sampling If no population list is available and if the population covers a wide area.
Take Note! Used probability sampling if the main objective of the sample survey is making inferences about the characteristics of the population under study.
Basic Sampling Technique of Non-Probability Sampling
Accidental Sampling There is no system of selection but only those whom the researcher or interviewer meets by chance.
Quota Sampling There is specified number of persons of certain types is included in the sample.
Convenience Sampling It is process of picking out people in the most convenient and fastest way to get reactions immediately.
Purposive Sampling It is based on certain criteria laid down by the researcher.
Judgement Sampling Selects sample in accordance with an expert’s judgement.
Cases wherein Non-Probability Sampling is Useful
Cases wherein Non-Probability Sampling is Useful Only few are willing to be interviewed. Extreme difficulties in locating or identifying subjects. Probability sampling is more expensive to implement. Cannot enumerate the population elements.
Sources of Errors in Sampling
1. Non-sampling Error Errors that results from the survey process. Any errors that cannot be attributed to the sample-to-sample variability.
Sources of Non-sampling Error Non-response Interview Error Misrepresented Answers Data entry errors Questionnaire Design Wording of Questions Selection Bias
2. Sampling Error Error that results from taking one sample instead of examining the whole population. Error that results from using sampling to estimate information regarding a population.