Sampling_and_Data research methodology.pptx

ssuserbc4c21 21 views 103 slides Jun 06, 2024
Slide 1
Slide 1 of 103
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103

About This Presentation

Sampling in research


Slide Content

Research Methods By: Wondmagegn D. (PhD Candidate)

Discussion points 1. Define the following terms Source population Study population Sampling Sampling unit Study unit Sampling frame Sampling interval 2. Advantages and disadvantages of sampling 3. Discuss the different types of sampling methods 4. How sample size is determined? 3/19/24 2

Population The population under consideration should be clearly and explicitly defined in terms of place , time , and other relevant criteria. The selection of study population on the basis of suitability usually affects the validity of subsequent generalizations from the findings. 3/19/24 3

Reference population (target population): the population of interest, to which the investigators would like to generalize the results of the study. Study population: Population from which the sample actually was drawn and about which a conclusion can be made. Two examples: Volunteer and Hospital populations 3/19/24 4

Volunteer populations Persons who volunteer to enter a study may differ in many respects from those who do not so volunteer, and therefore the findings in a volunteer population do not necessarily apply to the population at large. b) Hospital or clinic populations Persons receiving medical care are obviously not representative of the general population from which they have come from. That is, persons treated in hospital for a certain disease may differ from those patients with the same disease but not receiving care for it. 3/19/24 5

Operational Definition of Variables Definition : A variable is a characteristic of a person, object, or phenomenon that can take on different values. A simple example of a variable is a person's age. The variable can take on different values, such as, 20 years old, 30 years old, and so on. 3/19/24 6

Operationalizing variables by choosing appropriate indicators is important Operationalizing variables means that you make them ‘ measurable'. E.g. In a study on VCT acceptance , you want to determine the level of knowledge concerning HIV in order to find out to what extent the factor ‘poor knowledge’ influences willingness to be tested for HIV. The variable ‘level of knowledge’ cannot be measured as such. You would need to develop a series of questions to assess a person’s knowledge, for example on modes of transmission of HIV and its prevention methods. 3/19/24 7

The answers to these questions form an indicator of someone’s knowledge on this issue, which can then be categorized. If 10 questions were asked, you might decide that the knowledge of those with: 0 to 3 correct answers is poor , 4 to 6 correct answers is reasonable and 7 to 10 correct answers are good. 3/19/24 8

Sampling Sampling is the process of selecting a portion of the population to represent the entire population. Sampling allows one to obtain a representative picture about the population, without studying the entire population. Reference population (target or source population): the population of interest, to which the investigators would like to generalize the results of the study. 3/19/24 9

Study population: the subset of the target population from which a sample will be drawn and conclusion is made. Sample: the actual group in which the study is conducted Why sampling? Often, it is too expensive or impossible to collect information on an entire population. For appropriately chosen samples, accurate statistical estimates of population parameters are possible. 3/19/24 10

Sampling frame: the list of all the units in the reference population, from which a sample is to be picked. Sampling fraction: the ratio of the number of units in the sample to the number of units in the reference population (n/N) Sample size: The number of units in the sample. 3/19/24 11

Advantages Saves resources Improves quality of data Disadvantages Sampling error : errors in the selection of a sample A different sample would give a different estimate, the difference being due to sampling variation . Should be minimized 3/19/24 12

When taking a sample, we will be confronted with the following questions: What is the group of people from which we want to draw a sample? How many people do we need in our sample? How will these people be selected? 3/19/24 13

Sampling methods An important issue influencing the choice of the most appropriate sampling method is whether a sampling frame is available, that is, a listing of all the units that compose the study population. Two broad areas: Non-probability and probability sampling methods 3/19/24 14

3/19/24 15

A) Non-probability sampling methods Used when sampling frame does not exist Examples: 1. Convenience sampling 2. Quota sampling 3. Snowball sampling 4. Volunteer sampling The above sampling methods do not claim to be representative of the entire population. 3/19/24 16

1. Convenience sampling It is a method in which for convenience sake the study units that happen to be available at the time of data collection are selected. 2. Quota sampling It is a method that insures that a certain number of sample units from different categories with specific characteristics appear in the sample so that all these characteristics are represented. 3/19/24 17

In this method the investigator interviews as many people in each category of study unit as he can find until he has filled his quota. 3. Snow ball sampling A data collector interviews persons from the indication of other persons (prior interviewee). 3/19/24 18

4. Volunteer Sampling As the term implies, this type of sampling occurs when people volunteer to be involved in the study In psychological experiments or pharmaceutical trials (drug testing), for example, it would be difficult and unethical to enlist random participants from the general public In these instances, the sample is taken from a group of volunteers. 3/19/24 19

3/19/24 20

B) Probability sampling methods: Involve random selection All units of the study population should have an equal or at least a known chance of being included in the sample. Sample findings can be generalized 3/19/24 21

Examples of Probability Sampling Methods 1. Simple Random Sampling(SRS) 2. Systematic Sampling 3. Stratified sampling 4. Cluster sampling 5. Multi-Stage Sampling 3/19/24 22

1. Simple Random Sampling Least biased of all sampling techniques, there is no subjectivity - each member of the total population has an equal chance of being selected. One of the easiest and most convenient methods for achieving reliable inferences about a population is to take a simple random sample 3/19/24 23

To select a simple random sample you need to: Make a numbered list of all the units in the population Each unit should be numbered from 1 to N (where N is the size of the population) Select the required number using lottery method, table of random numbers or computer programs (ensure randomness of sample). 3/19/24 24

Example (random numbers) Suppose there are N=850 students in a school from which a sample of n=10 students is to be taken. The students are numbered from 1 to 850. Since our data runs into three digits we use random numbers that contain three digits. All numbers exceeding 850 are ignored because they do not correspond to any serial numbers in the data. In case the same number occurs again, the repetition is skipped. 3/19/24 25

2. Systematic Sampling Selection of individuals from the sampling frame systematically rather than randomly. We have to number the data items from 1 to N (sampling frame). Suppose the sample size be n, then we have to calculate the sampling interval by dividing N by n and generate a number between 1 and N/n and select that data item to be in the sample. 3/19/24 26

Other items in the sample are obtained by adding the sampling interval N/n successively to the random number. Individuals are chosen at regular intervals (every k th ) from the sampling frame. The first unit to be selected is taken at random from among the first k units. 3/19/24 27

3. Stratified Random Sampling It is done when the population is known to have heterogeneity with regard to some factors , and those factors are used for stratification. The population is divided into homogeneous, mutually exclusive groups called strata according to a characteristic of interest (e.g., sex, geographic area, prevalence of disease, etc.). A separate sample is taken independently from each stratum. 3/19/24 28

4. Cluster Sampling Method of sampling in which the element selected is a group (as distinguished from an individual), called a cluster . The most widely used to reduce the cost The clusters should be homogeneous , unlike stratified sampling where the strata are heterogeneous The sampling unit is a cluster, and the sampling frame is a list of these clusters 3/19/24 29

5. Multi-stage Sampling Similar to the cluster sampling, except that it involves picking a sample from within each chosen cluster, rather than including all units in the cluster. This method is appropriate when the reference population is large and widely scattered . Selection is done in stages until the final sampling unit This type of sampling requires at least two stages. 3/19/24 30

3/19/24 31

3/19/24 32

3/19/24 33

E xternally heterogenous , internally homogenous ( stratified) I nternally hetro , Externally Homogeneous ( cluster sampling ) 3/19/24 34

3/19/24 35

3/19/24 36

Errors During sampling, our results will not exactly equal the correct results for the whole population. That is, our results will be subject to errors. This error has two components: Sampling and Non-sampling errors a) Sampling error (i.e., random error) Random error, the opposite of reliability (i.e., Precision or repeatability), consists of random deviations from the true value, which can occur in any direction. ( it is when there is problem in the selection of population) It can be minimized by increasing the size of the sample . Reliability/precision); the repeatability of a measure, i.e., the degree of closeness between repeated measurements of the same value. 3/19/24 37

b) Non Sampling error (i.e., Bias) Bias, the opposite of validity, consists of systematic deviations from the true value, always in the same direction. It is possible to eliminate or reduce the non-sampling error (bias) by careful design of the sampling procedure . 3/19/24 38

Sample Size Determination In planning any investigation, we must decide how many people need to be studied in order to answer the study objectives. An appropriate sample size will produce accurate results. If the study is too small, we may fail to detect important effects, or may estimate effects too imprecisely. If the study is too large , then we will waste resources. The patients/animals are subjected to unnecessary hardship; hence inappropriate sample size is definitely unethical . 3/19/24 39

In order to calculate the required sample size (n), you need to know the following facts: The reasonable estimate of the key proportion to be studied (p). ( from a previous study ( article ) on the topic but if you don’t know the p u can use 0.5 as the p ) The degree of accuracy required (d). That is, the allowed deviation from the true proportion in the population as a whole. The confidence level required, usually specified as 95%. ( Z α /2 ) The size of the population that the sample is to represent (N). 3/19/24 40

3/19/24 41

Determination of Sample Size for Estimating proportion The minimum sample size (n) required for a very large population (N>10,000) is: n = Z 2 p(1-p) / d 2 3/19/24 42

Note The above formulas are used with the assumption of a very large population (N<10,000) For finite population use: Take the values of p and standard deviations from other published research findings If there are more than one p value, take a p value that gives the maximum value when p is multiplied with q. Other option: do pilot study ( takes time but is preferred ) If the value of p is unknown, take 50% Consider non response (5-10%) in your sample size calculation 3/19/24 43

Variables 3/19/24 44

3/19/24 45

3/19/24 46

3/19/24 47

Inclusion and exclusion criteria 3/19/24 48 A ge is not an exclusion criteria . T he population that can obsicure our results is the ones we exclusion

Plan for Data Collection A plan for data collection should be developed so that: You will have a clear overview of what tasks have to be carried out, who should perform them, and the duration of these tasks; You can organise both human and material resources for data collection in the most efficient way; and You can minimise errors and delays which may result from lack of planning (for example, the population not being available or data forms being misplaced). 3/19/24 49

Stages Three main stages can be distinguished: Stage 1: Permission to proceed ( IRP OF THE institution, ….. The permission of the participants ) Stage 2: Data collection Stage 3: Data handling ( data cleaning ) 3/19/24 50 P retesting is best to ajdust the tool of data collection

Data collection Methods Methods of Collecting Quantitative Data The most commonly used methods of collecting information (quantitative data) are: The use of documentary sources Interview administered questionnaire Self-administered questionnaire 3/19/24 51

A. The use of documentary sources Clinical records and other personal records, death certificates, published mortality statistics, census publications, etc. Advantages: Documents can provide ready-made information relatively easily The best means of studying past events. 3/19/24 52

Disadvantages: Problems of reliability and validity (because the information is collected by a number of different persons who may have used different definitions or methods of obtaining data ). Error during data extraction. Since the records are maintained not for research purposes, but for clinical, administrative or other ends, the information required may not be recorded at all, or only partly recorded . 3/19/24 53

B. Self-administered Questionnaire The respondent reads the questions and fills in the answers by himself. Advantages Simpler and cheaper : questionnaires can be administered to many persons simultaneously. They can be sent by post . Disadvantage Demands a certain level of education on the part of the respondent. 3/19/24 54

C. Interview-administered Questionnaire Interview may be highly structured interview or relatively unstructured. Advantages: Stimulate and maintain the respondent's interest Allay if anxiety is aroused (e.g., why am I being asked these question?) Repeat unclear questions “Follow-up” or “probing” questions to clarify a response Observations during the interview 3/19/24 55

Disadvantages: Expensive and time taking Leading/guiding question Difficult to address sensitive issues Social desirability bias: Occurs because subjects are systematically more likely to provide a socially acceptable response. In general, apart from their expense, interviews are preferable to self-administered questionnaires provided that they are conducted by skilled interviewers. 3/19/24 56

Methods of Collecting Qualitative Data Qualitative approaches to data collection usually involve direct interaction with individuals on a one to one basis or in a group setting. The main methods of collecting qualitative data are: Qualitative interviews (Individual interviews) Focus group discussion Observation 3/19/24 57

3/19/24 58

A. Qualitative interviews (Individual interviews) Qualitative interviews are semi-structured or unstructured. Semi-structured interview (focused interview): Semi structured interviews tend to work well when the interviewer has already identified a number of aspects he wants to be sure of addressing. It involves a series of open ended questions based on the topic areas the researcher wants to cover. 3/19/24 59

Unstructured interview (depth/in-depth interview) Has very little structure at all The interviewer goes into the interview with the aim of discussing a limited number of topics , sometimes as few as one or two , and frames the questions on the basis of the interviewee's previous response Important to get very detail information 3/19/24 60

B. Focus group discussion: Sometimes, it is preferable to collect information from groups of people rather than from a series of individuals. A focus group discussion (FGD) is a group discussion of 6-12 persons guided by a facilitator, during which group members talk freely and spontaneously about a certain topic. The purpose of an FGD is to obtain in-depth information on concepts, perceptions, and ideas of the group. It aims to be more than a question-answer interaction. 3/19/24 61

3/19/24 62

C. Observation Not all qualitative data collection approaches require direct interaction with people. It is a technique that can be used when data collected through other means can be of limited value or is difficult to validate. Observation can also serve as a technique for verifying or nullifying information provided in face to face encounters. E.g. Latrine cleanness, availability of windows of a house, etc 3/19/24 63

3/19/24 64

The choice of methods of data collection is based on: The accuracy of information they yield Practical considerations , such as, the need for personnel, time, equipment and other facilities, in relation to what is available. 3/19/24 65

Questionnaire Design Questionnaires are a very convenient way of collecting useful comparable data from a large number of individuals. However, they can only produce valid and meaningful results if the questions are clear and precise and if they are asked consistently across all respondents. Therefore, careful consideration needs to be given to the design of the questionnaire. 3/19/24 66

The type and content of a questionnaire depends much on your research question and research objectives (be clear about your dependent and independent variables) All questionnaires require a title (short description) It needs to be appealing and inviting It needs a confidential unique identifier 3/19/24 67

Questions may take two general forms: Open ended questions: the subject answers in his own words, or Closed questions: respondents answer by choosing from a number of fixed alternative responses. 3/19/24 68

In questionnaire design remember to: a) Use familiar and appropriate language b) Avoid abbreviations, double negatives, etc. c) Avoid two elements to be collected through one question d) Pre-code the responses to facilitate data processing e) Avoid embarrassing and painful questions f) Watch out for ambiguous wording g) Avoid language that suggests a response h) Start with simpler questions i ) Ask the same question to all respondents j) Provide other, or don’t know options where appropriate 3/19/24 69

k) Provide the unit of measurement for continuous variables (years, months, k.g , etc) l) For open ended questions, provide sufficient space for the response m) Arrange questions in logical sequence n) Group questions by topic, and place a few sentences of transition between topics o) Provide complete training for interviewers p) Pre-test the questionnaire on 5% respondents in actual field situation q) Check all filled questionnaire at field level r) Include “thank you” after the last question 3/19/24 70

Bias in data Collection Bias in data collection is a distortion in the collected data so that it does not represent reality. A) Possible Sources of Bias during data collection: 1. Defective instruments 2. Observer bias 3. Effect of the interview on the informant 4. Information bias 3/19/24 71

Data Quality Assurance Assuring data quality is important to get valid research findings It can be assured through: Providing training for data collectors Supervision Pre-testing and pilot study Assigning appropriate and skilled personnel 3/19/24 72

Pre-test and Pilot study A pre-test usually refers to a small-scale trial of a particular research component. A pilot study is the process of carrying out a preliminary study going through the entire research procedure with a small sample. 3/19/24 73

3/19/24 74

3/19/24 75

Plan for data processing and analysis Data processing and analysis should start in the field with፡- Checking for completeness of the data Performing quality control checks Sorting the data by instrument used and by group of informants. Data of small samples may even be processed and analyzed as soon as it is collected. 3/19/24 76

The plan for data processing and analysis must be made after careful consideration of the objectives of the study as well as of the tools developed to meet the objectives. Preparation of a plan for data processing and analysis will provide you with better insight into the feasibility of the analysis to be performed as well as the resources that are required. 3/19/24 77

What Should the Plan Include? When making a plan for data processing and analysis the following issues should be considered: Sorting data, Performing quality-control checks, Data processing, and Data analysis 3/19/24 78

1. Sorting data An appropriate system for sorting the data is important for facilitating subsequent processing and analysis If you have different study populations (for example village health workers, village health committees and the general population), you obviously would number the questionnaires separately. 3/19/24 79

In a comparative study , it is best to sort the data right after collection into the two or three groups that you will be comparing during data analysis. For example, in a study concerning the reasons for low acceptance of family planning services, users and non-users would be basic categories In a case control study , obviously the cases are to be compared with the controls. 3/19/24 80

2. Performing quality control checks Usually the data have already been checked in the field to ensure that all the information has been properly collected and recorded. Before and during data processing, however, the information should be checked again for completeness and internal consistency. If a questionnaire has not been filled in completely you will have missing data for some of your variables. 3/19/24 81

3. Data processing Decide whether to process and analyze the data from questionnaires: Manually, using data master sheets or manual compilation of the questionnaires, or By computer, for example, using a micro-computer and existing software or self-written programmes for data analysis. 3/19/24 82

3/19/24 83

Data processing involves: Categorising the data Coding Summarising in data master sheets, manual compilation, or compilation by computer 3/19/24 84

A. Categorizing data Decisions have to be made concerning how to categorize responses. For categorical variables that are investigated through closed questions or observation the categories have been decided upon beforehand . For example, observation of the presence or absence of latrines in household In interviews, the answers to open-ended questions can be pre-categorized to a certain extent, depending on the knowledge of possible answers. 3/19/24 85

B. Coding Is assigning a separate (non-overlapping) numerical, letter code for separate answers and missing values. Example, the answer ‘yes’ may be coded as ‘Y’ or 1; ‘no’ as ‘N’ or 2 and ‘no response’ or ‘unknown’ as ‘U’ or 9. Common responses should have the same code in each question, as this minimizes mistakes by coders. 3/19/24 86

C. Summarizing the Data Data Master Sheets: used raw data are processed manually On a data master sheet all the answers of individual respondents are entered by hand. Facilitate data analysis. 3/19/24 87

4. Data analysis – quantitative data Analysis of quantitative data involves the production and interpretation of frequencies, tables, graphs, etc., that describe the data. A. Frequency counts From the data master sheets, simple tables can be made with frequency counts for each variable. A frequency count is an enumeration of how often a certain measurement or a certain answer to a specific question occurs. 3/19/24 88

B. Cross-tabulations Is used to obtain the frequency distribution of one variable by the subset of another variable To describe the problem (descriptive) or to arrive at possible explanations for it (analytic) Dependent variable are usually placed horizontally, and the headings of the independent variable vertically. 3/19/24 89

E.g. The following table shows the relationship between smoking and lung cancer Independent Variable   Dependent Variable (Lung cancer status)   Total Smoking status   yes No Smokers 15 36 51 Non-smokers 5 88 93   Total 20 124 144 3/19/24 90

Dummy Table Constructed to visualize how the data can be organized and summarized Contains all elements of a real table, except that the cells are still empty. Prepared during proposal development. 3/19/24 91

E.g. The following table shows a dummy table Independent Variable   Dependent variable (Lung cancer status)       Total Smoking status   yes No Smokers       Non-smokers       Total       3/19/24 92

What to consider in data analysis? What to do for the data after collection What type of analysis you will calculate Descriptive Analysis Both What type of test statistics/model you will be used Z-test, T-test, X2-test etc How to analyze your data? How to present your data? 3/19/24 93

Ethics in research Ethical issues relating to research participants (their consent, incentives, sensitive information, harm to participants etc.) Ethical issues relating to the researcher (avoiding bias, using appropriate research methodology, correct reporting etc.) 3/19/24 94

Ethical considerations balance between protecting participants vs. quest for knowledge IRB provides one mechanism informed consent/assent confidentiality and anonymity justification of procedures right to services 3/19/24 95

Ethical Considerations Why do we need ethical approval? Before you embark on research with human subjects, you are likely to require ethical approval. Ethical decisions are based on three main approaches: duty, rights and goal-based. 3/19/24 96

Research studies should be judged ethically on three sets of criteria: Ethical principles Ethical rules Scientific criteria. The later is often neglected but is important since if a study is poor or the sample size insufficient, then the study is not capable of demonstrating anything and consequently could be regarded as unethical. 3/19/24 97

Ethical principles 3/19/24 98

2. Ethical Rules These rules are essential for the development of trust between researchers and study participants. a) Veracity All subjects in any research project should always be told the truth. b) Privacy When subjects enrol in a research study, they grant access to themselves , but this is not unlimited access. Access is a broad term and generally includes viewing, touch or having information about them. 3/19/24 99

c) Confidentiality No information obtained with the patient’s or subject’s permission from their medical records should be disclosed to any third person without that individual’s consent. d) Fidelity Fidelity means keeping our promises and avoiding negligence with information. For example, if we agree to send a summary of our research findings to participants in a study, we should do so. 3/19/24 100

3) Applying to Ethics Committee (Scientific criteria) Remember that the key questions that the Ethics Committee will be asking are: Is the research valid? How important is the research question? Can the question be answered ? Is the welfare of the research subject under threat? What will participating involve? Are the risks necessary and acceptable? Is the dignity of the research subject upheld? Will consent be sought? Will confidentiality be respected? 3/19/24 101

Dissemination and Utilization of Results Briefly describe the dissemination plan Feedback to the community Feedback to local authorities Identify relevant agencies that need to be informed Scientific publication Presentation in meetings/conferences 3/19/24 102

3/19/24 103
Tags