What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization?
ElegantJ-BusinessIntelligence
7,893 views
22 slides
Jun 29, 2018
Slide 1 of 22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
About This Presentation
The independent sample t-test is a statistical method of hypothesis testing that determines whether there is a statistically significant difference between the means of two independent samples. It is helpful when an organization wants to determine whether there is a statistical difference between tw...
The independent sample t-test is a statistical method of hypothesis testing that determines whether there is a statistically significant difference between the means of two independent samples. It is helpful when an organization wants to determine whether there is a statistical difference between two categories or groups or items and, furthermore, if there is a statistical difference, whether that difference is significant.
Size: 807.36 KB
Language: en
Added: Jun 29, 2018
Slides: 22 pages
Slide Content
Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists Journey Towards Augmented Analytics
Independent Sample T-test
Basic Terminologies Sample data is the subset of population data used to represent the entire group as whole For instance, if we want to come up with average value of all cars in united states, it is impractical to assess the each car value in united states, adding these numbers and dividing by total number of cars Instead, we can randomly select some of the cars, say 200 and get value of each of these 200 cars and find average of these 200 numbers These 200 numbers containing randomly selected 200 cars’ values is called a sample data of entire United states’ cars’ values ( population data ) There are various sampling techniques such as simple random sampling, stratified sampling and systematic sampling which are explained in annexure section
Basic Terminologies Null hypothesis in case of Independent sample t-test is a general statement that there is no statistically significant difference between two samples Alternative hypothesis in case of Independent sample t-test is the one that states that there is a statistically significant difference between two samples For instance, an online store marketing manager decides to test the hypothesis that females have significantly higher tendency to shop online than males In this case following would be the null and alternative hypothesis: Null hypothesis : There is no significant difference between males and females in terms of tendency to shop online Alternative hypothesis : There is statistically significant difference between males and females in terms of tendency to shop online
P- value : In case of independent sample t test, it indicates whether there is a statistically significant difference between two samples For different levels of accuracy desired, the p-value can be checked at different thresholds and inference can be made accordingly For instance, for confidence level or accuracy = 95% ( error =5%) , we have to check p-value against the threshold of 0.05. If p-value < 0.05 then the difference is significant else the difference is insignificant Similarly, for confidence level =98% (error =2%), we have to check p-value against the threshold of 0.02. If p-value < 0.02 then the difference is significant else the difference is insignificant and so on Basic Terminologies
Introduction Independent sample t-test is a statistical test that determines whether there is a statistically significant difference between the means of two independent samples For instance, checking if average value of a sedan car type is significantly different than the SUV car type Here the hypothesis would be set as follows : Null hypothesis : SUV and Sedan car types have insignificant difference in terms of value Alternative hypothesis : Value of SUV and Sedan differ significantly
Example : Input Let’s conduct the Independent t-test on following two variables, one is a dimension containing two values and the other is a measure : Group Value A 90 A 95 A 80 B 78 B 75 B 70 B 65 Two Independent Groups Dependent Variable
Example : Output Group “A” Mean Value 79.0 Group “B” Mean Value 72.0 Mean Difference 7.0 P-value 0.041 At 95% confidence level (5% chance of error) : As p-value = 0.041 which is less than 0.05, there is a statistically significant difference between the means of two groups A and B Mean of Group A is significantly higher than that of Group B At 98 % confidence level (2% chance of error) : As p-value = 0.041 which is greater than 0.02, there is no statistically significant difference between the means of two groups A and B
Standard input parameters & sample UI
Sample output 1 : Interpretation
Sample output 2 : Model Summary
Sample output 3 : OUTLIERS Outliers : They are the data values that differ greatly from the majority of a set of data.
Limitations Can be applied on only two samples (one dimension with two values and one measure at a time) Observations within each group must be independent The values in each group must be normally distributed Number of data points should be at least 30
General applications Medicine Has the quality of life improved for patients who took drug A as opposed to patients who took drug B? Sociology Are men more satisfied with their jobs than women? Do they earn more? Biology Are foxes in one specific habitat larger than in another? Economics Is the economic growth of developing nations larger than the economic growth of the first world? Marketing Does customer segment A spend more on groceries than customer segment B?
Use case 1
Use case 1 : Input Dataset Gender Income Male 21000 Male 15000 Male 25600 Male 23000 Female 19750 Female 25000 Female 21250 Female 14400 Female 10000
Use case 1 : Output Value “Male” Mean Income Value 19444.44 “Female” Mean Income Value 18080.0 Mean Difference 1364.44 P-value 0.406 P-value : 0.406 (> 0.05) indicates that there is no significant difference between income of males and females.
Use case 2
Use case 3
Sampling Methods There are three main types of sampling : Simple random sampling: Here, the selection is purely based on a chance and every item has an equal chance of getting selected Lottery system is an example of simple random sampling Stratified sampling: Here, the population data is divided into subgroups known as strata The members in each of the subgroup formed have similar attributes and characteristics in terms of demographics, income, location etc. A random sample from each of these subgroups is taken in proportion to the subgroup size relative to the population size These subsets of subgroups are then added to from a final stratified random sample Higher statistical precision is achieved through this method due to low variability within each subgroup, also less sample size is required for this method of sampling when compared to simple random sampling
Sampling Methods Government policymakers generally make use of stratified random sampling method for coming up with better targeted solutions Systematic sampling: Here, the researcher has to decide the sampling size first and then the interval of sampling – the standard distance between each sampled element Divide total population size by sample size to come up with this interval For instance, say you want to create a systematic random sample of 1,000 people from a population of 10,000. Using a list of the total population, number each person from 1 to 10,000. Then, randomly choose a number, like 4, as the number to start with. This means that the person numbered "4" would be your first selection, and then every tenth person from then on would be included in your sample. Your sample, then, would be composed of persons numbered 14, 24, 34, 44, 54, and so on down the line until you reach the person numbered 9,994
Want to Learn More? Get in touch with us @ [email protected] And Do Checkout the Learning section on Smarten.com June 2018