What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization?

ElegantJ-BusinessIntelligence 7,893 views 22 slides Jun 29, 2018
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

The independent sample t-test is a statistical method of hypothesis testing that determines whether there is a statistically significant difference between the means of two independent samples. It is helpful when an organization wants to determine whether there is a statistical difference between tw...


Slide Content

Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists Journey Towards Augmented Analytics

Independent Sample T-test

Basic Terminologies Sample data is the subset of population data used to represent the entire group as whole For instance, if we want to come up with average value of all cars in united states, it is impractical to assess the each car value in united states, adding these numbers and dividing by total number of cars Instead, we can randomly select some of the cars, say 200 and get value of each of these 200 cars and find average of these 200 numbers These 200 numbers containing randomly selected 200 cars’ values is called a sample data of entire United states’ cars’ values ( population data ) There are various sampling techniques such as simple random sampling, stratified sampling and systematic sampling which are explained in annexure section

Basic Terminologies Null hypothesis in case of Independent sample t-test is a general statement that there is no statistically significant difference between two samples Alternative hypothesis in case of Independent sample t-test is the one that states that there is a statistically significant difference between two samples For instance, an online store marketing manager decides to test the hypothesis that females have significantly higher tendency to shop online than males In this case following would be the null and alternative hypothesis: Null hypothesis : There is no significant difference between males and females in terms of tendency to shop online Alternative hypothesis : There is statistically significant difference between males and females in terms of tendency to shop online

P- value : In case of independent sample t test, it indicates whether there is a statistically significant difference between two samples For different levels of accuracy desired, the p-value can be checked at different thresholds and inference can be made accordingly For instance, for confidence level or accuracy = 95% ( error =5%) , we have to check p-value against the threshold of 0.05. If p-value < 0.05 then the difference is significant else the difference is insignificant Similarly, for confidence level =98% (error =2%), we have to check p-value against the threshold of 0.02. If p-value < 0.02 then the difference is significant else the difference is insignificant and so on Basic Terminologies

Introduction Independent sample t-test is a statistical test that determines whether there is a statistically significant difference between the means of two independent samples For instance, checking if average value of a sedan car type is significantly different than the SUV car type Here the hypothesis would be set as follows : Null hypothesis : SUV and Sedan car types have insignificant difference in terms of value Alternative hypothesis : Value of SUV and Sedan differ significantly

Example : Input Let’s conduct the Independent t-test on following two variables, one is a dimension containing two values and the other is a measure : Group Value A 90 A 95 A 80 B 78 B 75 B 70 B 65 Two Independent Groups Dependent Variable

Example : Output Group “A” Mean Value 79.0 Group “B” Mean Value 72.0 Mean Difference 7.0 P-value 0.041 At 95% confidence level (5% chance of error) : As p-value = 0.041 which is less than 0.05, there is a statistically significant difference between the means of two groups A and B Mean of Group A is significantly higher than that of Group B At 98 % confidence level (2% chance of error) : As p-value = 0.041 which is greater than 0.02, there is no statistically significant difference between the means of two groups A and B

Standard input parameters & sample UI

Sample output 1 : Interpretation

Sample output 2 : Model Summary

Sample output 3 : OUTLIERS Outliers : They are the data values that differ greatly from the majority of a set of data.

Limitations Can be applied on only two samples (one dimension with two values and one measure at a time) Observations within each group must be independent The values in each group must be normally distributed Number of data points should be at least 30

General applications Medicine Has the quality of life improved for patients who took drug A as opposed to patients who took drug B? Sociology Are men more satisfied with their jobs than women? Do they earn more? Biology Are foxes in one specific habitat larger than in another? Economics Is the economic growth of developing nations larger than the economic growth of the first world? Marketing Does customer segment A spend more on groceries than customer segment B?

Use case 1

Use case 1 : Input Dataset Gender Income Male 21000 Male 15000 Male 25600 Male 23000 Female 19750 Female 25000 Female 21250 Female 14400 Female 10000

Use case 1 : Output Value “Male” Mean Income Value 19444.44 “Female” Mean Income Value 18080.0 Mean Difference 1364.44 P-value 0.406 P-value : 0.406 (> 0.05) indicates that there is no significant difference between income of males and females.

Use case 2

Use case 3

Sampling Methods There are three main types of sampling : Simple random sampling: Here, the selection is purely based on a chance and every item has an equal chance of getting selected Lottery system is an example of simple random sampling Stratified sampling: Here, the population data is divided into subgroups known as strata The members in each of the subgroup formed have similar attributes and characteristics in terms of demographics, income, location etc. A random sample from each of these subgroups is taken in proportion to the subgroup size relative to the population size These subsets of subgroups are then added to from a final stratified random sample Higher statistical precision is achieved through this method due to low variability within each subgroup, also less sample size is required for this method of sampling when compared to simple random sampling

Sampling Methods Government policymakers generally make use of stratified random sampling method for coming up with better targeted solutions Systematic sampling: Here, the researcher has to decide the sampling size first and then the interval of sampling – the standard distance between each sampled element Divide total population size by sample size to come up with this interval For instance, say you want to create a systematic random sample of 1,000 people from a population of 10,000. Using a list of the total population, number each person from 1 to 10,000. Then, randomly choose a number, like 4, as the number to start with. This means that the person numbered "4" would be your first selection, and then every tenth person from then on would be included in your sample. Your sample, then, would be composed of persons numbered 14, 24, 34, 44, 54, and so on down the line until you reach the person numbered 9,994

Want to Learn More? Get in touch with us @ [email protected] And Do Checkout the Learning section on   Smarten.com June 2018