Path: blob/master/Generative AI for Intelligent Data Handling/Examples Hypothesis Testing .ipynb
3079 views
Two-sample t-test. This will allow us to test if there is a significant difference between the means of two groups. I'll provide some sample data and explain the steps along the way.
We will use the following Python libraries:
pandas (for data handling)
scipy.stats (for statistical tests)
seaborn and matplotlib (for visualizations)
Hypothesis:
Let’s assume we have two groups of students from different sections of a class, and we want to check if there is a significant difference in their test scores.
Null Hypothesis (H₀): The mean scores of both sections are equal.
Alternative Hypothesis (H₁): The mean scores of both sections are not equal.=.
Paired t-test for dependent samples.
ANOVA for comparing more than two groups.
Chi-Square Test for categorical data.
Paired t-test (for dependent samples) This is useful when you have two related groups (e.g., before and after measurements from the same subjects).
Hypothesis:
Null Hypothesis (H₀): The mean difference between the paired samples is 0.
Alternative Hypothesis (H₁): The mean difference between the paired samples is not 0.
ANOVA (Analysis of Variance) (for comparing more than two groups)
ANOVA is useful when you want to compare the means of three or more independent groups.
Hypothesis:
Null Hypothesis (H₀): The means of all groups are equal.
Alternative Hypothesis (H₁): At least one group mean is different from the others.
Chi-Square Test (for categorical data)
This test is used to examine the relationship between two categorical variables in a contingency table.
Hypothesis:
Null Hypothesis (H₀): The two categorical variables are independent.
Alternative Hypothesis (H₁): The two categorical variables are not independent.
Example: We want to test whether there is a significant relationship between gender (male/female) and preference for a product (yes/no).
Summary of Hypothesis Tests and Use Cases
Test Type | Use Case | Python Function |
---|---|---|
Two-Sample t-test | Compare means of two independent groups (e.g., test scores of two sections). | stats.ttest_ind() |
Paired t-test | Compare means of two related groups (e.g., before and after measurements from the same individuals). | stats.ttest_rel() |
ANOVA | Compare means of three or more independent groups (e.g., test scores from multiple classes). | stats.f_oneway() |
Chi-Square Test | Test the association or independence between categorical variables (e.g., gender vs product preference). | stats.chi2_contingency() |
Test Descriptions:
Two-Sample t-test:
Used when comparing the means of two independent groups.
Assumes data is normally distributed and variances are equal.
Example: Testing whether the mean scores of students from two different sections are significantly different.
Paired t-test:
Used when comparing means from the same group at different times or under different conditions.
Example: Testing if there is a significant improvement in test scores before and after a training program.
ANOVA (Analysis of Variance):
Used when comparing the means of three or more independent groups.
Example: Comparing test scores from students in three different classes to see if there is a significant difference between the classes.
Chi-Square Test:
Used to examine the relationship between two categorical variables.
Example: Testing if gender is associated with product preference (Yes/No) in a sample population.