Two-sample t-test. This will allow us to test if there is a significant difference between the means of two groups. I'll provide some sample data and explain the steps along the way.

We will use the following Python libraries:

pandas (for data handling)
scipy.stats (for statistical tests)
seaborn and matplotlib (for visualizations)

Hypothesis:

Let’s assume we have two groups of students from different sections of a class, and we want to check if there is a significant difference in their test scores.

Null Hypothesis (H₀): The mean scores of both sections are equal.
Alternative Hypothesis (H₁): The mean scores of both sections are not equal.=.

In [ ]:

# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Create sample data
np.random.seed(42)  # For reproducibility

# Scores for Section A (Normally distributed with mean=75, std=10)
section_a_scores = np.random.normal(75, 10, 30)

# Scores for Section B (Normally distributed with mean=80, std=12)
section_b_scores = np.random.normal(80, 12, 30)

# Creating a DataFrame for easy handling
data = pd.DataFrame({
    'Scores': np.concatenate([section_a_scores, section_b_scores]),
    'Section': ['A']*30 + ['B']*30
})

# Visualizing the data
plt.figure(figsize=(8, 5))
sns.boxplot(x='Section', y='Scores', data=data)
plt.title('Test Scores by Section')
plt.show()

# Perform a two-sample t-test
t_stat, p_value = stats.ttest_ind(section_a_scores, section_b_scores)

# Output the result
print(f"T-Statistic: {t_stat:.3f}")
print(f"P-Value: {p_value:.3f}")

# Set significance level
alpha = 0.05

# Hypothesis testing conclusion
if p_value < alpha:
    print("We reject the null hypothesis (H₀). There is a significant difference between the two groups.")
else:
    print("We fail to reject the null hypothesis (H₀). There is no significant difference between the two groups.")

Paired t-test for dependent samples.
ANOVA for comparing more than two groups.
Chi-Square Test for categorical data.
Paired t-test (for dependent samples) This is useful when you have two related groups (e.g., before and after measurements from the same subjects).

Hypothesis:

Null Hypothesis (H₀): The mean difference between the paired samples is 0.
Alternative Hypothesis (H₁): The mean difference between the paired samples is not 0.

In [ ]:

# Simulating paired data (e.g., before and after treatment scores)
np.random.seed(42)

# Scores before treatment (mean=70, std=8)
before_treatment = np.random.normal(70, 8, 30)

# Scores after treatment (mean=75, std=8)
after_treatment = before_treatment + np.random.normal(5, 5, 30)  # Slight improvement after treatment

# Paired t-test
t_stat, p_value = stats.ttest_rel(before_treatment, after_treatment)

# Output the result
print(f"Paired T-Statistic: {t_stat:.3f}")
print(f"Paired P-Value: {p_value:.3f}")

# Set significance level
alpha = 0.05

# Hypothesis testing conclusion
if p_value < alpha:
    print("We reject the null hypothesis (H₀). The treatment had a significant effect.")
else:
    print("We fail to reject the null hypothesis (H₀). The treatment did not have a significant effect.")

ANOVA (Analysis of Variance) (for comparing more than two groups)

ANOVA is useful when you want to compare the means of three or more independent groups.

Hypothesis:

Null Hypothesis (H₀): The means of all groups are equal.
Alternative Hypothesis (H₁): At least one group mean is different from the others.

In [ ]:

# Simulating data for 3 groups (e.g., test scores from 3 different classes)
np.random.seed(42)

# Scores for Class A, B, and C (Normally distributed with different means and std)
class_a_scores = np.random.normal(70, 10, 30)
class_b_scores = np.random.normal(75, 10, 30)
class_c_scores = np.random.normal(80, 10, 30)

# Combine the data into a DataFrame
data_anova = pd.DataFrame({
    'Scores': np.concatenate([class_a_scores, class_b_scores, class_c_scores]),
    'Class': ['A']*30 + ['B']*30 + ['C']*30
})

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(class_a_scores, class_b_scores, class_c_scores)

# Output the result
print(f"F-Statistic: {f_stat:.3f}")
print(f"ANOVA P-Value: {p_value:.3f}")

# Set significance level
alpha = 0.05

# Hypothesis testing conclusion
if p_value < alpha:
    print("We reject the null hypothesis (H₀). At least one class has a significantly different mean score.")
else:
    print("We fail to reject the null hypothesis (H₀). All class means are roughly the same.")

Chi-Square Test (for categorical data)

This test is used to examine the relationship between two categorical variables in a contingency table.

Hypothesis:

Null Hypothesis (H₀): The two categorical variables are independent.
Alternative Hypothesis (H₁): The two categorical variables are not independent.

Example: We want to test whether there is a significant relationship between gender (male/female) and preference for a product (yes/no).

In [ ]:

# Creating a contingency table (2x2 table with Gender vs Product Preference)
data_chi2 = pd.DataFrame({
    'Gender': ['Male', 'Male', 'Female', 'Female'],
    'Preference': ['Yes', 'No', 'Yes', 'No'],
    'Count': [20, 15, 30, 10]
})

# Reshape data into a contingency table
contingency_table = data_chi2.pivot(index='Gender', columns='Preference', values='Count')

# Perform Chi-Square Test
chi2_stat, p_value, dof, expected = stats.chi2_contingency(contingency_table)

# Output the result
print(f"Chi-Square Statistic: {chi2_stat:.3f}")
print(f"Chi-Square P-Value: {p_value:.3f}")

# Set significance level
alpha = 0.05

# Hypothesis testing conclusion
if p_value < alpha:
    print("We reject the null hypothesis (H₀). There is a significant relationship between gender and product preference.")
else:
    print("We fail to reject the null hypothesis (H₀). There is no significant relationship between gender and product preference.")

Summary of Hypothesis Tests and Use Cases

Test Type	Use Case	Python Function
Two-Sample t-test	Compare means of two independent groups (e.g., test scores of two sections).	`stats.ttest_ind()`
Paired t-test	Compare means of two related groups (e.g., before and after measurements from the same individuals).	`stats.ttest_rel()`
ANOVA	Compare means of three or more independent groups (e.g., test scores from multiple classes).	`stats.f_oneway()`
Chi-Square Test	Test the association or independence between categorical variables (e.g., gender vs product preference).	`stats.chi2_contingency()`

Test Descriptions:

Two-Sample t-test:
- Used when comparing the means of two independent groups.
- Assumes data is normally distributed and variances are equal.
- Example: Testing whether the mean scores of students from two different sections are significantly different.
Paired t-test:
- Used when comparing means from the same group at different times or under different conditions.
- Example: Testing if there is a significant improvement in test scores before and after a training program.
ANOVA (Analysis of Variance):
- Used when comparing the means of three or more independent groups.
- Example: Comparing test scores from students in three different classes to see if there is a significant difference between the classes.
Chi-Square Test:
- Used to examine the relationship between two categorical variables.
- Example: Testing if gender is associated with product preference (Yes/No) in a sample population.

In [ ]:

Two-sample t-test. This will allow us to test if there is a significant difference between the means of two groups. I'll provide some sample data and explain the steps along the way.

We will use the following Python libraries:

Hypothesis:

ANOVA (Analysis of Variance) (for comparing more than two groups)

Chi-Square Test (for categorical data)

Summary of Hypothesis Tests and Use Cases

Test Descriptions:

Product

Resources

Company