Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/april_18/lessons/lesson-04/A-B Testing.ipynb
1904 views
Kernel: Python 2

A/B Testing Hypothesis tests

Scenario

You are a data scientist working for a web-based company and you are planning to roll out a new site design soon. For random samples of users one of two competing designs were presented and the ultimate purchase total was recorded (if any).

Your task is to determine which of the two designs yields higher total purchases, and if the result is statistically significant.

from matplotlib import pyplot as plt import numpy as np import pandas as pd from scipy import stats import seaborn as sns %matplotlib inline
# Generate some data and randomize # Some people bought nothing, the others bought with some distribution data1 = [0] * 50 data1.extend(stats.norm(12, 4).rvs(150)) np.random.shuffle(data1) # The second design hooked less people, but those that were hooked bought more stuff data2 = [0] * 100 data2.extend(stats.norm(20, 5).rvs(100)) np.random.shuffle(data2) # Make a data frame df = pd.DataFrame() df["A"] = data1 df["B"] = data2 df.head()
# Let's plot the data first plt.hist(df["A"], bins=50, label="A") plt.ylabel("A counts") plt.xlabel("Total Purchase") # Exercise: Make the same plot for dataset "B"
# Seaborn is also useful sns.violinplot(data=df) plt.ylabel("Total Purchase") sns.plt.show()

Exercises

  • In the next cell, use pandas to make box plots from the data frame. Feel free to Google around if you can't remember how to do this.

Do you like the boxplots or violin plots better?

  • Are our datasets normal, or approximately so? Use what we learned in the previous lesson to decide. Hint: You need to compute a statistical value for each of the two data sets.

# Work out the exercises here

Statistical Tests

There are a few good statistical tests for A/B testing:

Each test makes various assumptions:

  • ANOVA assumes normal distributions and equal variances in the two data sets

  • The Welch t-test assumes normal distributions but not necessarily equal variances, and accounts for small sample sizes better

  • The Mann-Whitney test assumes nothing about the distributions but requires at least 20 data points in each set, and produces a weaker p-value

Typically you need to choose the most appropriate test. Tests that make more assumptions are more discriminating (stronger p-values) but can be misleading on data sets that don't satisfy the assumptions.

Which test is most appropriate for our data?

All of these tests are available in the scipy library, a stats library for python:

# For example, to use ANOVA statistic, pvalue = stats.f_oneway(df["A"], df["B"]) print pvalue
# Use the Mann-Whitney test on our data # look up the function in scipy from the link above

Interpret your results

  • Is there a significant difference in the mean total purchases in the two designs?

  • Which design do you recommend and why? Write two sentences explaining your results and your recommendation.

Your explanation here.

Final Exercise

Your results may or may not have been significant depending on the randomness of the data set. Try running the notebook a few more times to see if anything changes. You can do this in one step by using the Kernal menu and selecting "Restart & Run All".

If you can't get the opposite outcome, try altering the distributions of the data sets (change the variance in the second argument) until you get the other outcome.