Path: blob/master/Machine Learning Unsupervised Methods/Day 1 Lab to understand ARM terms.ipynb
3074 views
Lab Question: Association Rule Mining (ARM) with Support, Confidence, and Lift
Objective: Calculate the Support, Confidence, and Lift for given transaction data sets to identify significant association rules.
Instructions:
Use the provided transaction data sets.
For each data set, identify all frequent itemsets.
Calculate the Support, Confidence, and Lift for the generated association rules.
Analyze the results to identify strong association rules.
Data Sets
Data Set 1: Supermarket Transactions
Transaction ID | Items Purchased |
---|---|
1 | Milk, Bread, Butter |
2 | Bread, Butter |
3 | Milk, Bread |
4 | Milk, Butter |
5 | Bread, Butter, Cheese |
Data Set 2: Online Retail Transactions
Transaction ID | Items Purchased |
---|---|
1 | Laptop, Mouse, Keyboard |
2 | Laptop, Mouse |
3 | Mouse, Keyboard |
4 | Laptop, Keyboard |
5 | Laptop, Mouse, Keyboard, Webcam |
Data Set 3: Movie Rentals
Transaction ID | Movies Rented |
---|---|
1 | Action, Comedy, Drama |
2 | Action, Drama |
3 | Comedy, Drama |
4 | Action, Comedy |
5 | Comedy, Drama, Thriller |
Data Set 4: E-commerce Purchases
Transaction ID | Items Purchased |
---|---|
1 | Smartphone, Headphones, Charger |
2 | Smartphone, Charger |
3 | Smartphone, Headphones |
4 | Headphones, Charger |
5 | Smartphone, Headphones, Charger, Case |
Tasks
Task 1: Identify Frequent Itemsets
For each data set, list all unique items.
Calculate the support for each item and each combination of items (itemsets).
Identify frequent itemsets with a support threshold of 50%.
Task 2: Generate Association Rules
Generate association rules from the frequent itemsets identified in Task 1.
For each rule, calculate the Confidence.
Task 3: Calculate Lift
Calculate the Lift for each generated rule.
Interpret the Lift values to determine the strength of the association.
Task 4: Analysis
Discuss the rules with the highest Confidence and Lift values.
Identify any interesting patterns or relationships in the data.
Formulas
Support: ( \text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}} )
Confidence: ( \text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} )
Lift: ( \text{Lift}(A \rightarrow B) = \frac{\text{Confidence}(A \rightarrow B)}{\text{Support}(B)} )
Example Calculation (Data Set 1)
Frequent Itemsets:
Support(Milk) = 3/5 = 0.6
Support(Bread) = 4/5 = 0.8
Support(Butter) = 3/5 = 0.6
Association Rules and Confidence:
Confidence(Milk → Bread) = Support(Milk ∩ Bread) / Support(Milk) = 2/3 ≈ 0.67
Confidence(Bread → Butter) = Support(Bread ∩ Butter) / Support(Bread) = 3/4 = 0.75
Lift:
Lift(Milk → Bread) = Confidence(Milk → Bread) / Support(Bread) = 0.67 / 0.8 ≈ 0.84
Lift(Bread → Butter) = Confidence(Bread → Butter) / Support(Butter) = 0.75 / 0.6 ≈ 1.25
Submission
Submit a report containing:
Detailed calculations for Support, Confidence, and Lift for each data set.
Tables listing frequent itemsets, generated rules, and corresponding Confidence and Lift values.
Analysis and interpretation of results.
This lab question will help you understand the practical application of Association Rule Mining and how to identify significant relationships within transaction data.