Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Machine Learning Unsupervised Methods/Day 1 Lab to understand ARM terms.ipynb
3074 views
Kernel: Python 3 (ipykernel)

Lab Question: Association Rule Mining (ARM) with Support, Confidence, and Lift

Objective: Calculate the Support, Confidence, and Lift for given transaction data sets to identify significant association rules.

Instructions:

  1. Use the provided transaction data sets.

  2. For each data set, identify all frequent itemsets.

  3. Calculate the Support, Confidence, and Lift for the generated association rules.

  4. Analyze the results to identify strong association rules.


Data Sets

Data Set 1: Supermarket Transactions

Transaction IDItems Purchased
1Milk, Bread, Butter
2Bread, Butter
3Milk, Bread
4Milk, Butter
5Bread, Butter, Cheese

Data Set 2: Online Retail Transactions

Transaction IDItems Purchased
1Laptop, Mouse, Keyboard
2Laptop, Mouse
3Mouse, Keyboard
4Laptop, Keyboard
5Laptop, Mouse, Keyboard, Webcam

Data Set 3: Movie Rentals

Transaction IDMovies Rented
1Action, Comedy, Drama
2Action, Drama
3Comedy, Drama
4Action, Comedy
5Comedy, Drama, Thriller

Data Set 4: E-commerce Purchases

Transaction IDItems Purchased
1Smartphone, Headphones, Charger
2Smartphone, Charger
3Smartphone, Headphones
4Headphones, Charger
5Smartphone, Headphones, Charger, Case

Tasks

Task 1: Identify Frequent Itemsets

  1. For each data set, list all unique items.

  2. Calculate the support for each item and each combination of items (itemsets).

  3. Identify frequent itemsets with a support threshold of 50%.

Task 2: Generate Association Rules

  1. Generate association rules from the frequent itemsets identified in Task 1.

  2. For each rule, calculate the Confidence.

Task 3: Calculate Lift

  1. Calculate the Lift for each generated rule.

  2. Interpret the Lift values to determine the strength of the association.

Task 4: Analysis

  1. Discuss the rules with the highest Confidence and Lift values.

  2. Identify any interesting patterns or relationships in the data.


Formulas

  • Support: ( \text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}} )

  • Confidence: ( \text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} )

  • Lift: ( \text{Lift}(A \rightarrow B) = \frac{\text{Confidence}(A \rightarrow B)}{\text{Support}(B)} )


Example Calculation (Data Set 1)

Frequent Itemsets:

  • Support(Milk) = 3/5 = 0.6

  • Support(Bread) = 4/5 = 0.8

  • Support(Butter) = 3/5 = 0.6

Association Rules and Confidence:

  • Confidence(Milk → Bread) = Support(Milk ∩ Bread) / Support(Milk) = 2/3 ≈ 0.67

  • Confidence(Bread → Butter) = Support(Bread ∩ Butter) / Support(Bread) = 3/4 = 0.75

Lift:

  • Lift(Milk → Bread) = Confidence(Milk → Bread) / Support(Bread) = 0.67 / 0.8 ≈ 0.84

  • Lift(Bread → Butter) = Confidence(Bread → Butter) / Support(Butter) = 0.75 / 0.6 ≈ 1.25


Submission

Submit a report containing:

  1. Detailed calculations for Support, Confidence, and Lift for each data set.

  2. Tables listing frequent itemsets, generated rules, and corresponding Confidence and Lift values.

  3. Analysis and interpretation of results.


This lab question will help you understand the practical application of Association Rule Mining and how to identify significant relationships within transaction data.