Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Machine Learning Unsupervised Methods/Day 2.1 Quick Revision ARM(Association rule minning).ipynb
3074 views
Kernel: Python 3 (ipykernel)

Importing libraries

pip install --proxy http://uername:[email protected]:8000 plotly

With respect to Grocies data shared with you use ARM on find association between product purchased

import numpy as np import pandas as pd import plotly.graph_objects as go import plotly.express as px from apyori import apriori import warnings # Ignore warning related to pandas_profiling warnings.filterwarnings('ignore')

Loading Dataset

df = pd.read_csv("Groceries.csv",parse_dates=['Date']) df.head()

Any null values

df.isnull().any()
Member_number False Date False itemDescription False dtype: bool
df.describe(include="object")

Total Products

all_products = df['itemDescription'].unique() print("Total products: {}".format(len(all_products)))
Total products: 167

Top 10 frequently sold products

def ditribution_plot(x,y,name=None,xaxis=None,yaxis=None): fig = go.Figure([ go.Bar(x=x, y=y) ]) fig.update_layout( title_text=name, xaxis_title=xaxis, yaxis_title=yaxis ) fig.show()
x = df['itemDescription'].value_counts() x = x.sort_values(ascending = False) x = x[:10] ditribution_plot(x=x.index, y=x.values, yaxis="Count", xaxis="Products")

One-hot representation of products purchased

#one_hot = pd.get_dummies(df['itemDescription']) #df.drop('itemDescription', inplace=True, axis=1) #df = df.join(one_hot) df.head(50)

Transactions

Note: if a customer bought multiple products on same day, We will consider it one transaction

records = df.groupby(["Member_number","Date"])[all_products[:]].apply(sum) records = records.reset_index()[all_products]
## Replacing non-zero values with product names def get_Pnames(x): for product in all_products: if x[product] > 0: x[product] = product return x records = records.apply(get_Pnames, axis=1) records.head()
print("total transactions: {}".format(len(records)))
total transactions: 14963
## Removing zeros x = records.values x = [sub[~(sub == 0)].tolist() for sub in x if sub[sub != 0].tolist()] transactions = x

Example transactions

transactions[0:10]
[['whole milk', 'pastry', 'salty snack'], ['whole milk', 'yogurt', 'sausage', 'semi-finished bread'], ['soda', 'pickled vegetables'], ['canned beer', 'misc. beverages'], ['sausage', 'hygiene articles'], ['whole milk', 'rolls/buns', 'sausage'], ['whole milk', 'soda'], ['frankfurter', 'soda', 'whipped/sour cream'], ['frankfurter', 'curd'], ['beef', 'white bread']]

Association Rules

rules = apriori(transactions,min_support=0.00030,min_confidance=0.05,min_lift=2,min_length=3,target="rules") association_results = list(rules)
for item in association_results: pair = item[0] items = [x for x in pair] print("Rule: " + items[0] + " -> " + items[1]) print("Support: " + str(item[1])) print("Confidence: " + str(item[2][0][2])) print("Lift: " + str(item[2][0][3])) print("=====================================")
Rule: soda -> artif. sweetener Support: 0.00046782062420637575 Confidence: 0.2413793103448276 Lift: 2.4857251346797353 ===================================== Rule: condensed milk -> berries Support: 0.0003341575887188398 Confidence: 0.015337423312883436 Lift: 2.3417741329660697 ===================================== Rule: brandy -> whole milk Support: 0.0008688097306689834 Confidence: 0.34210526315789475 Lift: 2.1662805978127717 ===================================== Rule: sweet spreads -> butter Support: 0.0003341575887188398 Confidence: 0.009487666034155597 Lift: 2.087705101015738 ===================================== Rule: butter milk -> long life bakery product Support: 0.0006683151774376796 Confidence: 0.03802281368821293 Lift: 2.1228931388683954 ===================================== Rule: butter milk -> packaged fruit/vegetables Support: 0.0003341575887188398 Confidence: 0.019011406844106463 Lift: 2.23990299691626 ===================================== Rule: butter milk -> pot plants Support: 0.0003341575887188398 Confidence: 0.019011406844106463 Lift: 2.431347697507393 ===================================== Rule: canned beer -> liver loaf Support: 0.00040098910646260775 Confidence: 0.008547008547008546 Lift: 2.5577777777777775 ===================================== Rule: frozen fish -> chocolate Support: 0.00040098910646260775 Confidence: 0.016997167138810197 Lift: 2.4934177637060486 ===================================== Rule: citrus fruit -> sauces Support: 0.0003341575887188398 Confidence: 0.006289308176100629 Lift: 2.1387935963407663 ===================================== Rule: cling film/bags -> curd Support: 0.0003341575887188398 Confidence: 0.06756756756756757 Lift: 2.005979193479194 ===================================== Rule: condensed milk -> waffles Support: 0.0003341575887188398 Confidence: 0.05102040816326531 Lift: 2.7560229868120536 ===================================== Rule: cream cheese -> liquor Support: 0.0003341575887188398 Confidence: 0.014124293785310734 Lift: 2.051862212714607 ===================================== Rule: detergent -> ham Support: 0.0003341575887188398 Confidence: 0.03875968992248062 Lift: 2.26547359496124 ===================================== Rule: specialty chocolate -> flour Support: 0.0003341575887188398 Confidence: 0.03424657534246575 Lift: 2.1440648822147073 ===================================== Rule: mustard -> frankfurter Support: 0.0005346521419501437 Confidence: 0.014159292035398233 Lift: 2.302885725278954 ===================================== Rule: frozen fish -> specialty chocolate Support: 0.0003341575887188398 Confidence: 0.049019607843137254 Lift: 3.0689556157190907 ===================================== Rule: fruit/vegetable juice -> liver loaf Support: 0.00040098910646260775 Confidence: 0.011787819253438114 Lift: 3.52762278978389 ===================================== Rule: grapes -> herbs Support: 0.0003341575887188398 Confidence: 0.02314814814814815 Lift: 2.192188232536334 ===================================== Rule: pickled vegetables -> ham Support: 0.0005346521419501437 Confidence: 0.03125 Lift: 3.4895055970149254 ===================================== Rule: pasta -> hamburger meat Support: 0.00046782062420637575 Confidence: 0.02140672782874618 Lift: 2.647180731417596 ===================================== Rule: hamburger meat -> soft cheese Support: 0.0006014836596939117 Confidence: 0.027522935779816515 Lift: 2.7455045871559633 ===================================== Rule: hamburger meat -> spread cheese Support: 0.0003341575887188398 Confidence: 0.015290519877675842 Lift: 2.2879204892966363 ===================================== Rule: hard cheese -> soft cheese Support: 0.00040098910646260775 Confidence: 0.02727272727272727 Lift: 2.7205454545454546 ===================================== Rule: oil -> herbs Support: 0.00046782062420637575 Confidence: 0.04430379746835443 Lift: 2.972725208605324 ===================================== Rule: meat -> roll products Support: 0.0003341575887188398 Confidence: 0.019841269841269844 Lift: 3.620547812620984 ===================================== Rule: sausage -> meat spreads Support: 0.0003341575887188398 Confidence: 0.14285714285714288 Lift: 2.3671887359595005 ===================================== Rule: salt -> misc. beverages Support: 0.0003341575887188398 Confidence: 0.0211864406779661 Lift: 3.5619405827461437 ===================================== Rule: spread cheese -> misc. beverages Support: 0.0003341575887188398 Confidence: 0.0211864406779661 Lift: 3.170127118644068 ===================================== Rule: mustard -> white bread Support: 0.0003341575887188398 Confidence: 0.05434782608695652 Lift: 2.2651992249000847 ===================================== Rule: napkins -> semi-finished bread Support: 0.00046782062420637575 Confidence: 0.021148036253776436 Lift: 2.2284370877834987 ===================================== Rule: pip fruit -> sweet spreads Support: 0.0005346521419501437 Confidence: 0.010899182561307903 Lift: 2.3983010097772084 ===================================== Rule: pork -> popcorn Support: 0.0003341575887188398 Confidence: 0.10416666666666667 Lift: 2.808370870870871 ===================================== Rule: salty snack -> red/blush wine Support: 0.00046782062420637575 Confidence: 0.044585987261146494 Lift: 2.374164154407598 ===================================== Rule: sausage -> rum Support: 0.0003341575887188398 Confidence: 0.15625 Lift: 2.589112679955703 ===================================== Rule: spices -> sausage Support: 0.0003341575887188398 Confidence: 0.005537098560354374 Lift: 2.0712901439645623 ===================================== Rule: soups -> seasonal products Support: 0.0003341575887188398 Confidence: 0.04716981132075471 Lift: 14.704205974842766 ===================================== Rule: soda -> spices Support: 0.0006014836596939117 Confidence: 0.006194081211286993 Lift: 2.317050929112182 ===================================== Rule: spread cheese -> sugar Support: 0.00040098910646260775 Confidence: 0.06 Lift: 3.3878490566037733 ===================================== Rule: sweet spreads -> tropical fruit Support: 0.0007351466951814476 Confidence: 0.16176470588235295 Lift: 2.387066365007542 ===================================== Rule: beef -> bottled beer Support: 0.00040098910646260775 Confidence: 0.0025391451544646637 Lift: 2.3745768091409225 ===================================== Rule: beef -> whipped/sour cream Support: 0.00046782062420637575 Confidence: 0.01377952755905512 Lift: 2.988160447335388 ===================================== Rule: beverages -> sausage Support: 0.0003341575887188398 Confidence: 0.020161290322580648 Lift: 2.2512939335580167 ===================================== Rule: bottled beer -> sausage Support: 0.0003341575887188398 Confidence: 0.007374631268436578 Lift: 2.3478001631833303 ===================================== Rule: bottled beer -> sausage Support: 0.0003341575887188398 Confidence: 0.007374631268436578 Lift: 3.8050554368833285 ===================================== Rule: chicken -> bottled beer Support: 0.0003341575887188398 Confidence: 0.007374631268436578 Lift: 2.1636589739140493 ===================================== Rule: bottled beer -> whole milk Support: 0.0003341575887188398 Confidence: 0.007374631268436578 Lift: 2.507877447036739 ===================================== Rule: bottled beer -> domestic eggs Support: 0.0003341575887188398 Confidence: 0.007374631268436578 Lift: 2.1636589739140493 ===================================== Rule: bottled beer -> frankfurter Support: 0.0003341575887188398 Confidence: 0.007374631268436578 Lift: 2.0063019576293915 ===================================== Rule: hard cheese -> bottled beer Support: 0.0003341575887188398 Confidence: 0.007374631268436578 Lift: 3.9409502739148756 ===================================== Rule: yogurt -> bottled water Support: 0.00040098910646260775 Confidence: 0.0066079295154185015 Lift: 2.29940579858621 ===================================== Rule: frozen vegetables -> canned beer Support: 0.0003341575887188398 Confidence: 0.008880994671403198 Lift: 6.644316163410303 ===================================== Rule: canned beer -> sausage Support: 0.00040098910646260775 Confidence: 0.010657193605683837 Lift: 4.309826700590467 ===================================== Rule: pork -> whole milk Support: 0.0003341575887188398 Confidence: 0.009009009009009009 Lift: 2.0119671910716685 ===================================== Rule: soda -> frankfurter Support: 0.0003341575887188398 Confidence: 0.009487666034155597 Lift: 3.086172758023265 ===================================== Rule: soda -> shopping bags Support: 0.0003341575887188398 Confidence: 0.009487666034155597 Lift: 2.150968891955609 ===================================== Rule: butter milk -> canned beer Support: 0.0003341575887188398 Confidence: 0.019011406844106463 Lift: 4.9046151829028455 ===================================== Rule: yogurt -> candy Support: 0.0003341575887188398 Confidence: 0.023255813953488372 Lift: 2.974160206718346 ===================================== Rule: chicken -> canned beer Support: 0.00040098910646260775 Confidence: 0.008547008547008546 Lift: 2.5076252723311545 ===================================== Rule: canned beer -> sausage Support: 0.0003341575887188398 Confidence: 0.007122507122507123 Lift: 3.437873357228196 ===================================== Rule: canned beer -> hygiene articles Support: 0.00040098910646260775 Confidence: 0.008547008547008546 Lift: 4.918803418803418 ===================================== Rule: pork -> canned beer Support: 0.0003341575887188398 Confidence: 0.007122507122507123 Lift: 2.089687726942629 ===================================== Rule: soda -> yogurt Support: 0.00040098910646260775 Confidence: 0.03333333333333333 Lift: 5.732950191570881 ===================================== Rule: yogurt -> whole milk Support: 0.0003341575887188398 Confidence: 0.02777777777777778 Lift: 2.4888556220891553 ===================================== Rule: citrus fruit -> other vegetables Support: 0.0003341575887188398 Confidence: 0.009920634920634922 Lift: 2.0617008377425043 ===================================== Rule: citrus fruit -> yogurt Support: 0.0003341575887188398 Confidence: 0.006289308176100629 Lift: 2.4764978483945717 ===================================== Rule: citrus fruit -> other vegetables Support: 0.0003341575887188398 Confidence: 0.006289308176100629 Lift: 2.0022748561488024 ===================================== Rule: citrus fruit -> other vegetables Support: 0.00040098910646260775 Confidence: 0.003284072249589491 Lift: 2.2336169577548888 ===================================== Rule: citrus fruit -> other vegetables Support: 0.00046782062420637575 Confidence: 0.00880503144654088 Lift: 2.233045517535444 ===================================== Rule: citrus fruit -> yogurt Support: 0.00040098910646260775 Confidence: 0.007547169811320754 Lift: 2.4549630844954877 ===================================== Rule: frankfurter -> coffee Support: 0.0003341575887188398 Confidence: 0.010570824524312896 Lift: 2.875840861041707 ===================================== Rule: soda -> frankfurter Support: 0.0003341575887188398 Confidence: 0.010570824524312896 Lift: 3.438505377332475 ===================================== Rule: pastry -> coffee Support: 0.0003341575887188398 Confidence: 0.010570824524312896 Lift: 3.2952343199436225 ===================================== Rule: soda -> coffee Support: 0.00040098910646260775 Confidence: 0.012684989429175475 Lift: 2.4026012256804132 ===================================== Rule: cream cheese -> other vegetables Support: 0.0003341575887188398 Confidence: 0.014124293785310734 Lift: 2.6752127583494243 ===================================== Rule: fruit/vegetable juice -> sausage Support: 0.0003341575887188398 Confidence: 0.009920634920634922 Lift: 5.497868900646679 ===================================== Rule: sausage -> margarine Support: 0.0003341575887188398 Confidence: 0.009920634920634922 Lift: 5.301516439909298 ===================================== Rule: rolls/buns -> sausage Support: 0.00040098910646260775 Confidence: 0.011904761904761906 Lift: 2.226636904761905 ===================================== Rule: yogurt -> sausage Support: 0.00046782062420637575 Confidence: 0.01388888888888889 Lift: 2.416505167958656 ===================================== Rule: soda -> rolls/buns Support: 0.0003341575887188398 Confidence: 0.003037667071688943 Lift: 2.066027836076439 ===================================== Rule: sausage -> dessert Support: 0.00040098910646260775 Confidence: 0.0066445182724252485 Lift: 2.7617201919527496 ===================================== Rule: frankfurter -> domestic eggs Support: 0.00040098910646260775 Confidence: 0.010810810810810811 Lift: 2.100807300807301 ===================================== Rule: frankfurter -> other vegetables Support: 0.00040098910646260775 Confidence: 0.010619469026548672 Lift: 2.011381203091744 ===================================== Rule: frankfurter -> other vegetables Support: 0.00040098910646260775 Confidence: 0.003284072249589491 Lift: 2.2336169577548888 ===================================== Rule: soda -> frankfurter Support: 0.0003341575887188398 Confidence: 0.003441156228492774 Lift: 2.340455483951699 ===================================== Rule: root vegetables -> rolls/buns Support: 0.0003341575887188398 Confidence: 0.011933174224343675 Lift: 2.0762335571959816 ===================================== Rule: grapes -> yogurt Support: 0.0003341575887188398 Confidence: 0.02314814814814815 Lift: 2.0740463517409626 ===================================== Rule: hard cheese -> other vegetables Support: 0.0003341575887188398 Confidence: 0.022727272727272728 Lift: 3.7785353535353536 ===================================== Rule: soda -> hygiene articles Support: 0.0003341575887188398 Confidence: 0.024390243902439025 Lift: 2.097420801794225 ===================================== Rule: pip fruit -> ice cream Support: 0.0003341575887188398 Confidence: 0.022026431718061675 Lift: 4.453804024288606 ===================================== Rule: pork -> other vegetables Support: 0.0003341575887188398 Confidence: 0.01037344398340249 Lift: 2.63081088684155 ===================================== Rule: pip fruit -> margarine Support: 0.00040098910646260775 Confidence: 0.008174386920980926 Lift: 2.005136909813731 ===================================== Rule: shopping bags -> margarine Support: 0.0003341575887188398 Confidence: 0.01037344398340249 Lift: 3.1043568464730287 ===================================== Rule: yogurt -> sausage Support: 0.00040098910646260775 Confidence: 0.012448132780082987 Lift: 2.1658303580044387 ===================================== Rule: pastry -> other vegetables Support: 0.0003341575887188398 Confidence: 0.00859106529209622 Lift: 2.337238363011559 ===================================== Rule: pastry -> sausage Support: 0.0003341575887188398 Confidence: 0.00859106529209622 Lift: 2.678085624284078 ===================================== Rule: pastry -> soda Support: 0.0003341575887188398 Confidence: 0.00859106529209622 Lift: 2.107346065010422 ===================================== Rule: soda -> sausage Support: 0.00040098910646260775 Confidence: 0.0066445182724252485 Lift: 2.8406264831514 ===================================== Rule: yogurt -> tropical fruit Support: 0.0003341575887188398 Confidence: 0.016501650165016504 Lift: 3.1655665566556657 ===================================== Rule: yogurt -> onions Support: 0.00046782062420637575 Confidence: 0.023102310231023104 Lift: 2.069939329262268 ===================================== Rule: shopping bags -> other vegetables Support: 0.0005346521419501437 Confidence: 0.004378762999452655 Lift: 2.2592907158900024 ===================================== Rule: other vegetables -> sausage Support: 0.0003341575887188398 Confidence: 0.002736726874657909 Lift: 3.4124703521255246 ===================================== Rule: other vegetables -> sausage Support: 0.0003341575887188398 Confidence: 0.005537098560354374 Lift: 2.1244001476559613 ===================================== Rule: soda -> other vegetables Support: 0.00040098910646260775 Confidence: 0.003284072249589491 Lift: 2.136503176982937 ===================================== Rule: yogurt -> other vegetables Support: 0.0003341575887188398 Confidence: 0.002736726874657909 Lift: 2.1552444329213842 ===================================== Rule: other vegetables -> sugar Support: 0.00046782062420637575 Confidence: 0.002962336013542108 Lift: 2.3329175668752926 ===================================== Rule: pastry -> soda Support: 0.0005346521419501437 Confidence: 0.008859357696567 Lift: 2.173156872356263 ===================================== Rule: pastry -> yogurt Support: 0.00046782062420637575 Confidence: 0.007751937984496124 Lift: 2.1480045937410277 ===================================== Rule: pork -> soda Support: 0.00040098910646260775 Confidence: 0.0066445182724252485 Lift: 2.2093761535622 ===================================== Rule: pork -> sausage Support: 0.0006014836596939117 Confidence: 0.003808717731696996 Lift: 2.4778192791035716 ===================================== Rule: pork -> yogurt Support: 0.00040098910646260775 Confidence: 0.0066445182724252485 Lift: 2.1613462371804126 ===================================== Rule: processed cheese -> rolls/buns Support: 0.00040098910646260775 Confidence: 0.039473684210526314 Lift: 2.826051372450264 ===================================== Rule: yogurt -> whipped/sour cream Support: 0.0006014836596939117 Confidence: 0.007003891050583658 Lift: 2.3818004952246197 ===================================== Rule: yogurt -> shopping bags Support: 0.00040098910646260775 Confidence: 0.004669260700389105 Lift: 2.4091775124111097 ===================================== Rule: yogurt -> sausage Support: 0.0014702933903628951 Confidence: 0.024363233665559245 Lift: 2.1829165589087607 ===================================== Rule: pastry -> soda Support: 0.0003341575887188398 Confidence: 0.002736726874657909 Lift: 2.924974587536164 ===================================== Rule: yogurt -> sausage Support: 0.0003341575887188398 Confidence: 0.003037667071688943 Lift: 2.066027836076439 =====================================

Please note:

** a lift value of an asscoiation rule which is higher then 1 indicates that the association rule is useful. a lift value less or equal 1 indicates that the association rule is not useful. in this case it is like the antecedent and the consequent of the association rule are independent of each other.