%pip install arulespy

Requirement already satisfied: arulespy in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (0.1.4)
Requirement already satisfied: pandas>1.5.3 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from arulespy) (2.1.0)
Requirement already satisfied: numpy>=1.14.2 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from arulespy) (1.25.2)
Requirement already satisfied: scipy>=1.10.1 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from arulespy) (1.11.2)
Requirement already satisfied: rpy2>=3.5.11 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from arulespy) (3.5.14)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from pandas>1.5.3->arulespy) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from pandas>1.5.3->arulespy) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from pandas>1.5.3->arulespy) (2023.3)
Requirement already satisfied: cffi>=1.10.0 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from rpy2>=3.5.11->arulespy) (1.15.1)
Requirement already satisfied: jinja2 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from rpy2>=3.5.11->arulespy) (3.1.2)
Requirement already satisfied: tzlocal in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from rpy2>=3.5.11->arulespy) (5.0.1)
Requirement already satisfied: pycparser in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from cffi>=1.10.0->rpy2>=3.5.11->arulespy) (2.21)
Requirement already satisfied: six>=1.5 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas>1.5.3->arulespy) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from jinja2->rpy2>=3.5.11->arulespy) (2.1.3)
Note: you may need to restart the kernel to use updated packages.

## Windows users: These environment variables may be necessary till rpy2 does this automatically
#from rpy2 import situation
#import os
#
#r_home = situation.r_home_from_registry()
#r_bin = r_home + '\\bin\\x64\\'
#os.environ['R_HOME'] = r_home
#os.environ['PATH'] =  r_bin + ";" + os.environ['PATH']
#os.add_dll_directory(r_bin)

from arulespy.arules import Transactions, apriori, parameters, concat

import pandas as pd

df = pd.DataFrame (
    [
        [True,True, True],
        [True, False,False],
        [True, True, True],
        [True, False, False],
        [True, True, True],
        [True, False, True],
        [True, True, True],
        [False, False, True],
        [False, True, True],
        [True, False, True],
    ],
    columns=list ('ABC')) 

df

trans = Transactions.from_df(df)
print(trans)

trans.as_df()

transactions in sparse format with
 10 transactions (rows) and
 3 items (columns)

trans.itemLabels()

['A', 'B', 'C']

trans.itemFrequency(type = 'relative')

[0.8, 0.5, 0.8]

trans.sample(3).as_df()

trans.unique().as_df()

trans2 = Transactions.from_df(pd.DataFrame (
    [
        [True,True, False],
        [False, False, True],
    ],
    columns=list ('CBA')), trans)

trans2.as_df()

trans3 = Transactions.from_list([['B', 'A'],
                        ['C']], 
                        trans)

trans3.as_df()

concat([trans, trans2]).as_df()

trans.as_matrix()

array([[1, 1, 1],
       [1, 0, 0],
       [1, 1, 1],
       [1, 0, 0],
       [1, 1, 1],
       [1, 0, 1],
       [1, 1, 1],
       [0, 0, 1],
       [0, 1, 1],
       [1, 0, 1]], dtype=int32)

trans.as_list()

[['A', 'B', 'C'],
 ['A'],
 ['A', 'B', 'C'],
 ['A'],
 ['A', 'B', 'C'],
 ['A', 'C'],
 ['A', 'B', 'C'],
 ['C'],
 ['B', 'C'],
 ['A', 'C']]

trans.as_int_list()

[[1, 2, 3],
 [1],
 [1, 2, 3],
 [1],
 [1, 2, 3],
 [1, 3],
 [1, 2, 3],
 [3],
 [2, 3],
 [1, 3]]

trans.as_csc_matrix()

<3x10 sparse matrix of type '<class 'numpy.int64'>'
	with 21 stored elements in Compressed Sparse Column format>

df2 = pd.DataFrame (
    [
        ['red',  12, True],
        ['blue', 10, False],
        ['red',  18, True],
        ['green',18, False],
        ['red',  16, True],
        ['blue',  9, False]
    ],
    columns=list(['color', 'size', 'class'])) 

trans2 = Transactions.from_df(df2)
trans2.as_df()

trans2.itemInfo()

R[write to console]: In addition: 
R[write to console]: Warning message:

R[write to console]: Column(s) 1, 2 not logical or factor. Applying default discretization (see '? discretizeDF').

rules = apriori(trans,
                    parameter = parameters({"supp": 0.1, "conf": 0.8}), 
                    control = parameters({"verbose": False}))  


rules.as_df()

rules.quality()

len(rules)

6

rules[0:3].as_df()

rules[[True, False, True, False, True, False]].as_df()

rules.labels()

['{} => {A}',
 '{} => {C}',
 '{B} => {A}',
 '{B} => {C}',
 '{A,B} => {C}',
 '{B,C} => {A}']

rules.items().as_df()

rules.lhs().as_df()

rules.lhs().as_list()

[[], [], ['B'], ['B'], ['A', 'B'], ['B', 'C']]

rules.rhs().as_df()

rules.sort(by = 'lift').as_df()

rules.quality()

im = rules.interestMeasure(["phi", 'support'])
im

rules.addQuality(im)
rules.as_df()

rules[[not x for x in rules.is_redundant()]].as_df()

rules.is_redundant()

[False, False, True, False, True, True]

rules.is_maximal()

[False, False, False, False, True, True]

import rpy2.robjects as ro
from arulespy.arules import Rules, ItemMatrix

trans = Transactions.from_df(pd.read_csv("https://mhahsler.github.io/arulespy/examples/Zoo.csv"))


lhs = [
    ['hair', 'milk', 'predator'],
    ['hair', 'tail', 'predator'],
    ['fins']
]
rhs = [
    ['type=mammal'],
    ['type=mammal'],
    ['type=fish']
]

r = Rules.new(ItemMatrix.from_list(lhs, itemLabels = trans), 
              ItemMatrix.from_list(rhs, itemLabels = trans))

r.as_df()

r.addQuality(r.interestMeasure(['support', 'confidence', 'lift'], trans))

r.as_df().round(2)

R[write to console]: In addition: 
R[write to console]: Warning message:

R[write to console]: Column(s) 13, 17 not logical or factor. Applying default discretization (see '? discretizeDF').

superset = trans.is_superset(r.lhs(), sparse = True)
superset

<101x3 sparse matrix of type '<class 'numpy.int64'>'
	with 53 stored elements in Compressed Sparse Column format>

superset[0:1, ].toarray()

array([[1, 0, 0]])

print("Transaction 1:", trans[0:1].as_list(), "\n")

print("Rule 1:\n", r[0:1].as_df())

Transaction 1: [['hair', 'milk', 'predator', 'toothed', 'backbone', 'breathes', 'legs=[4,8]', 'catsize', 'type=mammal']] 

Rule 1:
                     LHS            RHS  support  confidence      lift
1  {hair,milk,predator}  {type=mammal}  0.19802         1.0  2.463415

superset.sum(axis = 2)

matrix([[20, 16, 17]])

help(apriori)

Help on function wrapper in module arulespy.arules:

wrapper(*args, **kwargs)
    Wrapper around an R function.
    
    The docstring below is built from the R documentation.
    
    description
    -----------
    
    
     Mine frequent itemsets, association rules or association hyperedges using
     the Apriori algorithm.
     
    
    
    apriori(
        data,
        parameter = rinterface.NULL,
        appearance = rinterface.NULL,
        control = rinterface.NULL,
        ___ = (was "..."). R ellipsis (any number of parameters),
    )
    
    Args:
       data :  object of class transactions. Any data structure which can be
      coerced into transactions (e.g., a binary matrix, a
      data.frame or a tibble) can also be specified and will be
      internally coerced to transactions.
    
       parameter :  object of class APparameter or named list.  The default
      behavior is to mine rules with minimum support of 0.1,
      minimum confidence of 0.8, maximum of 10 items (maxlen), and
      a maximal time for subset checking of 5 seconds (‘maxtime’).
    
       appearance :  object of class APappearance or named list.  With this
      argument item appearance can be restricted (implements rule
      templates).  By default all items can appear unrestricted.
    
       control :  object of class APcontrol or named list. Controls the
      algorithmic performance of the mining algorithm (item
      sorting, report progress (verbose), etc.)
    
       ... :  Additional arguments are for convenience added to the
      parameter list.
    
    details
    -------
    
    
     The Apriori algorithm (Agrawal et al, 1993) employs level-wise search for
     frequent itemsets. The used C implementation of Apriori by Christian
     Borgelt (2003) includes some improvements (e.g., a prefix tree and item sorting).
     
     Warning about automatic conversion of matrices or data.frames to transactions. 
     It is preferred to create transactions manually before
     calling  apriori()  to have control over item coding. This is especially
     important when you are working with multiple datasets or several subsets of
     the same dataset. To read about item coding, see  itemCoding .
     
     If a data.frame is specified as  x , then the data is automatically
     converted into transactions by discretizing numeric data using
     discretizeDF()  and then coercion to transactions. The discretization
     may fail if the data is not well behaved.
     
     Apriori only creates rules with one item in the RHS (Consequent). 
     The default value in  APparameter  for  minlen  is 1.
     This meains that rules with only one item (i.e., an empty antecedent/LHS)
     like
     
     \{\} => \{beer\} {} => {beer} 
     
     will be created.  These rules mean that no matter what other items are
     involved, the item in the RHS will appear with the probability given by the
     rule's confidence (which equals the support).  If you want to avoid these
     rules then use the argument  parameter = list(minlen = 2) .
     
     Notes on run time and memory usage: 
     If the minimum  support  is
     chosen too low for the dataset, then the algorithm will try to create an
     extremely large set of itemsets/rules. This will result in very long run
     time and eventually the process will run out of memory.  To prevent this,
     the default maximal length of itemsets/rules is restricted to 10 items (via
     the parameter element  maxlen = 10 ) and the time for checking subsets is
     limited to 5 seconds (via  maxtime = 5 ). The output will show if you hit
     these limits in the "checking subsets" line of the output. The time limit is
     only checked when the subset size increases, so it may run significantly
     longer than what you specify in maxtime.  Setting  maxtime = 0  disables
     the time limit.
     
     Interrupting execution with  Control-C/Esc  is not recommended.  Memory
     cleanup will be prevented resulting in a memory leak. Also, interrupts are
     only checked when the subset size increases, so it may take some time till
     the execution actually stops.

from arulespy.arules import R_arules, Itemsets, arules2py

help(R_arules.random_patterns)

Help on DocumentedSTFunction in module rpy2.robjects.functions:

<rpy2.robjects.functions.DocumentedSTFunction ob...ebe43c0> [RTYPES.CLOSXP]
R classes: ('function',)
    Wrapper around an R function.
    
    The docstring below is built from the R documentation.
    
    description
    -----------
    
    
     Simulate random  transactions  using different methods.
     
    
    
    random.patterns(
        nItems,
        nPats = 2000.0,
        method = rinterface.NULL,
        lPats = 4.0,
        corr = 0.5,
        cmean = 0.5,
        cvar = 0.1,
        iWeight = rinterface.NULL,
        verbose = False,
    )
    
    Args:
       nItems :  an integer. Number of items to simulate
    
       nTrans :  an integer. Number of transactions to simulate
    
       method :  name of the simulation method used (see Details Section).
    
       ... :  further arguments used for the specific simulation method
      (see details).
    
       verbose :  report progress?
    
       nPats :  number of patterns (potential maximal frequent itemsets)
      used.
    
       lPats :  average length of patterns.
    
       corr :  correlation between consecutive patterns.
    
       cmean :  mean of the corruption level (normal distribution).
    
       cvar :  variance of the corruption level.
    
       iWeight :  item selection weights to build patterns.
    
    details
    -------
    
    
     Currently two simulation methods are implemented:
     
       "independent"  (Hahsler et al, 2006): All items
     are treated as independent. The transaction size is determined by
     rpois(lambda - 1) + 1 , where  lambda  can be specified (defaults to 3).
     Note that one subtracted from lambda and added to the size to avoid
     empty transactions. The items in the transactions are randomly chosen using
     the numeric probability vector  iProb  of length  nItems 
     (default: 0.01 for each item).
       "agrawal"  (see Agrawal and Srikant, 1994): This
     method creates transactions with correlated items using  random.patters() .
     The simulation is a two-stage process. First, a set of  nPats  patterns
     (potential maximal frequent itemsets) is generated.  The length of the
     patterns is Poisson distributed with mean  lPats  and consecutive
     patterns share some items controlled by the correlation parameter
     corr .  For later use, for each pattern a pattern weight is generated
     by drawing from an exponential distribution with a mean of 1 and a
     corruption level is chosen from a normal distribution with mean  cmean 
     and variance  cvar .
     The function returns the patterns as an  itemsets  objects which can be
     supplied to  random.transactions()  as the argument  patterns .  If
     no argument  patterns  is supplied, the default values given above are
     used.
     
     In the second step, the transactions are generated using the patterns.  The
     length the transactions follows a Poisson distribution with mean
     lPats . For each transaction, patterns are randomly chosen using the
     pattern weights till the transaction length is reached. For each chosen
     pattern, the associated corruption level is used to drop some items before
     adding the pattern to the transaction.

its_r = R_arules.random_patterns(100, 10)
its_r

<rpy2.robjects.methods.RS4 object at 0x7f441886f600> [RTYPES.S4SXP]
R classes: ('itemsets',)

its_p = Itemsets(its_r)
its_p.as_df()

trans = arules2py(R_arules.random_transactions(10, 1000))

print(trans)

transactions in sparse format with
 1000 transactions (rows) and
 10 items (columns)

from scipy.sparse import csc_matrix

trans.items().as_csc_matrix()

<10x1000 sparse matrix of type '<class 'numpy.int64'>'
	with 2976 stored elements in Compressed Sparse Column format>

	items	transactionID
1	{A,B,C}	0
2	{A}	1
3	{A,B,C}	2
4	{A}	3
5	{A,B,C}	4
6	{A,C}	5
7	{A,B,C}	6
8	{C}	7
9	{B,C}	8
10	{A,C}	9

	items	transactionID
1	{color=red,size=[11.3,16.7),class}	0
2	{color=blue,size=[9,11.3)}	1
3	{color=red,size=[16.7,18],class}	2
4	{color=green,size=[16.7,18]}	3
5	{color=red,size=[11.3,16.7),class}	4
6	{color=blue,size=[9,11.3)}	5

	phi	support
1	NaN	0.8
2	NaN	0.8
3	0.000000	0.4
4	0.500000	0.5
5	0.408248	0.4
6	0.000000	0.4

	items	pWeights	pCorrupts
1	{item51,item53,item55,item59}	0.016862	0.000000
2	{item7,item10,item51,item62,item78}	0.094877	0.479575
3	{item62,item91}	0.136921	0.030957
4	{item53,item62,item76,item98}	0.116791	0.770604
5	{item53,item61,item74,item78,item93}	0.184119	0.689259
6	{item53,item61,item74,item93}	0.261557	0.808408
7	{item61,item93}	0.019522	0.000000
8	{item23,item79,item92}	0.007628	0.860331
9	{item23,item32,item62,item75,item82,item92}	0.114453	0.892963
10	{item62,item82}	0.047270	0.856183

How to use the R package `arules` from Python using `arulespy`¶

Installation¶

Basic Usage¶

Creating transaction data¶

Working with transactions¶

Converting transactions into Python data strucutres¶

Mixing nominal and numeric variables¶

Mine association rules¶

Accessing Rules¶

Work With Interest Measures¶

Filter Redundant Rules¶

Create Rules Objects¶

Find Super and Subsets¶

Online Help for Functions Available via arulespy¶

Low-level R arules interface¶

	A	B	C
0	True	True	True
1	True	False	False
2	True	True	True
3	True	False	False
4	True	True	True
5	True	False	True
6	True	True	True
7	False	False	True
8	False	True	True
9	True	False	True

	labels	variables	levels
1	color=blue	color	blue
2	color=green	color	green
3	color=red	color	red
4	size=[9,11.3)	size	[9,11.3)
5	size=[11.3,16.7)	size	[11.3,16.7)
6	size=[16.7,18]	size	[16.7,18]
7	class	class	TRUE

	LHS	RHS	support	confidence	coverage	lift	count
1	{}	{A}	0.8	0.8	1.0	1.00	8
2	{}	{C}	0.8	0.8	1.0	1.00	8
3	{B}	{A}	0.4	0.8	0.5	1.00	4
4	{B}	{C}	0.5	1.0	0.5	1.25	5
5	{A,B}	{C}	0.4	1.0	0.4	1.25	4
6	{B,C}	{A}	0.4	0.8	0.5	1.00	4

	LHS	RHS
1	{hair,milk,predator}	{type=mammal}
2	{hair,predator,tail}	{type=mammal}
3	{fins}	{type=fish}

	LHS	RHS	support	confidence	lift
1	{hair,milk,predator}	{type=mammal}	0.20	1.00	2.46
2	{hair,predator,tail}	{type=mammal}	0.16	1.00	2.46
3	{fins}	{type=fish}	0.13	0.76	5.94

How to use the R package arules from Python using arulespy¶

Installation¶

Basic Usage¶

Creating transaction data¶

Working with transactions¶

Converting transactions into Python data strucutres¶

Mixing nominal and numeric variables¶

Mine association rules¶

Accessing Rules¶

Work With Interest Measures¶

Filter Redundant Rules¶

Create Rules Objects¶

Find Super and Subsets¶

Online Help for Functions Available via arulespy¶

Low-level R arules interface¶

How to use the R package `arules` from Python using `arulespy`¶