arulespy
is a Python module available from PyPI.
The arules
module in arulespy
provides an easy to install Python interface to the
R package arules for association rule mining built
with rpy2
.
The R arules package implements a comprehensive infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat, and optimized C/C++ code for mining and manipulating association rules using sparse matrix representation.
The arulesViz
module provides plot()
for visualizing association rules using
the R package arulesViz.
arulespy
provides Python classes
for
Transactions
: Convert pandas dataframes into transaction dataRules
: Association rulesItemsets
: ItemsetsItemMatrix
: sparse matrix representation of sets of items.with Phyton-style slicing and len()
.
Most arules functions are
interfaced as methods for the four classes with conversion from the R data structures to Python.
Documentation is avaialible in Python via help()
. Detailed online documentation
for the R package is available here.
Low-level arules
functions can also be directly used in the form
R.<arules R function>()
. The result will be a rpy2
data type.
Transactions, itemsets and rules can manually be converted to Python
classes using the helper function a2p()
.
To cite the Python module ‘arulespy’ in publications use:
Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: 10.48550/arXiv.2305.15263
arulespy
is based on the python package rpy2
which requires an R installation. Here are the installation steps:
Install the latest version of R (>4.0) from https://www.r-project.org/
sudo apt-get install libcurl4-openssl-dev
brew install curl
arulespy
which will automatically install rpy2
and pandas
.
pip install arulespy
Optional: Set the environment variable R_LIBS_USER
to decide where R packages are stored
(see libPaths() for details). If not set then R will determine a suitable location.
arulespy
will install the needed R packages when it is imported for the first time.
This may take a while. R packages can also be preinstalled. Start R and run
install.packages(c("arules", "arulesViz"))
The most likely issue is that rpy2
does not find R or R’s shared library.
This will lead the python kernel to die or exit without explanation when the package arulespy
is imported.
Check python -m rpy2.situation
to see if R and R’s libraries are found.
If you use iPython notebooks then you can include the following code block in your notebook to check:
from rpy2 import situation
for row in situation.iter_info():
print(row)
The output should include a line saying Loading R library from rpy2: OK
.
rpy2
currently does not fully support Windows and the installation is somewhat tricky. I was able to use it with the following setup:
I use the following code to set the needed environment variables needed by Windows
before I import from arulespy
from rpy2 import situation
import os
r_home = situation.r_home_from_registry()
r_bin = r_home + '\\bin\\x64\\'
os.environ['R_HOME'] = r_home
os.environ['PATH'] = r_bin + ";" + os.environ['PATH']
os.add_dll_directory(r_bin)
for row in situation.iter_info():
print(row)
The output should include a line saying Loading R library from rpy2: OK
More information on installing rpy2
can be found here.
from arulespy.arules import Transactions, apriori, parameters
import pandas as pd
# define the data as a pandas dataframe
df = pd.DataFrame (
[
[True,True, True],
[True, False,False],
[True, True, True],
[True, False, False],
[True, True, True]
],
columns=list ('ABC'))
# convert dataframe to transactions
trans = transactions.from_df(df)
# mine association rules
rules = apriori(trans,
parameter = parameters({"supp": 0.1, "conf": 0.8}),
control = parameters({"verbose": False}))
# display the rules as a pandas dataframe
rules.as_df()
LHS | RHS | support | confidence | coverage | lift | count | |
---|---|---|---|---|---|---|---|
1 | {} | {A} | 0.8 | 0.8 | 1 | 1 | 8 |
2 | {} | {C} | 0.8 | 0.8 | 1 | 1 | 8 |
3 | {B} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |
4 | {B} | {C} | 0.5 | 1 | 0.5 | 1.25 | 5 |
5 | {A,B} | {C} | 0.4 | 1 | 0.4 | 1.25 | 4 |
6 | {B,C} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |
Complete examples: