How to use the R package arules from Python using arulespy¶

This document is also available as an IPython notebook or you can open and run it directly in Google Colab.

Installation¶

The package can be installed using pip via the terminal

pip install arulespy

Or using the following magic command (note: use %conda if you use conda)

In [ ]:
%pip install arulespy
Requirement already satisfied: arulespy in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (0.1.4)
Requirement already satisfied: pandas>1.5.3 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from arulespy) (2.1.0)
Requirement already satisfied: numpy>=1.14.2 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from arulespy) (1.25.2)
Requirement already satisfied: scipy>=1.10.1 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from arulespy) (1.11.2)
Requirement already satisfied: rpy2>=3.5.11 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from arulespy) (3.5.14)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from pandas>1.5.3->arulespy) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from pandas>1.5.3->arulespy) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from pandas>1.5.3->arulespy) (2023.3)
Requirement already satisfied: cffi>=1.10.0 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from rpy2>=3.5.11->arulespy) (1.15.1)
Requirement already satisfied: jinja2 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from rpy2>=3.5.11->arulespy) (3.1.2)
Requirement already satisfied: tzlocal in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from rpy2>=3.5.11->arulespy) (5.0.1)
Requirement already satisfied: pycparser in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from cffi>=1.10.0->rpy2>=3.5.11->arulespy) (2.21)
Requirement already satisfied: six>=1.5 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas>1.5.3->arulespy) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/hahsler/baR/arulespy/.venv/lib/python3.10/site-packages (from jinja2->rpy2>=3.5.11->arulespy) (2.1.3)
Note: you may need to restart the kernel to use updated packages.

The code below may be needed for Windows users.

In [ ]:
## Windows users: These environment variables may be necessary till rpy2 does this automatically
#from rpy2 import situation
#import os
#
#r_home = situation.r_home_from_registry()
#r_bin = r_home + '\\bin\\x64\\'
#os.environ['R_HOME'] = r_home
#os.environ['PATH'] =  r_bin + ";" + os.environ['PATH']
#os.add_dll_directory(r_bin)

Basic Usage¶

Import the arules module from package arulespy. This will take a while if you run it for the first time since it needs to install all the needed R packages.

In [ ]:
from arulespy.arules import Transactions, apriori, parameters, concat

Creating transaction data¶

The data need to be prepared as a Pandas dataframe. Here we have 9 transactions with three items called A, B and C. True means that a transaction contains the item.

In [ ]:
import pandas as pd

df = pd.DataFrame (
    [
        [True,True, True],
        [True, False,False],
        [True, True, True],
        [True, False, False],
        [True, True, True],
        [True, False, True],
        [True, True, True],
        [False, False, True],
        [False, True, True],
        [True, False, True],
    ],
    columns=list ('ABC')) 

df
Out[ ]:
A B C
0 True True True
1 True False False
2 True True True
3 True False False
4 True True True
5 True False True
6 True True True
7 False False True
8 False True True
9 True False True

Convert the pandas dataframe into a sparse transactions object.

In [ ]:
trans = Transactions.from_df(df)
print(trans)

trans.as_df()
transactions in sparse format with
 10 transactions (rows) and
 3 items (columns)

Out[ ]:
items transactionID
1 {A,B,C} 0
2 {A} 1
3 {A,B,C} 2
4 {A} 3
5 {A,B,C} 4
6 {A,C} 5
7 {A,B,C} 6
8 {C} 7
9 {B,C} 8
10 {A,C} 9
In [ ]:
trans.itemLabels()
Out[ ]:
['A', 'B', 'C']

Working with transactions¶

We can calculate item frequencies, sample transactions or remove duplicate transactions. All available functions can be found at the end of this document.

In [ ]:
trans.itemFrequency(type = 'relative')
Out[ ]:
[0.8, 0.5, 0.8]
In [ ]:
trans.sample(3).as_df()
Out[ ]:
items transactionID
8 {C} 7
10 {A,C} 9
9 {B,C} 8
In [ ]:
trans.unique().as_df()
Out[ ]:
items transactionID
1 {A,B,C} 0
2 {A} 1
6 {A,C} 5
8 {C} 7
9 {B,C} 8

Create new data that uses the same encoding as an existing transaction set from a pandas dataframe. Note that the following dataframe has the columns (items) in reverse order which is fixed when the itemencoding in trans is used.

In [ ]:
trans2 = Transactions.from_df(pd.DataFrame (
    [
        [True,True, False],
        [False, False, True],
    ],
    columns=list ('CBA')), trans)

trans2.as_df()
Out[ ]:
items transactionID
1 {B,C} 0
2 {A} 1

Create the same transaction, but from a list of lists. Note that the order of the items is fixed to match trans.

In [ ]:
trans3 = Transactions.from_list([['B', 'A'],
                        ['C']], 
                        trans)

trans3.as_df()
Out[ ]:
items
1 {A,B}
2 {C}

Add the new transaction to the existing transactions.

In [ ]:
concat([trans, trans2]).as_df()
Out[ ]:
items transactionID
1 {A,B,C} 0
2 {A} 1
3 {A,B,C} 2
4 {A} 3
5 {A,B,C} 4
6 {A,C} 5
7 {A,B,C} 6
8 {C} 7
9 {B,C} 8
10 {A,C} 9
11 {B,C} 0
12 {A} 1

Converting transactions into Python data strucutres¶

Transactions can be converted into several Python formats inclusing 0-1 matrices, lists of item labels, lists of item idices or a sparse matrix.

In [ ]:
trans.as_matrix()
Out[ ]:
array([[1, 1, 1],
       [1, 0, 0],
       [1, 1, 1],
       [1, 0, 0],
       [1, 1, 1],
       [1, 0, 1],
       [1, 1, 1],
       [0, 0, 1],
       [0, 1, 1],
       [1, 0, 1]], dtype=int32)
In [ ]:
trans.as_list()
Out[ ]:
[['A', 'B', 'C'],
 ['A'],
 ['A', 'B', 'C'],
 ['A'],
 ['A', 'B', 'C'],
 ['A', 'C'],
 ['A', 'B', 'C'],
 ['C'],
 ['B', 'C'],
 ['A', 'C']]
In [ ]:
trans.as_int_list()
Out[ ]:
[[1, 2, 3],
 [1],
 [1, 2, 3],
 [1],
 [1, 2, 3],
 [1, 3],
 [1, 2, 3],
 [3],
 [2, 3],
 [1, 3]]
In [ ]:
trans.as_csc_matrix()
Out[ ]:
<3x10 sparse matrix of type '<class 'numpy.int64'>'
	with 21 stored elements in Compressed Sparse Column format>

Mixing nominal and numeric variables¶

Converting a dataframe with nominal and numeric variables. The nominal variables are converted into the form variable=value and numeric variables are first discretized (see arules.discretizeDF()).

In [ ]:
df2 = pd.DataFrame (
    [
        ['red',  12, True],
        ['blue', 10, False],
        ['red',  18, True],
        ['green',18, False],
        ['red',  16, True],
        ['blue',  9, False]
    ],
    columns=list(['color', 'size', 'class'])) 

trans2 = Transactions.from_df(df2)
trans2.as_df()
Out[ ]:
items transactionID
1 {color=red,size=[11.3,16.7),class} 0
2 {color=blue,size=[9,11.3)} 1
3 {color=red,size=[16.7,18],class} 2
4 {color=green,size=[16.7,18]} 3
5 {color=red,size=[11.3,16.7),class} 4
6 {color=blue,size=[9,11.3)} 5

Details on item label creation can be retrieved using arules.itemInfo().

In [ ]:
trans2.itemInfo()
R[write to console]: In addition: 
R[write to console]: Warning message:

R[write to console]: Column(s) 1, 2 not logical or factor. Applying default discretization (see '? discretizeDF'). 

Out[ ]:
labels variables levels
1 color=blue color blue
2 color=green color green
3 color=red color red
4 size=[9,11.3) size [9,11.3)
5 size=[11.3,16.7) size [11.3,16.7)
6 size=[16.7,18] size [16.7,18]
7 class class TRUE

Mine association rules¶

arules.apriori() calls the apriori algorithm and converts the results into a Python arulespy.arules.Rules object. Parameters for the algorithm are specified as dict inside the arules.parameter() funcition.

In [ ]:
rules = apriori(trans,
                    parameter = parameters({"supp": 0.1, "conf": 0.8}), 
                    control = parameters({"verbose": False}))  


rules.as_df()
Out[ ]:
LHS RHS support confidence coverage lift count
1 {} {A} 0.8 0.8 1.0 1.00 8
2 {} {C} 0.8 0.8 1.0 1.00 8
3 {B} {A} 0.4 0.8 0.5 1.00 4
4 {B} {C} 0.5 1.0 0.5 1.25 5
5 {A,B} {C} 0.4 1.0 0.4 1.25 4
6 {B,C} {A} 0.4 0.8 0.5 1.00 4
In [ ]:
rules.quality()
Out[ ]:
support confidence coverage lift count
1 0.8 0.8 1.0 1.00 8
2 0.8 0.8 1.0 1.00 8
3 0.4 0.8 0.5 1.00 4
4 0.5 1.0 0.5 1.25 5
5 0.4 1.0 0.4 1.25 4
6 0.4 0.8 0.5 1.00 4

Python-style len() and slicing is available.

In [ ]:
len(rules)
Out[ ]:
6
In [ ]:
rules[0:3].as_df()
Out[ ]:
LHS RHS support confidence coverage lift count
1 {} {A} 0.8 0.8 1.0 1.0 8
2 {} {C} 0.8 0.8 1.0 1.0 8
3 {B} {A} 0.4 0.8 0.5 1.0 4
In [ ]:
rules[[True, False, True, False, True, False]].as_df()
Out[ ]:
LHS RHS support confidence coverage lift count
1 {} {A} 0.8 0.8 1.0 1.00 8
3 {B} {A} 0.4 0.8 0.5 1.00 4
5 {A,B} {C} 0.4 1.0 0.4 1.25 4

Accessing Rules¶

rules can be converted into various Python data structures.

In [ ]:
rules.labels()
Out[ ]:
['{} => {A}',
 '{} => {C}',
 '{B} => {A}',
 '{B} => {C}',
 '{A,B} => {C}',
 '{B,C} => {A}']
In [ ]:
rules.items().as_df()
Out[ ]:
items
1 {A}
2 {C}
3 {A,B}
4 {B,C}
5 {A,B,C}
6 {A,B,C}
In [ ]:
rules.lhs().as_df()
Out[ ]:
items
1 {}
2 {}
3 {B}
4 {B}
5 {A,B}
6 {B,C}
In [ ]:
rules.lhs().as_list()
Out[ ]:
[[], [], ['B'], ['B'], ['A', 'B'], ['B', 'C']]
In [ ]:
rules.rhs().as_df()
Out[ ]:
items
1 {A}
2 {C}
3 {A}
4 {C}
5 {C}
6 {A}

The LHS and RHS of rules are of type itemMatrix in the same way are transactions are. Therefore, all conversions (to lists, sparce matrices, etc.) are also availabe.

In [ ]:
rules.sort(by = 'lift').as_df()
Out[ ]:
LHS RHS support confidence coverage lift count
4 {B} {C} 0.5 1.0 0.5 1.25 5
5 {A,B} {C} 0.4 1.0 0.4 1.25 4
1 {} {A} 0.8 0.8 1.0 1.00 8
2 {} {C} 0.8 0.8 1.0 1.00 8
3 {B} {A} 0.4 0.8 0.5 1.00 4
6 {B,C} {A} 0.4 0.8 0.5 1.00 4

Work With Interest Measures¶

Interest measures are stored as the quality attribute in rules and itemsets.

In [ ]:
rules.quality()
Out[ ]:
support confidence coverage lift count
1 0.8 0.8 1.0 1.00 8
2 0.8 0.8 1.0 1.00 8
3 0.4 0.8 0.5 1.00 4
4 0.5 1.0 0.5 1.25 5
5 0.4 1.0 0.4 1.25 4
6 0.4 0.8 0.5 1.00 4

Additional interest measures can be calculated with interestMeasure() and added to rules or itemsets using addQuality(). See all available meassures. To calculate some measures, transactions need to be specified.

In [ ]:
im = rules.interestMeasure(["phi", 'support'])
im
Out[ ]:
phi support
1 NaN 0.8
2 NaN 0.8
3 0.000000 0.4
4 0.500000 0.5
5 0.408248 0.4
6 0.000000 0.4
In [ ]:
rules.addQuality(im)
rules.as_df()
Out[ ]:
LHS RHS support confidence coverage lift count phi
1 {} {A} 0.8 0.8 1.0 1.00 8 NaN
2 {} {C} 0.8 0.8 1.0 1.00 8 NaN
3 {B} {A} 0.4 0.8 0.5 1.00 4 0.000000
4 {B} {C} 0.5 1.0 0.5 1.25 5 0.500000
5 {A,B} {C} 0.4 1.0 0.4 1.25 4 0.408248
6 {B,C} {A} 0.4 0.8 0.5 1.00 4 0.000000

Filter Redundant Rules¶

In [ ]:
rules[[not x for x in rules.is_redundant()]].as_df()
Out[ ]:
LHS RHS support confidence coverage lift count phi
1 {} {A} 0.8 0.8 1.0 1.00 8 NaN
2 {} {C} 0.8 0.8 1.0 1.00 8 NaN
4 {B} {C} 0.5 1.0 0.5 1.25 5 0.5
In [ ]:
rules.is_redundant()
Out[ ]:
[False, False, True, False, True, True]

Find maximal rules.

In [ ]:
rules.is_maximal()
Out[ ]:
[False, False, False, False, True, True]

Create Rules Objects¶

To import rules from other tools or to create rules manually, rules for arules can be created from lists of sets of items. The item labels (i.e., the sparse representation) is taken from the transactions trans.

The LHS and RHS of rules are of tpye itemMatrix and can be created by conversion form pandas data fames of lists of lists.

In [ ]:
import rpy2.robjects as ro
from arulespy.arules import Rules, ItemMatrix

trans = Transactions.from_df(pd.read_csv("https://mhahsler.github.io/arulespy/examples/Zoo.csv"))


lhs = [
    ['hair', 'milk', 'predator'],
    ['hair', 'tail', 'predator'],
    ['fins']
]
rhs = [
    ['type=mammal'],
    ['type=mammal'],
    ['type=fish']
]

r = Rules.new(ItemMatrix.from_list(lhs, itemLabels = trans), 
              ItemMatrix.from_list(rhs, itemLabels = trans))

r.as_df()
Out[ ]:
LHS RHS
1 {hair,milk,predator} {type=mammal}
2 {hair,predator,tail} {type=mammal}
3 {fins} {type=fish}

Next, we add interest measures calculated on the transactions.

In [ ]:
r.addQuality(r.interestMeasure(['support', 'confidence', 'lift'], trans))

r.as_df().round(2)
R[write to console]: In addition: 
R[write to console]: Warning message:

R[write to console]: Column(s) 13, 17 not logical or factor. Applying default discretization (see '? discretizeDF'). 

Out[ ]:
LHS RHS support confidence lift
1 {hair,milk,predator} {type=mammal} 0.20 1.00 2.46
2 {hair,predator,tail} {type=mammal} 0.16 1.00 2.46
3 {fins} {type=fish} 0.13 0.76 5.94

Find Super and Subsets¶

Subset calcualtion returns a large binary matrix. Since this matrix is often sparse, it is represented as a sparse matrix. For example, subset can be used to check which transactions contain the items in the LHS of the rules. The result is a number of transactions by number of rules sparse matrix.

In [ ]:
superset = trans.is_superset(r.lhs(), sparse = True)
superset
Out[ ]:
<101x3 sparse matrix of type '<class 'numpy.int64'>'
	with 53 stored elements in Compressed Sparse Column format>
In [ ]:
superset[0:1, ].toarray()
Out[ ]:
array([[1, 0, 0]])

Show first row as a dense vector. Transaction 1 is a superset of the LHS of the first rule. That is, transaction 1 contains the items in the LHS of Rule 1.

In [ ]:
print("Transaction 1:", trans[0:1].as_list(), "\n")

print("Rule 1:\n", r[0:1].as_df())
Transaction 1: [['hair', 'milk', 'predator', 'toothed', 'backbone', 'breathes', 'legs=[4,8]', 'catsize', 'type=mammal']] 

Rule 1:
                     LHS            RHS  support  confidence      lift
1  {hair,milk,predator}  {type=mammal}  0.19802         1.0  2.463415

This information can be used to find the LHS support count for the three rules by summing along the columns.

In [ ]:
superset.sum(axis = 2)
Out[ ]:
matrix([[20, 16, 17]])

Online Help for Functions Available via arulespy¶

In [ ]:
help(apriori)
Help on function wrapper in module arulespy.arules:

wrapper(*args, **kwargs)
    Wrapper around an R function.
    
    The docstring below is built from the R documentation.
    
    description
    -----------
    
    
     Mine frequent itemsets, association rules or association hyperedges using
     the Apriori algorithm.
     
    
    
    apriori(
        data,
        parameter = rinterface.NULL,
        appearance = rinterface.NULL,
        control = rinterface.NULL,
        ___ = (was "..."). R ellipsis (any number of parameters),
    )
    
    Args:
       data :  object of class transactions. Any data structure which can be
      coerced into transactions (e.g., a binary matrix, a
      data.frame or a tibble) can also be specified and will be
      internally coerced to transactions.
    
       parameter :  object of class APparameter or named list.  The default
      behavior is to mine rules with minimum support of 0.1,
      minimum confidence of 0.8, maximum of 10 items (maxlen), and
      a maximal time for subset checking of 5 seconds (‘maxtime’).
    
       appearance :  object of class APappearance or named list.  With this
      argument item appearance can be restricted (implements rule
      templates).  By default all items can appear unrestricted.
    
       control :  object of class APcontrol or named list. Controls the
      algorithmic performance of the mining algorithm (item
      sorting, report progress (verbose), etc.)
    
       ... :  Additional arguments are for convenience added to the
      parameter list.
    
    details
    -------
    
    
     The Apriori algorithm (Agrawal et al, 1993) employs level-wise search for
     frequent itemsets. The used C implementation of Apriori by Christian
     Borgelt (2003) includes some improvements (e.g., a prefix tree and item sorting).
     
     Warning about automatic conversion of matrices or data.frames to transactions. 
     It is preferred to create transactions manually before
     calling  apriori()  to have control over item coding. This is especially
     important when you are working with multiple datasets or several subsets of
     the same dataset. To read about item coding, see  itemCoding .
     
     If a data.frame is specified as  x , then the data is automatically
     converted into transactions by discretizing numeric data using
     discretizeDF()  and then coercion to transactions. The discretization
     may fail if the data is not well behaved.
     
     Apriori only creates rules with one item in the RHS (Consequent). 
     The default value in  APparameter  for  minlen  is 1.
     This meains that rules with only one item (i.e., an empty antecedent/LHS)
     like
     
     \{\} => \{beer\} {} => {beer} 
     
     will be created.  These rules mean that no matter what other items are
     involved, the item in the RHS will appear with the probability given by the
     rule's confidence (which equals the support).  If you want to avoid these
     rules then use the argument  parameter = list(minlen = 2) .
     
     Notes on run time and memory usage: 
     If the minimum  support  is
     chosen too low for the dataset, then the algorithm will try to create an
     extremely large set of itemsets/rules. This will result in very long run
     time and eventually the process will run out of memory.  To prevent this,
     the default maximal length of itemsets/rules is restricted to 10 items (via
     the parameter element  maxlen = 10 ) and the time for checking subsets is
     limited to 5 seconds (via  maxtime = 5 ). The output will show if you hit
     these limits in the "checking subsets" line of the output. The time limit is
     only checked when the subset size increases, so it may run significantly
     longer than what you specify in maxtime.  Setting  maxtime = 0  disables
     the time limit.
     
     Interrupting execution with  Control-C/Esc  is not recommended.  Memory
     cleanup will be prevented resulting in a memory leak. Also, interrupts are
     only checked when the subset size increases, so it may take some time till
     the execution actually stops.

Low-level R arules interface¶

arules functions can also be directly called using R_arules.<arules R function>() and R_arulesViz.<arules R function>(). The result will be a rpy2 data type. Transactions, itemsets and rules can manually be converted to Python classes using.

In [ ]:
from arulespy.arules import R_arules, Itemsets, arules2py
In [ ]:
help(R_arules.random_patterns)
Help on DocumentedSTFunction in module rpy2.robjects.functions:

<rpy2.robjects.functions.DocumentedSTFunction ob...ebe43c0> [RTYPES.CLOSXP]
R classes: ('function',)
    Wrapper around an R function.
    
    The docstring below is built from the R documentation.
    
    description
    -----------
    
    
     Simulate random  transactions  using different methods.
     
    
    
    random.patterns(
        nItems,
        nPats = 2000.0,
        method = rinterface.NULL,
        lPats = 4.0,
        corr = 0.5,
        cmean = 0.5,
        cvar = 0.1,
        iWeight = rinterface.NULL,
        verbose = False,
    )
    
    Args:
       nItems :  an integer. Number of items to simulate
    
       nTrans :  an integer. Number of transactions to simulate
    
       method :  name of the simulation method used (see Details Section).
    
       ... :  further arguments used for the specific simulation method
      (see details).
    
       verbose :  report progress?
    
       nPats :  number of patterns (potential maximal frequent itemsets)
      used.
    
       lPats :  average length of patterns.
    
       corr :  correlation between consecutive patterns.
    
       cmean :  mean of the corruption level (normal distribution).
    
       cvar :  variance of the corruption level.
    
       iWeight :  item selection weights to build patterns.
    
    details
    -------
    
    
     Currently two simulation methods are implemented:
     
       "independent"  (Hahsler et al, 2006): All items
     are treated as independent. The transaction size is determined by
     rpois(lambda - 1) + 1 , where  lambda  can be specified (defaults to 3).
     Note that one subtracted from lambda and added to the size to avoid
     empty transactions. The items in the transactions are randomly chosen using
     the numeric probability vector  iProb  of length  nItems 
     (default: 0.01 for each item).
       "agrawal"  (see Agrawal and Srikant, 1994): This
     method creates transactions with correlated items using  random.patters() .
     The simulation is a two-stage process. First, a set of  nPats  patterns
     (potential maximal frequent itemsets) is generated.  The length of the
     patterns is Poisson distributed with mean  lPats  and consecutive
     patterns share some items controlled by the correlation parameter
     corr .  For later use, for each pattern a pattern weight is generated
     by drawing from an exponential distribution with a mean of 1 and a
     corruption level is chosen from a normal distribution with mean  cmean 
     and variance  cvar .
     The function returns the patterns as an  itemsets  objects which can be
     supplied to  random.transactions()  as the argument  patterns .  If
     no argument  patterns  is supplied, the default values given above are
     used.
     
     In the second step, the transactions are generated using the patterns.  The
     length the transactions follows a Poisson distribution with mean
     lPats . For each transaction, patterns are randomly chosen using the
     pattern weights till the transaction length is reached. For each chosen
     pattern, the associated corruption level is used to drop some items before
     adding the pattern to the transaction.

In [ ]:
its_r = R_arules.random_patterns(100, 10)
its_r
Out[ ]:
<rpy2.robjects.methods.RS4 object at 0x7f441886f600> [RTYPES.S4SXP]
R classes: ('itemsets',)

Since we directly called a R function, we need to manually wrap the R object as a Python object before we use it in Python.

In [ ]:
its_p = Itemsets(its_r)
its_p.as_df()
Out[ ]:
items pWeights pCorrupts
1 {item51,item53,item55,item59} 0.016862 0.000000
2 {item7,item10,item51,item62,item78} 0.094877 0.479575
3 {item62,item91} 0.136921 0.030957
4 {item53,item62,item76,item98} 0.116791 0.770604
5 {item53,item61,item74,item78,item93} 0.184119 0.689259
6 {item53,item61,item74,item93} 0.261557 0.808408
7 {item61,item93} 0.019522 0.000000
8 {item23,item79,item92} 0.007628 0.860331
9 {item23,item32,item62,item75,item82,item92} 0.114453 0.892963
10 {item62,item82} 0.047270 0.856183
In [ ]:
trans = arules2py(R_arules.random_transactions(10, 1000))

print(trans)
transactions in sparse format with
 1000 transactions (rows) and
 10 items (columns)

Access directly the sparse representation.

In [ ]:
from scipy.sparse import csc_matrix

trans.items().as_csc_matrix()
Out[ ]:
<10x1000 sparse matrix of type '<class 'numpy.int64'>'
	with 2976 stored elements in Compressed Sparse Column format>