[1] |
Ron Kohavi and Foster Provost.
Glossary of terms.
Machine Learning, 30(2--3):271--274, 1988.
[ bib ]
Definition of measures from Machine Learning. Theses are especially interesting for comparison and evaluation of association rule algorithms. Keywords: evaluation |
[2] |
M. Sebag and M. Schoenauer.
Generation of rules with certainty and confidence factors from
incomplete and incoherent learning bases.
In In Proceedings of the European Knowledge Acquisition Workshop
(EKAW'88), Gesellschaft fuer Mathematik und Datenverarbeitung mbH, 1988.
[ bib ]
Defines the Seebag measure for rules. Keywords: measure |
[3] |
G. Piatetsky-Shapiro.
Discovery, analysis, and presentation of strong rules.
In G. Piatetsky-Shapiro and W.J. Frawley, editors, Knowledge
Discovery in Databases. AAAI/MIT Press, Cambridge, MA, 1991.
[ bib ]
Introduces the measure LEVERAGE which is the simplest function which satisfies his principles for rule-interest functions (0 if the variables are statistically independent; monotonically increasing if the variables occur more often together; monotonically decreasing if one of the variables alone occurs more often). Keywords: kdd, measure |
[4] |
Padhraic Smyth and R. Goodman.
Rule induction using information theory.
In Knowledge Discovery in Databases, 1991.
[ bib ]
Introduces the J-Measure as a scaled measures of cross entropy for the information content of a rule. Keywords: measure |
[5] |
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami.
Database mining: A performance perspective.
IEEE Transactions on Knowledge and Data Engineering,
5(6):914--925, 1993.
[ bib |
DOI ]
Places association rule mining together with classification and sequence mining into the context of rule discovery in database mining. The authors basic operations and an algorithm to discover classification rules. For the evaluation they generate artificial survey data using different classification functions. Keywords: evaluation |
[6] |
R. Agrawal, T. Imielinski, and A. Swami.
Mining association rules between sets of items in large databases.
In Proceedings of the ACM SIGMOD International Conference on
Management of Data, pages 207--216, Washington D.C., May 1993.
[ bib |
DOI ]
Introduces association rules and the SUPPORT-CONFIDENCE framework and an algorithm to mine large itemsets. The algorithm is sometimes called AIS after the authors initials. Keywords: algorithm |
[7] |
Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, and
A. Inkeri Verkamo.
Finding interesting rules from large sets of discovered association
rules.
In Nabil R. Adam, Bharat K. Bhargava, and Yelena Yesha, editors,
Third International Conference on Information and Knowledge Management
(CIKM'94), pages 401--407. ACM Press, 1994.
[ bib |
DOI ]
Introduce the usage of rule templates. Keywords: constraint |
[8] |
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Efficient algorithms for discovering association rules.
In Usama M. Fayyad and Ramasamy Uthurusamy, editors, AAAI
Workshop on Knowledge Discovery in Databases (KDD-94), pages 181--192,
Seattle, Washington, 1994. AAAI Press.
[ bib ]
Develop similar improvements to the candidate generation as APRIORI. Itemsets with support are called covering sets. The paper also introduces sampling from the database and gives bounds for the resulting estimate of support. Keywords: algorithm,sampling |
[9] |
Rakesh Agrawal and Ramakrishnan Srikant.
Fast algorithms for mining association rules in large databases.
In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors,
Proceedings of the 20th International Conference on Very Large Data Bases,
VLDB, pages 487--499, Santiago, Chile, September 1994.
[ bib ]
Introduction of the APRIORI algorithm (the best-known algorithm; it uses a breadth-first search strategy to counting the support of itemsets). The algorithm uses an improved candidate generation function which exploits the downward closure property of support and makes it more efficient than AIS. Also an algorithm to generate synthetic transaction data is presented. Such synthetic transaction data are widely used for the evaluation and comparison of new algorithms. Keywords: algorithm, evaluation |
[10] |
Rakesh Agrawal and Ramakrishnan Srikant.
Mining sequential patterns.
In Philip S. Yu and Arbee S. P. Chen, editors, Eleventh
International Conference on Data Engineering, pages 3--14, Taipei, Taiwan,
1995. IEEE Computer Society Press.
[ bib ]
Introduces mining sequential patterns. A sequential pattern is a maximal sequence that exceeds minimum support (a minimum number of customers). The algorithms AprioriSome and AprioryAll (based on Apriori) are presented. Keywords: sequential |
[11] |
Ashok Savasere, Edward Omiecinski, and Shamkant Navathe.
An efficient algorithm for mining association rules in large
databases.
In Proceedings of the 21st VLDB Conference, pages 432--443,
Zurich, Switzerland, 1995.
[ bib ]
Introduction of the PARTITION algorithm. The database is scanned only twice. For the first scan the DB is partitioned and in each partition support is counted. Then the counts are merged to generate potential large itemsets. In the second scan the potential large itemsets are counted to find the actual large itemsets. Keywords: algorithm |
[12] |
Ramakrishnan Srikant and Rakesh Agrawal.
Mining generalized association rules.
In Proceedings of the 21st VLDB Conference, Zurich,
Switzerland, 1995.
[ bib |
DOI ]
Generalized association rules use a taxonomy (is-a hierarchy) on items. The paper introduces R-interesting rules as rules with a support which is R-times higher than the support of its closest ancestor (a rule with at leased on item generalized). Algorithms that use R-interesting in addition to support and confidence are presented and evaluated. Keywords: generalized |
[13] |
Jean-Marc Bernard and Camilo Charron.
L'analyse implicative bayésienne : une méthode pour l'étude des
dépendances orientées. 2. modele logique sur un tableau de contingence.
Mathématiques et sciences humaines, 134:5--18, 1996.
[ bib |
DOI ]
Introduces Varying Rates Liaison. Keywords: measure |
[14] |
Usama M. Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth.
From Data Mining to Knowledge Discovery: An Overview, pages
1--36.
MIT Press, Cambridge, MA, 1996.
[ bib ]
Introduction to the KDD process. Keywords: kdd |
[15] |
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, and Takeshi Tokuyama.
Mining optimized association rules for numeric attributes.
In PODS '96 Proceedings of the fifteenth ACM
SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages
182--191. ACM Press, 1996.
[ bib |
DOI ]
Finds appropriate ranges for quantitative attributes automatically by maximizing the support on the condition that the confidence ratio is at least a given threshold value or by maximizing the confidence ratio on the condition that the support is at least a given threshold number. The paper also introduces the measure gain: gain(R) = sup(R) - minConf * sup(lhs(R)) = sup(R) * (conf(R) - minConf). Keywords: quantitative |
[16] |
Régis Gras, Saddo Ag Almouloud, Marc Bailleul, Annie Larher, Maria Polo,
Harrisson Ratsimba-Rajohn, and André Totohasina.
L'implication statistique, nouvelle méthode exploratoire de
données.
ARDM, 01 1996.
[ bib ]
Introduces the implication index. Keywords: measure |
[17] |
Heikki Mannila and Hannu Toivonen.
Multiple uses of frequent sets and condensed representations.
In Proceedings of the Second International Conference on
Knowledge Discovery and Data Mining (KDD-96), pages 189--194. AAAI Press,
1996.
[ bib ]
Introduces general rules with disjunctions and negations in the antecedent and the consequent. The confidence of any such rules can be approximated by using the support of frequent itemsets only (applying the inclusion-exclusion principle). Using the negative border, an error bound for the estimates can be calculated. The authors also show that frequent itemsets with a support of epsilon are a concise representation (epsilon-adequate representation) which can approximate the confidence of any itemset with an error of at most epsilon. Keywords: concise |
[18] |
Hannu Toivonen.
Sampling large databases for association rules.
In VLDB '96: Proceedings of the 22th International Conference on
Very Large Data Bases, pages 134--145, San Francisco, CA, USA, 1996. Morgan
Kaufmann Publishers Inc.
[ bib ]
Find frequent itemsets in a random sample of a database (that fits into main memory) and then verify the found frequent itemsets in the database. Keywords: algorithm,sampling |
[19] |
Brian Lent, Arun N. Swami, and Jennifer Widom.
Clustering association rules.
In Proceedings of the Thirteenth International Conference on
Data Engineering, April 7--11, 1997 Birmingham U.K., pages 220--231. IEEE
Computer Society, 1997.
[ bib ]
Join adjacent intervals for quantitative association rules to produce more general rules. Keywords: clustering,quantitative |
[20] |
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Discovery of frequent episodes in event sequences.
Data Mining and Knowledge Discovery, 1(3):259--289, 1997.
[ bib ]
Keywords: sequential |
[21] |
Ramakrishnan Srikant, Quoc Vu, and Rakesh Agrawal.
Mining association rules with item constraints.
In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy
Uthurusamy, editors, Proceedings of the 3rd International Conference
Knowledge Discovery and Data Mining (KDD-97), pages 67--73. AAAI Press,
1997.
[ bib ]
Integrates BOOLEAN CONSTRAINTS on items (absence, presence) into the mining algorithm to reduce the search space. Algorithms are discussed. Keywords: constraint |
[22] |
Mohammed Javeed Zaki, Srinivasan Parthasarathy, Wei Li, and Mitsunori Ogihara.
Evaluation of sampling for data mining of association rules.
In Proceedings of the 7th International Workshop on Research
Issues in Data Engineering (RIDE '97) High Performance Database Management
for Large-Scale Applications, pages 42--50. IEEE Computer Society, 1997.
[ bib ]
Evaluates random sampling with replacement as presented in Manila et al. 1994 using several datasets. The experiments show that Chernoff bounds overestimate the needed sample size and that sampling seems an effective tool for practical purposes. Keywords: sampling |
[23] |
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur.
Dynamic itemset counting and implication rules for market basket
data.
In SIGMOD 1997, Proceedings ACM SIGMOD International Conference
on Management of Data, pages 255--264, Tucson, Arizona, USA, May 1997.
[ bib |
DOI ]
Introduces CONVICTION (as an improvement to confidence based on implication rules) and INTEREST (later called LIFT). Keywords: measure |
[24] |
Sergey Brin, Rajeev Motwani, and Craig Silverstein.
Beyond market baskets: Generalizing association rules to
correlations.
In SIGMOD 1997, Proceedings ACM SIGMOD International Conference
on Management of Data, pages 265--276, Tucson, Arizona, USA, May 1997.
[ bib |
DOI ]
Proposes to use the chi-square test for correlation. For an itemset of length l, the test is carried out on a l-dimensional contingency tables. A problem is cells with low counts and multiple tests. Keywords: no-support |
[25] |
Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li.
New algorithms for fast discovery of association rules.
Technical Report 651, Computer Science Department, University of
Rochester, Rochester, NY 14627, July 1997.
[ bib ]
Quickly identify MAXIMAL FREQUENT ITEMSETS (a frequent itemset is maximal if it is no proper subset of any other frequent itemset) using different database layout schemes (regular, inverted) and clustering techniques (equivalence class ECLAT, max. clique). See also Zaki 2000, Scalable Algorithms for Association Mining. Keywords: maximal |
[26] |
C. C. Aggarwal and P. S. Yu.
A new framework for itemset generation.
In PODS 98, Symposium on Principles of Database Systems, pages
18--24, Seattle, WA, USA, 1998.
[ bib |
DOI ]
Points out weaknesses of the large frequent itemset method using support (spuriousness, dense datasets) and that lift gives only values close to one for items which are very frequent, even if they are perfectly positive correlated. COLLECTIVE STRENGTH is introduced. Collective strength uses the violation rate for an itemset which is the fraction of transactions which contains some, but not all items of the itemset. The violation rate is compared to the expected violation rate under independence. Collective strength is downward closed. Keywords: measure |
[27] |
Bing Liu, Wynne Hsu, and Yiming Ma.
Integrating classification and association rule mining.
In Proceedings of the 4rd International Conference Knowledge
Discovery and Data Mining (KDD-98), pages 80--86. AAAI Press, 1998.
[ bib ]
Mines only the subset of association rules with the classification class attribute in the right-hand-site (CARs). From these CARs a classifier is built by using the rules with the highest confidence to cover the whole database. The presented alorithm is called Classification Based on Associations (CBA). In ecperiment the resulting classifiers are more accurate than C4.5. Keywords: classification |
[28] |
Nimrod Megiddo and Ramakrishnan Srikant.
Discovering predictive association rules.
In Rakesh Agrawal, Paul E. Stolorz, and Gregory Piatetsky-Shapiro,
editors, Proceedings of the Fourth International Conference on Knowledge
Discovery and Data Mining (KDD-98), pages 274--278. AAAI Press, 1998.
[ bib ]
Introduces several STATISTICAL TESTS: Test if the observed support count is sig. greater than a support threshold, Chi-square test of independence (see also Brin et al. 1997). Also deals with the Bonferroni effect (multiple-comparison problem) by finding an upper bound of the number of tested hypotheses and proposing a resampling procedure using an independence model. The paper introduced confidence intervals for support and confidence. Finally, the authors find that the support-confidence framework does a good job to eliminate statistically insignificant rules (on market basket data). Keywords: theory |
[29] |
Raymond T. Ng, Laks V.S. Lakshmanan, Jiawei Han, and Alex Pang.
Exploratory mining and pruning optimizations of constrained
associations rules.
In Proceedings of the ACM SIGMOD Conference, Seattle, WA, pages
13--24, 1998.
[ bib |
DOI ]
Characterizes various constraints (contains, minimum, maximum, count, sum, avg) according to anti-monotonicity and succinctness. Anti-monotonicity is the property which allows iterative pruning (generate and test candidates) used e.g., on support by Apriori. Succinctness is a property that enables us to generate only those itemsets which satisfy the constraint without the need to test them. Keywords: constraint |
[30] |
Craig Silverstein, Sergey Brin, and Rajeev Motwani.
Beyond market baskets: Generalizing association rules to dependence
rules.
Data Mining and Knowledge Discovery, 2:39--68, 1998.
[ bib ]
Journal version of Brin et al. (1997). Keywords: no-support |
[31] |
M. J. Zaki and M. Ogihara.
Theoretical foundation of association rules.
In SIGMOD'98 Workshop on Research Issues in Data Mining and
Knowledge Discovery (SIGMOD-DMKD'98), Seattle, Friday, June 5, 1998, 1998.
[ bib ]
Presents the lattice-theoretic foundations of mining associations based on FORMAL CONCEPT ANALYSIS and shows that frequent itemsets are determined by the set of frequent concepts. The paper studies the generation of a minimal set of rules (called base) can be generated from which all other association rules can be inferred. The paper also presents some complexity considerations using the connection between frequent itemsets and maximal bipartite cliques. It is shown that for very sparse databases association rule algorithms should scale linearly in the number of items. Keywords: theory |
[32] |
Robert J. Bayardo Jr. and Rakesh Agrawal.
Mining the most interesting rules.
In Proceedings of the fifth ACM SIGKDD international conference
on Knowledge discovery and data mining (KDD-99), pages 145--154. ACM Press,
1999.
[ bib |
DOI ]
Shows that for all rules with the same antecedent, the best (optimal, most interesting) rules according to measures as confidence, support, gain, chi-square value, gini, entropy gain, laplace, lift, conviction all must reside along a support/confidence border. The paper also shows that many measures are monotone functions of support and confidence. Keywords: theory |
[33] |
Jinyan Li, Xiuzhen Zhang, Guozho Dong, Kotagiri Ramamohanarao, and Qun Sun.
Efficient mining of high confidence association rules without support
thresholds.
In J. Zytkow and J. Rauch, editors, Principles of Data Mining
and Knowledge Discovery PKDD'99, LNAI 1704, Prague, Czech Republic, pages
406--411. Springer-Verlag, 1999.
[ bib |
DOI ]
This paper used JUMPING EMERGING PATTERNS to mine a border for top rules (rules with 100% confidence) for a given consequent. The drawbacks are that only one consequent is mined at a time and that finding rules with other than 100% confidence is difficult. Keywords: no-support |
[34] |
Bing Liu, Wynne Hsu, and Yiming Ma.
Mining association rules with multiple minimum supports.
In Proceedings of the fifth ACM SIGKDD international conference
on Knowledge discovery and data mining (KDD-99), pages 337--341. ACM Press,
1999.
[ bib |
DOI ]
Adapts APRIORI to work with different minimum support thresholds assigned to different items (minimum item supports, MIS). To preserve the downward closure property of support item sorting using the MIS values is used. Keywords: var-support |
[35] |
Bing Liu, Wynne Hsu, and Yiming Ma.
Pruning and summarizing the discovered associations.
In Proceedings of the fifth ACM SIGKDD international conference
on Knowledge discovery and data mining (KDD-99), pages 125--134. ACM Press,
1999.
[ bib |
DOI ]
Remove insignificant rules using the chi-square test to test for correlation between the antecedent and the confident of a rule. Also DIRECTION SETTING (DS) RULES are introduced. A DS rule has a pos. correlated antecedent and consequent and is not built from a rule with a shorter antecedent which is a DS rule. Normally, only a small and concise fraction of rules are DS rules. Keywords: measures,theory |
[36] |
Nada Lavrač, Peter Flach, and Blaz Zupan.
Rule evaluation measures: A unifying view.
In Sašo Džeroski and Peter Flach, editors, Inductive
Logic Programming, pages 174--185, Berlin, Heidelberg, 1999. Springer Berlin
Heidelberg.
[ bib |
DOI ]
Introduces relative accuracy/gain. Keywords: measure |
[37] |
Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal.
Discovering frequent closed itemsets for association rules.
In Proceeding of the 7th International Conference on Database
Theory, Lecture Notes In Computer Science (LNCS 1540), pages 398--416.
Springer, 1999.
[ bib |
DOI ]
Introduces CLOSED ITEMSETS. An itemset X is closed if no proper super set of X is contained in every transaction in which X is contained. Which means there exists no super set of X with the same support count as X. Keywords: closed |
[38] |
Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal.
Efficient mining of association rules using closed itemset lattices.
Information Systems, 24(1):25--46, 1999.
[ bib |
DOI ]
Present the CLOSE algorithm to mine frequent closed itemsets. Keywords: closed |
[39] |
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhai.
Mining frequent patterns with counting inference.
SIGKDD Explorations, 2(2):66--75, 2000.
[ bib |
DOI ]
Proposes the algorithm PASCAL (a APRIORI optimization) to mine closed and frequent items. This approach uses frequent key-patterns to infer counts of frequent non-key patterns. Keywords: closed |
[40] |
Rakesh C. Agrawal, Charu C. Aggarwal, and V. V. V. Prasad.
Depth first generation of long patterns.
In Proceedings of the ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD-2000), pages 108--118, 2000.
[ bib |
DOI ]
Introduces the algorithm DepthProject which builds a lexicographic tree in a depth first order. Keywords: maximal |
[41] |
Khalil M. Ahmed, Nagwa M. El-Makky, and Yousry Taha.
A note on ”Beyond market baskets: Generalizing association rules
to correlations”.
SIGKDD Explorations, 1(2):46--48, 2000.
[ bib ]
A reply to Brin et al. (1997). The authors state that the chi-square test tests the whole contingency table, but for larger than 2x2 tables we want to test dependence for single cells. Keywords: no-support |
[42] |
R. Bayardo, R. Agrawal, and D. Gunopulos.
Constraint-based rule mining in large, dense databases.
Data Mining and Knowledge Discovery, 4(2/3):217--240, 2000.
[ bib ]
Introduces the MINIMUM IMPROVEMENT constraint for confidence (mine only rules with a confidence which is minimp greater than the confidence of any of its proper subset-rules). DenseMiner, an algorithm that enforces minimum support, minimum confidence and minimum improvement already during a breadth-first search for all rules for a given consequent C is presented. Keywords: constraint |
[43] |
Alex A. Freitas.
Understanding the crucial differences between classification and
discovery of association rules -- a position paper.
SIGKDD Explorations, 2(1):65--69, 2000.
[ bib |
DOI ]
Keywords: classification |
[44] |
Jochen Hipp, Ulrich Güntzer, and Gholamreza Nakhaeizadeh.
Algorithms for association rule mining -- A general survey and
comparison.
SIGKDD Explorations, 2(2):1--58, 2000.
[ bib |
DOI ]
Describes the fundamentals of association rule mining and presents an systematization of existing algorithms. Keywords: algorithm |
[45] |
Ron Kohavi, Carla Brodley, Brian Frasca, Llew Mason, and Zijian Zheng.
KDD-Cup 2000 organizers' report: Peeling the onion.
SIGKDD Explorations, 2(2):86--98, 2000.
[ bib |
DOI ]
Introduces also some freely available data sets for algorithm performance evaluation. Keywords: evaluation |
[46] |
Jian Pei, Jiawei Han, and Runying Mao.
CLOSET: An efficient algorithm for mining frequent closed
itemsets.
In ACM SIGMOD Workshop on Research Issues in Data Mining and
Knowledge Discovery, 2000.
[ bib ]
Introduces the algorithm CLOSET which mines frequent closed itemsets using FP-growth (a depth-first search using support counting). Keywords: closed |
[47] |
Craig Silverstein, Sergey Brin, Rajeev Motwani, and Jeffrey D. Ullman.
Scalable techniques for mining causal structures.
Data Mining and Knowledge Discovery, 4(2/3):163--192, 2000.
[ bib ]
Explores the applicability of constraint-based causal discovery (known from Bayesian learning) to discover causal relationships in market basket data. Keywords: causal |
[48] | Geoffrey I. Webb. Efficient search for association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, United States, August 20 -- 23, 2000, pages 99--107, 2000. [ bib | DOI ] |
[49] |
Rüdiger Wirth and Jochen Hipp.
Crisp-dm: Towards a standard process model for data mining.
In Proceedings of the 4th International Conference on the
Practical Applications of Knowledge Discovery and Data Mining, Manchester,
UK, April 2000.
[ bib ]
Keywords: kdd |
[50] |
Mohammed J. Zaki.
Scalable algorithms for association mining.
IEEE Transactions on Knowledge and Data Engineering,
12(3):372--390, May/June 2000.
[ bib |
DOI ]
Introduces six new algorithms combining several features (database format, the decomposition technique, and the search procedure). Includes Eclat (Equivalence CLAss Transformation), MaxEclat, Clique, MaxClique, TopDown, and AprClique. ECLAT is a well known depth-first search algorithm using set intersection. Keywords: algorithm |
[51] |
Jean-Marc Adamo.
Data Mining for Association Rules and Sequential Patterns.
Springer, New York, 2001.
[ bib |
DOI ]
Introduction to association rules and mining sequential patterns. Keywords: sequential |
[52] |
Stephen D. Bay and Michael J. Pazzani.
Detecting group differences: Mining contrast sets.
Data Mining and Knowledge Discovery, 5(3):213--246, 2001.
[ bib ]
Finds sets with substantially different support in different groups. Uses interest based pruning and statistical surprise for filtering (summarizing) contrast sets. The search error is controlled using different (Bonferroni) correction for sets of different size. Keywords: theory |
[53] |
Dario Bruzzese and Cristina Davino.
Pruning of discovered association rules.
Computational Statistics, 16:387--398, 2001.
[ bib ]
The authors construct several statistical tests to evaluate the significance of discovered associations. Keywords: measure |
[54] |
Douglas Burdick, Manuel Calimlim, and Johannes Gehrke.
MAFIA: A maximal frequent itemset algorithm for
transactional databases.
In Proceedings of the 17th International Conference on Data
Engineering, pages 443--452, Washington, DC, 2001. IEEE Computer Society.
[ bib ]
MAFIA (MAximal Frequent Itemset Algorithm) finds maximal itemsets using a depth-first traversal of the itemset lattice, a compressed vertical bitmap representation of the database, additional pruning techniques (Parent Equivalence Pruning, Frequent Head Union Tail pruning) and dynamic reordering. The authors claim that MAFIA outperforms DepthProject (Agrawal et al., 2001) by a factor of 3 to 5 on average. Keywords: maximal |
[55] |
Igor V. Cadez, Padhraic Smyth, and Heikki Mannila.
Probabilistic modeling of transaction data with applications to
profiling, visualization, and prediction.
In F. Provost and R. Srikant, editors, Proceedings of the ACM
SIGKDD Intentional Conference on Knowledge Discovery in Databases and Data
Mining (KDD-01), pages 37--45. ACM Press, 2001.
[ bib |
DOI ]
The authors construct a model (profile with weights) for each individual's behavior as a mixture of several components. This mixture provides the probabilities for a multinomial probability model (each item has a constant probability to be chosen for a transaction). Finally, the authors compare several estimation methods and model variants empirically using store choice data. Keywords: evaluation |
[56] |
Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk,
Rajeev Motwani, Jeffrey D. Ullman, and Cheng Yang.
Finding interesting associations without support pruning.
IEEE Transactions on Knowledge and Data Engineering,
13(1):64--78, 2001.
[ bib |
DOI ]
Uses similarity measures between hashed values of rows in a transaction database. The approach in the paper was only shown for associations between two items. Keywords: no-support |
[57] |
William DuMouchel and Daryl Pregibon.
Empirical Bayes screening for multi-item associations.
In F. Provost and R. Srikant, editors, Proceedings of the ACM
SIGKDD Intentional Conference on Knowledge Discovery in Databases and Data
Mining (KDD-01), pages 67--76. ACM Press, 2001.
[ bib |
DOI ]
Search for unusually frequent itemsets using statistical methods. First, the authors propose stratification of the data to avoid finding spurious associations within strata. Then the deviation of the observed frequency over a baseline frequency (based on independence) is used. Since the deviation is unreliable for low counts, an empirical Bayes model (its 95% confidence limit) is used to produce a posterior distribution of the true ratio of actual to baseline frequencies. The Bayes model gives ratios close to the observed ratios for large samples and reduces (shrinks) the ratio if the sample size gets small (to smooth away noise). For multi-item associations log-linear models are proposed to find higher order associations which cannot be explained by pairwise associations. Keywords: no-support,theory |
[58] |
Heike Hofmann and Adalbert F. X. Wilhelm.
Visual comparison of association rules.
Comput. Stat., 16(3):399--415, 2001.
[ bib |
DOI ]
Introduces difference of confidence. Keywords: measure |
[59] |
Yves Kodratoff.
Comparing machine learning and knowledge discovery in databases: An
application to knowledge discovery in texts.
In Georgios Paliouras, Vangelis Karkaletsis, and Constantine D.
Spyropoulos, editors, Machine Learning and Its Applications, Advanced
Lectures, volume 2049 of Lecture Notes in Computer Science, pages
1--21. Springer, 2001.
[ bib |
DOI ]
Introduces casual support and casual confidence informed by negatives. Keywords: measure |
[60] |
Jian Pei, Jiawei Han, and Laks V.S. Lakshmanan.
Mining frequent itemsets with convertible constraints.
In Proceedings of the 17th International Conference on Data
Engineering, April 2--6, 2001, Heidelberg, Germany, pages 433--442, 2001.
[ bib ]
Develops a technique of how constraints on avg, median and sum can be converted so that they can be used already during the search phase of the FP-growth algorithm. The constraints are classified into constraints that are: convertible anti-monotone, convertible monotone and strongly convertible. Keywords: constraint |
[61] |
Masakazu Seno and George Karypis.
Lpminer: An algorithm for finding frequent itemsets using length
decreasing support constraint.
In Nick Cercone, Tsau Young Lin, and Xindong Wu, editors,
Proceedings of the 2001 IEEE International Conference on Data Mining, 29
November -- 2 December 2001, San Jose, California, USA, pages 505--512. IEEE
Computer Society, 2001.
[ bib ]
To find longer frequent itemsets, the minimal support requirement decreases as a function of the itemset length. A algorithm based on the FP-tree is presented and a property called small valid extension (SVE) is introduced which makes mining efficient in absence of downward closure. Keywords: var-support |
[62] |
Hee Seok Song, Soung Hie Kim, and Jae Kyeong Kim.
A methodology for detecting the change of customer behavior based on
association rule mining.
In Proceedings of the Pacific Asia Conference on Information
System, pages 871--885. PACIS, 2001.
[ bib ]
Develops a methodology to detect changes of customer behavior automatically by comparing association rules between different time snapshots of data. Defines emerging pattern, unexpected change and the added/perished rule based on similarity and difference measures for rule matching. Keywords: changing |
[63] |
Ke Wang, Yu He, and David W. Cheung.
Mining confident rules without support requirement.
In Proceedings of the tenth international conference on
Information and knowledge management, pages 89 -- 96, New York, NY, 2001.
ACM Press.
[ bib |
DOI ]
The paper shows that for data with categorical attributes a UNIVERSAL-EXISTENTIAL UPWARD CLOSURE exists for confidence. With this property algorithms with confidence-based pruning are possible that use a level-wise (from k to k-1) candidate generation are. The paper also discusses a disk-based implementation. Keywords: no-support |
[64] |
Zijian Zheng, Ron Kohavi, and Llew Mason.
Real world performance of association rule algorithms.
In F. Provost and R. Srikant, editors, Proceedings of the ACM
SIGKDD Intentional Conference on Knowledge Discovery in Databases and Data
Mining (KDD-01), pages 401--406. ACM Press, 2001.
[ bib |
DOI ]
Compares the performance of association rule algorithms (APRIORI, CHARM, FP-growth, CLOSET, MagnumOpus) using one IBM-Artificial dataset and three real-world e-commerce datasets. It shows that some improvements demonstrated on artificial datasets do not carry over to real-world datasets. Keywords: evaluation |
[65] |
Yingjiu Li, Peng Ning, X. Sean Wang, and Sushil Jajodia.
Generating market basket data with temporal information.
In ACM KDD Workshop on Temporal Data Mining, August 2001.
[ bib ]
Develop a generator for synthetic data with temporal patterns based on the generator by Agrawal and Srikan (1994). Keywords: sequential,evaluation |
[66] |
Charu C. Aggarwal, Cecilia Magdalena Procopiuc, and Philip S. Yu.
Finding localized associations in market basket data.
Knowledge and Data Engineering, 14(1):51--62, 2002.
[ bib |
DOI ]
Proposes to cluster transactions using a similarity measure based on the new affinity measure (measures similarity between pairs of items). Then mine association rules in the identified clusters. Keywords: sampling |
[67] |
J. Azé and Y. Kodratoff.
Evaluation de la résistance au bruit de quelques mesures
d’extraction de règles d’assocation.
In D. Hérin and D.A. Zighed, editors, Extraction des
connaissances et apprentissage, volume 1, pages 143--154. Hermes, 2002.
[ bib ]
Introduces least contradiction. Keywords: measure |
[68] |
Christian Borgelt and Rudolf Kruse.
Induction of association rules: Apriori implementation.
In 15th Conference on Computational Statistics (Compstat 2002),
Heidelberg, Germany, 2002. Physica Verlag.
[ bib |
DOI ]
An efficient implementation of APRIORI. Keywords: implementation |
[69] |
Toon Calders and Bart Goethals.
Mining all non-derivable frequent itemsets.
In Tapio Elomaa, Heikki Mannila, and Hannu Toivonen, editors,
Proceedings of the 6th European Conference on Principles of Data Mining and
Knowledge Discovery, volume 2431 of Lecture Notes in Computer Science,
pages 74--85. Springer-Verlag, 2002.
[ bib |
DOI |
arXiv ]
Introduce NON-DERIVABLE ITEMSETS (NDIs). The support of all frequent NDIs allows for computing the support of all frequent itemsets using deduction rules based on the inclusion-exclusion principle. Keywords: concise |
[70] |
F. Galiano, I. J. Blanco, D. Sánchez, and M. Vila.
Measuring the accuracy and interest of association rules: A new
framework.
Intell. Data Anal., 6:221--235, 2002.
[ bib |
DOI ]
Introduces casual support and casual confidence informed by negatives. Keywords: measure |
[71] |
Gerd Stumme, Rafik Taouil, Yves Bastide, Nicolas Pasquier, and Lotfi Lakhal.
Computing iceberg concept lattices with titanic.
Data & Knowledge Engineering, 42(2):189--222, 2002.
[ bib |
DOI ]
The paper shows how iceberg concept lattices can be used as a condensed method to represent and visualize frequent (closed) itemsets. Iceberg concept lattices only show the top-most part of a concept lattices (known from Formal Concept Analysis). To compute iceberg concept lattices the algorithm TITANIC is presented which computes closed sets (a closure system) in a level-wise approach using weights (e.g., support), equivalence classes and key sets (minimal sets in an equivalence class). TITANIC is compared experimentally to Next-Closure and performs better. PASCAL (Bastide et al. 2000) is a modified version of TITANIC to mine all frequent itemsets. Keywords: theory |
[72] |
Mohammed J. Zaki and Ching-Jiu Hsiao.
CHARM: An efficient algorithm for closed itemset mining.
In Proceedings of the Second SIAM International Conference on
Data Mining, Arlington, VA, 2002. SIAM.
[ bib |
DOI ]
The algorithm CHARM enumerates all frequent closed itemsets and uses a number of improvements: (a) It uses a IT-tree (itemset-tidset tree based on equivalence classes) to search simultaneously the itemset space and the transaction space. (b) It uses a fast hash-based elimination of non-closed itemsets. (c) It uses diffsets which represents the database in a compact way which should fit into main memory. (d) It uses efficient intersection operations. The performance testing shows that CHARM can provide significant improvement over algorithms as Apriori, Close, Pascal, Mafia, and Closet. Keywords: closed |
[73] |
Y. Aumann and Y. Lindell.
Statistical theory for quantitative association rules.
Journal of Intelligent Information Systems, 20(3):255--283,
2003.
[ bib ]
Defines QUANTITATIVE ASSOCIATION RULES using statistical measures (e.g., mean and variance) of continuous data. Also algorithms are discussed. Keywords: quantitative |
[74] |
Brock Barber and Howard J. Hamilton.
Extracting share frequent itemsets with infrequent subsets.
Data Mining and Knowledge Discovery, 7:153--185, 2003.
[ bib ]
ITEMSET SHARE is the fraction of some measure (e.g., sales, profit) contributed by the items in the set. A itemset is share frequent if it exceeds a threshold. Share frequency is not downward closed! The article presents several algorithms and heuristics to mine share frequent itemsets. Keywords: measure |
[75] |
Jean-Francois Boulicaut, Artur Bykowski, and Christophe Rigotti.
Free-sets: A condensed representation of boolean data for the
approximation of frequency queries.
Data Mining and Knowledge Discovery, 7(1):5--22, 2003.
[ bib ]
Presents a new epsilon-adequate representation for frequent itemsets called frequent FREE-SETS. An itemset is a free-set if it has no subset with (almost) the same support thus the items in the itemset cannot be used to form a (nearly) exact rule. Keywords: concise |
[76] |
Edward R. Omiecinski.
Alternative interest measures for mining associations in databases.
IEEE Transactions on Knowledge and Data Engineering,
15(1):57--69, Jan/Feb 2003.
[ bib |
DOI ]
Omiecinski introduced several alternatives to support. The first measure, ANY-CONFIDENCE, is defined as the confidence of the rule with the largest confidence which can be generated from an itemset. The author states that although finding all itemsets with a set any-confidence would enable us to find all rules with a given minimum confidence, any-confidence cannot be used efficiently as a measure of interestingness since confidence is not downward closed. The second introduced measure is ALL-CONFIDENCE. This measure is defined as the smallest confidence of all rules which can be produced from an itemset, i.e., all rules produced form an itemset will have a confidence greater or equal to its all-confidence value. BOND, the last measure, is defined as the ratio of the number of transactions which contain all items of an itemset to the number of transactions which contain at least one of these items. Omiecinski showed that bond and all-confidence are downward closed and, therefore, can be used for efficient mining algorithms. Keywords: no-support |
[77] |
Dmitry Pavlov, Heikki Mannila, and Padhraic Smyth.
Beyond independence: Probabilistic models for query approximation on
binary transaction data.
IEEE Transactions on Knowledge and Data Engineering,
15(6):1409--1421, 2003.
[ bib |
DOI ]
Investigates the use of probabilistic models (independence model, pair-wise interactions stored in a Chow-Liu Tree, mixtures of independence models, itemset inclusion-exclusion model, and the maximum entropy method) for the problem of generating fast approx. answers to queries for large sparse binary data sets. Keywords: theory |
[78] |
Ganesh Ramesh, William A. Maniatty, and Mohammed J. Zaki.
Feasible itemset distributions in data mining: theory and
application.
In Symposium on Principles of Database Systems, PODS 2003, San
Diego, CA, USA, 2003. ACM Press.
[ bib |
DOI ]
Studies the length distributions of frequent and frequent maximal itemsets (the number of frequent itemsets with the same length). The length distribution determines the algorithms performance and is important to generate realistic synthetic datasets. Keywords: theory |
[79] |
Sam Y. Sung, Zhao Li, Chew L. Tan, and Peter A. Ng.
Forecasting association rules using existing data sets.
IEEE Transactions on Knowledge and Data Engineering,
15(6):1448--1459, Nov/Dec 2003.
[ bib |
DOI ]
Resample datasets proportional to background attributes (e.g., distribution of customers' sex) to forecast rules in a new situation (e.g., a new store at a new location). Keywords: sampling |
[80] |
Feng Tao, Fionn Murtagh, and Mohsen Farid.
Weighted association rule mining using weighted support and
significance framework.
In Proceedings of The Ninth ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD-2003), Washington, DC, 2003. ACM
Press.
[ bib |
DOI ]
Uses attributes of the items (e.g., price, page dwelling time) to WEIGHT SUPPORT. A support and significance framework is presented which possesses a weighted downward closure property important for pruning the search space. Keywords: var-support |
[81] |
Jaakko Hollmén, Jouni K. Seppänen, and Heikki Mannila.
Mixture models and frequent sets: Combining global and local methods
for 0--1 data.
In SIAM International Conference on Data Mining (SDM'03), San
Fransisco, May 2003.
[ bib |
DOI ]
Clusters binary data first using the EM-algorithms (looks like LCA; Cadez et al. (2001) seem to do the same to find profiles). Then the authors mine frequent itemsets in each cluster. Finally, they use the maximum entropy technique to obtain local models from the frequent itemsets and combine these models to approximate the joint distribution. Keywords: clustering |
[82] |
Christian Borgelt.
Efficient implementations of apriori and eclat.
In Bart Goethals and Mohammed J. Zaki, editors, Proceedings of
the IEEE ICDM Workshop on Frequent Itemset Mining Implementations,
Melbourne, FL, USA, November 2003.
[ bib ]
Discusses the efficient implementation of APRIORI (with prefix tree) and ECLAT. Keywords: implementation |
[83] |
Salvatore Orlando, Claudio Lucchese, Paolo Palmerini, Raffaele Perego, and
Fabrizio Silvestri.
kdci: a multi-strategy algorithm for mining frequent sets.
In Bart Goethals and Mohammed J. Zaki, editors, FIMI'03:
Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining
Implementations, November 2003.
[ bib ]
Introduces the kDCI algorithm. Keywords: algorithm,implementation |
[84] |
Hui Xiong, Pang-Ning Tan, and Vipin Kumar.
Mining strong affinity association patterns in data sets with skewed
support distribution.
In Bart Goethals and Mohammed J. Zaki, editors, Proceedings of
the IEEE International Conference on Data Mining, November 19--22, 2003,
Melbourne, Florida, pages 387--394, November 2003.
[ bib ]
Support-based pruning strategies are not effective for data sets with skewed support distributions. The authors propose the concept of hyperclique pattern, which uses an objective measure called h-confidence (equal to all-confidence by Omiecinski, 2003) to identify strong affinity patterns. The generation of so-called cross-support patterns (patterns with items with substantially different support) is avoided by h-confidence's cross-support property. Keywords: no-support |
[85] |
Frans Coenen, Graham Goulbourne, and Paul Leng.
Tree structures for mining association rules.
Data Mining and Knowledge Discovery, 8:25--51, 2004.
[ bib |
DOI ]
Describes how to compute PARTIAL SUPPORT COUNTS in one DB-pass and how to store them in an enumeration tree (P-Tree). Keywords: algorithm |
[86] |
Frans Coenen, Paul Leng, and Shakil Ahmed.
Data structures for association rule mining: T-trees and P-trees.
IEEE Transactions on Knowledge and Data Engineering,
16(6):774--778, 2004.
[ bib |
DOI ]
Describes two new structures for association rule mining: T-trees (total support trees) and P-trees (partial support trees). The T-tree is a data structure (a compressed set enumeration tree) to store itemsets. The P-tree is a compressed way to represent a database in memory for mining. Keywords: implementation |
[87] |
Bart Goethals and Mohammed J. Zaki.
Advances in frequent itemset mining implementations: Report on
FIMI'03.
SIGKDD Explorations, 6(1):109--117, 2004.
[ bib |
DOI ]
This paper reports on the performance of different frequent itemset mining implementations on several real-world and artificial databases. The authors conclude that the latest algorithms (patricia, kdci, fpclose, fpmax*) outperform older ones but that currently no tested algorithm gracefully scales up to very large databases with millions of transactions. Keywords: implementation |
[88] |
Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao.
Mining frequent patterns without candidate generation.
Data Mining and Knowledge Discovery, 8:53--87, 2004.
[ bib ]
Describes the data mining method FP-growth (frequent pattern growth) which uses an extended prefix-tree (FP-tree) structure to store the database in a compressed form. FP-growth adopts a divide-and-conquer approach to decompose both the mining tasks and the databases. It uses a pattern fragment growth method to avoid the costly process of candidate generation and testing. Keywords: algorithm |
[89] |
CL Sistrom and CW Garvan.
Proportions, odds, and risk.
Radiology, 230(1):12--19, 2004.
[ bib |
DOI ]
Introduces relative risk. Keywords: measure |
[90] |
Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava.
Selecting the right objective measure for association analysis.
Information Systems, 29(4):293--313, 2004.
[ bib |
DOI ]
Compare the properties of 21 objective measures (of interest). The measures in general lack to agree with each other. However, the authors show that if support-based pruning or table standardization (of the contingency tables) is used, the measures become highly correlated. Keywords: measures |
[91] |
Guizhen Yang.
The complexity of mining maximal frequent itemsets and maximal
frequent patterns.
In Proceedings of the 2004 ACM SIGKDD international conference
on Knowledge discovery and data mining, Seattle, WA, USA, 2004. ACM Press.
[ bib |
DOI ]
Shows that enumerating all maximal frequent itemsets is NP-hard and the associated counting problem is #P-complete. Keywords: theory |
[92] |
Mohammed Zaki.
Mining non-redundant association rules.
Data Mining and Knowledge Discovery, 9:223--248, 2004.
[ bib |
DOI ]
Compares frequent itemsets and frequent closed itemsets and shows that frequent closed itemsets can be used to generate NON-REDUNDANT association rules. Non-Redundant rules are a set of rules with the most general rules (smallest antecedent and consequent) without loss of information. Keywords: closed |
[93] |
Julien Blanchard, Fabrice Guillet, Henri Briand, and Regis Gras.
Assessing rule interestingness with a probabilistic measure of
deviation from equilibrium.
In Proceedings of the 11th international symposium on Applied
Stochastic Models and Data Analysis ASMDA-2005, pages 191--200. ENST, 2005.
[ bib ]
Presents a statistical test for the deviation from the equilibrium of a rule. The equilibrium for rule a -> b is defined as: the number of transactions which contain a and b together is equal to the number of transactions which contain a and not b. Keywords: measure |
[94] |
Francesco Bonchi, Fosca Giannotti, Alessio Mazzanti, and Dino Pedreschi.
ExAnte: A preprocessing method for frequent-pattern mining.
IEEE Intelligent Systems, 20(3):25--31, 2005.
[ bib |
DOI ]
Reduces the database size before mining by iteratively applying mu-reduction and alpha-reduction. Mu-reduction removes transactions which do not meet monotone constraints. Alpha-reduction remove infrequent items from the transactions. Keywords: constraint |
[95] |
Karam Gouda and Mohammed J. Zaki.
GenMax: An efficient algorithm for mining maximal frequent
itemsets.
Data Mining and Knowledge Discovery, 11:1--20, 2005.
[ bib |
DOI ]
Presents a backtrack search based algorithm for mining maximal frequent itemsets. Uses: progressive focusing for maximality checking, and diffset propagation for frequency computation. Keywords: maximal |
[96] |
Daniel R. Jeske, Behrokh Samadi, Pengyue J. Lin, Lan Ye, Sean Cox, Rui Xiao,
Ted Younglove, Minh Ly, Douglas Holt, and Ryan Rich.
Generation of synthetic data sets for evaluating the accuracy of
knowledge discovery systems.
In Proceeding of the eleventh ACM SIGKDD international
conference on Knowledge discovery in data mining, pages 756--762, New York,
NY, USA, 2005. ACM Press.
[ bib |
DOI ]
Generate synthetic data (e.g., credit card transaction data) for accuracy evaluation using semantic graphs. Keywords: evaluation |
[97] |
Tobias Scheffer.
Finding association rules that trade support optimally against
confidence.
Intelligent Data Analysis, 9(4):381--395, 2005.
[ bib |
DOI ]
Introduces predictive accuracy which is the expected value of the confidence of a rules with respect to the process underlying the database. The author shows how predictive accuracy can be calculated from confidence and support measured on a data set using a Bayesian frequency correction (very simplified: confidence is discounted for rules with low supports). Also an algorithm is presented which finds the top n most predictive association rules (redundant rules with a 0 predictive accuracy improvement are removed) and shows how to estimate the prior distribution needed for the correction. Keywords: theory,measures |
[98] |
Masakazu Seno and George Karypis.
Finding frequent itemsets using length-decreasing support constraint.
Data Mining and Knowledge Discovery, 10:197--228, 2005.
[ bib ]
See Seno and Karypis 2001. Keywords: var-support |
[99] |
Geoffrey I. Webb and Songmao S. Zhang.
k-optimal-rule-discovery.
Data Mining and Knowledge Discovery, 10(1):39--79, 2005.
[ bib |
DOI ]
Develops GRD (based on the OPUS search strategy) which discovers all rules satisfying a set of constraints (max. number of rules, min support, min confidence, max coverage, max leverage) in a depth-first search. (An early draft of the paper was called: Beyond association rules: Generalized rule discovery) Keywords: constraint |
[100] |
Mohammed Zaki and Ching-Jui Hsiao.
Efficient algorithms for mining closed itemsets and their lattice
structure.
IEEE Transactions on Knowledge and Data Engineering,
17(4):462--478, 2005.
[ bib |
DOI ]
Describes the algorithm CHARM. Keywords: closed |
[101] |
Francesco Bonchi and Claudio Lucchese.
On condensed representations of constrained frequent patterns.
Knowledge and Information Systems, 9(2):180--201, 2006.
[ bib ]
Presents an algorithm to efficiently mine closed and constrained frequent itemsets. Keywords: closed,constraint |
[102] |
Liqiang Geng and Howard J. Hamilton.
Interestingness measures for data mining: A survey.
ACM Computing Surveys, 38(3):9, 2006.
[ bib |
DOI ]
Keywords: measures |
[103] |
Jiuyong Li.
On optimal rule discovery.
IEEE Transactions on Knowledge and Data Engineering,
18(4):460--471, 2006.
[ bib |
DOI ]
An optimal rule set (with respect to a metric of interestingness) contains all rules except those with no greater interestingness than one of its more general rules. An optimal rule set is a subset of a nonredundant rule set. The autors present an algorithm called ORD to find an optimal rule set. Classifiers build on optimal class association rules are at least as accurate as those built from CBA and C4.5 rule. Keywords: measures,classification |
[104] |
Geoffrey I. Webb.
Discovering significant rules.
In KDD '06: Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 434--443, New York,
NY, USA, 2006. ACM Press.
[ bib |
DOI ]
Comapares two approaches (the well-known Bonferroni adjustment and a new evaluation using holdout data) to control the experimentwise risk of false discoveries for statistical hypothesis tests. Experimental results indicate that neither of the two approaches dominates the other. Keywords: theory |
[105] |
Toon Calders, Christophe Rigotti, and Jean-Francois Boulicaut.
A survey on condensed representations for frequent sets.
In Jean-Francois Boulicaut, Luc Raedt, and Heikki Mannila, editors,
Constraint-Based Mining and Inductive Databases: European Workshop on
Inductive Databases and Constraint Based Mining, Hinterzarten, Germany, March
11-13, 2004, Revised Selected Papers, volume 3848 of Lecture Notes in
Computer Science, pages 64--80, February 2006.
[ bib |
DOI ]
Keywords: concise |
[106] |
Michael Hahsler.
A model-based frequency constraint for mining associations from
transaction data.
Data Mining and Knowledge Discovery, 13(2):137--166, September
2006.
[ bib |
DOI ]
Develops a novel model-based frequency constraint as an alternative to a single, user-specified minimum support. The constraint utilizes knowledge of the process generating transaction data by applying a simple stochastic mixture model (the NB model) and uses a user-specified precision threshold to find local frequency thresholds for groups of itemsets (NB-frequent itemsets). The new constraint provides improvements over a single minimum support threshold and that the precision threshold is more robust and easier to set and interpret by the user. Keywords: no-support |
[107] |
Jean Diatta, Henri Ralambondrainy, and André Totohasina.
Towards a unifying probabilistic implicative normalized quality
measure for association rules.
In Fabrice J. Guillet and Howard J. Hamilton, editors, Quality
Measures in Data Mining, pages 237--250. Springer Berlin Heidelberg, Berlin,
Heidelberg, 2007.
[ bib |
DOI ]
Introduces the Ralambondrainy measure. Keywords: measure |
[108] |
Michael Hahsler and Kurt Hornik.
New probabilistic interest measures for association rules.
Intelligent Data Analysis, 11(5):437--455, 2007.
[ bib |
DOI |
arXiv |
http |
.pdf ]
Develops the interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The measures are related to Fisher's exact test and show significantly better performance than lift for applications where spurious rules are problematic. Keywords: measures |
[109] |
J. Han, H. Cheng, D. Xin, and X. Yan.
Frequent pattern mining: Current status and future directions.
Data Mining and Knowledge Discovery, 14(1), 2007.
[ bib |
DOI ]
Complete overview of the state-of-the art in frequent patten mining and identifies future research directions. Keywords: algorithm, concise, sequential |
[110] |
Philippe Lenca, Benoît Vaillant, Patrick Meyer, and Stephane Lallich.
Association rule interestingness measures: Experimental and
theoretical studies.
In Fabrice J. Guillet and Howard J. Hamilton, editors, Quality
Measures in Data Mining, pages 51--76. Springer Berlin Heidelberg, Berlin,
Heidelberg, 2007.
[ bib |
DOI ]
Compares interest measures. Keywords: measure |
[111] |
Ron Kenett and Silvia Salini.
Relative linkage disequilibrium: A new measure for association rules.
In Petra Perner, editor, Advances in Data Mining. Medical
Applications, E-Commerce, Marketing, and Theoretical Aspects, pages
189--199, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
[ bib ]
Introduces Relative Linkage Disequilibrium. RLD is an association measure motivated by indices used in population genetics. It evaluates the deviation of the support of the whole rule from the support expected under independence given the supports of X and Y. Keywords: measure |
[112] |
P.D. McNicholas, T.B. Murphy, and M. O’Regan.
Standardising the lift of an association rule.
Computational Statistics & Data Analysis, 52(10):4712--4721,
2008.
[ bib |
DOI ]
Standardized lift uses the minimum and maximum lift that can reach for each rule to standardize lift between 0 and 1. Keywords: measure |
[113] |
Mojdeh Jalali-Heravi and Osmar R. Zaïane.
A study on interestingness measures for associative classifiers.
In Proceedings of the 2010 ACM Symposium on Applied Computing,
SAC '10, pages 1039--1046. ACM, 2010.
[ bib |
DOI ]
Compares associative classifiers using 53 different objective measures for association rules. Keywords: classification |
[114] |
Tianyi Wu, Yuguo Chen, and Jiawei Han.
Re-examination of interestingness measures in pattern mining: a
unified framework.
Data Mining and Knowledge Discovery, January 2010.
[ bib |
DOI ]
Re-examines a set of null-invariant interestingness measures (AllConf, Coherence, Cosine, Kulc, MaxConf) and shows that they can be expressed as the generalized mathematical mean, leading to a total ordering of them. Also proposes a new measure called Imbalance Ratio. Keywords: measure |
[115] |
José L. Balcázar.
Formal and computational properties of the confidence boost of
association rules.
ACM Trans. Knowl. Discov. Data, 7(4), December 2013.
[ bib |
DOI ]
Introduces the measure confidence boost to help to obtain small and crisp sets of mined association rules. Keywords: measure |
[116] |
Jiuyong Li, Jixue Liu, Hannu Toivonen, Kenji Satou, Youqiang Sun, and Bingyu
Sun.
Discovering statistically non-redundant subgroups.
Knowledge Based Systems, 67:315–--327, 2014.
[ bib |
DOI ]
Uses a confidence interval around the rule's odds ratio to define redundant rules. Following this definition, the paper presents an efficient alforithm to mine non-redundant rules. Keywords: measure |
[117] |
Griselda López, Joaquín Abellán, Alfonso Montella, and Juan de Oña.
Patterns of single-vehicle crashes on two-lane rural highways in
granada province, spain: In-depth analysis through decision rules.
Transportation Research Record, 2432(1):133--141, 2014.
[ bib |
DOI ]
Introduces lift increase (LIC). Keywords: measure |
[118] |
Suresh Ochin and Nisheeth Joshi Kumar.
Rule power factor: A new interest measure in associative
classification.
In 6th International Conference On Advances In Computing and
Communications, ICACC 2016, Cochin, India, 2016.
[ bib |
DOI ]
The rule power factor weights the confidence of a rule by its support. Keywords: measure |
[119] |
Michael Hahsler, Christian Buchta, Bettina Gruen, and Kurt Hornik.
arules: Mining Association Rules and Frequent Itemsets, 2023.
R package version 1.7-7.
[ bib |
http ]
R infrastructure for association rule mining. Implements several algorithms and generalizations of measures. Keywords: measure, algorithm |
This file was generated by bibtex2html 1.99.