A Randomization Test for extracting Robust Association RulesReport as inadecuate

A Randomization Test for extracting Robust Association Rules - Download this document for free, or read online. Document in PDF available to download.

* Corresponding author 1 PAROLE - Analysis, perception and recognition of speech INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications

Abstract : An association rule -if A then B- is a link between database property sets A and B. Since this type of rule is not deduced from hypotheses, but found by investigation in data, association rules extraction belongs to Data Mining techniques Han et al. 2001. Presently, more than fifty different measures are used to try to establish the quality of association rules, according to their different semantics. It shows the great variety of links between properties expressed by these rules, but also the difficulty of being sure they are meaningful. To test if an association rule is robust, that is to say to determine if the link it brings out is not due to chance, a Randomization Test Edgington, 1995 is developed. For this, simulations that allow the generation of numerous artificial databases identical to an original database, except for the links between properties, are defined. Only the links which are found in the original database and in less than 5% of the artificial databases are judged statistically significant, with a type I error risk of less than 5% Snedecor et al., 1967, and produce significant association rules. This simulation technique is far more efficient than the acceptance-rejection method and allows the use of the associated randomization test in various databases.

Keywords : statistic data mining quality measure association rule threshold itemset simulation randomization test permutation test

Author: Martine Cadot -

Source: https://hal.archives-ouvertes.fr/


Related documents