New probabilistic interest measures for association rules - Computer Science > DatabasesReport as inadecuate

New probabilistic interest measures for association rules - Computer Science > Databases - Download this document for free, or read online. Document in PDF available to download.

Abstract: Mining association rules is an important technique for discovering meaningfulpatterns in transaction databases. Many different measures of interestingnesshave been proposed for association rules. However, these measures fail to takethe probabilistic properties of the mined data into account. In this paper, westart with presenting a simple probabilistic framework for transaction datawhich can be used to simulate transaction data when no associations arepresent. We use such data and a real-world database from a grocery outlet toexplore the behavior of confidence and lift, two popular interest measures usedfor rule mining. The results show that confidence is systematically influencedby the frequency of the items in the left hand side of rules and that liftperforms poorly to filter random noise in transaction data. Based on theprobabilistic framework we develop two new interest measures, hyper-lift andhyper-confidence, which can be used to filter or order mined association rules.The new measures show significantly better performance than lift forapplications where spurious rules are problematic.

Author: Michael Hahsler, Kurt Hornik


Related documents