Skip Headers
Oracle® Data Mining Concepts
11g Release 1 (11.1)

Part Number B28129-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

10 Apriori

This chapter describes Apriori, the algorithm used by Oracle Data Mining for calculating association rules.

See Also:

Chapter 8, "Association Rules"

This chapter contains the following topics:

Association Rules and Frequent Item Sets

Associations are calculated using the Apriori algorithm. The association mining problem can be decomposed into two subproblems:

The number of frequent itemsets is controlled by the minimum support parameters. The number of rules generated is controlled by the number of frequent itemsets and the confidence parameter. If the confidence parameter is set too high, there may be frequent itemsets in the association model but no rules.

Data Preparation for Association Rules

When Apriori uses equi-width binning, outliers cause most of the data to concentrate in a few bins, sometimes a single bin. As a result, the discriminating power of these algorithms can be significantly reduced.

Similarly, an association model might have all the values of a numerical attribute concentrated in a single bin, except for one value (the outlier) that belongs to a different bin. If, for example, this attribute is income, there will not be any rules reflecting different levels of income. All rules containing income will only reflect the range in the single bin; this range is basically the income range for the whole population.

Similarly, an association model might have all the values of a numerical attribute concentrated in a single bin, except for one value (the outlier) that belongs to a different bin. If, for example, this attribute is income, there will not be any rules reflecting different levels of income. All rules containing income will only reflect the range in the single bin; this range is basically the income range for the whole population.