Skip Headers
Oracle® Data Mining Concepts
11g Release 1 (11.1)

Part Number B28129-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

6 Anomaly Detection

This chapter describes anomaly detection, an unsupervised mining function for detecting rare cases in the data.

See Also:

"Unsupervised Data Mining" .

This chapter contains the following sections:

About Anomaly Detection

The goal of anomaly detection is to identify cases that are unusual. Anomaly detection is an important tool for detecting fraud, network intrusion, and other rare events that are significant but hard to find.

Anomaly detection can be used to solve problems like the following:

Counter-examples

Counter-examples are cases that do not fall within a given class. Sometimes examples are easy to find, but counter-examples are either hard to specify or expensive to collect. For example, in text document classification, it is easy to classify a document under a given topic. However, the universe of documents not belonging to this topic can be very large and it may not be feasible to provide counter-examples

Outliers

Outliers are cases that are unusual because they fall outside the distribution that is considered normal for the data. For example, census data might show a median household income of $60,000 and a mean household income of $70,000, but one or two households might have an income of $10,000. These cases would probably be identified as outliers.

The distance from the center of a normal distribution indicates how typical a given point is with respect to the distribution of the training data. Each case can be ranked according to the probability that it is either typical or atypical.

One-Class Classification

Anomaly detection is a form of classification. See "About Classification" for an overview of the classification mining function.

Anomaly detection is implemented as one-class classification, because only one class is represented in the training data. An anomaly detection model predicts whether a data point is typical for a given distribution or not. An atypical data point can be either an outlier or an example of a previously unseen class.

Normally, a classification model must be trained on data that includes both examples and counterexamples for each class so that the model can learn to distinguish between them. For example, a model that predicts side effects of a medication should be trained on data that includes a wide range of responses to the medication.

A one-class classifier develops a profile that generally describes the training data. Any deviation from the profile is identified as an anomaly. One-class classifiers are sometimes referred to as positive security models, because they seek to identify "good" behaviors and assume that all other behaviors are bad.

Note:

Solving a one-class classification problem is difficult, and the accuracy of one-class classifiers cannot usually match the accuracy of standard classifiers built with meaningful counterexamples.

The goal of anomaly detection is to provide some useful information where no information was previously attainable. However, if there are enough of the "rare" cases so that stratified sampling could produce a training set with enough counterexamples for a standard classification model, then that would generally be a better solution.

Anomaly Detection Algorithm

Oracle Data Mining supports One-Class SVM for anomaly detection. When used for anomaly detection, SVM classification does not use a target.