Skip Headers

Oracle9i Data Mining Concepts
Release 9.2.0.2

Part Number A95961-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page Go to next page
View PDF

A
ODM Sample Programs

Oracle9i Data Mining (ODM) includes sample input data and sample Java programs that illustrate techniques available with ODM's Java API. This appendix contains instructions for compiling and executing these sample programs.

The input data used by the sample programs is created as part of the ODM install; the data is available in the odm_mtr schema.

You can compile and execute the sample programs using provided scripts; you can also compile and execute the sample programs in JDeveloper using a JDeveloper project that you can download from Oracle Technology Network (http://otn.oracle.com). For information about installing and using the JDeveloper project for ODM, see the README that is included in the download.

A.1 Overview of the ODM Sample Programs

The ODM sample programs illustrate the main operations of the data mining process:

Data mining models are either supervised or unsupervised.

A supervised model is used to predict the value of a designated variable, called a target, together with the confidence associated with each prediction. Supervised models are illustrated in the sample programs for Naive Bayes (NB) and Adaptive Bayes Networks (ABN).

An unsupervised model has no target variable, but is used to predict group membership or relationships of an individual. Unsupervised models are illustrated in the sample programs for Clustering and Association Rules.

The Discretization sample programs illustrate the techniques of binning, that is, forming groups of categorical values and ranges of numerical values to satisfy the requirements of ODM's algorithms.

The Model Seeker sample program illustrates the creation and testing of several supervised models with a variety of parameter settings; the model with the best test results is saved.

The Attribute Importance sample program illustrates the analysis of data in order to rank the variables by the influence of each in predicting target values.

The PMML sample programs illustrate the production and consumption (export/import) of data mining models conforming to the emerging standards for Predictive Model Markup Language.

The short sample programs Sample_NaiveBayesBuild_short.java and Sample_NaiveBayesApply_short.java illustrate basic ODM usage. They are described in detail in Chapter 3.

A.1.1 ODM Java API

This appendix does not include a detailed description of the ODM API classes and methods. For detailed information about the ODM API, see the ODM Javadoc in the directory $ORACLE_HOME/dm/doc (UNIX) or %ORACLE_HOME%\dm\doc (Windows) on any system where ODM is installed.

A.1.2 Oracle9i JDeveloper Project for the Sample Programs

If you want to use Oracle9i JDeveloper to exercise the sample programs, you can either create a new project or download an existing one. The ODM sample programs are available on Oracle Technology Network (otn.oracle.com) as an Oracle9i JDeveloper project. For information about the JDeveloper project, including installation instructions, see the readme file included with the download.

A.1.3 Requirements for Using the Sample Programs

The ODM user schema must be configured with or upgraded to Oracle9i release 2 (9.2.0.1).

The patch 9.2.0.2 must be applied.

The ODM user and ODM_MTR accounts must be unlocked if locked.

You will be required to provide the host name, TCP/IP port, and Oracle SID of the database to which you want to connect. Additionally, you will also need to know the password for ODM user on that database. Contact your database administrator if you do not know what those values are.

A.2 ODM Sample Programs Summary

Most programs have an (input) table. See Section A.4 for more information.

The sample programs, except for the short sample programs, use property files to specify values that control program execution. Each program has at least one property file. There is also one special property file, Sample_Global.property, that is used to specify the characteristics of the environment in which the programs run. See Section A.5 for more information.

After ODM is installed on your system, the sample programs, property files, and scripts are in the directory $ORACLE_HOME/dm/demo/sample (UNIX) or %ORACLE_HOME%\dm\demo\sample (Windows).

The rest of this section lists the ODM sample programs, arranged according to the ODM features that they illustrate. For detailed information about a program, see the comments in the sample program and in its property file.

A.2.1 Basic ODM Usage

The following sample programs are the programs that are discussed in detail in Chapter 3:


Note:

If you execute SampleNaiveBayesBuild.java and then execute SampleNaiveBayesBuild_short.java or Sample_NaiveBayesApply_short.java, you must change buildtablename to a new name. Otherwise, you get a unique constraint error because the model name, MFS name, and Mining Task name are identical in both programs.


  1. Sample_NaiveBayesBuild_short.java
    • Property file: This program does not have a property file.
    • Data: census_2d_build_unbinned
  2. Sample_NaiveBayesApply_short.java
    • Property file: This program does not have a property file.
    • Data: census_2d_apply_unbinned

Neither of these sample programs uses either a property file or Sample_Global.property.

A.2.2 Adaptive Bayes Network Models

The following sample programs illustrate building an Adaptive Bayes Network Model, calculating lift for the model and testing it, and applying the model:

  1. Sample_AdaptiveBayesNetworkBuild.java
    • Property file: Sample_AdaptiveBayesNetworkBuild.property
    • Data: census_2d_build_binned
  2. Sample_AdaptiveBayesNetworkLiftAndTest.java
    • Property file:
      Sample_AdaptiveBayesNetworkLiftAndTest.property
    • Data: census_2d_test_binned
  3. Sample_AdaptiveBayesNetworkApply.java
    • Property file: Sample_AdaptiveBayesNetworkApply.property
    • Data: census_2d_apply_binned

A.2.3 Naive Bayes Models

The following programs illustrate building a Naive Bayes Model, calculating lift for the model and testing it, applying the model, and cross validating the model:


Note:

If you execute SampleNaiveBayesBuild.java and then execute SampleNaiveBayesBuild_short.java or Sample_NaiveBayesApply_short.java, you must change the buildtablename to a new name. Otherwise, you get a unique constraint error because the model name, MFS name, and Mining Task name are identical in both programs.


  1. Sample_NaiveBayesBuild.java
    • Property file: Sample_NaiveBayesBuild.property
    • Data: census_2d_build_unbinned
  2. Sample_NaiveBayesLiftAndTest.java
    • Property file: Sample_NaiveBayesLiftAndTest.property
    • Data: census_2d_test_unbinned
  3. Sample_NaiveBayesApply.java
    • Property file: Sample_NaiveBayesApply.property
    • Data: census_2d_apply_unbinned
  4. Sample_NaiveBayesCrossValidate.java
    • Property file: Sample_NaiveBayesCrossValidate.property
    • Data: census_2d_build_unbinned

A.2.4 Model Seeker Usage

The following sample program illustrates how to use Model Seeker to identify a "best" model:

  1. Sample_ModelSeeker.java
    • Property file: Sample_ModelSeeker.property
    • Data: census_2d_build_unbinned and census_2d_test_unbinned

A.2.5 Clustering Models

The following sample programs illustrate building a clustering model, applying it, and inspecting clustering results:

  1. Sample_ClusteringBuild.java
    • Property file: Sample_ClusteringBuild.property
    • Data: eight_clouds_build_unbinned
  2. Sample_ClusteringApply.java
    • Property file: Sample_ClusteringApply.property
    • Data: eight_clouds_apply_unbinned
  3. Sample_Clustering_Results.java
    • Property file: Sample_Clustering_Results.property.
    • Input is name of built and applied clustering model

A.2.6 Association Rules Models

The following sample program illustrates building an Association Rules model:

Sample_AssociationRules.java

The property file depends on the format of the data:

A.2.7 PMML Export and Import

The following sample programs illustrate importing and exporting PMML Models:

  1. Sample_PMML_Export.java
    • Property file: Sample_PMML_Export.property
    • Data: no input data is required
  2. Sample_PMML_Import.java
    • Property file: Sample_PMML_Import.property
    • Data: no input data is required

A.2.8 Attribute Importance Model Build and Use

The following sample programs illustrate how to build an attribute importance model and use the results to build another model:

  1. Sample_AttributeImportanceBuild.java
    • Property file: Sample_AttributeImportanceBuild.property
    • Data: magazine_2d_build_binned
  2. Sample_AttributeImportanceUsage.java
    • Property file: Sample_AttributeImportanceUsage.property
    • Data: magazine_2d_build_binned and magazine_2d_test_binned

A.2.9 Discretization

The following sample programs show to discretize (bin) data by creating a bin boundaries table and how to use the bin boundaries table:

  1. Sample_Discretization_CreateBinBoundaryTables.java
    • Property file:
      Sample_Discretization_CreateBinBoundaryTables.property
    • Data: census_2d_build_unbinned
  2. Sample_Discretization_UseBinBoundaryTables.java
    • Property file:
      Sample_Discretization_UseBinBoundaryTables.property
    • Data: census_2d_apply_unbinned

A.3 Using the ODM Sample Programs

After ODM is installed on your system, the sample programs, property files, and scripts are in the directory $ORACLE_HOME/dm/demo/sample (UNIX) or %ORACLE_HOME%\dm\demo\sample (Windows); the data used by the sample programs is in the directory $ORACLE_HOME/dm/demo/data (UNIX) or %ORACLE_HOME%\dm\demo\data (Windows). The data required by the sample programs is also installed in the ODM_MTR schema.

First, copy all of the sample files into a new directory so that the original files will remain intact.

Next, if necessary, connect to the database as the user ODM and run the script that starts the scheduler for ODM programs:

exec odm_start_monitor

The monitor must be started once in the life of a database installation; it is not harmful to start the monitor if it is already running. If a sample program is executed and hangs at the beginning of a data mining task, then the monitor is probably not running.

Property files are used to specify common characteristics of the execution environment and to control execution of individual programs. Sample_Global.property file must be edited to point to the local database before any sample programs can be executed. The property files provide parameter settings for each of the sample programs; every parameter has a default setting, so the sample programs can be run without editing any of the property files. Each property file is discussed in Section A.5.

Scripts are included with the sample programs to compile and execute them. See Section A.6 for details.

The sample programs must be executed in the proper order. For a given model type, the sample build program must be executed before test, apply, or PMML export can be executed. For discretization, CreateBinBoundaryTables must be executed before UseBinBoundaryTables. The script that executes the sample programs supports a parameter that executes all of the programs in the correct order.

The sample programs illustrate the ODM API classes and methods used to perform various data mining tasks, and display the input required for each task as well as typical results.

Possible phases in exercising the sample code might be:

  1. Compile and run a sequence of sample programs using all default values in the property files. To compile one or all of the sample programs, see Section A.6.1; to execute one of sample programs or all of them in order, see Section A.6.2.
  2. For each program, make note of the input values, look at the source code to observe which ODM methods accomplish each piece of the process, and note the results.
  3. Edit a property file to modify one or more parameters; re-execute the program and note changes in the results. Modifying the binning scheme requires editing the source code in Sample_Discretization_CreateBinBoundaryTables.java, which must then be re-compiled before execution.
  4. Create new sample tables for building and testing a model from data that is not part of the supplied sample data. This will illustrate the data preparation that is required in order to implement a data mining solution within an existing application.


    Note:

    The sample programs for Naive Bayes and Adaptive Bayes Network models require that any record to which you apply the models has either an integer or string attribute. You cannot apply the models to records that have a continuous numeric attribute.


A.4 Data Used by the Sample Programs

Each of the algorithms employed by ODM requires data that is discrete (binned) and numerical. Data for ODM programs can be binned in several ways:

For some of the sample programs, there is a choice of format for the input data, and binned as well as unbinned versions of the input data are supplied. The Discretization programs can be used to apply a customized binning scheme to unbinned data.

ODM can accept input data in either "nontransactional" format, that is one row per case, or "transactional" format, that is multiple rows per case. The input for the Association sample programs, the "Market Basket" data, is available in either format.

The data used to test a model must be distinct from the data that was used to build that model, but it is valid in a development setting to apply the model to the same data that was used for testing.

The data used by the sample programs is in the directory $ORACLE_HOME/dm/demo/data (UNIX) or %ORACLE_HOME%\dm\demo\data (Windows). The data is also installed in the ODM_MTR schema.

Table A-1 summarizes the data that is included with the sample programs.

Table A-1 ODM Sample Programs Data
Tables Description

CENSUS_2D_BUILD_BINNED

CENSUS_2D_BUILD_UNBINNED

CENSUS_2D_TEST_BINNED

CENSUS_2D_TEST_UNBINNED

CENSUS_2D_APPLY_BINNED

CENSUS_2D_APPLY_UNBINNED

The Census data is derived from information from the U.S. Census Bureau. The target attribute is CLASS, which represents salary level (0 = low salary and 1 = high salary) . The Census data is used in all of the sample programs except in those illustrating clustering, attribute importance, and association rules.

EIGHT_CLOUDS_APPLY_UNBINNED

EIGHT_CLOUDS_BUILD_UNBINNED

The Eight Clouds data is artificially generated for the sole purpose of illustrating Clustering.It is designed to produce eight partially overlapping clusters, which makes clustering this dataset nontrivial.

MAGAZINE_2D_BUILD_BINNED

MAGAZINE_2D_TEST_BINNED

The Magazine data is derived from an actual marketing data set concerning a magazine subscription campaign. The target is MR_MAG, which represents Purchase (= 1) or No Purchase (= 0). This data is used in the Attribute Importance sample program.

MARKET_BASKET_2D_BINNED

MARKET_BASKET_TX_BINNED

The Market Basket data represents shopping sessions in a grocery store. In the 2D (nontransactional) version, each row represents a shopping session and has a value 1 in the column for a product found in the check-out basket. This data is used in the Association Rules sample program.

A.5 Property Files for the ODM Sample Programs

After ODM is installed on your system, the sample programs, property files, and scripts are in the directory $ORACLE_HOME/dm/demo/sample (UNIX) or %ORACLE_HOME%\dm\demo\sample (Windows).

Most sample programs require two property files, Sample_Global.property and a property file specific to the sample program. The two short sample programs, Sample_NaiveBayesBuild_short.java and Sample_NaiveBayesApply_short.java, do not require any property files.

A.5.1 Sample_Global.property

The information in this file is used to make the connection to the database when a sample program is executed and to specify the location of the input and output tables.

A.5.1.1 Database and Schemas

During the installation of Oracle9i, the database name and port number were established. You must specify the URL for the database and the password for the ODM user in miningServer.url. The database URL is a string that specifies the type of JDBC driver used to connect to the database and database details. ODM supports the JDBC thin driver which requires a database URL in the following form:

"jdbc:oracle:thin:@<host_name>:<port_number>:<sid>"

The schemas (and user names) ODM and ODM_MTR were created during the installation, and during the Password Management phase, each schema user was assigned a password. Ensure that the password for ODM is entered in miningServer.password.

For an example of how to edit the global property file, see step 5.

The ODM schema is used internally by the ODM API programs; the ODM_MTR schema contains the sample input tables and some of the output tables. Normally you enter ODM_MTR for the values of both inputDataSchemaName and outputSchemaName.

A.5.1.2 Cleanup Section

Each sample program property file has a Cleanup Section. Since the sample programs, by default, re-use the names of objects created during execution, the default action is to delete the objects created during any previous execution of that program. You can choose to change the setting so that cleanup is prevented, but if you do, you must change all object names or risk program failure. You also have the option of cleaning up only, and not otherwise executing the program.

A.5.1.3 Tasks

Each distinct data mining operation is a task that is queued for execution. A task name is required to identify the operation, and each sample program has a default task name assigned in the parameter miningTaskName.

A.5.2 Sample_Discretization_CreateBinBoundaryTables.property

ODM algorithms require that input data be discretized (binned) before model building, testing, computing lift, and applying (scoring). You can either bin the data using appropriate Java methods or you can let the ODM algorithms automatically bin data. For a detailed discussion of discretization, see Section 1.8.

A.5.2.1 Sample Discretization Create Bin Boundary Input

If you use as input the default table census_2d_build_unbinned, then the transactionalData parameters are ignored. If you name a transactional table as discretizationData.tableName, then you must change discretizationData.type to transactional and enter the three column names of the table in the transactionalData parameters.

A.5.2.2 Sample Discretization Create Bin Boundary Output

This program creates separate binning definitions for categorical and numerical data and stores these definitions in the ODM_MTR schema in tables named in the parameters discretization.discretizationNumericTableName and discretization.discretizationCategoricalTableName.

A.5.3 Sample_Discretization_UseBinBoundaryTables.property

Section A.5.2 describes how to create bin boundaries. This section explains how to use those bin boundaries to create a bin boundaries table.

A.5.3.1 Sample Discretization Use Bin Boundary Input

If you use as input the default table census_2d_apply_unbinned, then the transactionalData parameters are ignored. If you name a transactional table as discretizationData.tableName, then you must change discretizationData.type to transactional and enter the three column names of the table in the transactionalData parameters.

The input table must have the same description as the table used in generating the Bin Boundary tables named in the input parameters discretization.discretizationNumericTableName and discretization.discretizationCategoricalTableName.

A.5.3.2 Sample Discretization Use Bin Boundary Output

The attribute values in the input data table are binned according to the rules in the Bin Boundary tables and the results are found in the view named in the parameter discretization.discretizedViewName in the ODM_MTR schema.

If the parameter discretization.openEndedNumericalDiscretization is set to true, then the highest and lowest bins are open-ended. That is, for example, instead of a top Age range of 90-100, the range will be "greater than 90".

A.5.4 Sample_NaiveBayesBuild.property

If you use as input the default table census_2d_build_unbinned, then the transactionalData parameters are ignored. If you name a transactional table as discretizationData.tableName, then you must change discretizationData.type to transactional and enter the three column names of the table in the transactionalData parameters.

In addition, several parameters describe how the model is to be built; these settings are held in the object named in classificationFunctionSettings.miningSettingsName.

The setting dataPrepStatus indicates whether automatic binning will be used (unprepared) or whether the data has been previously binned (discretized).

The setting supplementalAttributes lists those attributes, in a comma-separated list, that could be used to identify individuals in a report, but cannot be used in building the model because they would eliminate the generality needed for the production of meaningful rules.

The setting targetAttributeName identifies the attribute value that the model will predict.

The predictions of the model are based on probabilities, which are based on the number of times particular values occur in the Build input data. Setting the Thresholds establishes how much data is gathered; in particular, raising the threshold eliminates "rarer" cases, making the rules tables smaller and the build process faster, but possibly making the model less accurate.

The parameter naiveBayesOutput.modelName assigns a name to the resultant model so that it can be uniquely identified in the testing and applying programs.

A.5.5 Sample_NaiveBayesLiftAndTest.property

There are two tasks included in Lift and Test.

The parameter liftAndTest.modelName contains the name of the model produced by the Sample_NaiveBayesBuild program.

If you use as input the default table census_2d_test_unbinned, then the transactionalData parameters are ignored. If you name a transactional table as discretizationData.tableName, then you must change discretizationData.type to transactional and enter the three column names of the table in the transactionalData parameters.

The parameters for Lift are

The parameters computeLift.resultName and test.resultName give the names of the objects containing the test results.

A.5.6 Sample_NaiveBayesCrossValidate.property

The Sample_NaiveBayesCrossValidate program builds a Naive Bayes model and tests it by simulating an iterative process in which a model is built omitting one record of the input data, then the model is applied to that one record as a test. When this procedure has been completed for each distinct record in the input data, an aggregate accuracy figure and confusion matrix are calculated. This process is an effective test when only a small number of records is available for model building.

The input parameters are the same as those for Sample_NaiveBayesBuild and Sample_NaiveBayesLiftAndTest and the results are the same as the results of Sample_NaiveBayesLiftAndTest, except that there are no Lift results.

A.5.7 Sample_NaiveBayesApply.property

The Naive Bayes model can be applied to new data to produce a scored list. The input data must be in the same format as the data used to build the model (attribute names and data types, binning scheme, dataset type - transactional or nontransactional). The input can be a table or a single record; in the case of a record, the column names must be in upper case and the model must have been built using automatic binning.

There are two output options for the format and contents of the resultant table: multipleScoring or targetProbability.

The choice of multipleScoring gives not only the predicted value and the probability (confidence) of the prediction, but also the option to list other possible target values and their probabilities. For example, if the target values are Low, Medium, High, then the output could be a single row with the prediction High and confidence.75, or three separate rows in the output table containing High/.75, Low/.15, Medium/.10.

The choice targetProbability gives a result for each target value, and in addition to the information produced by multipleScoring, displays the ranking of each predicted value.

applyOutputOption is multipleScoring or targetProbability, as explained above. The parameter sourceAttributeNames is a comma-separated list of the attribute names from the input table to be included in the output; sourceAttributeAliases give a display name for each attribute.

For both multipleScoring and targetProbability, predictionColumnName and probabilityColumnName give the names of the columns in the result table containing the predicted value and the confidence. The parameter numberOfPredictions is an integer from 1 to the number of distinct target values, and indicates how many rows will be produced for each input record. The parameter topToBottom is true if you want the predictions sorted in descending order by probability and false if you want the predictions sorted in ascending order.

The parameter rankAttribute is the column name for rank in the result table.

The distinct possible prediction values are listed in targetValues (with data type targetDataType), and the display names for the predictions are listed in targetDisplayNames.

A.5.8 Sample_AttributeImportanceBuild.property

The parameter dataPrepStatus indicates whether automatic binning should be applied to the input data (unprepared) or if the data is already binned (discretized).

The target attribute is named in targetAttributeName.

The resultant table contains a list of the attributes ranked by their influence in predicting the target value.

A.5.9 Sample_AttributeImportanceUsage.property

This program executes the Attribute Importance program on the input data, then uses one of five methods available to create a table containing a subset of the original attributes. It then builds two Naive Bayes models, one using the full attribute set (NB1), the other using the reduced set as input (NB2), and displays the two test results.

The threshold and target parameters are set as for Naive Bayes Build. The accuracy parameter is ignored in the sample code.

The attributeSelection parameter has an integer value from 1 to 5, as follows:

  1. Specify 1 to select attributes whose Importance is above or below (inclusive) a given value.

    Example: attributes with importance value above 0.01:

    threshold = 0.01, aboveThreshold = true
    
    
  2. Specify 2 to select attributes whose Importance is between two given values

    Example: attributes with importance value between 0.01 and 0.02:

    lowerBound = 0.01, upperBound = 0.02
    
    
  3. Specify 3 to select attributes by rank from a list sorted in descending order by Importance, above or below a given rank

    Example: the first five entries from the list:

    threshold = 5, aboveThreshold = true
    
    
  4. Specify 4 to select attributes by rank from a list sorted in descending order by Importance, between two given ranks

    Example: attributes with rank between 3 and 6:

    lowerBound = 3, upperBound = 6
    
    
  5. Specify 5 to select the highest N% or lowest N% of attributes from a list sorted in descending order by Importance

    Example: the first 10% of attributes from the list

    percentage = 0.1, aboveThreshold = true
    

A.5.10 Sample_AssociationRules Property Files

The Sample_AssociationRules program takes one of two property files depending on the format of the input data table

The buildData.type parameter is set appropriately to transactional or nonTransactional, depending on which property file is used, and in the transactional case, the column headings must be set. Otherwise the two property files have the same parameters.

The function settings parameters are as follows:

A.5.11 Sample_ModelSeeker.property

The Model Seeker sample program creates several Naive Bayes (NB) and Adaptive Bayes Network (ABN) models and compares them by calculating a value called the "Figure of Merit" based on the test results of the models. The model with the best Figure of Merit is saved; for other models, the models are discarded and only the settings and test results are saved.

Build and Test input data tables are identified, with parameter settings as for the Naive Bayes sample programs.

The AlgorithmSetting parameters specify the model types to be built and the input parameters for each. The Setting_1 specifies one NB model and Setting_4 one ABN model; the parameters are as described for the property files of those model types. Settings 2, 3, and 5 specify multiple models and a list of values is entered for some parameters. For NB, the setting crossProduct indicates that models with every possible combination of parameter values are built, whereas parallelPairs indicates that a model is built using the first value in each list, another model is built using the second value in each list, and so on.

The function settings include dataPreparationStatus (discretized = binned, unprepared = unbinned), targetAttributeName, targetAttributeType (categorical, numerical), and supplementalAttributes (a comma-separated list of attributes to be ignored during model build).

For the purposes of calculating Lift, one target value is designated as "positive". The value, data type, and Name used in reports are established in the positiveCategoryTarget parameters.

The parameter modelSeeker.weight assigns a value used in calculating the Figure of Merit. The default value of 1 makes no distinction in cost between types of errors in prediction: false-positive (a prediction of positive when the actual value is negative) and false-negative (a prediction of negative when the actual value is positive). An example is a model to predict that a customer is likely to buy Product X in a telemarketing campaign: the cost of a false-positive is small (the cost of a telephone call) but the cost of a false-negative is high (the revenue for a lost sale). In this case a weight greater than 1 would be set (by experiment). In a case in which the cost of false-positive is higher than false-negative, a weight between 0 and 1 would be used.

The parameter modelSeeker.liftQuantiles sets the number of quantiles displayed in the Lift table. For large Test data tables, more finely-grained results can be observed for lift quantiles set to 100.

A.5.12 Sample_ClusteringBuild.property

There are two clustering algorithms available in ODM:

For k-means, you define the (maximum) number of clusters, whereas O-cluster determines the number of clusters as part of the algorithm. The default input table is an artificially-generated set of data designed to illustrate the functionality by forming eight distinct clusters.

The clustering parameters include:

A.5.13 Sample_ClusteringApply.property

The Clustering Apply program takes the rules generated by Clustering Build and applies them to a single record or to a table to identify the cluster membership of each record. If the input is a single record, then the clustering model must have been built using automatic binning.

The parameter apply.type indicates whether the input is a record or a table. The modelName is the clusteringOutput.modelName from the Clustering Build program. The miningApplyOutput.ApplyOutputOption indicates the format of the results table, as follows:

A.5.14 Sample_Clustering_Results.property

Sample_Clustering_Results.java illustrates how to get information about the results of applying a clustering model. Sample_Clustering_Results.java returns the following kinds of information:

Sample_Clustering_Results.property requires on input value: the name of the clustering model built with Sample_ClusteringBuild.java and applied using Sample_ClusteringApply.java.

A.5.15 Sample_AdaptiveBayesNetworkBuild.property

An Adaptive Bayes Network can be thought of as a series of small trees that can be combined to produce a set of rules describing how the model makes predictions. These small trees are called Network Features. Each level of a network feature describes one attribute, and the number of branches at each node is the number of distinct values of that attribute. Thus, features can quickly become complex; there are several parameters in Adaptive Bayes Network that control the complexity and therefore the build time:

Other parameters have the same meaning as for Naive Bayes as described in Section A.5.4.

A.5.16 Other Sample_AdaptiveBayesNetwork Property Files

The parameters in the property files SampleAdaptiveBayesNetworkLiftAndTest.property and SampleAdaptiveBayesNetworkApply.property are identical to those of the corresponding Naive Bayes property file:

A.5.17 Sample PMML Import and Export Property

ODM provides programs to export a model into a table and import a model from a table in a format conforming to emerging standards called PMML. These standards will allow a model created by one vendor's data mining utility to be used by another vendor's utility for scoring.

The parameters specify the model to be imported/exported as well as the schema, table, and column containing the model specifications.

A.6 Compiling and Executing ODM Sample Programs

This section provides a brief description of how to compile and execute the ODM sample programs. You can do the following:

After ODM is installed on your system, the sample programs, property files, and scripts are in the directory $ORACLE_HOME/dm/demo/sample (UNIX) or %ORACLE_HOME%\dm\demo\sample (Windows).

The sample programs first check to see if they have run previously. If the program has run previously, each cleans up the environment before it runs again. You can execute these programs more than once without getting errors.

A.6.1 Compiling the Sample Programs

Follow these steps to compile the sample programs:

  1. Set your ORACLE_HOME environment variable.
  2. Ensure that you have installed JDK 1.3.1 or above. JDK 1.3.1 may also be available in ORACLE_HOME with your installation. Set your JAVA_HOME environment variable; it should point to your installed JDK directory or the one available in ORACLE_HOME.
  3. On UNIX, you must set your CLASSPATH environment variable so that it includes the following Oracle9i Java Archive files:
    $ORACLE_HOME/jdbc/lib/classes12.jar
    $ORACLE_HOME/lib/xmlparserv2.jar
    $ORACLE_HOME/rdbms/jlib/jmscommon.jar
    $ORACLE_HOME/rdbms/jlib/aqapi.jar
    $ORACLE_HOME/rdbms/jlib/xsu12.jar
    $ORACLE_HOME/dm/lib/odmapi.jar
    
    

    On Windows, your Classpath system variable must include

    %ORACLE_HOME%\jdbc\lib\classes12.jar
    %ORACLE_HOME%\lib\xmlparserv2.jar
    %ORACLE_HOME/rdbms/jlib/jmscommon.jar
    %ORACLE_HOME%\rdbms\jlib\aqapi.jar
    %ORACLE_HOME%\rdbms\jlib\xsu12.jar
    %ORACLE_HOME\dm\lib\odmapi.jar
    
    

    If you use a database character set that is not US7ASCII, WE8DEC, WE8ISO8859P1, or UTF8, you must also include the following in your CLASSPATH:

    $ORACLE_HOME/jdbc/lib/nls_charset12.zip (on UNIX)
    %ORACLE_HOME%\jdbc\lib\nls_charset12.zip (on Windows)

    
    
  4. Before you compile the short programs Sample_NaiveBayesBuild_short.java and Sample_NaiveBayesApply_short.java, you must edit the programs to specify the database URL (DB_URL) and the password for the ODM user. The database URL is a string that specifies the type of JDBC driver used and database details. ODM supports the JDBC thin driver which requires a database URL in the following form:
    "jdbc:oracle:thin:@<host_name>:<port_number>:<sid>"
    
    

    To specify the data mining server, substitute appropriate values for DB_URL and ODM password in the following line in both short sample programs:

    dms = new DataMiningServer("DB_URL", "odm", "ODM password");
    
    

    The location access data has already been specified in both programs for your convenience.

  5. For all other sample programs except for the short sample programs, you must edit Sample_Global.property file to specify the database URL, the ODM user name, and the password for ODM user. Replace the strings MyHost, MyPort, MySid, MyName, and MyPW with the appropriate values for your system. MyName refers to the ODM user and it must be replaced with odm. MyPW is the password specified during ODM configuration or while unlocking the ODM user account. By default the password is odm.

    For example:

    miningServer.url=jdbc:oracle:thin:@odmserver.company.com:1521:orcl
    miningServer.userName=odm
    miningServer.password=odm
    inputDataSchemaName=odm_mtr
    outputSchemaName=odm_mtr
    
    
  6. You can compile each ODM sample program or all of them at once by running one of the provided scripts.

    To compile a specific sample program, execute one of the following scripts as shown:

    On UNIX platforms:

    /usr/bin/sh compileSampleCode.sh <filename>
    
    

    For example, to compile SampleModelSeeker.java:

    /usr/bin/sh compileSampleCode.sh Sample_ModelSeeker.java
    
    

    On Windows platforms:

    compileSampleCode.bat <filename>
    
    

    For example, to compile SampleModelSeeker.java:

    compileSampleCode.bat Sample_ModelSeeker.java
    
    

    To compile all of the programs, use one of the scripts with the parameter all:

    On UNIX platforms:

    /usr/bin/sh compileSampleCode.sh all
    
    

    On Windows platforms:

    compileSampleCode.bat all
    

A.6.2 Executing the Sample Programs

  1. Before you execute a sample program, log in to the ODM user schema in the database and type the following command to turn on the ODM task monitor
    exec odm_start_monitor
    
    

    Generally, you will not have to start the monitor again unless you manually stop it or the job associated with the task monitor turns broken. If you do not start the task monitor, any data mining tasks pending for execution will hang.

  2. Each of the sample programs uses Sample_Global.property and a program-specific property file. For example, Sample_ModelSeeker.java requires Sample_ModelSeeker.property. To execute a specific sample program use one of the provided scripts. If you do not specify a property file, the script assumes the default property file for the specified program. Note that Sample_AssociationRules.java has a choice of two distinct property files; the desired property file for this sample program must be specified explicitly.

    The short sample programs do not require property files. To execute them, use the script without specifying any property files.

    To execute a specific sample program, execute one of the scripts as follows:

    On UNIX platforms:

    /usr/bin/sh executeSampleCode.sh <classname> [<property file>]
    
    

    For example:

    /usr/bin/sh executeSampleCode.sh Sample_ModelSeeker
    /usr/bin/sh executeSampleCode.sh Sample_ModelSeeker myFile.property
    
    

    On Windows platforms:

    executeSampleCode.bat <classname> [<property file>]
    
    

    For example:

    executeSampleCode.bat Sample_ModelSeeker
    executeSampleCode.bat Sample_ModelSeeker myFile.property
    
    
  3. The ODM sample programs must be executed in the correct order; for example,:
    • For a given model type, the sample build program must be executed before test, apply, or PMML export can be executed.
    • For discretization, Sample_Discretization_CreateBinBoundaryTables must be executed before Sample_Discretization_UseBinBoundaryTables can be executed.
    • You must execute Sample_NaiveBayesBuild_short before Sample_NaiveBayesApply_short can be executed.

    To execute all of the programs in the correct order (except for Sample_NaiveBayesBuild_short.java and Sample_NaiveBayesApply_short.java), use one of the execution scripts with the parameter all:

    On UNIX platforms:

    /usr/bin/sh executeSampleCode.sh all
    
    

    On Windows platforms:

    executeSampleCode.bat all
    
    

    Note:

    If you use the all parameter, you must use the default names for the program-specific property files. Association rules models are executed twice using both of the possible property files described in Section A.5.10.