Oracle® Data Mining Concepts 11g Release 1 (11.1) Part Number B2812901 


View PDF 
This chapter describes NonNegative Matrix Factorization, the unsupervised algorithm used by Oracle Data Mining for feature extraction.
This chapter contains the following topics:
NonNegative Matrix Factorization (NMF) is a feature extraction algorithm that decomposes multivariate data by creating a userdefined number of features, which results in a reduced representation of the original data.
Note:
NonNegative Matrix Factorization (NMF) is described in the paper "Learning the Parts of Objects by NonNegative Matrix Factorization" by D. D. Lee and H. S. Seung in Nature (401, pages 788791, 1999).NMF decomposes a data matrix V into the product of two lower rank matrices W and H so that V is approximately equal to W times H. NMF uses an iterative procedure to modify the initial values of W and H so that the product approaches V. The procedure terminates when the approximation error converges or the specified number of iterations is reached.
Each feature is a linear combination of the original attribute set; the coefficients of these linear combinations are nonnegative.
During model apply, an NMF model maps the original data into the new set of attributes (features) discovered by the model.
Text mining involves extracting information from unstructured data. Typically, text data is highdimensional and sparse. Unsupervised algorithms like Principal Components Analysis (PCA), Singular Value Decomposition (SVD), and NMF involve factoring the documentterm matrix based on different constraints. One widely used approach for text mining is latent semantic analysis.
NMF focuses on reducing dimensionality. By comparing the vectors for two adjoining segments of text in a highdimensional semantic space, NMF provides a characterization of the degree of semantic relatedness between the segments. NMF is less complex than PCA and can be applied to sparse data. NMFbased latent semantic analysis is an attractive alternative to SVD approaches due to the additive nonnegative nature of the solution and the reduced computational complexity and resource requirements.
The presence of outliers can significantly impact NMF models. Use a clipping transformation before you bin or normalize the table to avoid the problems caused by outliers for these algorithms.
NMF may benefit from normalization.
Outliers with minmax normalization cause poor matrix factorization. To improve the matrix factorization, you need to decrease the error tolerance. This in turn leads to longer build times.