Description: Compute kernel-based eigenvectors for a set of training examples.
Usage: kernel-pca [options] -train <filename>
Input:
- -train <filename> - an RDB file of training examples. The first column contains labels, and the remaining columns containing real-valued features.
Output: An RDB matrix in which each column corresponds to an eigenvector. Eigenvectors are normalized so that the dot product of the eigenvector with itself equals the reciprocal of the corresponding eigenvalue. In the output, the eigenvectors are sorted by increasing magnitude.
Options:
The following six options modify the base kernel function. The operations occur in the order listed below.
- -matrix - By default, the base kernel function is a dot product. This option allows that function to be replaced by an arbitrary function. Read a kernel matrix, rather than training set examples, from the file specified by '-train'. The matrix is an n+1 by n+1 RDB matrix, where n is the number of training examples. The first row and column contain data labels. The matrix entry for row x, column y, contains the kernel value K(x,y).
- -zeromean - Subtract from each element in the input data the mean of the elements in that row, giving the row a mean of zero.
- -varone - Divide each element in the input data by the standard deviation of the elements in that row, giving the row a variance of one.
- -adddiag <value> - Add the given value to the diagonal of the kernel matrix.
- -normalize - Normalize the kernel matrix by dividing K(x,y) by sqrt(K(x,x) * K(y,y)).
- -constant <value> - Add a given constant to the kernel. The default constant is 1.
- -coefficient <value> - Multiply the kernel by a given coefficient. The default coefficient is 1.
- -power <value> - Raise the kernel to a given power. The default power is 1.
- -radial - Convert the kernel to a radial basis function. If K is the base kernel, this option creates a kernel of the form exp[(-D(x,y)2)/(2 w2)], where w is the width of the kernel (see below) and D(x,y) is the distance between x and y, defined as D(x,y) = sqrt[K(x,x)2 - 2 K(x,y) + K(y,y)2].
- -widthfactor <value> - The width w of the radial basis kernel is set using a heuristic: it is the median of the distance from each training point to the nearest training point. This option specifies a multiplicative factor to be applied to that width.
- -nocenter - PCA requires a centered matrix, in which the sum of each column is zero. This centering operation can be performed in kernel space, and is done by default. The
-nocenter
option disables this operation. This option is only useful in conjunction with the-kernelout
operation, to produce an intermediate matrix.- -numeigens <value> - Include in the output at most the specified number of eigenvectors (subject to the next constraint). By default, all are included.
- -eigenthresh <value> - Include in the output only eigenvectors whose corresponding eigenvalues are greater than the specified value. Default value is zero (all eigenvectors).
- -eigenvalues <file> - Create a file with the given name and store the eigenvalues there as a space-separated array of numbers.
- -noformatline - Usually, RDB formatted files contain column width information on the second line of the file. With this option, the program does not expect a format line in the input file and does not produce a format line in the output file.
- -kernelout - Compute and print the kernel matrix to stdout. Do not compute the eigenvectors.
- -notime - Do not include timing information in the output header.
- -verbose 1|2|3|4|5 - Set the verbosity level of the output to stderr. The default level is 2.