Description: Train a support vector machine using a simple iterative update procedure first described by Jaakkola, Diekhans and Haussler.
Usage:
gist-train-svm [options] -train <train filename> -class <class filename>
Input:
- <train filename> - a tab-delimited file of training examples. The first column contains labels, and the remaining columns contain real-valued features. Missing values are not allowed.
- <class filename> - a multi-column, tab-delimited file of training set labels. This file must contain exactly the same number of lines as the training data file. The first column contains labels, which must appear in the same order as in the training data file. The second and subsequent columns contain binary classifications (1 for positive examples, -1 or 0 for negatives). The classification column used from this file is the first one by default; subsequent columns can be used by invoking the -useclassnumber option described below.
Output: A five-column, tab-delimited file. The first two columns are identical to the classification file that was provided as input. Column three contains learned weights for the SVM, each multiplied by the corresponding label. Columns four and five contain the predicted classification and the corresponding discriminant value. This output file is suitable for input to classify.
Options:
- -useclassnumber <value> - If the class file contains multiple classes, use the class indicated by this number. The first column of class labels is column 1. If this option is omitted, the first column of classifications is used.
- -initial <file> - Initialize the weights to the given values. The weights should appear in column 3 of the file. Output files produced by this program may be used to initialize the weights.
- -hyperplane <file> - Print to the given file the coordinates of the hyperplane. This option is only allowed when the kernel is linear; i.e., when the
-radial
,-power
,-coefficient
and-matrix
options are not specified. The resulting file can be used as input to thegist-fast-classify
program.- -holdout <percent> - Add two additional columns to the output, which will contain the predicted classification and corresponding discriminant values computed via hold-one-out cross-validation. The specified <percent> determines what percentage of the training set will be randomly selected for hold-one-out cross-validation. For the remaining, non-held-out examples, the final two columns will contain the value "NaN". Note that the weights reported using this switch are the weights obtained from training on the entire data set.
- -zeromeanrow - Subtract from each element in the input data the mean of the elements in that row, giving the row a mean of zero.
- -varone - Divide each element in the input data by the standard deviation of the elements in that row, giving the row a variance of one.
- -matrix - By default, the base kernel function is a dot product. This option allows that function to be replaced by an arbitrary function supplied by the user (for many commonly used kernels, see the options listed below). If supplied, the software reads kernel values, rather than raw feature data, from the file specified by
-train
. The matrix must be an n+1 by n+1 tab-delimited matrix, where n is the number of training examples. The first row and column contain data labels. The matrix entry for row x, column y, contains the kernel value K(x,y). Note that if the kernel matrix was generated by gist-train-svm (and then possibly modified), then you should also set "-constant 0 -nonormalize
" when using -matrix, to avoid these transformations being applied to the kernel matrix a second time. Further Note that there are special options that must be invoked during later classification using classify with an SVM trained using the -matrix option. These include the -selftrain and -selftest options. See the documentation for classify for details.The following four options control feature selection, which is only available in conjunction with hold-one-out cross-validation. In order to perform feature selection on distinct training and test sets, you must first use fselect to select a feature subset.
The following eight options modify the base kernel function. The operations occur in the order listed below.
- -fselect fisher|ttest|welch|mannwhitney|sam|tnom - Specify the metric used to evaluate individual features. See the documentation for fselect for more information.
- -fthreshtype percent|number|value - Select different means of setting the feature selection threshold. The "percent" option chooses the top n% of the features. The "number" option chooses the top n features. The "value" option chooses features that score above n. The default setting is "percent".
- -fthreshold <value> - Set the threshold for feature selection. The default setting depends upon the threshold type: for "percent" and "number", the default is 10; for "value" it is 1. This threshold is not to be confused with the SVM stopping criterion, which is set by -threshold.
- -fscores <file> - Write to the given file a matrix containing the computed quality scores for each feature. Each row corresponds to one feature. The first column contains the feature name, and the second column contains the the Fisher score, the t-test score, or the negative log2 of the t-test p-value. If the "-holdout" option is specified, additional columns are included, corresponding to each held-out example.
- -adddiag <value> - Add the given value to the diagonal of the training kernel matrix. This option effects a 2-norm soft margin and should therefore not be used in conjunction with the
-posconstraint
and-negconstraint
options. The default value is 0.- -nonormalize - Do not normalize the kernel matrix. By default, the matrix is normalized by dividing K(x,y) by sqrt(K(x,x) * K(y,y)).
- -constant <value> - Add a given constant to the kernel. The default constant is 10.
- -coefficient <value> - Multiply the kernel by a given coefficient. The default coefficient is 1.
- -power <value> - Raise the kernel to a given power. The default power is 1.
- -radial - Convert the kernel to a radial basis function. If K is the base kernel, this option creates a kernel of the form exp[(-D(x,y)2)/(2 w2)], where w is the width of the kernel (see below) and D(x,y) is the distance between x and y, defined as D(x,y) = sqrt[K(x,x) - 2 K(x,y) + K(y,y)].
- -widthfactor <value> - The width w of the radial basis kernel is set using a heuristic: it is the median of the distance from each positive training point to the nearest negative training point. This option specifies a multiplicative factor to be applied to that width. The default is a width factor of 1.
- -width <value> - Directly set the width w of the radial basis kernel. If set, this option overrides the -widthfactor option.
- -diagfactor <value> - Add to the diagonal of the kernel matrix (n+/N) * m * k, where n+ is the number of positive training examples if the current example is positive (and similarly for negative training examples), N is the total number of training examples, m is the median value of the diagonal of the kernel matrix, and k is the value specified here. This option effects a 2-norm soft margin and should therefore not be used in conjunction with the
-posconstraint
and-negconstraint
options. The default diagonal factor is 0.1. Note that the diagonal factor is not applied if the-kernelout
option is set.- -posconstraint <value> - Set an explicit upper bound on the magnitude of the weights for positive training examples. By default, the magnitude is unconstrained. Note that this option (and the next) should be used in combination with a
-diagfactor
of 0.- -negconstraint <value> - Set an explicit upper bound on the magnitude of the weights for negative training examples. By default, the magnitude is unconstrained.
- -rdb - Allow the program to read and create RDB formatted files, which contain an additional format line after the first line of text.
- -kernelout - Compute and print the kernel matrix to stdout. Do not compute the weights and do not add to the kernel diagaonal.
- -threshold <value> - Set the convergence threshold. Training halts when the objective function changes by less than this amount. Default is 0.000001. Note that lowering the threshold also increases the precision with which weights are reported by the program. Note that this threshold has nothing to do with feature selection: the feature seleciton threshold is set with -fthreshold.
- -maxiter <value> - Set the maximum number of iterations for the optimization routine. Default is no limit. If -maxiter and -maxtime are both set, the optimization will stop when either limit is reached. If a limit is reached, the program quits without producing any output.
- -maxtime <seconds> - Set the maximum time in seconds for the optimization routine. This limit applies only to a single SVM optimization, not to the total running time of hold-one-out cross-validation. Default is no limit. If -maxiter and -maxtime are both set, the optimization will stop when either limit is reached. If a limit is reached, the program quits without producing any output.
- -seed <value> - Set the seed for the random number generator. By default the seed is set from the clock.
- -notime - Do not include timing information in the output header.
- -precision <value> - Number of digits after the decimal place in the output file. By default, this value is set equal to the maximum of 4 and the log of the convergence threshold minus 2.
- -verbose 1|2|3|4|5 - Set the verbosity level of the output to stderr. The default level is 2.
Warning messages:
Warning: Zero-valued diagonal kernel element:row 0 = 2.99937, col 163 = 0.
This warning is issued when normalization of the kernel causes a divide-by-zero error. The message above indicates that the (unnormalized) 163rd diagonal kernel matrix entry (i.e., at row 163 and column 163) has a value of zero. Therefore, when normalizing the kernel value in row 0, column 163, Gist attempts to divide by zero, which is not possible. Gist circumvents the problem by dividing instead by a very small number (0.0000000001), but the warning is issued to let you know about the problem. Note that in order to avoid producing too many warnings, the warning is issued only once, even though the divide-by-zero happens many times (in this case, it will also happen in row 1, 2, etc.). Typically, this error is caused by a row of all zeroes in the input data.
Warning: possible underflow in radial_kernel
This error indicates that the exponentation carried out by the radial basis function produced a value that is smaller than can be represented by the computer. This will occur when your data contains many features and many large values. A possible solution is to scale your data, using for example a small
-coefficient
value. Note that the warning is issued only once, even if the underflow occurs many times.Warning: Terminating after failure to converge in 10000 iterations.
Warning: Terminating after failure to converge in 10000 seconds.These warnings are issued if the computation time exceeds the limits imposed by the
-maxiter
and-maxtime
options.Warning: Using 1-norm and 2-norm soft margins simultaneously.
This warning is issued when the soft margin is specified in two ways, both by constraining the magnitude of the weights (using
-posconstraint
and-negconstraint
) and by adding to the diagonal of the kernel matrix (using-diagfactor
). The optimization will still work, but it is somewhat odd to use both types of margin at once. Note that, by default, the 2-norm soft margin is used.Bugs:
- The program does not verify that the labels in the class file match the labels in the data file.
- Tie breaking is not actually random in the sense that each run will give different results, because the decision depends only on the original order of the examples in the input files.