gist-rfe

Description:

Perform recursive feature elimination, following
I. Guyon, J. Weston, S. Barnhill and V. Vapnik. "Gene selection for cancer classification using support vector machines." Machine Learning. 46(1-3):389-422, 2002.
This is an algorithm for selecting a subset of features for a particular learning task. The basic algorithm is the following:

Initialize the data set to contain all features.
Train an SVM on the data set.
Rank features according to c_i = (w_i)².
Eliminate the lower-ranked 50% of the features.
If more than one feature remains, return to step 2.

When using this algorithm, beware of incurring a selection bias. For details, see
C. Ambroise and G. J. McLachlan. "Selection bias in gene extraction on the basis of microarray gene-expression data." PNAS. 99:6562-6566, 2002.

Usage: gist-rfe [options] <train data> <train labels>

Inputs:

train data - a data file suitable for input to compute-weights
train labels - the corresponding label file

Output:

score-svm-results

-test

Options:

-test <data> <labels> - Evaluate performance with respect to an external test set.
-rfe-iter <value> - Perform a maximum number of iterations of RFE. By default, the algorithm continues until all features are eliminated.
-reduce <percent> - Percent of training set to eliminate each round. At least one feaure will always be removed. Default = 50.
-features <file> - File in which to print the features selected at each iteration. The output is in four columns: feature name, iteration, score (c_i) and a binary value indicating whether the feature was discarded in this iteration. If the filename contains a %, then multiple output files will be created, replacing % with the iteration number.
-weights <file> - File in which to store weights. The same use of % applies.
-predicts <file> - File in which to store predictions. The same use of % applies.
In addition, any option that is valid for gist-train-svm may also be given to SVM-RFE.

Calls: gist-train-svm, gist-classify, gist-score-svm