Usage: gist-sigmoid <train labels>
<train predictions> <test predictions>
Description:
Fit a sigmoid function to the discriminant values produced by an SVM, and use the sigmoid to compute probabilities. This program is based upon pseudocode given in "Probabilistic outputs for support vector machines and comparison to regularized likelihood methods" by Platt.Typically, to use Gist, you begin with a training set of data, a corresponding set of labels, and a test set of unannotated data that you would like to classify. In order to use this
gist-sigmoid
, follow these steps:It is important that the three data sets -- SVM training set, sigmoid training set, and unannotated test set -- be disjoint. Otherwise, the probability estimates that you obtain from
- Divide your set of labeled training data into an SVM training set and a sigmoid training set. The sigmoid training set can be smaller than the SVM training set; for example, you might randomly extract 10% of your training set to be used in sigmoid training.
- Run
compute-weights
on the SVM training set.- Run
classify
on the sigmoid training set, using the weights generated from the SVM training set.- Run
classify
on the unnannotated set of test data, also using the weights generated from the SVM training set. This is the data set for you would like to get probability estimates.- Finally, run
gist-sigmoid
using the sigmoid training set predictions generated byclassify
, the true labels of the sigmoid training set, and the predictions for the unannotated training set.gist-sigmoid
will be skewed.
Inputs:
Output:
The program prints to standard output a version of the test predictions file with an additional column. This column contains probabilities corresponding to each of the given discriminant values. The parameters of the sigmoid (A and B) are included in an additional line at the end of the header. The formula for converting a discriminant value X into a probability is 1 / (1 + exp(A * X + B)). Also, the second column of the predictions file contains binary class predictions based upon the probabilities (using a threshold of 50%), rather than based upon the discriminants.
Options:
-algorithm [platt|lin]
Selecting the lin
option activates an alternative optimization routine. This is based
upon pseduocode provided by Lin and colleages (Lin H, Lin C, Weng R.
"A note on Platt's probabilistic outputs for support vector machines",
Technical report, Department of Computer Science and Information
Engineering, National Taiwan University, 2003). This code was
supplied by Michael E. Matheny (mmatheny@dsg.harvard.edu).
Calls: none