[MarkLogic Dev General] Saving trained 'supports' SVM classifier
lists at hubmed.org
Fri Dec 12 08:58:02 PST 2008
2008/12/11 Mary Holstege <mary.holstege at marklogic.com>:
> On Thu, 11 Dec 2008 09:13:43 -0800, Alf Eaton <lists at hubmed.org> wrote:
>> I've been trying to use the SVM classifier (MarkLogic 4.0-1) to
>> classify a set of documents, but ran into a problem when trying to
>> save a trained 'supports' classifier between runs. The problem seems
>> to be that the saved classifier identifies documents in the training
>> set using a temporary ID, which is no longer valid when the
>> classification of the test set is performed. With the 'weights'
>> classifier it works fine.
>> Here's the error message:
>> Invalid classifier specification element:
>> -- Invalid classifier specification element: document id
>> 6196220549445471859 not found
>> I've attached a PHP script that contains the actual XQuery queries
>> used, in case that's helpful.
> This is expected behaviour.
> The documentation (for cts:train) says this, although it doesn't
> perhaps stress the implications:
> "The support vector representation of the classifier includes a supports
> node that has <class/> children for each class. Here the class elements
> contain a list of doc elements which identify the specific training nodes
> using an internal key. This internal key is valid across queries only for
> nodes in the database."
> What this means is that if your training and classification are happening in
> different queries (which is generally the case, although it need not be),
> then you have to put the training set in the database if you are using the
> "supports" form of the classifier. If you are using the "weights" form of
> the classifier you won't have this issue. And if you perform the training
> and the classification in the same query, you also won't have a problem.
Thanks Mary, I hadn't noticed that in the documentation.
When you say "you have to put the training set in the database", what
does that involve, specifically? I was storing the training set in the
same way as the classifier (saving the set of documents at a specific
URI), but maybe it needs to be stored differently.
More information about the General