[MarkLogic Dev General] prevent CPF updated doc from getting into CPF

Mary Holstege mary.holstege at marklogic.com
Sun Mar 18 09:52:52 PDT 2012


On Sun, 18 Mar 2012 01:30:30 -0700, Mihir Das <mihir.kumar999 at gmail.com> wrote:

> Thanks Mary for quick response.
>
> I wanted to stop that doc which is updated by CPF. just want any indicator which will CPF identify automatically and stop from its reprocessing(e.g cpf:document-set-state=final).
>
> Now, I am wondering that if i have CPF configured then i won't be able to run CORB/content bulk reprocessing EVER!! coz corb updated doc will automatically get queue in CPF and go through entire CPF process. and though we have restriction of MAX-TASK queue of about 100k.
>
> I think there has to be some way which stops the updated doc from getting into CPF queue.
>
> or is there any way to perform (node-replace,Set-processing-status,set-state) on doc without getting multiple-update-conflict.


If a document is updated in a CPF domain, CPF will respond to it.
This is by design: CPF is designed to enforce those guarantees.

Setting the state won't do anything, because the state is
just an application label; it means nothing.

Setting the status in the document being processed directly
will fail, because the status is CPF's way of controlling
processing and it will automatically set the status/state
as it advanced the document through the rules specified.

What you can do is put in special rules to cut off
processing under some circumstances (see the rule
in the Conversion Processing pipeline for suppressing
additional processing of TOCs, for example. You
will still get the status trigger. You need some
application level property to control whether or
not processing is required.

Another approach is to move the document out of the
CPF domain when you are done with it. If you use
collection-based domains instead of directory-based
domains, this is a matter of setting the collection
on it as you final step of processing. In your case
I don't think this will work because you want some
updates to take effect, but not all.

So really, your only option is what I outlined
before: you want some updates to take effect and
not others. Therefore you need to specify the
condition in which those updates will take
effect and put it as a condition on the state
in your pipeline, or on the updated status, if
you want to cut off further processing earlier.

As far as max tasks goes, you need to make
sure you throttle Corb so that it doesn't overrun
CPF. You can bump up the limit, as well.

//Mary


More information about the General mailing list