XQSync

To get started using XQSync, try the tutorial.

Running XQSync

The entry point is the main method in the com.marklogic.ps.xqsync.XQSync class. It takes zero or more property files as its arguments. Any specified system properties will override file-based properties, and properties found in later files may override properties specified in earlier files on the command line. See src/xqsync.sh for a sample shell script.

Note: XQSync needs a lot of heap space for large synchronization tasks. Be prepared to increase the Java VM heap space limit, using -Xmx.

Required libraries:

Required properties:

Available properties:

Propertydefault valuenotes
ALLOW_EMPTY_METADATA false If true, missing metadata files in INPUT_PACKAGE will be ignored.
COPY_PERMISSIONStrue If true, all document permissions are copied.
COPY_PROPERTIEStrue If true, all document properties are copied.
FATAL_ERRORStrue If true, all exceptions are fatal. If false, exceptions will still be logged, but in most cases XQSync will proceed.
INPUT_CONNECTION_STRING null Input documents will come from this XCC connection. By default, every document in the input database will be transferred. To change this behavior, use one of the related properties:
  • INPUT_COLLECTION_URI
  • INPUT_DIRECTORY_URI
  • INPUT_DOCUMENT_URIS
  • INPUT_QUERY
NB - to list all input documents, or to list a collection or a directory, XQSync uses cts:uris(). If the document URI lexicon is not available, it will fall back to a slower technique.
INPUT_COLLECTION_URI null In combination with INPUT_CONNECTION_STRING, all documents in the named collection(s) will be transferred. If whitespace is present, INPUT_COLLECTION_URI will be treated as a whitespace-delimited sequence; e.g., INPUT_COLLECTION_URI=a b would transfer all documents in either collection a or collection b.
INPUT_DIRECTORY_URI null In combination with INPUT_CONNECTION_STRING, all documents in the named directory will be transferred. If whitespace is present, INPUT_DIRECTORY_URI will be treated as a whitespace-delimited sequence; e.g., INPUT_DIRECTORY_URI=a/ b/ would transfer all documents whose URIs begin with a/ or b/.
INPUT_DOCUMENT_URIS null In combination with INPUT_CONNECTION_STRING, all documents named by the (whitespace-delimited) uris will be transferred.
INPUT_QUERY null In combination with INPUT_CONNECTION_STRING, all uris returned by the query will be transferred. This sample query would transfer the first 100 documents, in document order:
for $i in doc()[1 to 100] return xdmp:node-uri($i)
If the document URI lexicon is enabled, this could be written as:
cts:uris('', 'document')[1 to 100]
to transfer the first 100 documents, sorted by document URI.
INPUT_QUERY_CACHABLE false In combination with INPUT_CONNECTION_STRING, the query which fetches the input document URIs will instruct XCC to cache or to stream the URIs. If set to true, no documents will sync until all URIs have been fetched. This is usually undesirable, so false is the default.
INPUT_QUERY_BUFFER_BYTES 0 In combination with INPUT_CONNECTION_STRING, the query which fetches the input document URIs will use this buffer size. The value 0 will cause XCC to use its default size.
INPUT_PACKAGE null Input documents will come from this zip file path. If the path is a directory, any "*.zip" children will be used.
INPUT_START_POSITION null Use the numeric value of this property as the starting position for the sequence of input documents.
INPUT_TIMESTAMP null If not null, and INPUT_CONNECTION_STRING is set, then all input queries will use this timestamp. The special value #AUTO will cause the first request timestamp to be used for the entire synchronization.
LOG_LEVELINFO java.util.logger.Level at which to log.
LOG_HANDLERCONSOLE,FILE java.util.logger log handlers with which to log.
OUTPUT_COLLECTIONS null Output documents will be added to one or more collection URIs. Collection URIs may be delimited by whitespace, commas, or colons.
OUTPUT_CONNECTION_STRING null Documents will be written to this XCC connection.
OUTPUT_DELETE_COLLECTION null In combination with INPUT_COLLECTION_URI and OUTPUT_CONNECTION_STRING, delete the INPUT_COLLECTION_URI on the OUTPUT_CONNECTION_STRING, before beginning synchronization.
OUTPUT_FORESTSnullPermitted output forest names.
OUTPUT_PACKAGE null Output documents will be written to this zip file path.
QUEUE_SIZE 100,000 Maximum size of the synchronization queue, to limit memory consumption by XQSync. You may wish to use a smaller value, if you encounter OutOfMemoryError. You may wish to use a larger value, if using many threads and loading very small documents. If you use a large value, you may also need something like -Xmx4096m to increase the Java heap size. Plan for roughly 1-GB per 1-M queue entries (ie, 1-kB per entry).
READ_PERMISSION_ROLESnull Names of any roles to attach to output documents.
REPAIR_INPUT_XMLfalse Should MarkLogic Server try to repair malformed input XML?
SKIP_EXISTINGfalse If true, documents that already exist in OUTPUT_CONNECTION are not overwritten. This only affects operations when OUTPUT_CONNECTION is defined. If false, or if targeting an OUTPUT_PACKAGE, then all documents will be overwritten.
THREADS1 Number of worker threads to spawn.
URI_PREFIXnull String to prepend to all output uris.
URI_SUFFIXnull String to append to all output uris.