To get started using XQSync, try the tutorial.
The entry point is the main method in the com.marklogic.ps.xqsync.XQSync class. It takes zero or more property files as its arguments. Any specified system properties will override file-based properties, and properties found in later files may override properties specified in earlier files on the command line. See src/xqsync.sh for a sample shell script.
Note:
XQSync needs a lot of heap space for large synchronization tasks.
Be prepared to increase the Java VM heap space limit,
using -Xmx.
INPUT_PACKAGE,
INPUT_CONNECTION_STRING
OUTPUT_PACKAGE,
OUTPUT_CONNECTION_STRING
| Property | default value | notes |
|---|---|---|
| ALLOW_EMPTY_METADATA | false |
If true, missing metadata files
in INPUT_PACKAGE will be ignored.
|
| COPY_COLLECTIONS | true | If true, all document collections are copied. |
| COPY_PERMISSIONS | true | If true, all document permissions are copied. |
| COPY_PROPERTIES | true | If true, document properties are copied. When targeting an output connection that has CPF enabled, it is a good idea to disable this setting. |
| COPY_QUALITY | true | If true, document quality is copied. |
| FATAL_ERRORS | true | If true, all exceptions are fatal. If false, exceptions will still be logged, but in most cases XQSync will proceed. |
| INPUT_BATCH_SIZE | 1 | Process documents in batches of N documents. When exporting many small documents from an input database, increasing this setting can improve performance. Note that the right setting will vary according to document size: if the batch size is too large, poor performance or errors may result. |
| INPUT_CONNECTION_STRING | null |
Input documents will come from this XCC connection.
By default, every document in the input database will be transferred.
To change this behavior, use one of the related properties:
cts:uris().
If the document URI lexicon is not available,
it will fall back to a slower technique.
If the connection string uses the xccs:// scheme,
XQSync will attempt to use SSL for server communications.
This requires MarkLogic Server 4.1 or later.
|
| INPUT_COLLECTION_URI | null | In combination with INPUT_CONNECTION_STRING,
all documents in the named collection(s) will be transferred.
If whitespace is present, INPUT_COLLECTION_URI
will be treated as a whitespace-delimited sequence;
e.g., INPUT_COLLECTION_URI=a b would transfer all documents
in either collection a or collection b.
|
| INPUT_DIRECTORY_URI | null | In combination with INPUT_CONNECTION_STRING,
all documents in the named directory will be transferred.
If whitespace is present, INPUT_DIRECTORY_URI
will be treated as a whitespace-delimited sequence;
e.g., INPUT_DIRECTORY_URI=a/ b/ would transfer all documents
whose URIs begin with a/ or b/.
|
| INPUT_DOCUMENT_URIS | null | In combination with INPUT_CONNECTION_STRING,
all documents named by the (whitespace-delimited) uris will be transferred.
|
| INPUT_MODULE_URI | null |
In combination with Here is a simple example module, which recursively transforms the input document to lower-case all element names.
xquery version "0.9-ml"
define variable $URI as xs:string external
define function lc($list as node()*)
as node()*
{
for $n in $list
return typeswitch($n)
case document-node() return document { lc($n/node()) }
case element() return element {
expanded-QName(namespace-uri($n), lower-case(local-name($n)))
} {
$n/@*, lc($n/node())
}
default return $n
}
lc(doc($URI))
|
| INPUT_PACKAGE | null | Input documents will come from this zip file path. If the path is a directory, any "*.zip" children will be used. |
| INPUT_QUERY | null |
In combination with If the query contains any repeated semicolons (";;"), it will be split into multiple queries and run separately. This permits faster start-up with complex queries. |
| INPUT_QUERY_CACHABLE | false |
In combination with INPUT_CONNECTION_STRING,
the query which fetches the input document URIs
will instruct XCC to cache or to stream the URIs.
If set to true,
no documents will sync until all URIs have been fetched.
This is usually undesirable, so false is the default.
|
| INPUT_QUERY_BUFFER_BYTES | 0 |
In combination with INPUT_CONNECTION_STRING,
the query which fetches the input document URIs
will use this buffer size.
The value 0 will cause XCC to use its default size.
|
| INPUT_START_POSITION | null | Use the numeric value of this property as the starting position for the sequence of input documents. |
| INPUT_TIMESTAMP | null |
If not null, and INPUT_CONNECTION_STRING is set,
then all input queries will use this timestamp.
The special value #AUTO
will cause the first request timestamp
to be used for the entire synchronization.
|
| INPUT_RESULT_BUFFER_SIZE | 0 |
In combination with INPUT_CONNECTION_STRING,
the query which fetches each input document and its metadata
will use this buffer size.
The value 0 will cause XCC to use its default size.
|
| LOG_LEVEL | INFO | java.util.logger.Level at which to log. |
| LOG_HANDLER | CONSOLE,FILE | java.util.logger log handlers with which to log. |
| OUTPUT_COLLECTIONS | null | Output documents will be added to one or more collection URIs. Collection URIs may be delimited by whitespace, commas, or colons. |
| OUTPUT_CONNECTION_STRING | null |
Documents will be written to this XCC connection.
If the connection string uses the xccs:// scheme,
XQSync will attempt to use SSL for server communications.
This requires MarkLogic Server 4.1 or later.
|
| OUTPUT_DELETE_COLLECTION | null | In combination with INPUT_COLLECTION_URI and OUTPUT_CONNECTION_STRING, delete the INPUT_COLLECTION_URI on the OUTPUT_CONNECTION_STRING, before beginning synchronization. |
| OUTPUT_FILTER_FORMATS | null | The specified list of document types will not be copied to output.
Example: OUTPUT_FILTER_FORMATS=binary()
Example: OUTPUT_FILTER_FORMATS=text(),xml
|
| OUTPUT_FORESTS | null | Permitted output forest names. |
| OUTPUT_PACKAGE | null | Output documents will be written to this zip file path. |
| QUEUE_SIZE | 100,000 |
Maximum size of the synchronization queue,
to limit memory consumption by XQSync.
You may wish to use a smaller value,
if you encounter OutOfMemoryError.
You may wish to use a larger value,
if using many threads and loading very small documents.
If you use a large value,
you may also need something like -Xmx4096m
to increase the Java heap size.
Plan for roughly 1-GB per 1-M queue entries (ie, 1-kB per entry).
|
| READ_PERMISSION_ROLES | null | Names of any roles to attach to output documents. |
| REPAIR_INPUT_XML | false | Should MarkLogic Server try to repair malformed input XML? |
| SESSION_READER_CLASS | com.marklogic.ps.xqsync.SessionReader | Class to be used for new session reader instances.
This is an experimental feature, allowing plug-in of
any subclass of the default SessionReader class.
A sample subclass is provided as
com.marklogic.ps.tests.SessionReaderTest.
|
| SKIP_EXISTING | false | If true, documents that already exist in OUTPUT_CONNECTION are not overwritten. This only affects operations when OUTPUT_CONNECTION is defined. If false, or if targeting an OUTPUT_PACKAGE, then all documents will be overwritten. |
| THREADS | 1 | Number of worker threads to spawn. |
| URI_PREFIX | null | String to prepend to all output uris. |
| URI_SUFFIX | null | String to append to all output uris. |