MarkLogic Connector for Hadoop 1.1-3

com.marklogic.mapreduce
Interface MarkLogicConstants

All Known Implementing Classes:
ContentOutputFormat, ContentWriter, DocumentInputFormat, DocumentReader, KeyValueInputFormat, KeyValueOutputFormat, KeyValueReader, KeyValueWriter, MarkLogicInputFormat, MarkLogicOutputFormat, MarkLogicRecordReader, MarkLogicRecordWriter, NodeInputFormat, NodeOutputFormat, NodeReader, NodeWriter, PropertyOutputFormat, PropertyWriter, ValueInputFormat, ValueReader

public interface MarkLogicConstants

Configuration property names and other constants used in the package. Use these property names in your Hadoop configuration to set MarkLogic specific properties. Properties may be set either in a Hadoop configuration file or programatically.

Use the mapreduce.marklogic.input.* properties when using MarkLogic Server as an input source. Use the mapreduce.marklogic.output.* properties when using MarkLogic Server to store your results.


Field Summary
static String ADVANCED_MODE
          Value string of advanced mode for input.mode.
static String BASIC_MODE
          Value string of basic mode for input.mode.
static String BATCH_SIZE
          The config property name ("mapreduce.marklogic.output.batchsize") which, if set, indicates the number of records in one request.
static String BIND_SPLIT_RANGE
          The config property name ("mapreduce.marklogic.input.bindsplitrange") which, if set to true, specifies that the input query declares and references external variables "splitstart" and "splitend" under the namespace "http://marklogic.com/hadoop".
static String CONTENT_TYPE
          The config property name ("mapreduce.marklogic.output.content.type") which, if set, indicates type of content to be inserted when using ContentOutputFormat.
static int DEFAULT_BATCH_SIZE
          Default batch size.
static String DEFAULT_CONTENT_TYPE
          Default content type.
static long DEFAULT_MAX_SPLIT_SIZE
          The default maximum split size for input splits, used if input.maxsplitsize is not specified.
static String DEFAULT_OUTPUT_CONTENT_ENCODING
          Default output content encoding
static String DEFAULT_OUTPUT_XML_REPAIR_LEVEL
          Default output XML repair level
static String DEFAULT_PROPERTY_OPERATION_TYPE
          Default property operation type.
static String DOCUMENT_SELECTOR
          The config property name ("mapreduce.marklogic.input.documentselector") which, if set, specifies the document selection portion of the path expression used to retrieve data from the server.
static String INDENTED
          The config property name ("mapreduce.marklogic.input.indented") which, if set, specifies whether to format data with indentation retrieved from MarkLogic.
static String INPUT_DATABASE_NAME
          Not yet Implemented.
static String INPUT_HOST
          The config property name ("mapreduce.marklogic.input.host") which, if set, specifies the MarkLogic Server host to use for input operations.
static String INPUT_KEY_CLASS
          The config property name ("mapreduce.marklogic.input.keyclass") which, if set, specifies the name of the class of the map input keys for KeyValueInputFormat.
static String INPUT_LEXICON_FUNCTION_CLASS
          The config property name ("mapreduce.marklogic.input.lexiconfunctionclass") which, if set, specifies the name of the class implementing LexiconFunction which will be used to generate input.
static String INPUT_MODE
          The config property name ("mapreduce.marklogic.input.mode") which, if set, specifies whether to use basic or advanced input query mode.
static String INPUT_PASSWORD
          The config property name ("mapreduce.marklogic.input.password") which, if set, specifies the cleartext password to use for authentication with input.username.
static String INPUT_PORT
          The config property name ("mapreduce.marklogic.input.port") which, if set, specifies the port number of the input XDBC server on the MarkLogic Server host specified by the input.host property.
static String INPUT_QUERY
          The config property name ("mapreduce.marklogic.input.query") which, if set, specifies the query used to retrieve input records from MarkLogic Server.
static String INPUT_SSL_OPTIONS_CLASS
          The config property name ("mapreduce.marklogic.input.ssloptionsclass") which, if set, specifies the name of the class implementing SslConfigOptions which will be used if input.ssl is set to true.
static String INPUT_USE_SSL
          The config property name ("mapreduce.marklogic.input.usessl") which, if set, specifies whether the connection to the input server is SSL enabled; false is assumed if not set.
static String INPUT_USERNAME
          The config property name ("mapreduce.marklogic.input.username") which, if set, specifies the MarkLogic Server user name under which input queries and operations run.
static String INPUT_VALUE_CLASS
          The config property name ("mapreduce.marklogic.input.valueclass") which, if set, specifies the name of the class of the map input value for KeyValueInputFormat, ValueInputFormat and DocumentInputFormat.
static String MAX_SPLIT_SIZE
          The config property name ("mapreduce.marklogic.input.maxsplitsize") which, if set, specifies the maximum number of fragments per input split.
static String MR_NAMESPACE
          The namespace ("http://marklogic.com/hadoop") in which the split range external variables are defined.
static String NODE_OPERATION_TYPE
          The config property name ("mapreduce.marklogic.output.node.optype") which, if set, indicates what node operation to perform during output.
static String OUTPUT_CLEAN_DIR
          The config property name ("mapreduce.marklogic.output.content.cleandir") which, if set, indicates whether or not to remove the output directory.
static String OUTPUT_COLLECTION
          The config property name ("mapreduce.marklogic.output.content.collection") which, if set, specifies a comma-separated list of collections to which generated output documents are added.
static String OUTPUT_CONTENT_ENCODING
          The config property name ("mapreduce.marklogic.output.content.encoding") which, if set, specifies the charset encoding to be used by the server when loading this document.
static String OUTPUT_CONTENT_LANGUAGE
          The config property name ("mapreduce.marklogic.output.content.language") which, if set, specifies the language name to associate with inserted documents.
static String OUTPUT_CONTENT_NAMESPACE
          The config property name ("mapreduce.marklogic.output.content.namespace") which, if set, specifies the namespace to associate with inserted documents.
static String OUTPUT_DIRECTORY
          The config property name ("mapreduce.marklogic.output.content.directory") which, if set, specifies the MarkLogic Server database directory where output documents are created.
static String OUTPUT_FAST_LOAD
          The config property name ("mapreduce.marklogic.output.content.fastload") which, if set, indicates whether or not to use the fast load mode to load content into MarkLogic.
static String OUTPUT_FOREST_HOST
          Internal use only.
static String OUTPUT_HOST
          The config property name ("mapreduce.marklogic.output.host") which, if set, specifies the MarkLogic Server host to use for output operations.
static String OUTPUT_KEY_TYPE
          The config property name ("mapreduce.marklogic.output.keytype") which, if set, specifies the data type of the output keys for KeyValueOutputFormat.
static String OUTPUT_KEY_VARNAME
          Value string of the output key external variable name.
static String OUTPUT_NAMESPACE
          The config property name ("mapreduce.marklogic.output.node.namespace") which, if set, indicates the namespace used for output.
static String OUTPUT_PASSWORD
          The config property name ("mapreduce.marklogic.output.password") which, if set, specifies the cleartext password to use for authentication with output.username.
static String OUTPUT_PERMISSION
          The config property name ("mapreduce.marklogic.output.content.permission") which, if set, specifies a comma-separated list role-capability pairs to associate with created output documents.
static String OUTPUT_PORT
          The config property name ("mapreduce.marklogic.output.port") which, if set, specifies the port number of the output MarkLogic Server specified by the input.host property.
static String OUTPUT_PROPERTY_ALWAYS_CREATE
          The config property name ("mapreduce.marklogic.output.property.alwayscreate") which, if set to true, causes PropertyOutputFormat to create document properties for reduce output key-value pairs even when no document exists with the target URI.
static String OUTPUT_QUALITY
          The config property name ("mapreduce.marklogic.output.content.quality") which, if set, specifies the document quality for created output documents.
static String OUTPUT_QUERY
          The config property name ("mapreduce.marklogic.output.query") which, if set, specifies the statement to execute against MarkLogic Server.
static String OUTPUT_SSL_OPTIONS_CLASS
          The config property name ("mapreduce.marklogic.output.ssloptionsclass") which, if set, specifies the name of the class implementing SslConfigOptions which will be used if output.usessl is set to true.
static String OUTPUT_STREAMING
          The config property name ("mapreduce.marklogic.output.content.streaming") which, if set, specifies whether to use streaming to insert content.
static String OUTPUT_TOLERATE_ERRORS
          The config property name ("mapreduce.marklogic.output.content.tolerateerrors") which, if set, specifies whether to tolerate insertion errors and make sure all successful inserts are committed.
static String OUTPUT_USE_SSL
          The config property name ("mapreduce.marklogic.output.usessl") which, if set, specifies whether the connection to the output server is SSL enabled; false is assumed if not set.
static String OUTPUT_USERNAME
          The config property name ("mapreduce.marklogic.output.username") which, if set, specifies the MarkLogic Server user name under which output operations run.
static String OUTPUT_VALUE_TYPE
          The config property name ("mapreduce.marklogic.output.valuetype") which, if set, specifies the data type of the map output value for KeyValueOutputFormat.
static String OUTPUT_VALUE_VARNAME
          Value string of the output value external variable name.
static String OUTPUT_XML_REPAIR_LEVEL
          The config property name ("mapreduce.marklogic.output.content.repairlevel") which, if set, specifies the document repair level for this options object.
static String PATH_NAMESPACE
          The config property name ("mapreduce.marklogic.input.namespace") which, if set, specifies a list of namespaces to use when evaluating the path expression constructed from the input.documentselector and input.subdocumentexpr properties.
static String PROPERTY_OPERATION_TYPE
          The config property name ("mapreduce.marklogic.output.property.optype") which, if set, indicates what property operation to perform during output when using PropertyOutputFormat.
static String RECORD_TO_FRAGMENT_RATIO
          The config property name ("mapreduce.marklogic.input.recordtofragmentratio") which, if set, specifies the ratio of the number of retrieved records to the number of accessed fragments.
static String SPLIT_END_VARNAME
          Use this external variable name ("splitend") in your advanced mode input query to access the end value of the record range in an input split when "mapreduce.marklogic.input.bindsplitrange" is true.
static String SPLIT_QUERY
          The config property name ("mapreduce.marklogic.input.splitquery") which, if set, specifies the query MarkLogic Server uses to generate input splits.
static String SPLIT_START_VARNAME
          Use this external variable name ("splitstart") in your advanced mode input query to access the start value of the record range in an input split when "mapreduce.marklogic.input.bindsplitrange" is true.
static String SUBDOCUMENT_EXPRESSION
          The config property name ("mapreduce.marklogic.input.subdocumentexpr") which, if set, specifies the path expression used to retrieve sub-document records from the server.
static String TXN_SIZE
          The config property name ("mapreduce.marklogic.output.transactionsize") which, if set, indicates the number of requests in one transaction.
 

Field Detail

INPUT_USERNAME

static final String INPUT_USERNAME
The config property name ("mapreduce.marklogic.input.username") which, if set, specifies the MarkLogic Server user name under which input queries and operations run. Required if using MarkLogic Server for input.

See Also:
Constant Field Values

INPUT_PASSWORD

static final String INPUT_PASSWORD
The config property name ("mapreduce.marklogic.input.password") which, if set, specifies the cleartext password to use for authentication with input.username. Required if using MarkLogic Server for input.

See Also:
Constant Field Values

INPUT_HOST

static final String INPUT_HOST
The config property name ("mapreduce.marklogic.input.host") which, if set, specifies the MarkLogic Server host to use for input operations. Required if using MarkLogic Server for input.

See Also:
Constant Field Values

INPUT_PORT

static final String INPUT_PORT
The config property name ("mapreduce.marklogic.input.port") which, if set, specifies the port number of the input XDBC server on the MarkLogic Server host specified by the input.host property. Required if using MarkLogic Server for input.

NOTE: Within a cluster, all nodes supplying MapReduce input data must use the same XDBC server port number.

See Also:
Constant Field Values

INPUT_USE_SSL

static final String INPUT_USE_SSL
The config property name ("mapreduce.marklogic.input.usessl") which, if set, specifies whether the connection to the input server is SSL enabled; false is assumed if not set.

See Also:
Constant Field Values

INPUT_SSL_OPTIONS_CLASS

static final String INPUT_SSL_OPTIONS_CLASS
The config property name ("mapreduce.marklogic.input.ssloptionsclass") which, if set, specifies the name of the class implementing SslConfigOptions which will be used if input.ssl is set to true.

See Also:
Constant Field Values

DOCUMENT_SELECTOR

static final String DOCUMENT_SELECTOR
The config property name ("mapreduce.marklogic.input.documentselector") which, if set, specifies the document selection portion of the path expression used to retrieve data from the server. Only used if using MarkLogic Server for input in basic mode.

The XQuery path expression step given in this property must select a sequence of document nodes. To further refine the input selection to nodes or values within the documents, use input.subdocumentexpr. If this property is not set, fn:collection() is used. For more information, see the overview.

This property is only usable when basic mode is specified with the input.mode property. If more powerful input customization is needed, use advanced mode and specify a complete input query with the input.query property.

The path expression step given in this property must be searchable. A searchable expression is one which can be optimized using indexes. See the Query and Performance Tuning Guide for more information on searchable path expressions.

The following selects all documents:

 <property>
   <name>mapreduce.marklogic.input.documentselector</name>
   <value>fn:collection()</value>
 </property>
 

See Also:
Constant Field Values

SUBDOCUMENT_EXPRESSION

static final String SUBDOCUMENT_EXPRESSION
The config property name ("mapreduce.marklogic.input.subdocumentexpr") which, if set, specifies the path expression used to retrieve sub-document records from the server. Used only if using MarkLogic Server for input in basic mode. If not set, the document nodes selected by the document selector are used.

The XQuery path expression step given in this property should select a sequence of nodes or atomic values from the set of documents selected by the path step given in the input.documentselector property. For more information, see the overview.

This property is only usable when basic mode is specified with the input.mode property. If more powerful input customization is needed, use advanced mode and specify a complete input query with the input.query property.

The following would select all documents containing hrefs:

 <property>
   <name>mapreduce.marklogic.input.documentselector</name>
   <value>fn:collection()</value>
 </property>
 <property>
   <name>mapreduce.marklogic.input.subdocumentexpr</name>
   <value>//wp:a[@href]</value>
 </property>
 

See Also:
Constant Field Values

INPUT_LEXICON_FUNCTION_CLASS

static final String INPUT_LEXICON_FUNCTION_CLASS
The config property name ("mapreduce.marklogic.input.lexiconfunctionclass") which, if set, specifies the name of the class implementing LexiconFunction which will be used to generate input.

See Also:
Constant Field Values

PATH_NAMESPACE

static final String PATH_NAMESPACE
The config property name ("mapreduce.marklogic.input.namespace") which, if set, specifies a list of namespaces to use when evaluating the path expression constructed from the input.documentselector and input.subdocumentexpr properties.

Specify the namespaces as comma separated alias-URI pairs. For example:

 <property>
   <name>mapreduce.marklogic.input.namespace</name>
   <value>wp, "http://www.mediawiki.org.xml/export-0.4/"</value>
 </property>
 

If a namespace URI includes a comma, you must set this property programmatically, rather than in a config file.

See Also:
Constant Field Values

SPLIT_QUERY

static final String SPLIT_QUERY
The config property name ("mapreduce.marklogic.input.splitquery") which, if set, specifies the query MarkLogic Server uses to generate input splits. This property is required (and only usable) in advanced mode; see the input.mode property for details.

The split query must return a sequence of (forest id, record count, hostname) tuples. The host name and forest id identify the forest associated with the split. The count is an estimate of the number of key-value pairs in the split.

The default split query used in basic input mode computes a rough estimate based on the number of documents in the database.

See Also:
Constant Field Values

MAX_SPLIT_SIZE

static final String MAX_SPLIT_SIZE
The config property name ("mapreduce.marklogic.input.maxsplitsize") which, if set, specifies the maximum number of fragments per input split. Optional. Default: 50000L. The default should be suitable for most applications.

See Also:
Constant Field Values

INPUT_DATABASE_NAME

static final String INPUT_DATABASE_NAME
Not yet Implemented.

The config property name ("mapreduce.marklogic.input.databasename") which, if set, specifies the name of the MarkLogic Server database from which to create input splits.

See Also:
Constant Field Values

INPUT_KEY_CLASS

static final String INPUT_KEY_CLASS
The config property name ("mapreduce.marklogic.input.keyclass") which, if set, specifies the name of the class of the map input keys for KeyValueInputFormat. Optional. Default: Text.

See Also:
Constant Field Values

INPUT_VALUE_CLASS

static final String INPUT_VALUE_CLASS
The config property name ("mapreduce.marklogic.input.valueclass") which, if set, specifies the name of the class of the map input value for KeyValueInputFormat, ValueInputFormat and DocumentInputFormat. Optional. Default: Text for KeyValueInputFormat and ValueInputFormat, MarkLogicDocument for DocumentInputFormat.

See Also:
Constant Field Values

INPUT_MODE

static final String INPUT_MODE
The config property name ("mapreduce.marklogic.input.mode") which, if set, specifies whether to use basic or advanced input query mode. Allowable values are basic and advanced. Optional. Default: basic.

Only basic mode is supported at this time.

Basic mode enables use of the input.documentselector, input.subdocumentexpr, and input.namespace properties. Advanced mode enables use of the input.query and input.splitquery properties.

See Also:
Constant Field Values

BASIC_MODE

static final String BASIC_MODE
Value string of basic mode for input.mode.

See Also:
Constant Field Values

ADVANCED_MODE

static final String ADVANCED_MODE
Value string of advanced mode for input.mode.

See Also:
Constant Field Values

INPUT_QUERY

static final String INPUT_QUERY
The config property name ("mapreduce.marklogic.input.query") which, if set, specifies the query used to retrieve input records from MarkLogic Server. This property is required when advanced is specified in the input.mode property.

The value of this property must be a fully formed query, suitable for evaluation by xdmp:eval, and must return a sequence. The items in the sequence depend on the InputFormat subclass configured for the job. For details, see "Advanced Input Mode" in the Hadoop MapReduce Connector Developer's Guide.

See Also:
Constant Field Values

BIND_SPLIT_RANGE

static final String BIND_SPLIT_RANGE
The config property name ("mapreduce.marklogic.input.bindsplitrange") which, if set to true, specifies that the input query declares and references external variables "splitstart" and "splitend" under the namespace "http://marklogic.com/hadoop". The connector binds to these variables with the start and end of an input split instead of constraining the query with the split range.

For details, see "Optimizing Your Input Query" in the Hadoop MapReduce Connector Developer's Guide.

See Also:
Constant Field Values

MR_NAMESPACE

static final String MR_NAMESPACE
The namespace ("http://marklogic.com/hadoop") in which the split range external variables are defined.

The split range variables "splitstart" and "splitend" are in this namespace when using advanced input mode and "mapreduce.marklogic.input.bindsplitrange" is true. Declare a namespace prefix for this namespace in your input query and qualify references to "splitstart" and "splitend" by the prefix. For details, see "Optimizing Your Input Query" in the Hadoop MapReduce Connector Developer's Guide.

See Also:
Constant Field Values

SPLIT_START_VARNAME

static final String SPLIT_START_VARNAME
Use this external variable name ("splitstart") in your advanced mode input query to access the start value of the record range in an input split when "mapreduce.marklogic.input.bindsplitrange" is true.

The variable must be declared and referenced in the namespace "http://marklogic.com/hadoop". For details, see "Optimizing Your Input Query" in the Hadoop MapReduce Connector Developer's Guide.

See Also:
Constant Field Values

SPLIT_END_VARNAME

static final String SPLIT_END_VARNAME
Use this external variable name ("splitend") in your advanced mode input query to access the end value of the record range in an input split when "mapreduce.marklogic.input.bindsplitrange" is true.

The variable must be declared and referenced in the namespace "http://marklogic.com/hadoop". For details, see "Optimizing Your Input Query" in the Hadoop MapReduce Connector Developer's Guide.

See Also:
Constant Field Values

RECORD_TO_FRAGMENT_RATIO

static final String RECORD_TO_FRAGMENT_RATIO
The config property name ("mapreduce.marklogic.input.recordtofragmentratio") which, if set, specifies the ratio of the number of retrieved records to the number of accessed fragments. Optional. Default: 1.0 (one record per fragment) for documents, 100 for nodes and values.

The record to fragment ratio is used for progress estimate.

See Also:
Constant Field Values

INDENTED

static final String INDENTED
The config property name ("mapreduce.marklogic.input.indented") which, if set, specifies whether to format data with indentation retrieved from MarkLogic. Optional. Valid values: TRUE, FALSE, SERVERDEFAULT. Default: false.

See Also:
Constant Field Values

OUTPUT_USERNAME

static final String OUTPUT_USERNAME
The config property name ("mapreduce.marklogic.output.username") which, if set, specifies the MarkLogic Server user name under which output operations run. Required if using MarkLogic Server for output.

See Also:
Constant Field Values

OUTPUT_PASSWORD

static final String OUTPUT_PASSWORD
The config property name ("mapreduce.marklogic.output.password") which, if set, specifies the cleartext password to use for authentication with output.username. Required if using MarkLogic Server for output.

See Also:
Constant Field Values

OUTPUT_HOST

static final String OUTPUT_HOST
The config property name ("mapreduce.marklogic.output.host") which, if set, specifies the MarkLogic Server host to use for output operations. Required if using MarkLogic Server for output.

See Also:
Constant Field Values

OUTPUT_FOREST_HOST

static final String OUTPUT_FOREST_HOST
Internal use only.

See Also:
Constant Field Values

OUTPUT_PORT

static final String OUTPUT_PORT
The config property name ("mapreduce.marklogic.output.port") which, if set, specifies the port number of the output MarkLogic Server specified by the input.host property. Required if using MarkLogic Server for output.

See Also:
Constant Field Values

OUTPUT_USE_SSL

static final String OUTPUT_USE_SSL
The config property name ("mapreduce.marklogic.output.usessl") which, if set, specifies whether the connection to the output server is SSL enabled; false is assumed if not set.

See Also:
Constant Field Values

OUTPUT_SSL_OPTIONS_CLASS

static final String OUTPUT_SSL_OPTIONS_CLASS
The config property name ("mapreduce.marklogic.output.ssloptionsclass") which, if set, specifies the name of the class implementing SslConfigOptions which will be used if output.usessl is set to true.

See Also:
Constant Field Values

OUTPUT_DIRECTORY

static final String OUTPUT_DIRECTORY
The config property name ("mapreduce.marklogic.output.content.directory") which, if set, specifies the MarkLogic Server database directory where output documents are created.

If output.cleandir is false (the default) then an error occurs if the directory already exists. If output.cleandir is true, then the directory is removed as part of the job submission process.

See Also:
Constant Field Values

OUTPUT_CONTENT_ENCODING

static final String OUTPUT_CONTENT_ENCODING
The config property name ("mapreduce.marklogic.output.content.encoding") which, if set, specifies the charset encoding to be used by the server when loading this document. The encoding provided will be passed to the server at document load time and must be a name that it recognizes. The document byte stream will be transcoded to UTF-8 for storage.

See Also:
Constant Field Values

DEFAULT_OUTPUT_CONTENT_ENCODING

static final String DEFAULT_OUTPUT_CONTENT_ENCODING
Default output content encoding

See Also:
Constant Field Values

OUTPUT_COLLECTION

static final String OUTPUT_COLLECTION
The config property name ("mapreduce.marklogic.output.content.collection") which, if set, specifies a comma-separated list of collections to which generated output documents are added. Optional. Relevant only when using MarkLogic Server for output with ContentOutputFormat.

Example:

 <property>
   <name>mapreduce.marklogic.output.content.collection</name>
   <value>latest,top10</value>
 </property>
 

See Also:
Constant Field Values

OUTPUT_PERMISSION

static final String OUTPUT_PERMISSION
The config property name ("mapreduce.marklogic.output.content.permission") which, if set, specifies a comma-separated list role-capability pairs to associate with created output documents. Optional. If not set, the default permissions for output.username are used. Relevant only when using MarkLogic Server for output with ContentOutputFormat.

Example:

 <property>
   <name>mapreduce.marklogic.output.content.permission</name>
   <value>dls-user,update,dls-user,read</value>
 </property>
 

See "URI Privileges and Permissions on Documents" in the Understanding and Using Security Guide for more information about roles and capabilities.

If the property value includes a comma in embedded in the role name, you must set this property in your code, rather than in a configuration file.

See Also:
Constant Field Values

OUTPUT_QUALITY

static final String OUTPUT_QUALITY
The config property name ("mapreduce.marklogic.output.content.quality") which, if set, specifies the document quality for created output documents. Optional. Relevant only when using MarkLogic Server for output with ContentOutputFormat.

Quality affects the search relevance of a document. The value must be a positive or negative integer. For more information about document quality, see "Relevance Scores: Understanding and Customizing" in the Search Developer's Guide.

See Also:
Constant Field Values

OUTPUT_STREAMING

static final String OUTPUT_STREAMING
The config property name ("mapreduce.marklogic.output.content.streaming") which, if set, specifies whether to use streaming to insert content. When streaming is set to true, the content will not be fully buffered in memory, hence will consume less memory but will disable auto-retry if there is a problem inserting the content.

See Also:
Constant Field Values

OUTPUT_CLEAN_DIR

static final String OUTPUT_CLEAN_DIR
The config property name ("mapreduce.marklogic.output.content.cleandir") which, if set, indicates whether or not to remove the output directory. Only applicable to ContentOutputFormat. Default: false.

When set to true, the output directory specified by the output.content.directory property is removed. When set to false, an exception is thrown if the output content directory already exists.

See Also:
Constant Field Values

OUTPUT_FAST_LOAD

static final String OUTPUT_FAST_LOAD
The config property name ("mapreduce.marklogic.output.content.fastload") which, if set, indicates whether or not to use the fast load mode to load content into MarkLogic. Default: false.

Setting it to true when the documents to be loaded already exist may cause XDMP-DBDUPURI error if the original documents were inserted when the database had a different forest count. The fast load mode will always be used if "mapreduce.marklogic.output.content.directory" is set.

See Also:
Constant Field Values

NODE_OPERATION_TYPE

static final String NODE_OPERATION_TYPE
The config property name ("mapreduce.marklogic.output.node.optype") which, if set, indicates what node operation to perform during output. Required if using MarkLogic Server for output with NodeOutputFormat. Valid choices: INSERT_BEFORE, INSERT_AFTER, INSERT_CHILD, REPLACE.

See Also:
NodeOpType, NodeOutputFormat, Constant Field Values

OUTPUT_PROPERTY_ALWAYS_CREATE

static final String OUTPUT_PROPERTY_ALWAYS_CREATE
The config property name ("mapreduce.marklogic.output.property.alwayscreate") which, if set to true, causes PropertyOutputFormat to create document properties for reduce output key-value pairs even when no document exists with the target URI. Default: false.

By default, PropertyOutputFormat does not create a property for a document URI unless the document already exists.

See Also:
Constant Field Values

OUTPUT_NAMESPACE

static final String OUTPUT_NAMESPACE
The config property name ("mapreduce.marklogic.output.node.namespace") which, if set, indicates the namespace used for output. This is used only in NodeOutputFormat, and is used for resolving element names in the node path.

See Also:
Constant Field Values

DEFAULT_MAX_SPLIT_SIZE

static final long DEFAULT_MAX_SPLIT_SIZE
The default maximum split size for input splits, used if input.maxsplitsize is not specified.

See Also:
Constant Field Values

PROPERTY_OPERATION_TYPE

static final String PROPERTY_OPERATION_TYPE
The config property name ("mapreduce.marklogic.output.property.optype") which, if set, indicates what property operation to perform during output when using PropertyOutputFormat. Ignored if not using PropertyOutputFormat. Optional. Valid choices: SET_PROPERTY, ADD_PROPERTY. Default: SET_PROPERTY.

See Also:
PropertyOpType, PropertyOutputFormat, PropertyWriter, Constant Field Values

DEFAULT_PROPERTY_OPERATION_TYPE

static final String DEFAULT_PROPERTY_OPERATION_TYPE
Default property operation type.

See Also:
Constant Field Values

CONTENT_TYPE

static final String CONTENT_TYPE
The config property name ("mapreduce.marklogic.output.content.type") which, if set, indicates type of content to be inserted when using ContentOutputFormat. Optional. Valid choices: XML, TEXT, BINARY, MIXED, UNKNOWN. Default: XML.

See Also:
Constant Field Values

OUTPUT_KEY_TYPE

static final String OUTPUT_KEY_TYPE
The config property name ("mapreduce.marklogic.output.keytype") which, if set, specifies the data type of the output keys for KeyValueOutputFormat. Optional. Default: xs:string.

See Also:
Constant Field Values

OUTPUT_VALUE_TYPE

static final String OUTPUT_VALUE_TYPE
The config property name ("mapreduce.marklogic.output.valuetype") which, if set, specifies the data type of the map output value for KeyValueOutputFormat. Optional. Default: xs:string.

See Also:
Constant Field Values

OUTPUT_QUERY

static final String OUTPUT_QUERY
The config property name ("mapreduce.marklogic.output.query") which, if set, specifies the statement to execute against MarkLogic Server. This property is required for KeyValueOutputFormat.

The statement is allowed to declare and refernce two external variables "key" and "value" under namespace "http://marklogic.com/hadoop", which will be bound by the connector with the output key and value in the user specified data type.

See Also:
Constant Field Values

OUTPUT_KEY_VARNAME

static final String OUTPUT_KEY_VARNAME
Value string of the output key external variable name.

See Also:
Constant Field Values

OUTPUT_CONTENT_LANGUAGE

static final String OUTPUT_CONTENT_LANGUAGE
The config property name ("mapreduce.marklogic.output.content.language") which, if set, specifies the language name to associate with inserted documents. A value of en indicates that the document is in english. The default is null, which indicates to use the server default.

See Also:
Constant Field Values

OUTPUT_CONTENT_NAMESPACE

static final String OUTPUT_CONTENT_NAMESPACE
The config property name ("mapreduce.marklogic.output.content.namespace") which, if set, specifies the namespace to associate with inserted documents. The default is null, which indicates that the default namespace should be used.

See Also:
Constant Field Values

OUTPUT_VALUE_VARNAME

static final String OUTPUT_VALUE_VARNAME
Value string of the output value external variable name.

See Also:
Constant Field Values

OUTPUT_XML_REPAIR_LEVEL

static final String OUTPUT_XML_REPAIR_LEVEL
The config property name ("mapreduce.marklogic.output.content.repairlevel") which, if set, specifies the document repair level for this options object.

See Also:
Constant Field Values

OUTPUT_TOLERATE_ERRORS

static final String OUTPUT_TOLERATE_ERRORS
The config property name ("mapreduce.marklogic.output.content.tolerateerrors") which, if set, specifies whether to tolerate insertion errors and make sure all successful inserts are committed.

See Also:
Constant Field Values

DEFAULT_OUTPUT_XML_REPAIR_LEVEL

static final String DEFAULT_OUTPUT_XML_REPAIR_LEVEL
Default output XML repair level

See Also:
Constant Field Values

DEFAULT_CONTENT_TYPE

static final String DEFAULT_CONTENT_TYPE
Default content type.

See Also:
Constant Field Values

BATCH_SIZE

static final String BATCH_SIZE
The config property name ("mapreduce.marklogic.output.batchsize") which, if set, indicates the number of records in one request. Optional. Currently only applies to ContentOutputFormat.

See Also:
Constant Field Values

DEFAULT_BATCH_SIZE

static final int DEFAULT_BATCH_SIZE
Default batch size.

See Also:
Constant Field Values

TXN_SIZE

static final String TXN_SIZE
The config property name ("mapreduce.marklogic.output.transactionsize") which, if set, indicates the number of requests in one transaction. Optional.

See Also:
Constant Field Values

MarkLogic Connector for Hadoop 1.1-3

Copyright © 2013 MarkLogic Corporation. All Rights Reserved.

Complete online documentation for MarkLogic Server, XQuery and related components may be found at developer.marklogic.com