[MarkLogic Dev General] MLCP input query

Ganesh Vaideeswaran Ganesh.Vaideeswaran at marklogic.com
Sun Jan 31 20:10:11 PST 2016


David,

Curious .. what input are your offering in this thread? Is it that they should write their own layer to get the uris and pass that to mlcp?

Ganesh

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of David Lee
Sent: Sunday, January 31, 2016 6:27 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] MLCP input query

Since mlcp.bat (and .sh)  is simply a small front end to  'java ...'
--- essentially this:  -- taken from an 8.0 release

java -cp "%classpath%" -DCONTENTPUMP_HOME="%LIB_HOME%" -Dfile.encoding="UTF-8" %JVM_OPTS% com.marklogic.contentpump.Cont
entPump %*

----

There will always be issues with quoting and escaping not just from the user's invocation but from inside the .bat or .sh file itself and from 'up the call chain' from the users's side - even with simple values.  This gets tedious and error prone very fast.
Until MLCP natively provides for alternate syntax ( as you noted XQSync does and many common programs eventually lead to the same problem/solution) -- It should be a fairly simple and perhaps enjoyable project for someone to write a simple front end in whatever language they like the adds these additional parameters and then calls java with everything properly escaped.

A JVM based language could do so by directly calling com.jarklogic.contentpump.ContentPump.main passing in the String[] without any escaping or quoting needed.
A language that can do an 'exec' could call 'java.exe' and pass the arguments similarly.   Some OS's (like Windows) and some languages do require some kinds of quoting to do this correctly -- but much less than going through multiple layers of 'shell' interpreter's expansion.

In the specific case, since the URI's are the result of a query it would require a round trip to get the list of URIs (of unknown size) then re-invoke mlcp ... excactly where that is processed and how could make a big difference.  Avoiding multiple invocations of the JVM might lead one to a JVM based implementation -- which could conceivably do a front-end Map/Reduce of its own and call multiple mlcp instances from the same process in different threads.

I suspect existing tools could provide much of this 'out of the box' ... like xproc, gradle, xmlsh, jython, jruby, node etc.

Note that even reading parameters 'directly' from a file still requires some kind of parsing, quoting, escaping, character set and encoding transcoding -- but usually much less than a general purposes command processor.
Its not uncommon for programmers to attempt to avoid making up yet another syntax by using a 'standard' format for the config files (like JSON, XML, INI, YAML, CSV ) ... sometimes the config files can be as complicated to create correctly as a sh or cmd.exe batch script.






From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Prasanth N V R
Sent: Sunday, January 31, 2016 5:18 PM
To: MarkLogic Developer Discussion <general at developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] MLCP input query

Thanks Peter.

It worked finally!
mlcp.bat copy -input_host localhost -input_port 5767 -input_username admin -input_password admin ^
-output_host localhost -output_port 5767 -output_username admin -output_password admin ^
-input_database samplestack-modules ^
-output_database Documents ^
-query_filter "<cts:directory-query depth=\"infinity\" xmlns:cts=\"http://marklogic.com/cts\<http://marklogic.com/cts/>"><cts:uri>/Default/samplestack/rest-api/options/</cts:uri></cts:directory-query>"

Thanks all for the timely help!

Kind feedback -
In the mlcp -query_filter, giving double quotes and escaping it in a real time(large queries) would be tedious.
So, MarkLogic MLCP can provide an additional option much similar to the XQSync INPUT_QUERY which takes only uris as input and process it in parallel threads.
It would be really great if this is implemented in MLCP.

Thanks,
Prasanth

On Thu, Jan 28, 2016 at 4:26 AM, Peter Kester <Peter.Kester at marklogic.com<mailto:Peter.Kester at marklogic.com>> wrote:
This will also work on the command line
:
-query_filter "<cts:directory-query depth=\”infinity\" xmlns:cts=\"http://marklogic.com/cts\"><cts:uri>/Default/samplestack/rest-api/options/</cts:uri></cts:directory-query>”

You need to put double quotes around the cts query and escape all double quotes inside




From: <general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> on behalf of Prasanth N V R
Reply-To: MarkLogic Developer Discussion
Date: Thursday 28 January 2016 00:16
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] MLCP input query

I just tried the same.
mlcp.bat copy -input_host localhost -input_port 5767 -input_username admin -input_password admin ^
-output_host localhost -output_port 5767 -output_username admin -output_password admin ^
-input_database Documents ^
-output_database Documents ^
-query_filter 'cts:element-value-query(xs:QName("Type"),"TuchtrechtelijkeInstantie")'

But still getting the error.
ERROR mapreduce.MarkLogicInputFormat: com.marklogic.xcc.exceptions.XQueryException: XDMP-UNEXPECTED: (err:XPST0003) Unexpected token syntax error, unexpected QName_, expecting Rpar_ [Session: user=admin, cb=Documents [ContentSource: user=admin, cb=Documents

Please help me to solve this.

Thanks,
Prasanth

On Tue, Jan 26, 2016 at 12:34 PM, Peter Kester <Peter.Kester at marklogic.com<mailto:Peter.Kester at marklogic.com>> wrote:
You could try this:
-query_filter 'cts:element-value-query(xs:QName("Type"),"TuchtrechtelijkeInstantie”)'
This will be expanded to
cts:search(/,cts:element-value-query(xs:QName("Type"),"TuchtrechtelijkeInstantie”))

Regards,

Peter


From: <general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> on behalf of Prasanth N V R
Reply-To: MarkLogic Developer Discussion
Date: Tuesday 26 January 2016 14:25

To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] MLCP input query

Thanks Peter.

-directory_filter works for me. It is able to copy documents based on this.

In general, I am curious to know how to send the uris as INPUT_QUERY similar to how we do in XQSync.

In XQSync, the INPUT_QUERY will be given as - cts:uris((),(),cts:element-value-query(xs:QName("bucket"),"samplequestion")).
This takes all the uris matching this query and processed.

But I do not know how to achieve this in MLCP.

If you suggest some working example it would be great.

On Tue, Jan 26, 2016 at 3:16 AM, Peter Kester <Peter.Kester at marklogic.com<mailto:Peter.Kester at marklogic.com>> wrote:
Hi Prasanth,

Just checked the docs:

You need to quote the cts:query
Sou your query_filter needs to be:
-query_filter 'cts:directory-query("/Default/samplestack/rest-api/options/","infinity”)’ according to the doc, but you would probably be better off with the directory_filter option if you just want to select documents from a given directory inside ML.
Check this section of the documentation: https://docs.marklogic.com/guide/mlcp/export#id_47556

HTH

Peter


From: <general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> on behalf of Prasanth N V R
Reply-To: MarkLogic Developer Discussion
Date: Tuesday 26 January 2016 04:10
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] MLCP input query

Thanks for the reply Peter.

I tried giving it as cts:query(in xml format).
-query_filter '<cts:directory-query depth="infinity" xmlns:cts="http://marklogic.com/cts"><cts:uri>/Default/samplestack/rest-api/options/</cts:uri></cts:directory-query>'

But it is throwing error like - < was unexpected at this time.

In general, I am curious to know how to send the uris as INPUT_QUERY similar to how we do in XQSync.

In XQSync, the INPUT_QUERY will be given as - cts:uris((),(),cts:directory-query("/Default/samplestack/rest-api/options/","infinity")).
This takes all the uris matching this query and processed.

But I do not know how to achieve this in MLCP.

Thanks,
Prasanth

On Mon, Jan 25, 2016 at 2:58 AM, Peter Kester <Peter.Kester at marklogic.com<mailto:Peter.Kester at marklogic.com>> wrote:
Hi Prasanth,

Try making that cts:directory-query an xml representation.
Like this:
-query-filter <cts:directory-querydepth="infinity"xmlns:cts="http://marklogic.com/cts">
<cts:uri>
/Default/samplestack/rest-api/options/
</cts:uri>
</cts:directory-query>

Doc says:

-query_filter string

Specifies a query to apply when selecting documents for export. The argument must be the XML serialization of a cts:query or JSON serialization of a cts.query. Only documents matching the query are considered for export; false positives are possible. For details, seeControlling What is Exported, Copied, or Extracted<https://docs.marklogic.com/guide/mlcp/export#id_47556>.

HTH.

Peter

From: <general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> on behalf of Prasanth N V R
Reply-To: MarkLogic Developer Discussion
Date: Monday 25 January 2016 04:59
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] MLCP input query

Thanks for you reply Christopher.

Yes. I tried that too. But I am getting error.

mlcp.bat copy -input_host localhost -input_port 5767 -input_username admin -input_password admin ^
-output_host localhost -output_port 5767 -output_username admin -output_password admin ^
-input_database samplestack-modules ^
-output_database Documents ^
-query_filter cts:directory-query("/Default/samplestack/rest-api/options/","infinity")

ERROR mapreduce.MarkLogicInputFormat: com.marklogic.xcc.exceptions.XQueryException: XDMP-DOCROOTTEXT: xdmp:unquote("cts:directory-query(/Default/samplestack/rest-api/options/,infin...") -- Invalid root text "cts:directory-query(/Default/samplestack/rest-api/options/,infinity)" at  line 1
 [Session: user=admin, cb=samplestack-modules [ContentSource: user=admin, cb=samplestack-modules

Thanks,
Prasanth

On Sun, Jan 24, 2016 at 7:27 PM, Christopher Hamlin <cbhamlin at gmail.com<mailto:cbhamlin at gmail.com>> wrote:
It looks to me like you are sending in a search, not a query.

So maybe try

-query_filter cts:directory-query("/Default/samplestack/rest-api/options/","infinity")

instead.


On Sun, Jan 24, 2016 at 7:22 PM, Prasanth N V R <prasanth.nvr04 at gmail.com<mailto:prasanth.nvr04 at gmail.com>> wrote:
Hi,

I am trying to copy documents from one DB to another DB using MLCP.

Here is my command(running in Windows)
mlcp.bat copy -input_host localhost -input_port 5767 -input_username admin -input_password admin ^
-output_host localhost -output_port 5767 -output_username admin -output_password admin ^
-input_database samplestack-modules ^
-output_database Documents ^
-query_filter cts:search(doc(),cts:directory-query("/Default/samplestack/rest-api/options/","infinity"))

But am getting error when i execute the above command.
ERROR mapreduce.MarkLogicInputFormat: com.marklogic.xcc.exceptions.XQueryException: XDMP-DOCROOTTEXT: xdmp:unquote("cts:search(doc(),cts:directory-query(/Default/samplestack/rest-a...") -- Invalid root text "cts:search(doc(),cts:directory-query(/Default/samplestack/rest-api/options/,infinity))" at  line 1
 [Session: user=admin, cb=samplestack-modules [ContentSource: user=admin, cb=samplestack-modules


How can I pass a query in command line to select matching documents?

Thanks,
Prasanth

_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20160201/ba5ea611/attachment-0001.html 


More information about the General mailing list