[MarkLogic Dev General] mlcp.sh help with filtering to ingest only XML files in zip files

Morales-Martin, Kristina kmorales-martin at cas.org
Mon Jul 13 08:42:35 PDT 2015


Dear all,

We need help in ingesting a directory of many* zip files, each with many* XML files.

We are using the mlcp (Mark Logic Content Pump) out of the box to import content as-is from a directory of zip files.

In particular, we are using these options:
-mode local \
-input_file_path [a directory that has zip files, each zip file has a heterogenous mix of .xml and other binary files] \
-input_compressed true \
-input_file_pattern '.*.xml' \
-output_uri_replace "(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \
...

Can anyone help with the -input_file_pattern option?  Our intent is to only load the .xml files in the zip files in the directory.
We do not want to load other files.  For some reason, the -input_file_pattern is not successfully filtering.
If you have encountered this non-filtering behavior, what have you done to make it work?

Thank you,
Kristina Morales-Martin
Sr. Technical Information Specialist, e-Content Operations
CAS, a division of the American Chemical Society
2540 Olentangy River Road
Columbus, OH 43202
Phone: 614-447-3600, ext. 2322
Fax: 614-447-3827
www.cas.org<http://www.cas.org/>


Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from Chemical Abstracts Service ("CAS"), a division of the American Chemical Society ("ACS"). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20150713/4d730f42/attachment.html 


More information about the General mailing list