[MarkLogic Dev General] mlcp error

Jakob Fix jakob.fix at gmail.com
Thu Oct 31 16:01:53 PDT 2013


Hi,

we've run into something we think might be a bug with the most recent
version of mlcp. We did an export of a database with XML documents and lots
of binary documents, and an import of the exported data into another
database.  In the second step of the procedure, i.e. the import into the
new database, the error below appeared (the line with Archive damaged ...).
Apparently, mlcp stores XML and binary documents in different zip files.
Also, each binary document gets its metadata document. In our case, the
export created two zip files containing the binaries. For some reason, in
the case of one document, the actual binary file and its metadata file were
separated, as shown below:

20131031140432+0100-000001-BINARY.zip ==> RO-GE_DTC.pdf.metadata****

20131031140432+0100-000002-BINARY.zip ==> RO-GE_DTC.pdf****

**

which seems to have caused the error below. The PDF file is indeed not
loaded into the database.

Reuniting the PDF file with its metadata equivalent in the same binary zip
file made the import procedure run without errors.

thanks,
Jakob.



**

marklogic-contentpump-1.0.3\bin\mlcp.bat EXPORT -host 192.168.56.90 -port
50000 -username abc  -password abc  -output_type archive -output_file_path
db-prod-20131031****

** **

marklogic-contentpump-1.0.3\bin\mlcp.bat IMPORT -host 192.168.56.90 -port
40100 -username abc -password abc -input_file_path db-prod-20131031
-input_file_type archive****

** **

13/10/31 14:09:54 INFO contentpump.LocalJobRunner: Content type: XML****

13/10/31 14:09:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=****

13/10/31 14:09:54 INFO input.FileInputFormat: Total input paths to process
: 3****

13/10/31 14:09:55 ERROR contentpump.ArchiveRecordReader: Archive damaged:
no/incorrect metadata for /content/assets/agreements/RO-GE_DTC.pdf in
/D:/Projects/EOI/deployment/mlcp/eoi-db-prod-20131032/20131031140432+0100-000002-BINARY.zip

**

13/10/31 14:09:55 ERROR contentpump.LocalJobRunner: Error running task:
attempt__0000_m_000001_0****

java.lang.NullPointerException****

        at
com.marklogic.contentpump.DatabaseContentWriter.write(DatabaseContentWriter.java:231)
****

        at
com.marklogic.contentpump.DatabaseContentWriter.write(DatabaseContentWriter.java:58)
****

        at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
****

        at
com.marklogic.contentpump.DocumentMapper.map(DocumentMapper.java:46)****

        at
com.marklogic.contentpump.DocumentMapper.map(DocumentMapper.java:32)****

        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)****

        at
com.marklogic.contentpump.LocalJobRunner$LocalMapTask.call(LocalJobRunner.java:375)
****

        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)****

        at java.util.concurrent.FutureTask.run(Unknown Source)****

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
****

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)****

        at java.lang.Thread.run(Unknown Source)****

13/10/31 14:09:56 INFO contentpump.LocalJobRunner:  completed 0%****

13/10/31 14:14:42 INFO contentpump.LocalJobRunner:  completed 33%****

13/10/31 14:15:41 WARN contentpump.DatabaseContentWriter: SEC-PERMDENIED:
Permission denied****

13/10/31 14:18:27 INFO contentpump.LocalJobRunner:  completed 66%****

13/10/31 14:18:27 INFO contentpump.LocalJobRunner:
com.marklogic.contentpump.ContentPumpStats:****

13/10/31 14:18:27 INFO contentpump.LocalJobRunner:
ATTEMPTED_INPUT_RECORD_COUNT: 20230****

13/10/31 14:18:27 INFO contentpump.LocalJobRunner:
SKIPPED_INPUT_RECORD_COUNT: 0****

13/10/31 14:18:27 INFO contentpump.LocalJobRunner: Total execution time:
512 sec****

** **
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20131101/f5d9a2e1/attachment.html 


More information about the General mailing list