[MarkLogic Dev General] Converting MS Office documents
Pete.Aven at marklogic.com
Fri Mar 27 03:50:29 PDT 2015
That should work. I just tried on 8.0-1.1 on Windows and got the expected results.
If you're using CPF. Then you want to confirm you have the following pipelines enabled:
Status Change Handling
Office OpenXML Extract
For Office 2007 and greater (docs ending with a .docx, .pptx. .xlsx extension) the file format is XML, and so you can unzip the contents and work with the native OpenXML Format directly once you've extracted the contents using the Office OpenXML Extract pipeline.
Once inserted, the original doc will be saved in MarkLogic as:
/myDoc/UtilizationReport_xlsx //the original doc
Once this original doc processed by Office OpenXML Extract, you should see the extracted parts in MarkLogic as well :
/myDoc/UtilizationReport_xlsx_parts //with a bunch of .xml here in SpreadsheetML format
The cpf state on the .xlsx will be: http://marklogic.com/states/extracted
If you already have those 2 pipelines enabled, you may want to disable others to see if you can get the expected results to insure no pipelines are conflicting with each other in their attempt to process the document.
Hope this helps,
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Javier Lizarraga
Sent: Thursday, March 26, 2015 7:51 PM
To: General at developer.marklogic.com
Subject: [MarkLogic Dev General] Converting MS Office documents
I want to load an MS excel file with filename.xlsx into a MarkLogic database (using ML8). I want to be able to access the contents of the MS excel document.
I enabled the triggers for the database and installed and enabled the Content Processing. I followed the ML document below:
"uri" : "/myDoc/UtilizationReport.xlsx",
"permissions" : xdmp.defaultPermissions()
When I load my UtilizationReport.xlsx file I can see the associated properties in Query Console:
<?xml version="1.0" encoding="UTF-8"?>
It appears to me that it was successful but I do not see any other associated documents besides the UtilizationReport.xlsx file reference.
I was expecting to see:
UtilizationReport.xlsx (Original Document)
A Directory called UtilizationReport_xlsx_Parts
I don't see any errors. Any help would be greatly appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the General