[MarkLogic Dev General] Require suggestions to load and search worddocs

venkatesh.sheshgiri at wipro.com venkatesh.sheshgiri at wipro.com
Wed Jun 13 05:32:24 PDT 2007

Hi Sorabh,
You need to get a CPF(Content Processing Framework) License key to
convert any MS Office doc or PDF doc to XML.
Then you can write your search query(check out cts:search api) based on
the schema (Normally it is docbook)of these XML.
Venkatesh M S


From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Sukhendra
Sent: Wednesday, June 13, 2007 5:52 PM
To: general at developer.marklogic.com
Cc: Sorabh Jerath
Subject: [MarkLogic Dev General] Require suggestions to load and search



I am familiarizing my self with Mark Logic Server and XQuery. 

I have to store (load) word documents in the server. 

I want to search these documents for particular keywords. 


I request for suggestions to find out the best way to load and search
these documents in MarkLogic Server.


Going through the developer guide chapter 11, I found three formats XML,
binary and text. I used xdmp:document-load to load the doc files. If I
try to use XML or text in <format> parameter of xdmp:document-load, a
error is generate stating that "my document is not in the UTF-8 format
while it works fine with binary format. In my opinion, word document
stored in the binary format can not be searched efficiently.
xdmp:document-load does not seems to be automatically converting the
document from any other type to XML format. Is there any function does


I found the xdmp:word-convert
function to convert the word document in XHTML format. If I need to
store the doc files in XHTML for better searching performance should I
need to first convert and then store them in the server?



Sukhendra Rai


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20070613/cffc6507/attachment.html

More information about the General mailing list