[MarkLogic Dev General] Require suggestions to load and search word docs

Sukhendra Rai sukhendra.rai at globallogic.com
Wed Jun 13 05:22:02 PDT 2007



I am familiarizing my self with Mark Logic Server and XQuery. 

I have to store (load) word documents in the server. 

I want to search these documents for particular keywords. 


I request for suggestions to find out the best way to load and search
these documents in MarkLogic Server.


Going through the developer guide chapter 11, I found three formats XML,
binary and text. I used xdmp:document-load to load the doc files. If I
try to use XML or text in <format> parameter of xdmp:document-load, a
error is generate stating that "my document is not in the UTF-8 format
while it works fine with binary format. In my opinion, word document
stored in the binary format can not be searched efficiently.
xdmp:document-load does not seems to be automatically converting the
document from any other type to XML format. Is there any function does


I found the xdmp:word-convert
function to convert the word document in XHTML format. If I need to
store the doc files in XHTML for better searching performance should I
need to first convert and then store them in the server?



Sukhendra Rai


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20070613/e8541ed0/attachment.html

More information about the General mailing list