[MarkLogic Dev General] Require suggestions to load and search word docs

Sukhendra Rai sukhendra.rai at globallogic.com
Wed Jun 13 05:22:02 PDT 2007


Hi,

 

I am familiarizing my self with Mark Logic Server and XQuery. 

I have to store (load) word documents in the server. 

I want to search these documents for particular keywords. 

 

I request for suggestions to find out the best way to load and search
these documents in MarkLogic Server.

 

Going through the developer guide chapter 11, I found three formats XML,
binary and text. I used xdmp:document-load to load the doc files. If I
try to use XML or text in <format> parameter of xdmp:document-load, a
error is generate stating that "my document is not in the UTF-8 format
while it works fine with binary format. In my opinion, word document
stored in the binary format can not be searched efficiently.
xdmp:document-load does not seems to be automatically converting the
document from any other type to XML format. Is there any function does
this?

 

I found the xdmp:word-convert
<file:///C:\Documents%20and%20Settings\sukhendra.rai\Desktop\markLogic\M
arkLogic_3.2_pubs\pubs\apidocs\Document-Conversion.html#word-convert>
function to convert the word document in XHTML format. If I need to
store the doc files in XHTML for better searching performance should I
need to first convert and then store them in the server?

 

Thanks,

Sukhendra Rai

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20070613/e8541ed0/attachment.html


More information about the General mailing list