[MarkLogic Dev General] using XQuery on Word documents
Yves Dolce
yvesdolc at hotmail.com
Tue Dec 11 13:10:05 PST 2007
Right, Danny.
First I enabled the URI Lexicon and fire the query you suggested. It did not work but others did!
Thanks for the help!!
FWIW, the error I got is related to date format, I guess:
XDMP-COMPARE: <Date>2008-10-01T00:00:00</Date> gt xs:date("2008-01-01") -- Items not comparable: xs:date("2008-01-01") lt xdt:untypedAtomic("2008-10-01T00:00:00")> --------------------------------------------------> From: "Yves Dolce (hotmail.com)" <YvesDolc at hotmail.com>> Sent: Monday, December 10, 2007 4:31 PM> To: "General Mark Logic Developer Discussion" > <general at developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] using XQuery on Word documents> > > Thanks Danny. Just by looking at the syntax, I'm pretty this is what I > > want. I'll try this tomorrow and will confirm. Thanks again.> >> > --------------------------------------------------> > From: "Danny Sokolsky" <dsokolsky at marklogic.com>> > Sent: Monday, December 10, 2007 4:06 PM> > To: "General Mark Logic Developer Discussion" > > <general at developer.marklogic.com>> > Subject: RE: [MarkLogic Dev General] using XQuery on Word documents> >> >> Hi Yves,> >>> >> To do this efficiently, it is very helpful to have a URI lexicon. A URI > >> lexicon gives you very fast access the URI of every document in the > >> database. You enable the URI lexicon in the Admin Interface database > >> config page for your database.> >>> >> Once you have the URI lexicon created (and reindexing has completed), you > >> can do something like this to get what you want:> >>> >> for $x in cts:uris()> >> where fn:ends-with($x, ".docx") and> >> xdmp:zip-get(doc($x), "customXml/item1.xml")/Customer/Date> >> gt xs:date("2008-01-01")> >> return> >> $x> >>> >> If you do it without the URI lexicon, you will probably need to do it in > >> batches, because to get the URIs you need to first fetch the document and > >> then do xdmp:node-uri to find its URI. This can effectively attempt to > >> put the entire database in memory, and you therefore would probably need > >> to do it in batches without the URI lexicon.> >>> >> If you have a lot of docx's in your database, you still probably want to > >> do this in batches.> >>> >> Is this what you were looking for?> >>> >> -Danny> >>> >> From: general-bounces at developer.marklogic.com > >> [mailto:general-bounces at developer.marklogic.com] On Behalf Of Yves Dolce> >> Sent: Monday, December 10, 2007 2:27 PM> >> To: general at developer.marklogic.com> >> Subject: [MarkLogic Dev General] using XQuery on Word documents> >>> >> This is a question that will have a simple answer. If only I knew more > >> about XQuery...> >>> >> If I run the following line in CQ:> >> xdmp:zip-get(doc("Contract.docx"), "customXml/item1.xml")> >>> >> I get:> >> <Customer>> >> <Date>2008-11-15T00:00:00</Date>> >> <CompanyName>Bebop Corporation</CompanyName>> >> <FirstName>Erick</FirstName>> >> <LastName>Trojan</LastName>> >> <SSN>1111-22-3333</SSN>> >> <Address>Av. Revolucion 841, DF, CP 03910, Mexico</Address>> >> <ContactTitle>Test Manager</ContactTitle>> >> <Phone>+52 (55) 6666-66666</Phone>> >> </Customer>> >>> >> How should I express a query that essentially says: for each docx file in > >> the DB, get me its customXml/item1.xml part, if it has one, and the > >> <Date> element in it is greater than 1/1/2008.> >>> >> Does my question make sense? Thanks!> >> _______________________________________________> >> General mailing list> >> General at developer.marklogic.com> >> http://xqzone.com/mailman/listinfo/general> >>> > _______________________________________________> > General mailing list> > General at developer.marklogic.com> > http://xqzone.com/mailman/listinfo/general> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20071211/70e526b4/attachment.html
More information about the General
mailing list