[MarkLogic Dev General] document format
Mike Sokolov
sokolov at ifactory.com
Mon Mar 31 10:56:20 PST 2008
Which very cleverly relies on ($doc/node() instance of text()) failing
for documents with multiple root nodes (maybe a bit too clever as an
example, but works...)
thanks!
Michael Blakeley wrote:
> Yes, that's correct. Here's another take on the XQuery:
>
> let $doc := doc($uri)
> return
> if ($doc/binary()) then 'binary'
> else if (doc/node() instance of text()) then 'text'
> else 'xml'
>
> Note that this test will treat empty documents (ie, uri does not
> exist) as xml. You could test for that case via empty(), or just 'if
> ($doc)...'
>
> -- Mike
>
> Mike Sokolov wrote:
>> OK I am pursuing a solution along those general lines. Just out of
>> curiosity though: does this mean that internally there is no
>> distinction between xml documents and text documents and binary
>> documents? It sounds as if text documents are simply documents that
>> happen to have a single text node (and same for binary) - is that right?
>>
>> -Mike
>>
>> Danny Sokolsky wrote:
>>> Mike,
>>>
>>> I think your approach is the right idea, only it needs a little more
>>> logic to be more robust. If you took the last() instead of the
>>> first in
>>> your node-kind test, that might work most of the time (or more often):
>>>
>>> node-kind(doc($uri)/node()[last()])
>>>
>>> Here is a similar idea using the instance of operator, performing a
>>> little logic to make a best-guess at the type:
>>>
>>> define function doctype($x as node()) as element()
>>> {
>>> <node>
>>> <uri>{xdmp:node-uri($x)}</uri>
>>> <type>{
>>> if ($x/node() instance of binary())
>>> then ("binary node") else if ( $x/node() instance of element() )
>>> then ("XML node")
>>> else if ( $x/node() instance of text() )
>>> then "text node"
>>> else "not sure"
>>> }</type>
>>> </node>
>>> }
>>>
>>> for $x in doc()[1 to 100]
>>> return doctype($x)
>>>
>>> I have not found any of my documents that return "not sure" here, but I
>>> can imagine that you might be able to construct one.
>>>
>>> -Danny
>>>
>>> -----Original Message-----
>>> From: general-bounces at developer.marklogic.com
>>> [mailto:general-bounces at developer.marklogic.com] On Behalf Of Mike
>>> Sokolov
>>> Sent: Monday, March 31, 2008 10:34 AM
>>> To: General Mark Logic Developer Discussion
>>> Subject: [MarkLogic Dev General] document format
>>>
>>> I have been trying to come up with a way to determine the "format"
>>> of a document in MarkLogic. The only api call that seems directly
>>> related is xdmp:document-uri-format, but this seems to operate on
>>> the uri without any reference to the contents of a document.
>>> Instead, I tried testing:
>>>
>>> node-kind(doc($uri)/node()[1])
>>>
>>>
>>> but we just found an XML document for which this returns "text" -
>>> apparently it has a BOM at the start, so the document node has two
>>> child
>>>
>>> nodes: one text (containing the BOM) and one element (the root
>>> element).
>>>
>>> Presumably there could be comments there too and processing
>>> instructions, so this strategy is clearly flawed.
>>>
>>> Does anybody have a good way to determine whether a document in Mark
>>> Logic is an XML document, a text document or a binary document?
>>>
>>> -Mike
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>>>
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
More information about the General
mailing list