[MarkLogic Dev General] Treating elements as byte strings

Lee, David dlee at epocrates.com
Thu Oct 7 12:18:59 PDT 2010


Asking for the "Size of XML" is like asking for the "Size of a 3x5
photo".
It's basically meaningless until you pin down a specific instance of a
serialization.
I don't know enough about your specs to know if its useful to go back
and argue against their validity or not.
But I can say if you expect XML to go into any kind of transition from
Text to (???) back to Text, you cannot
guarantee a byte wise equivalence even when maintaining semantic
identity.  There are so many things that are semantically equivalent
but are not textually equivalent at many different layers and conforming
XML processors generally only need to respect the semantics.

If you need the text format stored and retrieved unchanged then I
suggest storing (a copy of?) the XML as a text document.
But that does limit the application ... for example if you edit the
document you will need to regenerate the text format then
all bets are off.


----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
dlee at epocrates.com
812-482-5224





-----Original Message-----
From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Karl
Erisman
Sent: Thursday, October 07, 2010 2:27 PM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] Treating elements as byte strings

I would like to take an element node and treat part of it as a string
in the same way it was originally declared (lexical equivalence, not
just semantic equivalence).  Here is an example that does NOT do what
I want:

declare namespace ns="namespace";
let $elem := <xml><ns:xml>hi</ns:xml></xml>
return xdmp:quote($elem/*)

=> <ns:xml xmlns:ns="namespace">hi</ns:xml>

This returns a string representing semantically equivalent XML, but it
differs lexically from the original.

After $elem is stored as an element node, only its tree structure is
stored, correct?  So the only way for me to do what I'm describing
would be for *me* to save the string form of the element at the time
it is declared.  Is this correct?

BTW: As background, the reason I need to do this is to comply with a
spec that requires computing the "size" of incoming data, which may or
may not be XML (and the "size" is specific to the way the XML is
declared -- it is lexically significant).  The data is sent as part of
a larger XML element, and by the time it arrives at the module
responsible for checking the size, it is already in XML.  This is fine
for text nodes (fn:string-length gives the "size"), but not for
element nodes.  If my understanding is correct, I'll need to make
modifications to lower-level modules so the original XML is available.

Thanks,
Karl
_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


More information about the General mailing list