[MarkLogic Dev General] Importing xml with unpredictable encoding
Geert Josten
Geert.Josten at daidalos.nl
Wed Mar 25 13:37:11 PST 2009
Hi Danny,
Are there ways to pre-read the document as a string or binary (from Xquery), get the encoding from the declaration by using straigh forward functions, and use that as the value for the encoding option to a call to xdmp:document-get to read the document with the correct encoding?
I could pre-parse the files outside MarkLogic Server, or rely on things like MLJAM, but I would prefer not needing to.
Has it been considered to do support the xml declaration for this purpose, for instance when the xdmp:document-get was called without an explicit encoding option? If not, would you be willing to consider such addition? I really think it would improve the value.
Kind regards,
Geert
> -----Original Message-----
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of
> Danny Sokolsky
> Sent: woensdag 25 maart 2009 16:43
> To: General Mark Logic Developer Discussion
> Subject: RE: [MarkLogic Dev General] Importing xml with
> unpredictable encoding
>
> Hi Geert,
>
> You can specify the encoding with the <encoding> option to
> xdmp:document-get or xdmp:document-load. You do have to know
> the encoding though--it will not use an encoding in a header
> of the document on its own, and will default to UTF-8.
>
> -Danny
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of
> Geert Josten
> Sent: Wednesday, March 25, 2009 6:07 AM
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] Importing xml with
> unpredictable encoding
>
> Hi,
>
> Is it correct that the MarkLogic built-in functions
> xdmp:document-load and xdmp:document-get do not respect the
> encoding specification in the XML declaration? They expect
> UTF-8 by default and otherwise try to consume the file with
> the encoding specified in the options. Is there a way to
> anticipate on the encoding in the XML declaration?
>
> I tried using something like xdmp:filesystem-file and (rather
> ugly) try parsing the string with string functions, but it
> chokes with the message that the string contains a bad
> codepoint (SVC-BAD: ... -- Bad CodepointIterator::_next).
>
> Any ideas?
>
> Kind regards,
> Geert
>
>
> Drs. G.P.H. Josten
> Consultant
>
>
> http://www.daidalos.nl/
> Daidalos BV
> Source of Innovation
> Hoekeindsehof 1-4
> 2665 JZ Bleiswijk
> Tel.: +31 (0) 10 850 1200
> Fax: +31 (0) 10 850 1199
> http://www.daidalos.nl/
> KvK 27164984
> De informatie - verzonden in of met dit emailbericht - is
> afkomstig van Daidalos BV en is uitsluitend bestemd voor de
> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen,
> verzoeken wij u het te verwijderen. Aan dit bericht kunnen
> geen rechten worden ontleend.
>
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
More information about the General
mailing list