[MarkLogic Dev General] Is xdml:unquote appropriate for handling accent characters?

Kari Cowan KCowan at alm.com
Thu Feb 9 09:00:06 PST 2017


With that - QConsole returns correct view visually as:
<title>VOIR DIRE: Pokémon Drive?</title>

But on the RSS/XML page – It shows an error:
XML Parsing Error: not well-formed
VOIR DIRE: Pokmon Drive?
---------------------^

I am setting encoding as UTF-8 – if that is relevant.
<?xml version="1.0" encoding="UTF-8"?>

I can prevent that error if I do fn:escape-html-uri($Str)
Then it returns as Pok%C3%A9mon

So clearly, I am mis-using fn:escape-html-uri() – it’s not intended for what I was trying but I am just trying to prevent the xml from being malformed.

I can’t seem to make it display without the XML Parsing Error.

Any other suggestion I might try?


From: <general-bounces at developer.marklogic.com> on behalf of Indrajeet Verma <indrajeet.verma at gmail.com>
Reply-To: MarkLogic <general at developer.marklogic.com>
Date: Wednesday, February 8, 2017 at 10:27 PM
To: MarkLogic <general at developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] Is xdml:unquote appropriate for handling accent characters?

declare namespace xhtml = "http://www.w3.org/1999/xhtml";

declare function do:makeXMLsafe( $Str as xs:string ) {
let $xhtml-node:=xdmp:tidy($Str, <options xmlns="xdmp:tidy"><output-xhtml>yes</output-xhtml>
                  </options>)[2]/xhtml:html/xhtml:body/node()
 return $xhtml-node
};

On Thu, Feb 9, 2017 at 3:55 AM, Kari Cowan <KCowan at alm.com<mailto:KCowan at alm.com>> wrote:
Thanks Indy – how is that meant to flow, something like this?

declare namespace xhtml = "http://www.w3.org/1999/xhtml";

declare function do:makeXMLsafe( $Str as xs:string ) {
 let $Str:=fn:escape-html-uri($Str)
let $Str:=xdmp:tidy($Str, <options xmlns="xdmp:tidy"><output-xhtml>yes</output-xhtml>
                  </options>)[2]/xhtml:html/xhtml:body/node()
 return $Str
};

From: <general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> on behalf of Indrajeet Verma <indrajeet.verma at gmail.com<mailto:indrajeet.verma at gmail.com>>
Reply-To: MarkLogic <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Date: Wednesday, February 8, 2017 at 10:28 AM

To: MarkLogic <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Is xdml:unquote appropriate for handling accent characters?

See if this works for you.

declare namespace xhtml = "http://www.w3.org/1999/xhtml";
xdmp:tidy($Str, <options xmlns="xdmp:tidy"><output-xhtml>yes</output-xhtml>
                  </options>)[2]/xhtml:html/xhtml:body/node()

Regards,
Indy

On Wed, Feb 8, 2017 at 11:40 PM, Kari Cowan <KCowan at alm.com<mailto:KCowan at alm.com>> wrote:
I guess I can make it palatable with the function I added below – then have them unfurl it on the front end.   When I pulled actual doc source – even though ‘Pokémon’ displayed in Qconsole, it was actually encoded as &egrave;

declare function do:makeXMLsafe( $Str as xs:string ) {
 let $Str:=fn:escape-html-uri($Str)
 return $Str
};

>> changes ‘Pokémon’ to ‘Pok%C3%A9mon’

Is there any better way to deal with it?


From: <general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>> on behalf of Kari Cowan <KCowan at alm.com<mailto:KCowan at alm.com>>
Reply-To: MarkLogic <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Date: Tuesday, February 7, 2017 at 2:34 PM
To: MarkLogic <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Is xdml:unquote appropriate for handling accent characters?

(note outlook stripped out the unknown character below, in the <title> node it was “Pok?mon”


From: Kari Cowan <KCowan at alm.com<mailto:KCowan at alm.com>>
Date: Tuesday, February 7, 2017 at 2:31 PM
To: MarkLogic <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Subject: Is xdml:unquote appropriate for handling accent characters?

The doc contains a node with text including an accent grave, example:

<HEADLINE>VOIR DIRE: Pokémon Drive?</HEADLINE>

I tried to handle it with:
let $theTitle:=xdmp:unquote($theTitle, "", ("repair-full"))

But I still get an output with an unknown character in xml

<title>VOIR DIRE: Pokmon Drive?</title>

>> XML Parsing Error: not well-formed

Anyone have a tip they can share on how to handle it?

_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20170209/0f3963f7/attachment-0001.html 


More information about the General mailing list