[MarkLogic Dev General] Problem while using xdmp:tidy

Rachit Rampal rachit.rampal at nagarro.com
Mon Jan 11 20:43:52 PST 2016


I have a piece of content which is neither a valid HTML or XML in my legacy database. Considering the fact, it would be difficult to clean the legacy, I want to tidy this up in MarkLogic(version 8.0-3) using xdmp:tidy.
The content looks like :
          [cid:image002.png at 01D14CC7.BC7F7E90]
Please find the attached query I'm executing on ML QConsole to tidy this up.

The problem here is that the response I'm getting after applying tidy functionality is not a valid XML(verified it via XML validator). Also when I try to insert document with the resulted xml body via POSTMAN or RESTClient, it throws an error saying 'MALFORMED BODY | Invalid Processing Instruction names'.

Response XML :
          [cid:image001.png at 01D14D20.C3737130]

My expectation is, that the Marklogic Tidy functionality should rather refrain to tidy-up this type of content and throw an error, which it does not do in the current scenario. If I get the error from the Marklogic Tidy itself, I will rather get this dirty or bad data removed from the legacy database.

Please help me to get through this problem or suggest me workaround to get this resolved.

Things Tried So Far
I have tried various options listed out in xdmp:tidy but it didn't help me much. Also I investigated on the Processing Instructions but couldn't find a way through as it doesn't looks like a valid PI either

Kind Regards,
Rachit Rampal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20160112/4ad79347/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 3927 bytes
Desc: image002.png
Url : http://developer.marklogic.com/pipermail/general/attachments/20160112/4ad79347/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 2623 bytes
Desc: image001.png
Url : http://developer.marklogic.com/pipermail/general/attachments/20160112/4ad79347/attachment-0003.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tidy-query.xqy
Type: application/octet-stream
Size: 402 bytes
Desc: tidy-query.xqy
Url : http://developer.marklogic.com/pipermail/general/attachments/20160112/4ad79347/attachment-0001.obj 

More information about the General mailing list