[MarkLogic Dev General] loading XML documents with DTDs

Michael Blakeley michael.blakeley at marklogic.com
Tue Jun 19 10:08:08 PDT 2007


Alan,

I'd recommend starting with xdmp:document-load() - 
http://developer.marklogic.com/pubs/3.2/apidocs/UpdateBuiltins.html#document-load

You might also be interested in 
http://developer.marklogic.com/howto/tutorials/2006-06-recordloader.xqy

-- Mike

Alan Darnell wrote:
> I have a number of documents (sample below) in XML format but not UTF-8 
> encoding and with an externally referenced DTD and rendering 
> stylesheet.  What's the best way to get these documents into MarkLogic 
> so that:
> 
> - the encoding is changed to UTF-8
> - any entities in the DTD are resolved to UTF-8 encoded characters
> - any CDATA sections are removed with the content left intact, including 
> markup embedded in the CDATA content
> 
> Do I need to pre-process the files before loading or can Mark logic 
> handle these kinds of conversion as part of the load functions?
> 
> Also, does anyone know of any good strategies for converting math in TeX 
> format to MathML?
> 
> Thanks,
> 
> Alan
> 
> Alan Darnell
> University of Toronto
> 
> 
> 
> <?xml version="1.0" encoding="iso-8859-1"?><?xml-stylesheet 
> type="text/xsl" href="file://batchgate1\StyleS\bpg4
> 0.xsl"?>
> <!DOCTYPE content PUBLIC "-//BLACKWELL PUBLISHING GROUP//DTD 4.0//EN" 
> "\\Batchgate1\bpgdtd\4-0\bpg4-0.dtd">
> <content dtdver="4.0" docfmt="xml">



More information about the General mailing list