[MarkLogic Dev General] UTF-8 BOM bombs
Dominic Beesley
dominic at brahms.demon.co.uk
Thu Nov 15 06:08:43 PST 2007
Hello,
I'm using the Xcc.Net library to load documents up to an ML server and have
been having trouble with documents that are in UTF-8 with a BOM (byte order
mark) at the beginning of the file.
My code is:
Stream s = new FileStream(directory + mi.filename, FileMode.Open,
FileAccess.Read);
ContentCreateOptions co = new ContentCreateOptions();
co.Format = DocumentFormat.Format.XML;
co.RepairLevel = DocumentRepairLevel.Level.NONE;
Content con = ContentFactory.NewContent(mi.uri, s, co);
ses.InsertContent(con);
When ML loads an XML document with a BOM it loads it in as text - even
though the co.Format has been set. Is there a way of making ML skip the BOM
(or even better obey it) or will I have to put in a test and skip those
bytes in the stream?
Thanks
Dom
More information about the General
mailing list