[MarkLogic Dev General] Re: XDBC creation / access

Michael Blakeley michael.blakeley at marklogic.com
Wed Dec 3 16:02:00 PST 2008


No, it means that the input encoding will not be detected automatically. 
 From http://developer.marklogic.com/svn/recordloader/trunk/README.html:

> INPUT_ENCODING UTF-8 The Java Charset encoding (codepage) to use for all
> input XML. If unset, RecordLoader will use null, which will default to
> the default Locale's character encoding.
> Note that MarkLogic Server must receive all XML as UTF-8, so the output
> encoding is always UTF-8.
> Example: if the input XML is encoded as windows-1252, use
> INPUT_ENCODING=Cp1252 to ensure correct conversion. 

-- Mike

Dave Pawson wrote:
> 2008/12/1 Dave Pawson <dave.pawson at gmail.com>:
>> I note in Loader.java
>>
>>  try {
>>            xpp = config.getXppFactory().newPullParser();
>>            xpp.setInput(new InputStreamReader(input, decoder));
>>            // TODO feature isn't supported by xpp3 - look at xpp5?
>>            // xpp.setFeature(XmlPullParser.FEATURE_DETECT_ENCODING, true);
>>            // TODO feature isn't supported by xpp3 - look at xpp5?
>>            // xpp.setFeature(XmlPullParser.FEATURE_PROCESS_DOCDECL, true);
>>            xpp
>>                    .setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES,
>>                            true);
>>        } catch (XmlPullParserException e) {
>>            throw new FatalException(e);
>>        }
>>
>>
>> Does that mean this code only supports utf-8 encodings?
> 
> I note from http://www.xmlpull.org/ that the recommended implementation
> (http://www.extreme.indiana.edu/dist/java-repository/xpp3/distributions/?M=A)
> calls on also has no mention of encoding?
> 
> http://www.xmlpull.org/v1/doc/api/org/xmlpull/v1/XmlPullParserFactory.html#setFeature(java.lang.String,%20boolean)
> has the setfeatures ... but
> http://www.extreme.indiana.edu/viewcvs/~checkout~/XPP3/java/src/java/api/org/xmlpull/v1/XmlPullParser.java
> this has no mention of encoding.
> 
> The whole emphasis seems to be on speed rather than competeness.
> 
> Is the implication that utf-8 is the only encoding usable via this
> interface... or MarkLogic?
> 
> regards
> 
> 



More information about the General mailing list