[Corona] extractContent

Ryan Grimm grimm at xqdev.com
Mon Dec 12 13:02:28 PST 2011


I've thought a bit about the inconsistencies between different file formats as well.  Clark is right, this is a side effect of the way ISYS is doing things but I'd love to do better.  I contemplated running any piece of metadata that has date or time in it's name through the date parsing library in Corona to try and get xs:dateTime values for each of them.  I haven't done so yet because I can't guarantee that all of them will be parseable.  But perhaps the best way to figure out what the landscape looks like is to start trying and see what happens.

Thoughts?

--Ryan


On Dec 12, 2011, at 12:55 PM, Clark Richey wrote:

> This isn't Corona its the ISYS filters that are transforming the binary content. This won't cause an error as the elements aren't typed per se. However, if you wanted to apply a dateTime range index then yes, you might need to normalize the values. 
> 
> Sent from my iPhone
> 
> On Dec 12, 2011, at 15:42, "Scott Conroy" <conroys at avalonconsult.com> wrote:
> 
>> Can someone verify that the extractContent capability for binary files
>> is a bit hit-or-miss when it comes to element naming and type?
>> 
>> For example, for PDF's, I get:
>> 
>> <corona:modDate>2011/09/20 00:03:43Z</corona:modDate>
>> 
>> For a Word file, I get:
>> 
>> <corona:lastSavedDate>2011-11-28T18:25:00Z</corona:lastSavedDate>
>> 
>> I believe the first of those two example will error with an invalid
>> cast as xs:date, though I didn't check.
>> 
>> I imagine the transform issue is because of the underlying library
>> rather than Corona itself.
>> _______________________________________________
>> Corona mailing list
>> Corona at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/corona
> _______________________________________________
> Corona mailing list
> Corona at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/corona



More information about the Corona mailing list