[MarkLogic Dev General] Type safe data and referencing questions

Mark Waschkowski mwaschkowski at gmail.com
Fri Feb 15 04:34:15 PST 2008


Hey Ryan,

Ah, thanks man, thats the ticket! Didn't know of the castable, um,
feature/function.

I modified to:

  if($x/age castable as xs:int) then xs:int($x/age) else $x/age

which orders those entries that don't have age at the bottom, and appears to
be the same as:

  if($x/age castable as xs:int) then xs:int($x/age) else ()

which is exactly what I'm looking for.

Thanks again,

Mark

On Thu, Feb 14, 2008 at 5:25 PM, Ryan Grimm <grimm at xqdev.com> wrote:

> Hey Mark,
>
> You can simply test to see if $x/age is castable as an xs:int:
>
> for $x in collection('Contacts')/*
> order by if($x/age castable as xs:int) then xs:int($x/age) else 0
> return $x
>
> This also safeguards you a bit in case the age element contains
> something that isn't an int.  You might need to change the default
> value in the else to something other than 0, but that seemed like a
> good guess.
>
> --Ryan
>
>
> On Feb 14, 2008, at 2:15 PM, Mark Waschkowski wrote:
>
> > Hi Danny,
> >
> > One quick followup, I need to do a cast in the order by, but the
> > cast fails if the value is missing, whats the simplest way to do
> > what I've done below, which doesn't look ideal to me
> >
> > for $x in collection('Contacts')/*
> > order by  (for $x in $x where exists($x/age) return xs:int($x/age))
> > return $x
> >
> > Thanks!
> >
> > Mark
> >
> > On Thu, Jan 17, 2008 at 8:32 PM, Danny Sokolsky <dsokolsky at marklogic.com
> > > wrote:
> > Hi Mark,
> >
> > It is true that it would take extra time to cast one or two million
> > times in a query.  But it will take time to do anything that many
> > times
> > in a query.  The trick is to write the query in a such a way that it
> > does this fast.   Range indexes are a good tool for this, in
> > combination
> > with the order by optimizations.  For example, if you want to find the
> > 10 latest dates from an element named stringdate, for example:
> >
> > <stringdate>2008-12-02</stringdate>
> >
> > then you can write a query like the following:
> >
> > (for $x in //stringdate order by xs:date($x) descending return $x)[1
> > to
> > 10]
> >
> > Without a range index, it will need to find all of the stringdates and
> > cast them all to dates in the order by clause.  For a ballpark
> > estimate,
> > on my laptop with 1,000,000 stringdate elements, this takes about 13
> > seconds.  Not bad considering it has to order 1 million items.
> >
> > Now if I add a date range index for this element, the same query takes
> > about 0.3 seconds, for a speedup of about 40x.  That is because the
> > range index optimized the sort in the order by clause, and we just
> > returned the first 10 of them.  For details about the order by
> > optimizations, see the Query Performance and Tuning book (
> > http://developer.marklogic.com/pubs/3.2/books/performance.pdf).
> >
> > Another useful tool is the profile button in cq.  It shows you where
> > your query is spending time processing.
> >
> > My recommendation is to try some tests with range indexes and order by
> > optimizations and see how it works.  It is quite easy to generate some
> > dummy data for these tests.
> >
> > I'm not 100% sure I answered your question, but hopefully it will lead
> > you in the direction of what you are trying to accomplish.
> >
> > -Danny
> >
> > -----Original Message-----
> > From: general-bounces at developer.marklogic.com
> > [mailto:general-bounces at developer.marklogic.com] On Behalf Of Mark
> > Waschkowski
> > Sent: Thursday, January 17, 2008 11:38 AM
> > To: General Mark Logic Developer Discussion
> > Subject: Re: [MarkLogic Dev General] Type safe data and referencing
> > questions
> >
> > OK great, thanks for the information Danny.
> >
> > I'm a bit concerned about the type safety issue (#1) not because I'm
> > worried about the data being stored correctly, but because a
> > conversion might have to be carried out many many time during an
> > evaluation. I may be repeating the question here, but do you have any
> > idea of how the above use case would work with 1M+ rows of data. Seems
> > to me that converting some date text 2M+ times (twice per record in
> > this case) would have an adverse effect on a query, no? Likewise
> > converting when wanting to order a larger data set by date?
> >
> > Really appreciate the feedback.
> >
> > Mark
> >
> > On Jan 14, 2008 8:12 PM, Danny Sokolsky <dsokolsky at marklogic.com>
> > wrote:
> > > Hi Mark,
> > >
> > > I will take a stab at your questions.
> > >
> > > 1) You do not need a schema to use typed data.  A schema will make
> > it
> > so
> > > Mark Logic treats an element or attribute as its defined type
> > without
> > an
> > > explicit cast, but you can always add an explicit cast (like the
> > > use-case example) to make sure XQuery treats a value as a certain
> > type
> > > (with or without a schema).  The schema just makes that a little
> > easier.
> > > There might be some performance advantage to using a schema, but I
> > don't
> > > think it will be that big.  It is worth trying though, as this might
> > > depend somewhat on your content.  The real performance advantage
> > will
> > > come from creating range indexes on elements or attributes you will
> > use
> > > in comparisons.  Schemas can also help you ensure that your data
> > is in
> > > the correct format when you load it, as Mark Logic will throw an
> > > exception if it cannot cast content in an element or attribute to
> > the
> > > type specified in the schema.
> > >
> > > 2) You could put the referencing information in the properties
> > document.
> > > The default conversion application in CPF does this, for example, to
> > > keep track of the original documents and various converted
> > documents.
> > >
> > > 3) There are no foreign key constraints built in.  I think any best
> > > practices would depend on what you are trying to do.  Two approaches
> > > that tend to work well are to a) put the constraining items in the
> > same
> > > document and/or b) use the properties document corresponding to a
> > > document to store information about what is in the document.
> > >
> > > -Danny
> > >
> > >
> > > -----Original Message-----
> > > From: general-bounces at developer.marklogic.com
> > > [mailto:general-bounces at developer.marklogic.com] On Behalf Of Mark
> > > Waschkowski
> > > Sent: Monday, January 14, 2008 1:25 PM
> > > To: general at developer.marklogic.com
> > > Subject: [MarkLogic Dev General] Type safe data and referencing
> > > questions
> > >
> > > Hi,
> > >
> > > Have been using Marklogic for a while now and haven't seen answers
> > to
> > > the below questions yet, anyone know of an answer or two?
> > >
> > > 1) Type safe data -  I'm concerned with retrieval of typed data,
> > > especially for date information. The only way to store typed data is
> > > through the use of a schema right? I can't specify the type of
> > data on
> > > a per element basis, correct? ie. <person> <birthday
> > > xs:date>01-01-1970</birthday></person>
> > >
> > > As well, I noticed the below query in the use case examples:
> > >
> > >  let $item := doc("items.xml")//item_tuple
> > >               [end_date >= xs:date("1999-03-01")
> > >                and
> > >                end_date <= xs:date("1999-03-31")]
> > >  return
> > >  <item_count>
> > >  {
> > >    count($item)
> > >  }
> > >  </item_count>
> > >
> > > Is there a schema behind the loaded data or are the examples un-type
> > > safe? Should I just not worry about type safety and convert the data
> > > values to the type I need when querying? If so, won't that be a
> > > performance issue?
> > >
> > > 2) Referencing - what is the (if there is one) best practice
> > approach
> > > to reference documents together?
> > > ie. Document A and Document B should both refer to Document C
> > >
> > > 3) Foreign key constraints - is this supported at all in some
> > fashion?
> > > If not, any approaches to suggest?
> > >
> > > Thanks in advance for any and all suggestions!
> > >
> > > Mark
> > > _______________________________________________
> > > General mailing list
> > > General at developer.marklogic.com
> > > http://xqzone.com/mailman/listinfo/general
> > > _______________________________________________
> > > General mailing list
> > > General at developer.marklogic.com
> > > http://xqzone.com/mailman/listinfo/general
> > >
> > _______________________________________________
> > General mailing list
> > General at developer.marklogic.com
> > http://xqzone.com/mailman/listinfo/general
> > _______________________________________________
> > General mailing list
> > General at developer.marklogic.com
> > http://xqzone.com/mailman/listinfo/general
> >
> > _______________________________________________
> > General mailing list
> > General at developer.marklogic.com
> > http://xqzone.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20080215/444450b3/attachment.html


More information about the General mailing list