[MarkLogic Dev General] fields where the contents is a URL

Grobstein, Spike Spike.Grobstein at wmg.com
Fri Nov 6 13:32:57 PST 2009


My goal for this is to do a query and get back a sequence of all values
that are in that field. I have a lot of documents that contain:

 

<param type="profile_url">http://www.blah.com/profile/path</param>

 

I was doing a query where I was requesting all documents that were
within a date range (we've got elements that contain a datestamp) that
were for a specific site (ie: facebook), then pulling unique values from
the above field, but I was having speed and memory usage issues... I
kept getting the cachefull exception.

 

I really need to be able to get a list of values that are in that field
so I can create an index page and also do faster queries from that.

 

Fields were working great until I tried to use it on URLs.

 

Any other suggestions?

 

 

 

...spike

 

________________________________

From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Frank
Rubino
Sent: Thursday, November 05, 2009 2:01 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] fields where the contents is a URL

 

Spike-

I think you should look at a different way to index the url. For
instance, can you set up a range index with a scalar type anyURI?

 

Frank

 

On Nov 5, 2009, at 11:56 AM, Grobstein, Spike wrote:





I've got a Field configured in my database that I want to do
field-words() queries against, but the contents of the element is a URL.
It seems that when I do searches, the field is the URL broken up by
symbol. For example:

 

http://www.facebook.com/Seal?sid=01cfb667e33bd4a46d3460853fbf3fe7&ref=se
arch

 

is translated into the following fieldwords:

*	http
*	www
*	facebook
*	com
*	Seal
*	Sid
*	01cfb667e33bd4a46d3460853fbf3fe7
*	ref
*	search

 

Is there a way around this? Should I not be using Fields?

 

I need to be able to do queries based on the full URL.

 

Thanks!

 

 

...spike

Spike Grobstein

_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20091106/fd2452cf/attachment.html


More information about the General mailing list