[MarkLogic Dev General] How to count items in the database bymultiple file extensions?

Hartwig, Brent (CL Tech Sv) Brent.Hartwig at cengage.com
Wed Nov 26 05:23:06 PST 2008


Thanks, Danny. I look forward to the simpler route. Have a good holiday.

-Brent

-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Danny Sokolsky
Sent: Tuesday, November 25, 2008 5:43 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] How to count items in the database bymultiple file extensions?

That seems reasonable to me, as you have to do 1 million x 12 matches.  Assuming everything in your database has a URI starting with a "/" though, the directory-query does not buy you anything (as it matches all documents).  If you were able to constrain that directory query more, you might make things faster yet.

Also, if you are using 4.0 with function mapping, you can simplify this code a little.  something like:

xquery version "1.0-ml";
let $ext := ("*.jpg", "*.gif", "*.ico", "*.jpeg", "*.tiff", "*.tif",
   "*.png", "*.dcr", "*.pcd", "*.bmp", "*.mpeg", "*.mpg") return <count>{ fn:count(cts:uri-match($ext)) }</count>

(: note that both the cts:uri-match and the fn:count
   will use function mapping to iterate over the sequence :)

-Danny



From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Hartwig, Brent (CL Tech Sv)
Sent: Tuesday, November 25, 2008 2:03 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] How to count items in the database bymultiple file extensions?

Here's a way which took less than 45 seconds to execute:

let $dir := "/"

let $dir-level := "infinity"

let $dir-query := cts:directory-query($dir, $dir-level)

let $uris-1 := cts:uri-match("*.jpg", "document", $dir-query) let $uris-2 := cts:uri-match("*.gif", "document", $dir-query) let $uris-3 := cts:uri-match("*.ico", "document", $dir-query) let $uris-4 := cts:uri-match("*.jpeg", "document", $dir-query) let $uris-5 := cts:uri-match("*.tiff", "document", $dir-query) let $uris-6 := cts:uri-match("*.tif", "document", $dir-query) let $uris-7 := cts:uri-match("*.png", "document", $dir-query) let $uris-8 := cts:uri-match("*.dcr", "document", $dir-query) let $uris-9 := cts:uri-match("*.pcd", "document", $dir-query) let $uris-10 := cts:uri-match("*.bmp", "document", $dir-query) let $uris-11 := cts:uri-match("*.mpeg", "document", $dir-query) let $uris-12 := cts:uri-match("*.mpg", "document", $dir-query)

return
(
<count>{fn:count($uris-1) + fn:count($uris-2) + fn:count($uris-3) + fn:count($uris-4) + fn:count($uris-5) + fn:count($uris-6) + fn:count($uris-7) + fn:count($uris-8) + fn:count($uris-9) + fn:count($uris-10) + fn:count($uris-11) + fn:count($uris-12)}</count>
)

________________________________________
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Hartwig, Brent (CL Tech Sv)
Sent: Tuesday, November 25, 2008 12:54 PM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] How to count items in the database by multiple file extensions?
Hello, I'm looking for an efficient means to count the number of files of a particular type, where each type is defined by multiple file extensions. For example, how do you determine the total number of audio/video files in your MarkLogic database? Same question but for images. Ideally, I'd like to ask admin to run a single XQuery from the CQ interface but fear a timeout as we have over 1 mln files. I'll offer the following for illustrative purposes. It doesn't run. Please don't laugh loud enough for me to hear ;) Thank you in advance for your time and thoughts.

define function get-extension($uri as xs:string) {
        fn:tokenize($uri, "\.")[fn:last()] }

let $uris-docs := cts:uri-match("*.*", "document", cts:directory-query("/", "infinity"))

(: Trying to iterate through this large collection once, setting multiple vars. :) for $uri in $uris-docs
   let $ext := get-extension($uri)
   let $uris-av :=
   (
        return
        if ((fn:compare($ext, "avi") = 0) or (fn:compare($ext, "mov") = 0)
         or (fn:compare($ext, "mp3") = 0) or (fn:compare($ext, "wav") = 0)
         or (fn:compare($ext, "wma") = 0) or (fn:compare($ext, "wmv") = 0)
         or (fn:compare($ext, "ra") = 0) or (fn:compare($ext, "fla") = 0)
         or (fn:compare($ext, "flv") = 0) or (fn:compare($ext, "swf") = 0))
        then $uri else ()
   )
   let $uris-img :=
   (
        return
        if ((fn:compare($ext, "ico") = 0) or (fn:compare($ext, "gif") = 0)
         or (fn:compare($ext, "jpeg") = 0) or (fn:compare($ext, "jpg") = 0)
         or (fn:compare($ext, "mpeg") = 0) or (fn:compare($ext, "mpg") = 0)
         or (fn:compare($ext, "tiff") = 0) or (fn:compare($ext, "tif") = 0)
         or (fn:compare($ext, "png") = 0) or (fn:compare($ext, "bmp") = 0)
         or (fn:compare($ext, "dcr") = 0) or (fn:compare($ext, "pcd") = 0))
        then $uri else ()
   )

return
(
<count-av>{count($uris-av)}</count-av>
<count-img>{count($uris-img)}</count-img>
)

-Brent
_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general


More information about the General mailing list