[MarkLogic Dev General] Processing Large Documents?
Geert Josten
geert.josten at dayon.nl
Fri Feb 24 23:58:33 PST 2012
To my knowledge putting hexbin inside binary { } is the way to create a
real binary. So your approach should already work. Did you check?
A small optimization could be to make batches of let’s say about a 100
records, build a map:map of them, pass that to a spawn process that inserts
all 100. You are creating a new task server thread for every record now.
The task server queue has a limit, and doing it in batches of 100 files
usually works faster.
Here a bit of sample ‘transaction’ code I copied from collector-feed.xqy (
https://github.com/marklogic/infostudio-plugins/blob/master/collectors/collector-feed.xqy
):
let $entries := …
let $entry-count := count($entries)
let $transaction-size := 100
let $total-transactions := ceiling($entry-count div $transaction-size)
(: create transactions by breaking document set into maps
each maps's documents are saved to the db in their own transaction :)
let $transactions :=
for $i at $index in 1 to $total-transactions
let $map := map:map()
let $start := (($i -1) *$transaction-size) + 1
let $finish := min((($start - 1 + $transaction-size),
$entry-count))
let $put :=
for $entry in ($entries)[$start to $finish]
let $id := fn:concat(fn:string($entry/atom:id),".xml")
return map:put($map,$id,$entry)
return $map
(: the callback function for ingest :)
let $ingestion :=
for $transaction at $index in $transactions
return
infodev:transaction($transaction,$ticket-id,
xdmp:function(xs:QName("feed:process-file")),$policy-deltas,$index,(),())
Replace $entries with $table/row, change let $id to use your uris, and
replace that infodev:transaction call with your own spawn that should take
an entire map, loop over its keys, and do an insert for each key/value
within the map..
Cheers,
Geert
*Van:* general-bounces at developer.marklogic.com [mailto:
general-bounces at developer.marklogic.com] *Namens *Todd Gochenour
*Verzonden:* zaterdag 25 februari 2012 7:10
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Processing Large Documents?
It's time for me to pick this project up now that the work week has
passed.
I'm attempting to implement Michael Blakeley's recommendation to move the
SQL blob content into it's own document as part of this initial load/chunk
phase. Here's how I see the strategy. As I iterate through each record in
a table, when there is an element with the attribute
xsi:type="xs:hexBinary", I want to extract this data, generate a new
document. replace the original element with a reference to this new
document, and then spawn two 'document-insert.xqy' operations, one for the
original document and one for the binary document.
These are my current issues. I haven't figured out how to convert the
hexBinary into binary so that when I fetch the document I get the correct
format. I probably need to be setting mime type. The @xsi:type attribute
isn't part of the table_data, so I can't trigger blob processing based upon
this attribute. I'm currently only processing elements called file_blob.
My current working copy now looks like:
(: query console :)
xquery version "1.0-ml";
for $table in
xdmp:document-get('C:\Users\servicelogix\slx\us_co_slx.xml')/*/*/table_data
let $table-name := $table/@name/string()
let $database-name := $table/../@name/string()
for $row in $table/row
let $record-uri :=
concat('/',$database-name,'/',$table-name,'/id-',$row/field[@name='id'])
let $file-uri :=
concat('/',$database-name,'/',$table-name,'/file-',$row/field[@name='id'])
let $blob := if($row/field[@name='file_blob'][1]) then binary
{xs:hexBinary($row/field[@name='file_blob'][1])} else ()
let $record := element { $table-name } {
$row/field[text() and not(@name='file_blob')]/element
{ if(number(substring(@name,1,1))=number(substring(@name,1,1))) then
concat('_', at name) else @name } { text() },
if($blob) then element file_uri { $file-uri } else ()
}
return (
if($record) then xdmp:spawn('document-insert.xqy', (xs:QName('URI'),
$record-uri, xs:QName('NEW'), $record)) else (),
if($blob) then xdmp:spawn('document-insert.xqy', (xs:QName('URI'),
$file-uri, xs:QName('NEW'), $blob)) else ()
)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120225/9de36012/attachment.html
More information about the General
mailing list