MarkLogic Server
XQUERY API DOCUMENTATION
3.1
This page was generated
November  5,  2007
6:03  PM
XQuery Built-In and Modules Function Reference

Module: PDF Conversion

The PDF module is part of the conversion processing pipeline. These functions are used to manipulate XHTML derived by converting PDF documents, as part of conversion processing.

To use the PDF module as part of your own XQuery module, include the following line in your XQuery prolog:

import module namespace pdf = "http://marklogic.com/cpf/pdf" at "/MarkLogic/conversion/pdf.xqy"

You will need to ensure that the PDF module is loaded into the same modules database as the importing module.

The library namespace prefix pdf is not predefined in the server.

Function Summary
pdf:clean Clean up any conversion artifacts or other infelicities.
pdf:get-toc Fetch the linked TOC, if any.
pdf:insert-toc-headers Locate TOC anchors and make them properly refer to headers at the appropriate level.
pdf:make-toc Clean and normalize the TOC produced by raw conversion.
Function Detail
pdf:clean(
$doc as node()?,
$toc as element()?
)  as   node()?
Summary:

Clean up any conversion artifacts or other infelicities.

Parameters:
$doc : The XHTML produced by conversion of a PDF document.
$toc : The TOC produced by conversion of a PDF document.

Example:
  import module namespace pdf = "http://marklogic.com/cpf/pdf" 
		  at "/MarkLogic/conversion/pdf.xqy"

  pdf:clean(fn:doc("my_pdf.xhtml"), pdf:get-toc("my_pdf.xhtml"))
  

pdf:get-toc(
$uri as xs:string
)  as   element()?
Summary:

Fetch the linked TOC, if any.

Parameters:
$uri : The URI of the converted PDF document.

Example:
  import module namespace pdf = "http://marklogic.com/cpf/pdf" 
		  at "/MarkLogic/conversion/pdf.xqy"

  pdf:get-toc("my_pdf.xhtml")
  

pdf:insert-toc-headers(
$doc as node()?,
$toc as element()?
)  as   node()?
Summary:

Locate TOC anchors and make them properly refer to headers at the appropriate level. Returned the transformed document.

Parameters:
$doc : The cleaned XHTML produced by PDF conversion.
$toc : The normalized TOC.

Example:
  import module namespace pdf = "http://marklogic.com/cpf/pdf" 
		  at "/MarkLogic/conversion/pdf.xqy"

  xdmp:document-insert( "myfile.xhtml", 
         pdf:insert-toc-headers( doc("myfile.xhtml"), 
                                 pdf:get-toc("myfile.xhtml") )
  )
  

pdf:make-toc(
$toc as element()?
)  as   element()?
Summary:

Clean and normalize the TOC produced by raw conversion.

Parameters:
$toc : The raw TOC element.

Example:
  import module namespace pdf = "http://marklogic.com/cpf/pdf" 
		  at "/MarkLogic/conversion/pdf.xqy"

  let $results := 
     xdmp:pdf-convert( xdmp:get("/myfiles/myfile.pdf"), "myfile.pdf" )
  let $manifest := $results[1]
  let $toc := 
      for $doc at $index in $results[2 to last()]
      let $name := string($manifest/*[$index])
      where fn:matches( $name, "toc.xml" )
      return $doc
  return pdf:make-toc( $toc )