This page was generated
August  8,  2011
2:24  AM
XQuery Built-In and Modules Function Reference

Module: XHTML Conversion

The XHTML module is part of the conversion processing pipeline. These functions are used to manipulate XHTML, as part of conversion processing.

To use the XHTML module as part of your own XQuery module, include the following line in your XQuery prolog:

import module namespace xhtml = "http://marklogic.com/cpf/xhtml" at "/MarkLogic/conversion/xhtml.xqy"

You will need to ensure that the XHTML module is loaded into the same modules database as the importing module.

The library namespace prefix xhtml is not predefined in the server.

Function Summary
xhtml:add-lists Infer numbered or bulleted lists and insert appropriate markup.
xhtml:clean Clean up the XHTML: pruning empty spans, consolidating adjacent spans, etc.
xhtml:restructure Turn an XHTML with flat structure into one with div structure based on the header elements.
Function Detail
xhtml:add-lists(
$doc as node()?
)  as   node()?
Summary:

Infer numbered or bulleted lists and insert appropriate markup. Restructuring first is highly recommended to improve both accuracy and performance. This function also assumes that indentation styling is already present on the paragraphs in the original input.

Parameters:
$doc : The source XHTML.

Example:
  xquery version "0.9-ml"
  default element namespace = "http://www.w3.org/1999/xhtml"

  import module namespace xhtml = "http://marklogic.com/cpf/xhtml" 
		  at "/MarkLogic/conversion/xhtml.xqy"

  let $raw :=
     <html>
       <head><title>Example</title></head>
       <body>
          <div class="mlsection1">
            <h1>Section header</h1>
            <p>1. First paragraph.</p>
            <p>2. Second paragraph.</p>
            <p>a. Sub-topic 1.</p>
            <p>b. Sub-topic 1.</p>
            <p>3. Third paragraph</p>
            <div class="mlsection2">
              <h2>Subheader</h2>
              <p>1. Sub paragraph.</p>
            </div>
          </div>
       </body>
     </html>
  return xhtml:add-lists( $raw )

  Returns:
     <html>
       <head><title>Example</title></head>
       <body>
          <div class="mlsection1">
            <h1>Section header</h1>
            <ol style="list-style-type: none; margin-left: 0pt">
              <li>1. First paragraph.</li>
              <li>2. Second paragraph.</li>
              <ol style="list-style-type: none; margin-left: 0pt">
                <li>a. Sub-topic 1.</li>
                <li>b. Sub-topic 1.</li>
              </ol>
              <li>3. Third paragraph</li>
            </ol>
            <div class="mlsection2">
              <h2>Subheader</h2>
              <p>1. Sub paragraph.</p>
            </div>
          </div>
       </body>
     </html>
  

xhtml:clean(
$doc as node()?
)  as   node()?
Summary:

Clean up the XHTML: pruning empty spans, consolidating adjacent spans, etc.

Parameters:
$doc : The source XHTML.

Example:
  xquery version "0.9-ml"
  import module namespace xhtml = "http://marklogic.com/cpf/xhtml" 
		  at "/MarkLogic/conversion/xhtml.xqy"

  xhtml:clean(fn:doc("my.xhtml"))
  

xhtml:restructure(
$doc as node()?
)  as   node()?
Summary:

Turn an XHTML with flat structure into one with div structure based on the header elements.

Parameters:
$doc : The source XHTML.

Example:
  xquery version "0.9-ml"
  default element namespace = "http://www.w3.org/1999/xhtml"

  import module namespace xhtml = "http://marklogic.com/cpf/xhtml" 
		  at "/MarkLogic/conversion/xhtml.xqy"

  let $unstructured :=
     <html>
       <head><title>Example</title></head>
       <body>
          <h1>First section</h1>
          <p>First paragraph.</p>
          <p>Second paragraph.</p>
          <h2>Subheader</h2>
          <p>Sub paragraph.</p>
          <h1>Second section</h1>
          <p>Last paragraph.</p>
       </body>
     </html>
  return xhtml:restructure( $unstructured )

  Returns:
    <html>
      <head><title>Example</title></head>
      <body>
         <div class="mlsection1">
           <h1>First section</h1>
           <p>First paragraph.</p>
           <p>Second paragraph.</p>
           <div class="mlsection2">
             <h2>Subheader</h2>
             <p>Sub paragraph.</p>
           </div>
         </div>
         <div class="mlsection1">
           <h1>Second section</h1>
           <p>Last paragraph.</p>
         </div>
      </body>
    </html>