|
|
xdmp:excel-convert(
|
|
$doc as node(),
|
|
$filename as xs:string,
|
|
[$options as node()]
|
| ) as node()* |
|
 |
Summary:
Converts a Microsoft Excel document to XHTML. Returns several nodes,
including a parts node, the converted document xml node, and any
other document parts (for example, css files and images). The first
node is the parts node, which contains a manefest of all of the parts
generated as result of the conversion.
|
Parameters:
$doc
:
Microsoft Office Excel document to convert to HTML, as binary node().
|
$filename
:
The root for the name of the converted files and directories. If the
specified filename includes an extension, then the extension is appended
to the root with an underscore. The directory for other parts of the
conversion (images, for example) has the string "_parts" appended to the
root. For example, if you specify a filename of "myFile.xls", the
generated names will be "myFile_xls.xhtml" for the xml node and
"myFile_xls_parts" for the directory containing the any other parts
generated by the conversion (images, css files, and so on).
|
$options
(optional):
Options element for this conversion. The options element must be in the
xdmp:excel-convert namespace. The default value is (). In
addition to the options shown below, you can specify xdmp:tidy
options by entering the tidy option elements in the xdmp:tidy
namespace.
Options include:
<tidy>
- Specify
true to run tidy on the document and
false not to run tidy. If you run tidy, you can also
specify an xdmp:tidy options node.
<sheetID>
- An integer specifying which sheet of the input Excel document
to convert. If this option is not set, all sheets are converted.
<compact> - Specify
true to produce
"compact" HTML, that is, without style information. The default is
false.
<print-area-only> - Specify
true
to convert only the print area of the sheet.
<page-by-page> - Specify
true to produce
one document for each sheet. The default is false.
Sample Options Node:
- The following is a sample options node which specifies that tidy is
used to clean the generated html, specifies to use the tidy "clean"
option, and specifies to only convert sheet 2 of the document:
<options xmlns="xdmp:excel-convert"
xmlns:tidy="xdmp:tidy">
<tidy>true</tidy>
<tidy:clean>yes</tidy:clean>
<sheetID>2</sheetID>
</options>
|
|
Usage Notes:
The convert functions return several nodes. The first node is a manifest
containing the various parts of the conversion. Typically there will be
an xml part, a css part, and some image parts. Each part is returned as
a separate node in the order shown in the manifest.
Therefore, given the following manifest:
<parts>
<part>myFile_xls.xhtml</part>
<part>myFile_xls_parts/conv.css</part>
<part>myFile_xls_parts/toc.xml</part>
</parts>
the first node of the returned query is the manifest, the second is the
"myFile_xls.xhtml" node, the third is the "myFile_xls_parts/conv.css" node,
and the fourth is the myFile_xls_parts/toc.xml node.
|
Example:
let $results := xdmp:excel-convert(
xdmp:document-get("myFile.xls"),
"myFile.xls" ),
$manifest := $results[1]
return
$results[2 to last()]
=> all of the converted nodes
|
|
|
|
xdmp:pdf-convert(
|
|
$doc as node(),
|
|
$filename as xs:string,
|
|
[$options as node()]
|
| ) as node()* |
|
 |
Summary:
Converts a PDF file to XHTML. Returns several nodes,
including a parts node, the converted document xml node, and any
other document parts (for example, css files and images). The first
node is the parts node, which contains a manefest of all of the parts
generated as result of the conversion.
|
Parameters:
$doc
:
PDF document to convert to HTML, as a binary node().
|
$filename
:
The root for the name of the converted files and directories. If the
specified filename includes an extension, then the extension is appended
to the root with an underscore. The directory for other parts of the
conversion (images, for example) has the string "_parts" appended to the
root. For example, if you specify a filename of "myFile.pdf", the
generated names will be "myFile_pdf.xhtml" for the xml node and
"myFile_pdf_parts" for the directory containing the any other parts
generated by the conversion (images, css files, and so on).
|
$options
(optional):
Options element for this conversion. The options element must be in the
xdmp:pdf-convert namespace. The default value is (). In
addition to the options shown below, you can specify xdmp:tidy
options by entering the tidy option elements in the xdmp:tidy
namespace.
Options include:
<tidy>
- Default value:
true
Specify true to run tidy on the document and
false not to run tidy. If you run tidy, you can also
specify any xdmp:tidy options. Any tidy option
elements must be in the xdmp:tidy namespace.
<config>
- The configuration file for the conversion. You can specify an
absolute path or a relative path. The relative path is relative
to the
<install_dir>/Converters/cvtpdf directory.
The default configuration file is named PDFtoHTML.cfg;
it produces a single reflowed XHTML document with CSS styling. Setting
this parameter may override the remaining options.
<page-by-page>
- Default value:
false
Specify true to select a different default configuration
file that produces one XHTML document per page with absolute positioning.
The default paged configuration file is named
PDFtoXHTML_pages.cfg
If a specific configuration file is selected with the config
option, the page-by-page option has no effect.
<page-start-id>
- Default value:
0
The index of the first page to convert. Page indices start at zero.
<page-end-id>
- Default value:
-1
The index of the last page to convert. Page indices start at zero.
The default is -1, meaning to convert through the last page of the
document.
<synth-bookmarks>
- Default value:
true
Enable/disable converter's internal font-based TOC inferences.
<image-output>
- Default value:
true
Enable/disable extraction and conversion of images.
<text-output>
- Default value:
true
Enable/disable extraction of text.
<zones>
- Default value:
false
Enable/disable zone controls. Using true produces better
results when the PDF is annotated; using false produces
better results in non-annotated tables.
<ignore-text>
- Default value:
true
Enable/disable extraction of text from images. Documents consisting of
scanned pages can only have text extracted if this parameter is set to
true; however, diagrams with embedded text labels may
be less palatable. For page-by-page conversion, the problem with reflowing
of text and graphical elements within a diagram giving poor results is
not such a problem, and the value of false will probably
be the better choice.
<remove-overprint>
- Default value:
false
Enable/disable removal of text overlays. Setting this parameter to
true can sometimes clean up messy results stemming from
reflowing of text that was not visible in the original PDF because it
was covered by something else.
<illustrations>
- Default value:
true
Enable/disable extraction of illustrations. Setting this parameter to
false can sometimes clean up messy results stemming from
minor and unnecessary graphical ornaments.
<image-quality>
- Default value:
75
Determines the quality of extracted and converted images: smaller values
mean smaller image sizes (in bytes) but lossier rendering. The maximum is
100.
<page-start>
- Default value:
Boilerplate text inserted at the start of every page. Any XML markup
must be escaped. For example: <p>PAGE START</p>
<page-end>
- Default value:
Boilerplate text inserted at the end of every page. XML markup must be
escaped.
<document-start>
- Default value:
Boilerplate text inserted at the start of every document. XML markup
must be escaped.
<document-end>
- Default value:
Boilerplate text inserted at the end of every document. XML markup must
be escaped.
Sample Options Node:
- The following is a sample options node which specifies that tidy is
used to clean the generated html, specifies to use the tidy "clean" option,
and specifies a particular configuration file to use for the conversion:
<options xmlns="xdmp:pdf-convert"
xmlns:tidy="xdmp:tidy">
<tidy>true</tidy>
<tidy:clean>yes</tidy:clean>
<config>c:\myConfigFile.cfg</config>
</options>
|
|
Usage Notes:
The convert functions return several nodes. The first node is a manifest
containing the various parts of the conversion. Typically there will be
an xml part, a css part, and some image parts. Each part is returned as
a separate node in the order shown in the manifest.
Therefore, given the following manifest:
<parts>
<part>myFile_pdf.xhtml</part>
<part>myFile_pdf_parts/conv.css</part>
<part>myFile_pdf_parts/toc.xml</part>
</parts>
the first node of the returned query is the manifest, the second is the
"myFile_pdf.xhtml" node, the third is the "myFile_pdf_parts/conv.css" node,
and the fourth is the myFile_pdf_parts/toc.xml node.
|
Example:
let $results := xdmp:pdf-convert(
xdmp:document-get("myFile.pdf"),
"myFile.pdf" ),
$manifest := $results[1]
return
$results[2 to last()]
=> all of the converted nodes
|
|
|
|
xdmp:powerpoint-convert(
|
|
$doc as node(),
|
|
$filename as xs:string,
|
|
[$options as node()]
|
| ) as node()* |
|
 |
Summary:
Converts a Microsoft Powerpoint document to XHTML. Returns several nodes,
including a parts node, the converted document xml node, and any
other document parts (for example, css files and images). The first
node is the parts node, which contains a manefest of all of the parts
generated as result of the conversion.
|
Parameters:
$doc
:
Microsoft Powerpoint document to convert to HTML, as binary node().
|
$filename
:
The root for the name of the converted files and directories. If the
specified filename includes an extension, then the extension is appended
to the root with an underscore. The directory for other parts of the
conversion (images, for example) has the string "_parts" appended to the
root. For example, if you specify a filename of "myFile.ppt", the
generated names will be "myFile_ppt.xhtml" for the xml node and
"myFile_ppt_parts" for the directory containing the any other parts
generated by the conversion (images, css files, and so on).
|
$options
(optional):
Options element for this conversion. The options element must be in the
xdmp:powerpoint-convert namespace. The default value is (). In
addition to the options shown below, you can specify xdmp:tidy
options by entering the tidy option elements in the xdmp:tidy
namespace.
Options include:
<tidy>
- Specify
true to run tidy on the document and
false not to run tidy. If you run tidy, you can also
specify an xdmp:tidy options node.
<compact> - Specify
true to produce
"compact" HTML, that is, without style information. The default is
false.
<slideID>
- An integer specifying which slide of the input Powerpoint document
to convert. If this option is not set, all slides are converted.
<page-by-page> - Specify
true to produce
one document for each slide. The default is false.
<speaker-notes> - Specify
true to
include speaker notes in the output. The default is false.
Sample Options Node:
- The following is a sample options node which specifies that tidy is
used to clean the generated html, specifies to use the tidy "clean"
option, and specifies to only convert the second slide of the document:
<options xmlns="xdmp:powerpoint-convert"
xmlns:tidy="xdmp:tidy">
<tidy>true</tidy>
<tidy:clean>yes</tidy:clean>
<slideID>2</slideID>
</options>
|
|
Usage Notes:
The convert functions return several nodes. The first node is a manifest
containing the various parts of the conversion. Typically there will be
an xml part, a css part, and some image parts. Each part is returned as
a separate node in the order shown in the manifest.
Therefore, given the following manifest:
<parts>
<part>myFile_ppt.xhtml</part>
<part>myFile_ppt_parts/conv.css</part>
<part>myFile_ppt_parts/toc.xml</part>
</parts>
the first node of the returned query is the manifest, the second is the
"myFile_ppt.xhtml" node, the third is the "myFile_ppt_parts/conv.css" node,
and the fourth is the myFile_ppt_parts/toc.xml node.
|
Example:
let $results := xdmp:powerpoint-convert(
xdmp:document-get("myFile.ppt"),
"myFile.ppt" ),
$manifest := $results[1]
return
$results[2 to last()]
=> all of the converted nodes
|
|
|
|
xdmp:tidy(
|
|
$doc as xs:string,
|
|
[$options as node()]
|
| ) as node()+ |
|
 |
Summary:
Run tidy on the specified html document to convert the document to
well-formed and clean XHTML.
|
Parameters:
$doc
:
A string representing the the html document you want to tidy.
|
$options
(optional):
The options nodes for this operation. The node for the tidy options
must be in the xdmp:tidy namespace. The default value is ().
The options are based on the open source HTML Tidy configuration options,
available at http://tidy.sourceforge.net/docs/quickref.html.
Most of the tidy options are available through xdmp:tidy
with the following exceptions:
- The character encoding for the output is always UTF-8.
- The filesystem options which allow you to specify where to save output
are not supported (although there are many ways to achieve this through
functions such as
xdmp:save).
- The output is always XHTML.
Options include:
HTML, XHTML, and XML Options
<add-xml-decl>
- Default Value:
no
Description: This option specifies if Tidy should add the XML
declaration when outputting XML or XHTML. Note that if the input
already includes an <?xml ... ?> declaration then
this option will be ignored.
<add-xml-space>
- Default Value:
no
Description: This option specifies if Tidy should add
xml:space="preserve" to elements such as <PRE>,
<STYLE> and <SCRIPT> when generating XML. This is needed if
the whitespace in such elements is to be parsed appropriately without
having access to the DTD.
<alt-text>
- Default Value: n/a
Description: This option specifies the default "alt=" text Tidy uses
for <IMG> attributes. This feature is dangerous as it suppresses
further accessibility warnings. You are responsible for making your
documents accessible to people who can not see the images!
<assume-xml-procins>
- Default Value:
no
Description: This option specifies if Tidy should change the parsing
of processing instructions to require ?> as the terminator rather
than >. This option is automatically set if the input is in XML.
<bare>
- Default Value:
no
Description: This option specifies if Tidy should strip Microsoft specific
HTML from Word 2000 documents, and output spaces rather than
non-breaking spaces where they exist in the input.
<clean>
- Default Value:
no
Description: This option specifies if Tidy should strip out surplus
presentational tags and attributes replacing them by style rules and
structural markup as appropriate. It works well on the HTML saved by
Microsoft Office products.
<css-prefix>
- Default Value: n/a
Description: This option specifies the prefix that Tidy uses for styles
rules. By default, "c" will be used.
<doctype>
- Default Value:
auto
Possible Values: auto, omit, strict,
loose, transitional, or user-specified fpi string
Description:
This option specifies the DOCTYPE declaration generated by Tidy.
If set to omit the output won't contain a DOCTYPE declaration.
If set to auto (the default) Tidy will use an educated
guess based upon the contents of the document. If set to
strict,
Tidy will set the DOCTYPE to the strict DTD. If set to loose,
the DOCTYPE is set to the loose (transitional) DTD. Alternatively, you can
supply a string for the formal public identifier (FPI). For example:
doctype: "-//ACME//DTD HTML 3.14159//EN"
If you specify the FPI for an XHTML document, Tidy will set the
system identifier to the empty string. Tidy leaves the DOCTYPE for
generic XML documents unchanged. Specifying a doctype of omit
implies that the numeric-entities option is set to yes.
<drop-empty-paras>
- Default Value:
yes
Description:
This option specifies if Tidy should discard empty paragraphs. If
set to no, empty paragraphs are replaced by a pair of <BR>
elements as HTML4 precludes empty paragraphs.
<drop-front-tags>
- Default Value:
no
Description:
This option specifies if Tidy should discard <FONT>
and <CENTER> tags without creating the corresponding
style rules. This option can be set independently of the clean option.
<drop-proprietary-attributes>
- Default Value:
no
Description:
This option specifies if Tidy should strip out proprietary attributes,
such as MS data binding attributes.
<enclose-block-text>
- Default Value:
no
Description:
This option specifies if Tidy should insert a <P> element to enclose
any text it finds in any element that allows mixed content for HTML
transitional but not HTML strict.
<enclose-text>
- Default Value:
no
Description:
This option specifies if Tidy should enclose any text it finds in
the body element within a <P> element. This is useful when you want
to take existing HTML and use it with a style sheet.
<escape-cdata>
- Default Value:
no
Description:
This option specifies if Tidy should convert <![CDATA[]]>
sections to normal text.
<fix-backslash>
- Default Value:
yes
Description:
This option specifies if Tidy should replace backslash characters
"\" in URLs by forward slashes "/".
<fix-bad-comments>
- Default Value:
yes
Description:
This option specifies if Tidy should replace unexpected hyphens
with "=" characters when it comes across adjacent hyphens. The
default is yes. This option is provided for users of Cold Fusion
which uses the comment syntax: <!--- --->
<fix-uri>
- Default Value:
yes
Description:
This option specifies if Tidy should check attribute values that carry
URIs for illegal characters and if such are found, escape them as
HTML 4 recommends.
<hide-comments>
- Default Value:
no
Description:
This option specifies if Tidy should print out comments.
<hide-endtags>
- Default Value:
no
Description:
This option specifies if Tidy should omit optional end-tags when
generating the pretty printed markup. This option is ignored if
you are outputting to XML.
<indent-cdata>
- Default Value:
no
Description:
This option specifies if Tidy should indent <![CDATA[]]>
sections.
<input-xml>
- Default Value:
no
Description:
This option specifies if Tidy should use the XML parser rather than
the error correcting HTML parser.
<join-classes>
- Default Value:
no
Description:
This option specifies if Tidy should combine class names to generate a
single new class name, if multiple class assignments are detected on
an element.
<join-styles>
- Default Value:
yes
Description:
This option specifies if Tidy should combine styles to generate a
single new style, if multiple style values are detected on an element.
<literal-attributes>
- Default Value:
no
Description:
This option specifies if Tidy should ensure that whitespace characters
within attribute values are passed through unchanged.
<logical-emphasis>
- Default Value:
no
Description:
This option specifies if Tidy should replace any occurrence of <I>
by <EM> and any occurrence of <B> by <STRONG>. In both
cases, the attributes are preserved unchanged. This option can be set
independently of the clean and drop-font-tags options.
<lower-literals>
- Default Value:
yes
Description:
This option specifies if Tidy should convert the value of an attribute
that takes a list of predefined values to lower case. This is required for
XHTML documents.
<merge-divs>
- Default Value:
yes
Description:
Can be used to modify behavior of setting the clean option
to yes. This option specifies if Tidy should merge
nested <div> such as
<div><div>...</div></div>.
<ncr>
- Default Value:
yes
Description:
This option specifies if Tidy should allow numeric character
references.
<new-blocklevel-tags>
- Default Value: none
Description:
This option specifies new block-level tags. This option takes a space or
comma separated list of tag names. Unless you declare new tags, Tidy will
refuse to generate a tidied file if the input includes previously unknown
tags. Note you can't change the content model for elements such
as <TABLE>, <UL>, <OL> and <DL>.
<new-empty-tags>
- Default Value: none
Description:
This option specifies new empty inline tags. This option takes a space
or comma separated list of tag names. Unless you declare new tags, Tidy
will refuse to generate a tidied file if the input includes previously
unknown tags. Remember to also declare empty tags as either inline or
blocklevel.
<new-inline-tags>
- Default Value: none
Description:
This option specifies new non-empty inline tags. This option takes a
space or comma separated list of tag names. Unless you declare new tags,
Tidy will refuse to generate a tidied file if the input includes
previously unknown tags.
<new-pre-tags>
- Default Value: none
Description:
This option specifies new tags that are to be processed in exactly the
same way as HTML's <PRE> element. This option takes a space or
comma separated list of tag names. Unless you declare new tags, Tidy
will refuse to generate a tidied file if the input includes previously
unknown tags. Note you can not as yet add new CDATA elements (similar
to <SCRIPT>).
<numeric-entities>
- Default Value:
no
Description:
This option specifies if Tidy should output entities other than the
built-in HTML entities (&, <, > and ") in the numeric
rather than the named entity form.
<output-html>
- Default Value:
no
Description:
This option specifies if Tidy should generate pretty printed output,
writing it as HTML.
<output-xhtml>
- Default Value:
yes
Description:
This option specifies if Tidy should generate pretty printed output,
writing it as extensible HTML. This option causes Tidy to set the
DOCTYPE and default namespace as appropriate to XHTML. If a DOCTYPE or
namespace is given they will checked for consistency with the content
of the document. In the case of an inconsistency, the corrected values
will appear in the output. For XHTML, entities can be written as named
or numeric entities according to the setting of the
numeric-entities option. The original case of tags and
attributes will be preserved, regardless of other options.
<output-xml>
- Default Value:
yes
Description:
This option specifies if Tidy should pretty print output, writing it as
well-formed XML. Any entities not defined in XML 1.0 will be written
as numeric entities to allow them to be parsed by a XML parser. The
original case of tags and attributes will be preserved, regardless
of other options.
<quote-ampersand>
- Default Value:
yes
Description:
This option specifies if Tidy should output unadorned & characters
as &.
<quote-marks>
- Default Value:
no
Description:
This option specifies if Tidy should output " characters as " as
is preferred by some editing environments. The apostrophe character '
is written out as ' since many web browsers don't yet
support '.
<quote-nbsp>
- Default Value:
yes
Description:
This option specifies if Tidy should output non-breaking space characters
as entities, rather than as the Unicode character value 160 (decimal).
<repeated-attributes>
- Default Value:
keep-last
Possible Values:keep-first, keep-last
Description:
This option specifies if Tidy should keep the first or last attribute,
if an attribute is repeated (for example, if a tag has has two
align attributes.
<replace-color>
- Default Value:
no
Description:
This option specifies if Tidy should replace numeric values in color
attributes by HTML/XHTML color names where defined, e.g. replace
"#ffffff" with "white".
<show-body-only>
- Default Value:
no
Description:
This option specifies if Tidy should print only the contents of the body
tag as an HTML fragment. Useful for incorporating existing whole pages
as a portion of another page.
<uppercase-attributes>
- Default Value:
no
Description:
This option specifies if Tidy should output attribute names in upper case.
The default is no, which results in lower case attribute names, except
for XML input, where the original case is preserved.
<uppercase-tags>
- Default Value:
no
Description:
This option specifies if Tidy should output tag names in upper case.
The default is no, which results in lower case tag names, except for
XML input, where the original case is preserved.
<word-2000>
- Default Value:
no
Description:
This option specifies if Tidy should go to great pains to strip out all
the surplus stuff Microsoft Word 2000 inserts when you save Word
documents as "Web pages". Doesn't handle embedded images or VML.
Diagnostic Options
<accessibility-check>
- Default Value: 0
Possible Values: 0, 1, 2, or 3
Description:
This option specifies what level of accessibility checking, if any,
that Tidy should do. Level 0 is equivalent to Tidy Classic's
accessibility checking. For more information on Tidy's accessibility
checking, see the web site for the
Adaptive Technology Resource Centre at the University of Toronto.
<show-errors>
- Default Value:
6
Possible Values: Any integer.
Description:
This option specifies the number Tidy uses to determine if further
errors should be shown. If set to 0, then no errors are shown.
<show-warnings>
- Default Value:
yes
Description:
This option specifies if Tidy should suppress warnings. This is
useful when a few errors are hidden between many warning messages.
Pretty Print Options
<break-before-br>
- Default Value:
no
Description:
This option specifies if Tidy should output a line break before each
<BR> element.
<indent>
- Default Value:
no
Possible Values: no, yes, auto
Description:
This option specifies if Tidy should indent block-level tags. If set
to auto, this option causes Tidy to decide whether or not
to indent the content of tags such as TITLE, H1-H6, LI, TD, TD, or P
depending on whether or not the content includes a block-level element.
You are advised to avoid setting indent to yes as this
can expose layout bugs in some browsers.
<indent-attributes>
- Default Value:
no
Description:
This option specifies if Tidy should begin each attribute on a new
line.
<indent-spaces>
- Default Value:
2
Possible Values: Any integer.
Description:
This option specifies the number of spaces Tidy uses to indent content,
when indentation is enabled.
<markup>
- Default Value:
yes
Description:
This option specifies if Tidy should generate a pretty printed version
of the markup. Note that Tidy won't generate a pretty printed version
if it finds significant errors (see force-output).
<punctuation-wrap>
- Default Value:
no
Description:
This option specifies if Tidy should line wrap after some Unicode or
Chinese punctuation characters.
<split>
- Default Value:
no
Description:
This option specifies if Tidy should create a sequence of slides from
the input, splitting the markup prior to each successive <H2>.
The slides are written to "slide001.html", "slide002.html" etc.
<tab-size>
- Default Value: 8
Possible Values: Any integer.
Description:
This option specifies the number of columns that Tidy uses between
successive tab stops. It is used to map tabs to spaces when reading
the input. Tidy never outputs tabs.
<vertical-space>
- Default Value:
no
Description:
This option specifies if Tidy should add some empty lines for
readability.
<wrap>
- Default Value: 68
Possible Values: Any integer.
Description:
This option specifies the right margin Tidy uses for line wrapping.
Tidy tries to wrap lines so that they do not exceed this length.
Set wrap to zero if you want to disable line wrapping.
<wrap-asp>
- Default Value:
yes
Description:
This option specifies if Tidy should line wrap text contained within
ASP pseudo elements, which look as follows:
<% ... %>.
<wrap-attributes>
- Default Value:
no
Description:
This option specifies if Tidy should line wrap attribute values,
for easier editing. This option can be set independently of
wrap-script-literals.
<wrap-jste>
- Default Value:
yes
Description:
This option specifies if Tidy should line wrap text contained within
JSTE pseudo elements, which look as follows:
<# ... #>.
<wrap-php>
- Default Value:
yes
Description:
This option specifies if Tidy should line wrap text contained within
PHP pseudo elements, which look as follows:
<?php ... ?>.
<wrap-script-literals>
- Default Value:
no
Description:
This option specifies if Tidy should line wrap string literals that
appear in script attributes. Tidy wraps long script string literals
by inserting a backslash character before the line break.
<wrap-sections>
- Default Value:
yes
Description:
This option specifies if Tidy should line wrap text contained
within <![ ... ]> section tags.
Miscellaneous Options
<force-output>
- Default Value:
no
Description:
This option specifies if Tidy should produce output even if errors
are encountered. Use this option with care - if Tidy reports an error,
this means Tidy was not able to, or is not sure how to, fix the
error, so the resulting output may not be what you expect.
<keep-time>
- Default Value:
no
Description:
This option specifies if Tidy should keep the original modification
time of files that Tidy modifies in place. The default is no. Setting
the option to yes allows you to tidy files without causing these files
to be uploaded to a web server when using a tool such as SiteCopy.
Note this feature is not supported on some platforms.
<quiet>
- Default Value:
no
Description:
This option specifies if Tidy should output the summary of the
numbers of errors and warnings, or the welcome or
informational messages.
<tidy-mark>
- Default Value:
yes
Description:
This option specifies if Tidy should add a meta element to the
document head to indicate that the document has been tidied.
Tidy won't add a meta element if one is already present.
|
|
Example:
let $html := "
<htm>
<h1>This is a heading 1
<p>This is paragraph tag
"
return
xdmp:tidy($html, <options xmlns="xdmp:tidy">
</options>)
=> a tidy-status node with any errors and warnings and
an html node containing the clean and well-formed XHTML.
|
|
|
|
xdmp:word-convert(
|
|
$doc as node(),
|
|
$filename as xs:string,
|
|
[$options as node()]
|
| ) as node()* |
|
 |
Summary:
Converts a Microsoft Word document to XHTML. Returns several nodes,
including a parts node, the converted document xml node, and any
other document parts (for example, css files and images). The first
node is the parts node, which contains a manefest of all of the parts
generated as result of the conversion.
|
Parameters:
$doc
:
Microsoft Word document to convert to HTML, as binary node().
|
$filename
:
The root for the name of the converted files and directories. If the
specified filename includes an extension, then the extension is appended
to the root with an underscore. The directory for other parts of the
conversion (images, for example) has the string "_parts" appended to the
root. For example, if you specify a filename of "myFile.doc", the
generated names will be "myFile_doc.xhtml" for the xml node and
"myFile_doc_parts" for the directory containing the any other parts
generated by the conversion (images, css files, and so on).
|
$options
(optional):
Options element for this conversion. The options element must be in the
xdmp:word-convert namespace. The default value is (). In
addition to the options shown below, you can specify xdmp:tidy
options by entering the tidy option elements in the xdmp:tidy
namespace.
Options include:
<tidy>
- Specify
true to run tidy on the document and
false not to run tidy. If you run tidy, you can also
specify any xdmp:tidy options. Any tidy option
elements must be in the xdmp:tidy namespace.
<compact> - Specify
true to produce
"compact" HTML, that is, without style information. The default is
false.
Sample Options Node:
- The following is a sample options node which specifies that tidy is
used to clean the generated html and specifies to use the tidy "clean"
option for the conversion:
<options xmlns="xdmp:word-convert"
xmlns:tidy="xdmp:tidy">
<tidy>true</tidy>
<tidy:clean>yes</tidy:clean>
</options>
|
|
Usage Notes:
The convert functions return several nodes. The first node is a manifest
containing the various parts of the conversion. Typically there will be
an xml part, a css part, and some image parts. Each part is returned as
a separate node in the order shown in the manifest.
Therefore, given the following manifest:
<parts>
<part>myFile_doc.xhtml</part>
<part>myFile_doc_parts/conv.css</part>
<part>myFile_doc_parts/toc.xml</part>
</parts>
the first node of the returned query is the manifest, the second is the
"myFile_doc.xhtml" node, the third is the "myFile_doc_parts/conv.css" node,
and the fourth is the myFile_doc_parts/toc.xml node.
|
Example:
let $results := xdmp:word-convert(
xdmp:document-get("myFile.doc"),
"myFile.doc" ),
$manifest := $results[1]
return
$results[2 to last()]
=> all of the converted nodes
|
|
|