[MarkLogic Dev General] How to convert pdf / doc files to xml when storying into marklogic - reg.,

Danny Sokolsky Danny.Sokolsky at marklogic.com
Tue Apr 28 10:29:26 PDT 2009


Hi Santhosh,

If you look in the CPF documentation (http://developer.marklogic.com/pubs/4.0/books/cpf.pdf), chapter 9 describes the default conversion option.

You cannot just specify and xsd file and have it transform your document to that schema.  You have to write code to tell it how to transform it.

If you install the default conversion option (even without the conversion license), you can still convert html documents (to xhtml and simplified docbook formats).  That will give you a pretty good idea of the output it creates with word and pdf documents.

-Danny

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Santhosh Raj
Sent: Sunday, April 26, 2009 10:13 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] How to convert pdf / doc files to xml when storying into marklogic - reg.,


Hi Danny,

        Thnaks for your reply,I can understand your point. What i need to know is  if i am using conversion license then

1)  How/Where can i specify any xsd that the generated(converted) xml file should follow.

2) In what name the xml file will be stored.

3) send me some sample file that you have converted. (i.e, pdf/doc , xhtml, xml files)\
        Original pdf/doc file
        Generated xhtml file
        Generated docbook xml file.

Thanks in advance.


Santhosh Rajasekaran


Danny Sokolsky <Danny.Sokolsky at marklogic.com>
Sent by: general-bounces at developer.marklogic.com

04/24/2009 09:32 PM
Please respond to
General Mark Logic Developer Discussion <general at developer.marklogic.com>


To

General Mark Logic Developer Discussion <general at developer.marklogic.com>

cc

Subject

RE: [MarkLogic Dev General] How to convert pdf / doc files to xml        when        storying into marklogic - reg.,







Hi Santhosh,

You need to have a conversion license to run the pdf or office conversion built-in XQuery functions-it is not included in the community license.    Once you have the license, the conversion built-ins convert the binary files (pdf, word, and so on) to XHTML, and then there is a CPF process to clean up the XML and produce docbook.  If you wanted to transform it at that point to some other structure, you could write some code to perform that transformation.

-Danny

From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Santhosh Raj
Sent: Friday, April 24, 2009 5:03 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] How to convert pdf / doc files to xml when storying into marklogic - reg.,


Hi all,

       I have only community version of Marklogic Server. In community version  we can't convert pdf/doc files to xml format. It is stored as binary file.

1)  While storing in Marklogic If we want to convert the doc / pdf file to xml and then store it to marklogic then what to do.

2) can we specify any xsd (schema file) for the conversion to take place.  If else if marklogic itself uses any schema which schema it uses.

If you give me the sample doc / pdf , schema file, and the converted xml file.  Steps to follow to convert pdf/doc file to xml It will be more useful.

Thanks and Regards,
Santhosh Rajasekaran
Tata Consultancy Services
Mailto: santhosh.raj at tcs.com
Website: http://www.tcs.com<http://www.tcs.com/>
____________________________________________
Experience certainty.        IT Services
                      Business Solutions
                      Outsourcing
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

 _______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

ForwardSourceID:NT0000AB6A

=====-----=====-----=====

Notice: The information contained in this e-mail

message and/or attachments to it may contain

confidential or privileged information. If you are

not the intended recipient, any dissemination, use,

review, distribution, printing or copying of the

information contained in this e-mail message

and/or attachments to it are strictly prohibited. If

you have received this communication in error,

please notify us by reply e-mail or telephone and

immediately and permanently delete the message

and any attachments. Thank you




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20090428/8f46535b/attachment-0001.html


More information about the General mailing list