[MarkLogic Dev General] How to convert pdf / doc files to xml
when storying into marklogic - reg.,
Danny Sokolsky
Danny.Sokolsky at marklogic.com
Tue Apr 28 10:29:26 PDT 2009
Hi Santhosh,
If you look in the CPF documentation (http://developer.marklogic.com/pubs/4.0/books/cpf.pdf), chapter 9 describes the default conversion option.
You cannot just specify and xsd file and have it transform your document to that schema. You have to write code to tell it how to transform it.
If you install the default conversion option (even without the conversion license), you can still convert html documents (to xhtml and simplified docbook formats). That will give you a pretty good idea of the output it creates with word and pdf documents.
-Danny
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Santhosh Raj
Sent: Sunday, April 26, 2009 10:13 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] How to convert pdf / doc files to xml when storying into marklogic - reg.,
Hi Danny,
Thnaks for your reply,I can understand your point. What i need to know is if i am using conversion license then
1) How/Where can i specify any xsd that the generated(converted) xml file should follow.
2) In what name the xml file will be stored.
3) send me some sample file that you have converted. (i.e, pdf/doc , xhtml, xml files)\
Original pdf/doc file
Generated xhtml file
Generated docbook xml file.
Thanks in advance.
Santhosh Rajasekaran
Danny Sokolsky <Danny.Sokolsky at marklogic.com>
Sent by: general-bounces at developer.marklogic.com
04/24/2009 09:32 PM
Please respond to
General Mark Logic Developer Discussion <general at developer.marklogic.com>
To
General Mark Logic Developer Discussion <general at developer.marklogic.com>
cc
Subject
RE: [MarkLogic Dev General] How to convert pdf / doc files to xml when storying into marklogic - reg.,
Hi Santhosh,
You need to have a conversion license to run the pdf or office conversion built-in XQuery functions-it is not included in the community license. Once you have the license, the conversion built-ins convert the binary files (pdf, word, and so on) to XHTML, and then there is a CPF process to clean up the XML and produce docbook. If you wanted to transform it at that point to some other structure, you could write some code to perform that transformation.
-Danny
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Santhosh Raj
Sent: Friday, April 24, 2009 5:03 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] How to convert pdf / doc files to xml when storying into marklogic - reg.,
Hi all,
I have only community version of Marklogic Server. In community version we can't convert pdf/doc files to xml format. It is stored as binary file.
1) While storing in Marklogic If we want to convert the doc / pdf file to xml and then store it to marklogic then what to do.
2) can we specify any xsd (schema file) for the conversion to take place. If else if marklogic itself uses any schema which schema it uses.
If you give me the sample doc / pdf , schema file, and the converted xml file. Steps to follow to convert pdf/doc file to xml It will be more useful.
Thanks and Regards,
Santhosh Rajasekaran
Tata Consultancy Services
Mailto: santhosh.raj at tcs.com
Website: http://www.tcs.com<http://www.tcs.com/>
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general
ForwardSourceID:NT0000AB6A
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20090428/8f46535b/attachment-0001.html
More information about the General
mailing list