[MarkLogic Dev General] Can node libraries be installed server-side?

Erik Hennum Erik.Hennum at marklogic.com
Tue Jul 14 06:31:09 PDT 2015


Hi, Will:

Glad to hear you had success with it -- thanks for working through the hiccups.

In the next release, require may change to work out of the box for a *.js library (that is, a module with the application/javascript mime type).

You have to use an absolute path to the library in the Modules database because there's no equivalent to a global or local node_modules directory at present. The db.config.extlibs.write() API gives you that absolute path for free, but it is opinionated (rooted at /ext). Writing directly to the Modules database is possible but a sharp tool.  An arbitrary write could conflict with configuration managed by the REST API (such as transforms, resource extensions, and so on).

Anyway, please keep the feedback coming.


Erik Hennum

________________________________
From: general-bounces at developer.marklogic.com [general-bounces at developer.marklogic.com] on behalf of Will Lawrence [will.lawrence at gmail.com]
Sent: Monday, July 13, 2015 8:48 PM
To: general at developer.marklogic.com
Subject: Re: [MarkLogic Dev General] Can node libraries be installed server-side?

Thanks, Erik.

It helped me get in the right frame of mind when thinking critically on where certain ingestion logic should reside. And thanks for digging into the example of node-xslx and pointing out that it's async built on an underlying sync library. I definitely looked at the binary extract for xslx and the Open Office pipeline, but these seem to only allow rough grain text searches. I need to be able to create indexes and create fine-grain queries on the data. Plus, xslx has the nasty behavior of putting any repeated strings into a separate sharedStrings.xml file and there didn't seem to be any MarkLogic server side solution to remedy this. And I need to automate or at least control the shredding process from an external tier as much as possible because there will be a lot of different sets of xslx. I'm thinking of massaging xslx into json, send to MarkLogic, and use CPF to split each "row" into a document since the transform function can't do a xdmp.documentInsert().

Ok, back to the node/npm/JavaScript libraries. Here's a knowledgebase page<https://help.marklogic.com/knowledgebase/article/View/222/0/server-side-javascript-implementation-and-module-reuse> I just came across that offers additional explanation that you pretty much nailed. I've also included my troubleshooting steps in how to require a library server side using the example of 'lodash.js'.

I tried to send lodash.js to modules database and then use it in in a transform with `require(“lodash.js”)` statement, but it failed with:

"message": "JS-JAVASCRIPT: var _ = require('lodash.js'); -- Error running JavaScript request: XDMP-NOEXECUTE: Document is not of executable mimetype. URI: lodash.js

So, I needed to write it as lodash.sjs and require(“lodash.sjs”). But then this failed with:

"message": "JS-JAVASCRIPT: var _ = require('lodash.sjs'); -- Error running JavaScript request: XDMP-MODNOTFOUND: Module lodash.sjs not found

To fix this, send as uri: “/lodash.sjs" and used with require(“/lodash.sjs”).

Note: I used contentType: "application/vnd.marklogic-javascript” when sending lodash.sjs to server and used the node.js client api modulesDb.documents.write instead of the more specialized db.config.extlibs.write because I couldn't get the transform's require statement to work. Plus, the former feels like it gives more flexibility without having to learn a special set of write and read calls. Maybe my perspective will change on this with time.


Regards,

Will

------------------------------

Message: 2
Date: Mon, 13 Jul 2015 02:55:54 +0000
From: Erik Hennum <Erik.Hennum at marklogic.com<mailto:Erik.Hennum at marklogic.com>>
Subject: Re: [MarkLogic Dev General] Can node libraries be installed
        server-side?
To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Message-ID:
        <DFDF2FD50BF5AA42ADAF93FF2E3CA185070EAF9D at EXCHG10-BE01.marklogic.com<mailto:DFDF2FD50BF5AA42ADAF93FF2E3CA185070EAF9D at EXCHG10-BE01.marklogic.com>>
Content-Type: text/plain; charset="iso-8859-1"

Hi, Will:

There are some significant differences between Node.js and MarkLogic as a JavaScript runtime environment (even though both make use of v8).

First and foremost, Node.js emphasizes asynchronous IO.  As a transactional database, MarkLogic emphasizes synchronous IO.  You can execute asynchronous actions in MarkLogic (via the task server), but when you do an xdmp.documentInsert(), the operation blocks until the operation succeeds or fails.

Stepping back, the tier where you implement an action is not arbitrary.  In the database, it's best to write short actions (similar to stored procedure) for query expansion, query composition, inbound or outbound data transformation, and so on.  The middle tier is great for information bus operations, business logic, and so on.

With that perspective, the libraries that make sense to use as dependencies for server-side JavaScript actions are those that finish synchronous actions quickly.

For that reason, in the particular case, my guess would be that js-xlsx (the core library wrapped by node-xlsx) might be a better fit for server-side processing than node-xlsx (which adds asynchronous IO conveniences that would not work in the server).

At present, you would need to either modify the mimetypes configuration to identify *.js as an extension for server-side JavaScript (so the server knows that it's not static JavaScript to send to the client) or rename the library extension to sjs.

You could put the library in the modules database as described in:

    http://docs.marklogic.com/guide/rest-dev/extensions#id_55309

Then, require the library in your transform or main module.

The speculations about package management for such dependencies is very interesting.

By the way, the server can extract metadata from spreadsheets without installing an external library:

    http://docs.marklogic.com/guide/search-dev/binary-document-metadata#id_74790


Hoping that helps,



Erik Hennum

------------------------------

Message: 1
Date: Sun, 12 Jul 2015 22:19:41 -0400
From: Will Lawrence <will.lawrence at gmail.com<mailto:will.lawrence at gmail.com>>
Subject: [MarkLogic Dev General] Can node libraries be installed
        server-side?
To: general at developer.marklogic.com<mailto:general at developer.marklogic.com>
Message-ID:
        <CAGEHXqseoL3dqoGoBk-T6FZe6fx-M8DhyLbnLw3LC6T1c0mBpQ at mail.gmail.com<mailto:CAGEHXqseoL3dqoGoBk-T6FZe6fx-M8DhyLbnLw3LC6T1c0mBpQ at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"

I tried but couldn't find any examples or guidance for using node libraries
within .sjs files on the MarkLogic server. How could we use, for example,
the npm module 'node-xlsx' in a transform?

It would be great to be able to leverage the power of the npm and node
micro-library ecosystem within .sjs files.

Perhaps there could be an .npmrc file controlled via the MarkLogic admin to
specify if the server is allowed to talk to registry.npmjs.com<http://registry.npmjs.com/> or an
enterprise npm registry or non at all. Then, a REST API could be exposed to
write dependencies to the MarkLogic's package.json that would automatically
do an 'npm install' so that when an .sjs file is installed, it can execute
the line:

```spreadsheetShredder = require('node-xlsx');

Regards,
Will
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20150712/60738f32/attachment-0001.html



On Sun, Jul 12, 2015 at 10:19 PM, Will Lawrence <will.lawrence at gmail.com<mailto:will.lawrence at gmail.com>> wrote:
I tried but couldn't find any examples or guidance for using node libraries within .sjs files on the MarkLogic server. How could we use, for example, the npm module 'node-xlsx' in a transform?

It would be great to be able to leverage the power of the npm and node micro-library ecosystem within .sjs files.

Perhaps there could be an .npmrc file controlled via the MarkLogic admin to specify if the server is allowed to talk to registry.npmjs.com<http://registry.npmjs.com> or an enterprise npm registry or non at all. Then, a REST API could be exposed to write dependencies to the MarkLogic's package.json that would automatically do an 'npm install' so that when an .sjs file is installed, it can execute the line:

```spreadsheetShredder = require('node-xlsx');

Regards,
Will



--
William Lawrence
703-873-7035
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20150714/8011889a/attachment-0001.html 


More information about the General mailing list