[MarkLogic Dev General] General question on bi-temporal data

Anthony Coates anthony.coates at db.com
Thu Mar 12 03:43:05 PDT 2015


Classification: Public

Adrian, I couldn't quite follow your description, but I do think data arriving later is a problem.  Let me try to state it differently.  When I use 'Inf' here, I really mean MarkLogic's "infinity", which is the last day in 9999.

Day-0: EOD position calculated

Day-1: Day-0 EOD position inserted into DB => (Day-0 EOD) system range = Day-1 to Inf, valid range = Day-0 to Inf

Day-2: EOD position calculated but not inserted for some reason

Day-3: EOD position calculated

Day-4: Day-3 EOD position inserted into DB => (Day-0 EOD) system range = Day-1 to Day-4, valid range = Day-0 to Inf => (Day-0 EOD) system range = Day-4 to Inf, valid range = Day-0 to Day-3 => (Day-3 EOD) system range = Day-4 to Inf, valid range =Day-3 to Inf

Day-5: Day-2 EOD position inserted into DB => (Day-0 EOD) system range = Day-1 to Day-4, valid range = Day-0 to Inf => (Day-0 EOD) system range = Day-4 to Day-5, valid range = Day-0 to Day-3 => (Day-3 EOD) system range = Day-4 to Inf, valid range =Day-3 to Inf => (Day-0 EOD) system range = Day-5 to Inf, valid range = Day-0 to Day-2 => (Day-2 EOD) system rnage = Day-5 to Inf, valid range = Day-2 to Day-3

(that's out of my head, I didn't try this directly, but have played with the bi-temporal API somewhat)

The point is that adding a historic entry is fine, but you'll need to query with a system time of "now" in order to get the latest historic view of things.

Cheers, Tony.

-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Adrian Creegan
Sent: 09 March 2015 14:27
To: general at developer.marklogic.com
Subject: [MarkLogic Dev General] General question on bi-temporal data

Hi all,
I am new to MarkLogic so apologies if this is in the wrong forum or if I have a fundamental misunderstanding of this feature.


At the moment I am evaluating MarkLogic as a data store for financial market data. Generally the records are end-of-day positions keyed on an instrument identifier with the most recent position being active until a new one is received.
I think something like this use case is covered in the Temporal Developer's Guide with the Valid End date being infinity.
In this scenario, the resulting document splits have valid start and end dates updated so that they form a continuum up to infinity with only one version valid at any point in time.


e.g.
Add:
day-1 { validStart: 'day-1', validEnd: 'infinity' } @t1
day-3 { validStart: 'day-3', validEnd: 'infinity' } @t2


Becomes:
day-1 { systemStart: 't2', systemEnd: 'infinity', validStart: 'day-1', validEnd: 'day-3' }
day-1 { systemStart: 't1', systemEnd: 't2', validStart: 'day-3', validEnd: 'infinity' } current - day-3 { systemStart: 't2', systemEnd: 'infinity', validStart: 'day-3', validEnd: 'inifinty' }


So if I query using system time between t1 and t2 I will get just the single day-1 record to infinity and if I query on system time t2 or after I get a time-series of 2 values (as expected).


However, if a document is inserted out of time order, it does get inserted into what I would consider to be the valid time continuum:


Add:
day-2 { validStart: 'day-2', validEnd: 'infinity' } @t3


Becomes:
day-3 { systemStart: 't2', systemEnd: 't3', validStart: 'day-3', validEnd: 'infinity' }
day-1 { systemStart: 't3', systemEnd: 'infinity', validStart: 'day-1', validEnd: 'day-2' }
day-1 { systemStart: 't2', systemEnd: 't3', validStart: 'day-1', validEnd: 'day-3' }
day-1 { systemStart: 't1', systemEnd: 't2', validStart: 'day-1', validEnd: 'infinity' }


current - day-2 { systemStart: 't3', systemEnd: 'infinity', validStart: 'day-2', validEnd: 'infinity' }


Day 2 for us is not the most recent data in a transactional sense (we insert it at validEnd infinity because we don't know whether there is day 3 data in the store unless we query for it), it should be Day 3.
And if we query for all records at the following system times:
@t1 we get 1 (day-1 - expected)
@t2 we get 2 (day-1 & day-3 - expected)
@t3 we get 2 (day-1 & day-2 - not expected - we need to have day-1, day-2 and day-3).


This is a problem for us as we cannot guarantee that we will receive feed data on a given day and we can't guarantee that a position will be in a data feed on successive days.
Normally we have to proceed with previous values and back-flush the data if and when we get it in order to fill in the time-series.
Sometimes, especially when a system is being commissioned, we also need to back flush historic data while we are processing current end of day data (again to build up time-series).
And there is also a similar use case when adjustments are retroactively applied to data, for example at the close of regulatory reporting periods.


So I suppose my questions are:


Does MarkLogic allow for back-flushing data in the use cases described above (delayed feed back-flushing, back-flushing historic data and end-of-period adjustments) ?
Are there restrictions on the order of data insertion when trying to maintain a time continuum as above or can it be done ?
Or is my conceptual model of bi-temporal completely wrong and do I need to approach this differently ?


Thanking you in advance,
Adrian.





_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures and to http://www.db.com/unitedkingdom/content/privacy.htm for information about privacy.


More information about the General mailing list