[MarkLogic Dev General] Suggestions for data masking

Joel Wilson Gunasekaran joelwilson.gunasekaran at gmail.com
Tue Mar 24 10:42:25 PDT 2015


Thanks Geert and David for your valuable suggestions. 

Will check out Flexrep and CPF. 

> On Mar 24, 2015, at 2:17 AM, David Lee <David.Lee at marklogic.com> wrote:
> 
> Unless your document's PI data is separated into different documents you are going to need to do a custom transformation on each document - the details of which are very case specific (fill in SS#'s with '???' remove last names ? remove entire sections or replace with sample data ?).   Having worked in the Medical and commerce worlds I know getting this right, and clearly auditable are crucial.
> Also consider if you need to maintain any document properties or metadata (properties objects including mod dates,  collections, permissions , DLS data etc.,
> and are these copied as-is or modified)
> 
> That refines the question into parts
> 1) Selecting the document subset to copy 
> 2) Transforming the document content itself (*prior* to leaving the 'trust zone')
> 3) Select/copy/filter the document metadata
> 4) Extract from the source DB 
> 5) -- possibly package for secure, reliable or easy travel to the down sites, encrypt?
> 6) -- Copy the data
> .... > Now reverse the process on the target site.
> 
> You can do all this ad-hoc - once maybe
> Getting this reliable, scriptable, auditable and not screw up ever -- harder.
> 
> Greet's suggestion of FlexRep seems ideal for this as it can accomplish All of these.
> 
> MLCP by itself can do quite a bit - but it may be hard to put all the pieces together.
> 
> Another way is making a temporary DB, and using CPF or your own code to do all the data transformation on-server then (1-4) then use any number of ways to copy the data (mlcp, replication, database export/import )
> 
> Or ... if you prefer offline tools (say you like xproc or xmlsh or other non-server products) you could dump the DB to local files, clean them in in place, 
> then copy them over and reverse it.
> 
> FlexRep is looking really good though  ... 
> 
> 
> -----------------------------------------------------------------------------
> David Lee
> Lead Engineer
> MarkLogic Corporation
> dlee at marklogic.com
> Phone: +1 812-482-5224
> Cell:  +1 812-630-7622
> www.marklogic.com
> 
> -----Original Message-----
> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Geert Josten
> Sent: Tuesday, March 24, 2015 2:00 AM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Suggestions for data masking
> 
> Hi Joel,
> 
> I haven¹t dealt with this personally, but could ask around. I guess though there are numerous ways to go about with this, depending on the exact needs. The two that come to mind first:
> 
> You could create a permanent solution using Flexible Replication, which builds on top of CPF:
> http://docs.marklogic.com/guide/flexrep/rep_intro#id_62963
> 
> You could also use MLCP copying feature together with an MLCP transform.
> 
> You already mentioned triggers and scheduled tasks, but MLCP will load faster I think. CPF uses triggers underneath..
> 
> Kind regards,
> Geert
> 
> On 3/24/15, 2:12 AM, "Joel Wilson Gunasekaran"
> <joelwilson.gunasekaran at gmail.com> wrote:
> 
>> Hi,
>> 
>> Once in a while, we refresh dataset in lower environments with 
>> production data for testing purposes.
>> We have a requirement to mask all pii(personally identifiable
>> information) data like email id, phone number, etc. in lower 
>> environments like DEV, QA.
>> 
>> We were thinking about having a one-time script that does the masking, 
>> which can be run when we do the data refresh.
>> In addition to this, we also want a automated process that does this, 
>> like either a scheduled task or a trigger, to avoid any sensitive data 
>> left unmasked, accidentally.
>> 
>> Can you please let me know if you have had to deal with similar cases 
>> and any suggestions?
>> 
>> Thanks
>> Joel
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general



More information about the General mailing list