Saturday, 16 February 2013

Data Masking in TDM

In my previous posts, I explained about the Challenges in Production Cloning.  One of the major challenges in the Production Cloning approach is Data Security.  This post will focus on the solution for Data Security, Data Masking.

As already explained, Data Masking is the process of masking the sensitive fields from the complete data set.    The whole objective of data masking is to ensure that no sensitive data is leaked into non-production regions like the Dev and Testing regions.

What are the sensitive fields that needs to be masked?  That basically depends on the project needs.  But some of the generic fields that need to be masked are:

  • Personal information like First names, Last Names, Email IDs, DOB, Phone & Fax numbers, SSN Numbers, National Insurance Numbers, Other national unique identifiers.
  • In Banking, Financial Services & Insurance industry - Bank Balances, Account numbers, Credit card numbers, Policy numbers, etc.
  • In Healthcare industry - PHI attributes like Medical record numbers, Member IDs, etc.

This list is by no means exhaustive, but will give a fair idea of how many fields are sensitive in nature that needs to be handled with care.  Any lapse in masking any of these fields might have a big impact on the Organization as a whole.

Challenges in Data Masking

  • Individual data has a meaning
    • Many of the sensitive fields have a meaning attached to them, for ex.  a Credit Card number containing 16 digits can't just be masked to some random 16 digit number.  It needs to follow a unique algorithm.  The following picture depicts few of the sensitive fields format & its meanings  

  • Data Integrity
    • One of the biggest challenges in data masking is maintaining data integrity.  As we have already seen that tables can contain relationships, and hence masking on a primary key should be cascaded to the foreign key as well.
  • Cross-DB relationships
    • And to make things more complicated, data relationships can exist across databases (For ex, a federated relationships)
  • Multiple data sources
    • This is also another challenge, the source of data can be different, not just databases.  They can be in flat files, XML files, EDI files and delimited files.  
A good masking solution should have a solution to overcome these challenges.  

Types of Masking
  • Static or In-DB Masking
    • This is a technique where the data masking operates on the data in the database or files that are at rest.  In this technique, typically the production data needs to be first dumped into a temporary region called Staging Database.  Once the data is loaded into the Staging database, data is masked in the same database and then the masked data is loaded into the Test Database.
  • Dynamic or In-Memory Masking
    • This is a technique where data masking happens in memory and doesn't happen in a database.  In this technique, the masking happens while loading the data from Production to Test region.
The below picture will depict the process of Static and Dynamic Masking.

Hope the information in this post was useful.  In my future post, I will focus a bit more in detail about the different data masking techniques.

About the Author

Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing.  He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant.  He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing.  He blogs at Test Data Management Blog & Agile Blog.  Connect with him on Google+


1 comment:

  1. Thanks a lot, all your posts are very simple and down to earth!..