Wednesday 27 February 2013

Commonly Used Data Masking Techniques - TDM

In my previous posts I discussed about Data Subset and Data Masking.  In this post, I will discuss the data Masking techniques that are widely used.  This is by no means exhaustive but will provide a general idea of the techniques that are available.

  • Random Substitution
    • In this technique, the value to be masked is replaced or substituted with a random value.  Depending on the nature of the random value, they can be further categorized into
      • Random Numbers
      • Random Dates
      • Random Seed Values For ex.
        • Names
        • Addresses
        • SSN Numbers
        • Credit Card Numbers
        • Telephone numbers
        • And a lot more
      • Random Alphanumerics
  • Algorithmic Substitution
    • Even though a random substitution technique is used, certain fields need certain algorithms to be followed.  For ex. a Credit Card number need to follow the mod-10 algorithm and an SSN Number should only be 9 digits in the following format AAA-GG-RRRR
      • A -> Area Code within US
      • G -> Group Code
      • R -> Random number
  • Sequence
    • This technique is to generate a sequence of data.
  • Selective Mask
    • Masking a selective portion of the data.  For example, altering only the domain name of an Email ID.
  • Nulling
    • This technique will null the column values to a Null value in the database.
  • Blurring
    • This is a technique of adding a random variance to the existing values.  This is mostly used for numeric fields for providing variations of the same data.  For ex.  producing a variation of 80% to 120% of the current salary values.
  • Custom Rules / Expressions
    • Certain fields can be more complicated to mask than the others.  For those fields, custom rules / expressions might be needed to satisfy those requirements.  For ex.  A bank account number might have the following rule for a customer account number - BBB-LLLLLL-AAAA
      • B -> Bank Unique Code
      • L -> Location / Branch code of the bank
      • A -> Account number
The technique of generating meaningful values for masked data is known as Intelligent masking.  This technique is widely used in today's data masking solutions.


Hope this post was informative.  Please feel free to comment.  Thanks for the read.





About the Author

Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing.  He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant.  He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing.  He blogs at Test Data Management Blog & Agile Blog.  Connect with him on Google+

12 comments:

  1. Your posts are very good and concise!

    ReplyDelete
  2. For a list of data masking functions built into a tool for it, see:
    http://www.iri.com/products/fieldshield/technical-details
    For TDM generally, under http://tdminsights.blogspot.com/, please note IRI RowGen (which produces referentially correct test data without having to mask production data), and planning a test data environment here: http://www.iri.com/blog/test-data/tdm-primer/

    ReplyDelete
  3. Wonderful Blog!!! Your post is very informative about Data Management. Thank you for sharing the article with us.

    Hadoop Training Chennai |
    Big Data Training

    ReplyDelete
  4. Thank you.Nice blog post.Your post is knowledgeable about Data Management.Thank you for sharing the article with us
    Luxury Private Student Accommodation

    ReplyDelete
  5. Great blog on Test Data Management. The refreshing or provisioning of test environments from production can be tedious and take a lot of clock time (not to mention machine resources). I find that anything to automate the process on distributed platforms and especially on mainframe applications is key is delivering timely, production quality test data to QA and Test so they can meet their project timelines. BCV5 for mainframe and XDM for distributed (and mainframe) is used to speed delivery and automate test data provisioning. You can find more info at www.esaigroup.com

    ReplyDelete
  6. I have been through a blog, it was so distinct & I had a chance to collect the information that helps me a lot to improvise myself. I hope this will help many readers who are in need of this vital piece of information. Thanks for sharing & keep your blog updated.Visit my blog Data Management Services

    ReplyDelete
  7. I enjoyed reading your post. Technology has changed the way tasks are reported, with the Employee Automated Timesheet Software solution leading the way. Thank you for this informative article.

    ReplyDelete
  8. wonderful information Thanx For Sharing with us if you have regarding Quickbooks issue then you can call at QuickBooks Customer Support Phone Number+17735165910

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. These methods enable organizations to strike a balance between data privacy and utility, Best VPNs Free ensuring secure and effective use of data in various applications."

    ReplyDelete