Saturday, 2 March 2013

Test Data Refresh in TDM

In my previous posts, I explained about Data Subset and Data Masking.  In this post we will focus on the topic of Test Data Refresh.

So what is Test Data Refresh?  It is the process of loading / refreshing the Test Database with the latest data from the Production database or any other data source.

What are the Challenges in Test Data Refresh
  • Diverse data targets
    • Much like the Production data sources, Test data targets can also be across databases and different file systems.  The data needs to be in sync across these targets
  • Test DB can already contain data
    • If it is an existing system, already the Test Database will contain older data.  Steps should be taken to carefully overwrite the data without losing the data integrity.
  • Test DB downtime
    • During the time of test data refresh, it might be necessary to bring down the test environment in order to accommodate the refresh.

Types of Refresh
  • From Scratch
    • In this approach, the database schema is created and then the data is loaded into it.  This is useful in case of first time Test database creation or a new Test environment creation.
  • Complete Refresh
    • In this approach, the assumption is that the Test DB already contains data.  This data is wiped out completely and the new data is loaded
  • Partial Refresh
    • In this approach, only a subset of the data in the Test DB is refreshed rather than the entire data.  This is useful when only a specific or a few of the modules are to be tested in the application/product
  • Incremental refresh
    • In this approach, the existing data in the Test Database is untouched and the new data is appended to the existing data.  This method is useful when the old data could not be removed due to various reasons.

Hope the post was informative.  Thanks for the read.  Comments are welcome.

About the Author

Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing.  He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant.  He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing.  He blogs at Test Data Management Blog & Agile Blog.  Connect with him on Google+


  1. This is a good blog series on test data management which is a topic that I am challenged with. I am working to set up test data for an interelated set of databases with customers split
    between and duplicated in the databases. Would a TDM control
    database that includes selected customer ids and accounts be
    a good way to control the extract and subset of data from multiple data sources? I plan to experiment with this.

  2. Thanks David. From what I understand of your landscape, you have customer ids split between multiple databases (Lets say 1-100 in Database A and 50-150 in Database B with certain repeat customers in the range 51-100 that are duplicated in both databases).

    For this situation, having a control or a staging database having the selected customer ids would be a good starting point. That can act as a subset criteria for the other related sets of tables.

    One more question is are you trying to do this using an industry standard TDM tool or an inhouse process?