In my previous post, I discussed the Challenges in Production Cloning approach. In this post, we will focus on its solution, the Data Subset process / Data Sub-setting.
Data subset is the process of slicing a part of the Production Database and loading it into the Test Database. For ex. instead of cloning a 50 TB production database, create a subset that is only 50 GB worth data and put it back into the Test Database. Lets assume in a retail application, you have a Customers table having 10 million customers and Orders table having 100 million orders and 100 million other transaction tables, our subset process will try to shrink the sizes to good reasonable limits as depicted in the picture below.
Advantages of data sub-setting