Showing posts with label data subset. Show all posts
Showing posts with label data subset. Show all posts

Wednesday, 13 March 2013

Test Data Life Cycle

In the previous posts, I explained about the various concepts surrounding Test data creation and maintenance, namely Data Subset, Data Masking, Test Data Ageing, Test Data Refresh, Data Archive and Gold Copy.  In this post, I will focus on the life cycle of Test Data.

So what is meant by a life cycle.  Life Cycle is the various stages that a product/service/artifact goes through before attaining its end of life.  So a Test Data Cycle explains the various stages through which the test data goes through in order to reach its end of life or alternatively start a recurring life cycle.

So similar to a test life cycle or a software development life cycle, Test Data goes through the following phases.  They are

Requirement Gathering & Analysis

This is pretty straightforward.  In this phase, the test data requirements pertaining to the test requirements are gathered.  They are categorized into various heads

  • Pain Areas
  • Data Sources
  • Data Security/Masking
  • Data Volume requirements
  • Data Archival requirements
  • Test Data Refresh considerations
  • Gold Copy considerations

This phase is typically carried out in the form of a TDM assessment or Test Data Assessment.  Since that topic requires separate attention, I will dedicate a blog post to it.


Planning & Design

Thursday, 7 March 2013

Data Archive in Test Data Management (TDM)

In the previous posts, I explained about Data Subset, Data Masking, Test Data Ageing and Test Data Refresh.  In this post, we will focus on the topic of Data Archival and how important it is to the process of Test Data Management.

What does Data Archival typically mean?
  • Size Management
    • You would want to provide an efficient mechanism for the database size management.  Over time a database size grows and you need to actively manage it.
  • Archival of older data
    • Older data can be archived to some low disk space occupying area and can be later retrieved whenever needed

Types or Archive Mechanisms:

Wednesday, 20 February 2013

Implementation Approaches to Data Sub-setting

In one of my previous post, I described about the process of Data Subset.  In this post we will focus on the implementation approaches to data sub-setting.

There are 3 broad categories in which you can implement sub-setting.

SQL Query based approach

In this approach, we will use SQL queries to fetch the subset of the production data and load them it into the target environment.  Lets assume you have 2 tables in your production from which you need to create a small subset.  The following shows the relationship of the tables Customers and Orders where they are related through the custid field.  



The picture also shows the sample data within those tables.  So we need to subset this.  We find out a sample condition.  Lets assume we will pull out only the customer ids which are odd numbers.  A simple query will do this trick.  The following will be the query for the Customers table.


Friday, 15 February 2013

Data Subset in TDM

In my previous post, I discussed the Challenges in Production Cloning approach.  In this post, we will focus on its solution, the Data Subset process / Data Sub-setting.

Data subset is the process of slicing a part of the Production Database and loading it into the Test Database.  For ex. instead of cloning a 50 TB production database, create a subset that is only 50 GB worth data and put it back into the Test Database.  Lets assume in a retail application, you have a Customers table having 10 million customers and Orders table having 100 million orders and 100 million other transaction tables, our subset process will try to shrink the sizes to good reasonable limits as depicted in the picture below.















Advantages of data sub-setting

Wednesday, 13 February 2013

Challenges in Production Cloning approach

In my previous articles, I have already discussed the topics "How to create Test Data" and "Top 3 Challenges in using Production data in Test Environments".  In this post we will focus on the challenges that we face in Production Cloning approach and how to overcome those challenges.

1.  Infrastructure


Even though it is highly recommended to have the Test Environment in the same lines as Production, it is not always feasible to test under those real-time conditions.  It is highly recommended to do Performance / load / stress tests exactly mimicking the Production database, but the expensive infrastructure requirements might be an overkill for Functional Testing.  But cloning might force you to have production like infrastructure which will translate into higher costs for the customer.

2.  High Storage Costs


Another major challenge associated with Production Cloning is that all the production data needs to be stored in testing region.  Assuming the production data is 50 TBs (Terabytes), the Test Database also needs to hold 50 TBs of data.  So storage has to be provided for storing all of the data.  And with the databases being backed up regularly, that would mean higher storage costs for the customer.


Top 3 Challenges in using Production data in Test Environments

In my previous post "How to create Test Data", I explained the concept of creating test data directly from the production data.  In this post we will concentrate on the Top 3 challenges in using the Production data for testing purposes.

Data Security

This is by far the most crucial challenge of using Production data in Test Environments.  Production data can contain a lot of sensitive information.  Even though the data sets will be rich in nature in the Production database, the very thought of using production data involves a lot of risk.  For ex. if you are testing an application for a bank, production data will contain real customer information like Names, Addresses, Account Numbers, Balances, Credit Card Numbers, etc.  As you can see, if you try to use these data for testing, it exposes huge security risks for the bank. So how do we overcome this, the answer is Data Masking.

Data Masking is the process of masking of the sensitive fields from the complete data set.  Please read my future post on Data Masking and the Techniques used for Data Masking for more details.  The following figure depicts the data security challenge and the approaches.

Data Security Challenge