Test Data Management

Showing posts with label test data management. Show all posts

Tuesday 19 March 2013

9 Reasons why TDM is critical to a project's success

In my previous posts, I explained about the building blocks of the concepts of Test Data Management (TDM) namely Data Subset, Data Masking, Data Archive, Test Data Refresh, Gold Copy. Alternatively you might also want to read all articles from the table of contents. In this post, I will try to explain why TDM is critical to a project's success.

Your test data determines the quality of testing

No matter how good your testing processes are, if the test data used is not right or of adequate quality, then the entire product's quality will be affected.

Your test data should be highly secure

It is absolutely mandatory that your test data doesn't contain data from production without being masked. If the data is not secure enough, then there is every chance that a data breach might happen, which can cause the organization dearly.

Test data needs to be as close to real time as possible

Not only that test data needs to be of quality, it should be as close to real time data / production data as possible. Why? Simple reason is we do not want to build a system/application/product for 6 months and fail in the production just because there was not adequate real time data to test.

Read the full post

Wednesday 13 March 2013

Test Data Life Cycle

In the previous posts, I explained about the various concepts surrounding Test data creation and maintenance, namely Data Subset, Data Masking, Test Data Ageing, Test Data Refresh, Data Archive and Gold Copy. In this post, I will focus on the life cycle of Test Data.

So what is meant by a life cycle. Life Cycle is the various stages that a product/service/artifact goes through before attaining its end of life. So a Test Data Cycle explains the various stages through which the test data goes through in order to reach its end of life or alternatively start a recurring life cycle.

So similar to a test life cycle or a software development life cycle, Test Data goes through the following phases. They are

Requirement Gathering & Analysis

This is pretty straightforward. In this phase, the test data requirements pertaining to the test requirements are gathered. They are categorized into various heads

Pain Areas
Data Sources
Data Security/Masking
Data Volume requirements
Data Archival requirements
Test Data Refresh considerations
Gold Copy considerations

This phase is typically carried out in the form of a TDM assessment or Test Data Assessment. Since that topic requires separate attention, I will dedicate a blog post to it.

Planning & Design

Read the full post

Saturday 9 March 2013

Gold Copy in Test Data Management (TDM)

In the previous posts, we discussed about Data Subset, Data Masking, Test Data Ageing, Data Archive in TDM and Test Data Refresh. In this post, we will try to focus on what is Gold Copy in TDM.

So what is meant by Gold Copy?

This is the baseline version of the data that can be used for future releases. For example, if you are trying to load your test database from the production database for the first time. In this case, you can save the copy as a baseline from which future test data refreshes can be made. The following picture depicts this concept of Gold Copy

Storage

Read the full post

Thursday 7 March 2013

Data Archive in Test Data Management (TDM)

In the previous posts, I explained about Data Subset, Data Masking, Test Data Ageing and Test Data Refresh. In this post, we will focus on the topic of Data Archival and how important it is to the process of Test Data Management.

What does Data Archival typically mean?

Size Management

You would want to provide an efficient mechanism for the database size management. Over time a database size grows and you need to actively manage it.

Archival of older data

Older data can be archived to some low disk space occupying area and can be later retrieved whenever needed

Types or Archive Mechanisms:

Read the full post

Saturday 2 March 2013

Test Data Refresh in TDM

In my previous posts, I explained about Data Subset and Data Masking. In this post we will focus on the topic of Test Data Refresh.

So what is Test Data Refresh? It is the process of loading / refreshing the Test Database with the latest data from the Production database or any other data source.

What are the Challenges in Test Data Refresh

Diverse data targets

Much like the Production data sources, Test data targets can also be across databases and different file systems. The data needs to be in sync across these targets

Test DB can already contain data

If it is an existing system, already the Test Database will contain older data. Steps should be taken to carefully overwrite the data without losing the data integrity.

Test DB downtime

During the time of test data refresh, it might be necessary to bring down the test environment in order to accommodate the refresh.

Types of Refresh

Read the full post

Wednesday 27 February 2013

Commonly Used Data Masking Techniques - TDM

In my previous posts I discussed about Data Subset and Data Masking. In this post, I will discuss the data Masking techniques that are widely used. This is by no means exhaustive but will provide a general idea of the techniques that are available.

Random Substitution

In this technique, the value to be masked is replaced or substituted with a random value. Depending on the nature of the random value, they can be further categorized into

Random Numbers
Random Dates
Random Seed Values For ex.

Names
Addresses
SSN Numbers
Credit Card Numbers
Telephone numbers
And a lot more

Random Alphanumerics

Read the full post

Friday 22 February 2013

Techniques for Data Subset

In my previous posts, I explained about Data Subset in TDM & Implementation Approaches to Data Sub-setting. In this post, I will explain some basic techniques with the help of which we can do a data subset.

First N records

This is a pretty simple technique, wherein the first N records of a table is retrieved from the Production database. This can be achieved using a simple SQL Query such as

SELECT TOP 10000 * FROM DBO.CUSTOMERS

Based on a filter criteria

This is a scenario where the subset conditions can be based on a simple filter criteria like Age > 50, City = London, etc. This is easier to implement in cases where the subset requirements are less complicated. An example query for such technique would be

SELECT * FROM DBO.CUSTOMERS WHERE AGE > 50 AND CITY = 'LONDON'

Based on a complex SQL query

Sometimes the subset requirements can be more complicated. It might involve dependencies across multiple tables. What that means is

Read the full post

Wednesday 20 February 2013

Implementation Approaches to Data Sub-setting

In one of my previous post, I described about the process of Data Subset. In this post we will focus on the implementation approaches to data sub-setting.

There are 3 broad categories in which you can implement sub-setting.

SQL Query based approach

In this approach, we will use SQL queries to fetch the subset of the production data and load them it into the target environment. Lets assume you have 2 tables in your production from which you need to create a small subset. The following shows the relationship of the tables Customers and Orders where they are related through the custid field.

The picture also shows the sample data within those tables. So we need to subset this. We find out a sample condition. Lets assume we will pull out only the customer ids which are odd numbers. A simple query will do this trick. The following will be the query for the Customers table.

Read the full post

Tuesday 19 February 2013

Top smells that indicate that your project needs TDM

In my previous posts, I explained about the basics of Test Data Creation, Challenges in Production Cloning, Data Subset and Data Masking. In this post we will focus on a slightly different note.

Invariably every problem has a symptom that we call smells in the modern Agile days. So this post is going to focus on the typical smells that indicate that your project needs Test Data Management (TDM).

Testers waste more time preparing test data rather than testing the application

This is probably the number one symptom or smell that warrants a TDM process and solution in place.

Testers depend a lot of Business Analysts to provide the required test data

This is also one of the top symptom when it comes to the need for TDM. There is a lot of dependency for test data from the Business Analysts.

Read the full post

Saturday 16 February 2013

Data Masking in TDM

In my previous posts, I explained about the Challenges in Production Cloning. One of the major challenges in the Production Cloning approach is Data Security. This post will focus on the solution for Data Security, Data Masking.

As already explained, Data Masking is the process of masking the sensitive fields from the complete data set. The whole objective of data masking is to ensure that no sensitive data is leaked into non-production regions like the Dev and Testing regions.

What are the sensitive fields that needs to be masked? That basically depends on the project needs. But some of the generic fields that need to be masked are:

Personal information like First names, Last Names, Email IDs, DOB, Phone & Fax numbers, SSN Numbers, National Insurance Numbers, Other national unique identifiers.
In Banking, Financial Services & Insurance industry - Bank Balances, Account numbers, Credit card numbers, Policy numbers, etc.
In Healthcare industry - PHI attributes like Medical record numbers, Member IDs, etc.

This list is by no means exhaustive, but will give a fair idea of how many fields are sensitive in nature that needs to be handled with care. Any lapse in masking any of these fields might have a big impact on the Organization as a whole.

Challenges in Data Masking

Read the full post

Friday 15 February 2013

Data Subset in TDM

In my previous post, I discussed the Challenges in Production Cloning approach. In this post, we will focus on its solution, the Data Subset process / Data Sub-setting.

Data subset is the process of slicing a part of the Production Database and loading it into the Test Database. For ex. instead of cloning a 50 TB production database, create a subset that is only 50 GB worth data and put it back into the Test Database. Lets assume in a retail application, you have a Customers table having 10 million customers and Orders table having 100 million orders and 100 million other transaction tables, our subset process will try to shrink the sizes to good reasonable limits as depicted in the picture below.

Advantages of data sub-setting

Read the full post

Wednesday 13 February 2013

Challenges in Production Cloning approach

In my previous articles, I have already discussed the topics "How to create Test Data" and "Top 3 Challenges in using Production data in Test Environments". In this post we will focus on the challenges that we face in Production Cloning approach and how to overcome those challenges.

1. Infrastructure

Even though it is highly recommended to have the Test Environment in the same lines as Production, it is not always feasible to test under those real-time conditions. It is highly recommended to do Performance / load / stress tests exactly mimicking the Production database, but the expensive infrastructure requirements might be an overkill for Functional Testing. But cloning might force you to have production like infrastructure which will translate into higher costs for the customer.

2. High Storage Costs

Another major challenge associated with Production Cloning is that all the production data needs to be stored in testing region. Assuming the production data is 50 TBs (Terabytes), the Test Database also needs to hold 50 TBs of data. So storage has to be provided for storing all of the data. And with the databases being backed up regularly, that would mean higher storage costs for the customer.

Read the full post

Top 3 Challenges in using Production data in Test Environments

In my previous post "How to create Test Data", I explained the concept of creating test data directly from the production data. In this post we will concentrate on the Top 3 challenges in using the Production data for testing purposes.

Data Security

This is by far the most crucial challenge of using Production data in Test Environments. Production data can contain a lot of sensitive information. Even though the data sets will be rich in nature in the Production database, the very thought of using production data involves a lot of risk. For ex. if you are testing an application for a bank, production data will contain real customer information like Names, Addresses, Account Numbers, Balances, Credit Card Numbers, etc. As you can see, if you try to use these data for testing, it exposes huge security risks for the bank. So how do we overcome this, the answer is Data Masking.

Data Masking is the process of masking of the sensitive fields from the complete data set. Please read my future post on Data Masking and the Techniques used for Data Masking for more details. The following figure depicts the data security challenge and the approaches.

Data Security Challenge

Read the full post

Saturday 9 February 2013

TDM Topics to be covered in this blog

Hello all,

The intention of this blog is to share my insight and knowledge in the area of Test Data Management. I am looking forward to write a few posts in the following topics. I will write those whenever I get some free time. Thanks.

How to Create Test Data
Top 3 Challenges in using Production data in Test Environments
Challenges in Production Cloning Approach
Data Subset in TDM
Data Masking in TDM
Top smells that indicate that your project needs TDM
Implementation approaches to Data Sub-setting
Techniques for Data Subset
Commonly used Data Masking Techniques - TDM
Test Data Refresh in TDM
Test Data Ageing in TDM
Data Archive in TDM
Gold Copy in Test Data Management
Test Data Life Cycle
What is Test Data Management?
Technical Challenges in Test Data Management
Non-Technical Challenges in Test Data Management
Synthetic Data Generation
Is Test Data Management same as ETL?
Tools for TDM - COTS or In-house?
Test Data Management Challenges
Test Data Management Strategy
Test Data Management (TDM) Best Practices
Test Data Management Tools
Aligning TDM with Testing process

Regards
Rajaraman R

Saturday 2 February 2013

How to create Test Data?

Let's assume you have a very basic testing need. You need to have around 50 customers created in your system for testing it. Lets assume it is a web based application. In fact, the concept is applicable to any technology/application. So you have a customer creation screen as shown below.

So how do you create the test data that is required for you.

Basically there are 3 approaches to do it:

Manual approach
Functional Automation Approach
Database Approach

Manual Approach:

In the manual approach, you would manually feed the data in the screens and then create a customer. And similarly you would do this for 50 customers. Needless to say the time taken to do it in a manual fashion is going to be big.

The time taken for the example application would be :
For 1 Customer = 1 min.
For 50 Customers = 50 mins.

Functional Automation Approach:

In the automated approach, you would automate the user interface (UI) for creating the data. Thus you will effectively speed up the process of creating the required test data. In our example, we would automate the web based UI using a Automation Tool such as QTP, RFT, Selenium, etc. and then data drive those tests to create the data that we require.

The time taken for the example application would be :
For 1 Customer = 10 seconds
For 50 Customers = 500 seconds = 8 mins.

Database Approach:

In all probabilities, you will have plenty of real-time customer information lying around in your production database. So our job will be to query the right set of customers from the production database and load them into the test database. Simple. The data is ready to be used for testing.

Here in our example application, since its a pretty straightforward requirement, we would fetch the first 50 rows from the Customers table in Production and Insert those rows into the Customers table in Test Database. The work flow will be as depicted below.

The time taken for the example application would be :

For 50 Customers = 60 seconds = 1 min (Just an example)

NOTE: The above example assumes that the back end is a Microsoft SQL Server database and hence the "SELECT TOP 50" query.

As you can see, the database approach is much faster than any of the other approaches. The effort savings are enormous in a real time test data requirement as the data volumes are much higher.

This methodology of creating test data directly from the Production data will form the corner stone and the building block of the concept called Test Data Management. Of course we are dealing with real time data and hence we need to secure the data before loading it into the Test Database, but we would deal all those topics in a separate post.

Hope the information was useful in giving a basic idea about Test Data creation. I welcome your comments. Cheers.

About the Author

Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at Test Data Management Blog & Agile Blog. Connect with him on Google+

Pages

Tuesday 19 March 2013

Wednesday 13 March 2013

Saturday 9 March 2013

Thursday 7 March 2013

Saturday 2 March 2013

Wednesday 27 February 2013

Friday 22 February 2013

Wednesday 20 February 2013

Tuesday 19 February 2013

Saturday 16 February 2013

Friday 15 February 2013

Wednesday 13 February 2013

Saturday 9 February 2013

Saturday 2 February 2013