Showing posts with label TDM Blog. Show all posts
Showing posts with label TDM Blog. Show all posts

Tuesday 19 March 2013

9 Reasons why TDM is critical to a project's success

In my previous posts, I explained about the building blocks of the concepts of Test Data Management (TDM) namely Data Subset, Data Masking, Data Archive, Test Data Refresh, Gold Copy.  Alternatively you might also want to read all articles from the table of contents.  In this post, I will try to explain why TDM is critical to a project's success.

  • Your test data determines the quality of testing
    • No matter how good your testing processes are, if the test data used is not right or of adequate quality, then the entire product's quality will be affected.
  • Your test data should be highly secure
    • It is absolutely mandatory that your test data doesn't contain data from production without being masked.  If the data is not secure enough, then there is every chance that a data breach might happen, which can cause the organization dearly.
  • Test data needs to be as close to real time as possible
    • Not only that test data needs to be of quality, it should be as close to real time data / production data as possible.  Why? Simple reason is we do not want to build a system/application/product for 6 months and fail in the production just because there was not adequate real time data to test.

Wednesday 13 March 2013

Test Data Life Cycle

In the previous posts, I explained about the various concepts surrounding Test data creation and maintenance, namely Data Subset, Data Masking, Test Data Ageing, Test Data Refresh, Data Archive and Gold Copy.  In this post, I will focus on the life cycle of Test Data.

So what is meant by a life cycle.  Life Cycle is the various stages that a product/service/artifact goes through before attaining its end of life.  So a Test Data Cycle explains the various stages through which the test data goes through in order to reach its end of life or alternatively start a recurring life cycle.

So similar to a test life cycle or a software development life cycle, Test Data goes through the following phases.  They are

Requirement Gathering & Analysis

This is pretty straightforward.  In this phase, the test data requirements pertaining to the test requirements are gathered.  They are categorized into various heads

  • Pain Areas
  • Data Sources
  • Data Security/Masking
  • Data Volume requirements
  • Data Archival requirements
  • Test Data Refresh considerations
  • Gold Copy considerations

This phase is typically carried out in the form of a TDM assessment or Test Data Assessment.  Since that topic requires separate attention, I will dedicate a blog post to it.


Planning & Design

Saturday 9 March 2013

Gold Copy in Test Data Management (TDM)

In the previous posts, we discussed about Data Subset, Data Masking, Test Data Ageing, Data Archive in TDM and Test Data Refresh.  In this post, we will try to focus on what is Gold Copy in TDM.

So what is meant by Gold Copy?

This is the baseline version of the data that can be used for future releases.  For example,  if you are trying to load your test database from the production database for the first time.  In this case, you can save the copy as a baseline from which future test data refreshes can be made.  The following picture depicts this concept of Gold Copy



Gold Copy - Basics

















Storage

Monday 4 March 2013

What is Test Data Ageing in TDM?

In our previous posts I explained about Data Subset and Data Masking in TDM.  In this post we will focus on Test Data Ageing.

This is useful for Time based testing.  Let's assume you create a customer and it requires 48 hours for activation of that particular customer.  What if you have to test the scenario that will occur after 48 hours? Will you wait till 48 hours for that scenario to happen for your testing? The answer is No.  Then how will you handle this scenario?

There are basically 2 approaches by which we can do this

  • Tamper the system dates
    • Although it is possible in some cases to tamper the system dates and continue with the testing, this method will fail if the date is generated by a database server or an application server instead  of the client.
  • Tamper the dates in the backend
    • This should be most viable and practical solution for such scenario.  In this approach, we modify the date at the backend so that it reflects the new date.  But care should be taken to ensure that data integrity doesn't get lost or the data semantics doesn't get lost.
This method of modifying the date according to the scenario needs is known as Test Data Ageing.  Depending on the scenario that needs to be tested, we can either Back date or Front date the given date.


Challenges

Saturday 2 March 2013

Test Data Refresh in TDM

In my previous posts, I explained about Data Subset and Data Masking.  In this post we will focus on the topic of Test Data Refresh.

So what is Test Data Refresh?  It is the process of loading / refreshing the Test Database with the latest data from the Production database or any other data source.


What are the Challenges in Test Data Refresh
  • Diverse data targets
    • Much like the Production data sources, Test data targets can also be across databases and different file systems.  The data needs to be in sync across these targets
  • Test DB can already contain data
    • If it is an existing system, already the Test Database will contain older data.  Steps should be taken to carefully overwrite the data without losing the data integrity.
  • Test DB downtime
    • During the time of test data refresh, it might be necessary to bring down the test environment in order to accommodate the refresh.

Types of Refresh

Friday 22 February 2013

Techniques for Data Subset

In my previous posts, I explained about Data Subset in TDM & Implementation Approaches to Data Sub-setting.  In this post, I will explain some basic techniques with the help of which we can do a data subset.

  • First N records
    • This is a pretty simple technique, wherein the first N records of a table is retrieved from the Production database.  This can be achieved using a simple SQL Query such as
      • SELECT TOP 10000 * FROM DBO.CUSTOMERS
  • Based on a filter criteria
    • This is a scenario where the subset conditions can be based on a simple filter criteria like Age > 50, City = London, etc.  This is easier to implement in cases where the subset requirements are less complicated.  An example query for such technique would be
      • SELECT * FROM DBO.CUSTOMERS WHERE AGE > 50 AND CITY = 'LONDON'
  • Based on a complex SQL query
    • Sometimes the subset requirements can be more complicated.  It might involve dependencies across multiple tables.  What that means is

Tuesday 19 February 2013

Top smells that indicate that your project needs TDM

In my previous posts, I explained about the basics of Test Data Creation, Challenges in Production Cloning, Data Subset and Data Masking.  In this post we will focus on a slightly different note.

Invariably every problem has a symptom that we call smells in the modern Agile days.  So this post is going to focus on the typical smells that indicate that your project needs Test Data Management (TDM).

  • Testers waste more time preparing test data rather than testing the application
    • This is probably the number one symptom or smell that warrants a TDM process and solution in place.
  • Testers depend a lot of Business Analysts to provide the required test data
    • This is also one of the top symptom when it comes to the need for TDM.  There is a lot of dependency for test data from the Business Analysts.

Saturday 16 February 2013

Data Masking in TDM

In my previous posts, I explained about the Challenges in Production Cloning.  One of the major challenges in the Production Cloning approach is Data Security.  This post will focus on the solution for Data Security, Data Masking.


As already explained, Data Masking is the process of masking the sensitive fields from the complete data set.    The whole objective of data masking is to ensure that no sensitive data is leaked into non-production regions like the Dev and Testing regions.



What are the sensitive fields that needs to be masked?  That basically depends on the project needs.  But some of the generic fields that need to be masked are:


  • Personal information like First names, Last Names, Email IDs, DOB, Phone & Fax numbers, SSN Numbers, National Insurance Numbers, Other national unique identifiers.
  • In Banking, Financial Services & Insurance industry - Bank Balances, Account numbers, Credit card numbers, Policy numbers, etc.
  • In Healthcare industry - PHI attributes like Medical record numbers, Member IDs, etc.


This list is by no means exhaustive, but will give a fair idea of how many fields are sensitive in nature that needs to be handled with care.  Any lapse in masking any of these fields might have a big impact on the Organization as a whole.

Challenges in Data Masking