tag:blogger.com,1999:blog-28712296216926777582024-03-21T07:54:22.165+05:30Test Data ManagementAnonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.comBlogger16125tag:blogger.com,1999:blog-2871229621692677758.post-65507669539975439392013-03-19T11:25:00.003+05:302013-04-03T00:19:33.744+05:309 Reasons why TDM is critical to a project's success<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In my previous posts, I explained about the building blocks of the concepts of Test Data Management (TDM) namely <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset</a>, <a href="http://tdminsights.blogspot.in/2013/02/data-masking-in-tdm.html" target="_blank">Data Masking</a>, <a href="http://tdminsights.blogspot.in/2013/03/data-archive-in-test-data-management-tdm.html" target="_blank">Data Archive</a>, <a href="http://tdminsights.blogspot.in/2013/03/test-data-refresh-in-tdm.html" target="_blank">Test Data Refresh</a>, <a href="http://tdminsights.blogspot.in/2013/03/gold-copy-in-test-data-management-tdm.html" target="_blank">Gold Copy</a>. Alternatively you might also want to read all articles from the <a href="http://tdminsights.blogspot.in/p/table-of-contents_14.html" target="_blank">table of contents</a>. In this post, I will try to explain why TDM is critical to a project's success.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Your test data determines the quality of testing</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">No matter how good your testing processes are, if the test data used is not right or of adequate quality, then the entire product's quality will be affected.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Your test data should be highly secure</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">It is absolutely mandatory that your test data doesn't contain data from production without being masked. If the data is not secure enough, then there is every chance that a data breach might happen, which can cause the organization dearly.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Test data needs to be as close to real time as possible</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Not only that test data needs to be of quality, it should be as close to real time data / production data as possible. Why? Simple reason is we do not want to build a system/application/product for 6 months and fail in the production just because there was not adequate real time data to test.
<a name='more'></a></span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Lowers test data creation time which results in overall test execution time</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is self explanatory. This drastically reduces the overall test execution time.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Testers can focus on testing rather than test data creation</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">The main focus of trying to automate the test data management process is to allow the testers to focus on the actual testing than worrying about how the data is created and the technicalities surrounding it. This allows the team to remain focused on the job at hand (The actual testing) so that it can be done more effectively.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Speeds up time to market of applications</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Faster & Effective test data creation leads to faster & effective testing, which in turn leads to faster time to market for the application. It is a cycle and hence it has a compounding effect, release on release.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Increases efficiency of the process by reducing data related defects</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Due to the accuracy of the test data, data related defects will reduce enormously, thereby increasing the efficiency of the process.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>You can manage lower volumes of test data sets more efficiently</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Any time, managing lower volumes is better and more cost effective than managing higher data volumes. The maintenance costs associated with higher volumes will increase over time and will affect the operational costs.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Process remains same even though team size increases</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is a critical point, you would not need to reinvent the wheel if the team is ramped up. The same process can be followed/extended even if team size increases.</span></li>
</ul>
<ul></ul>
</ul>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In this post, we discussed why Test Data Management is critical to the project's success. Quite often, test data management is not the priority for most projects. This affects their efficiency in the longer run. Hope this post helps in bringing out the point why it is so critical to the project. Do you feel there are more points to add? Please feel free to comment. Thanks for reading.</span></div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i><b>About the Author</b></i></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" style="color: #888888; text-decoration: none;" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/" style="color: #888888; text-decoration: none;">Agile Blog</a>. Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com29tag:blogger.com,1999:blog-2871229621692677758.post-74592633345785225282013-03-13T10:04:00.001+05:302013-04-03T13:29:57.489+05:30Test Data Life Cycle<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In the previous posts, I explained about the various concepts surrounding Test data creation and maintenance, namely <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset</a>, <a href="http://tdminsights.blogspot.in/2013/02/data-masking-in-tdm.html" target="_blank">Data Masking</a>, <a href="http://tdminsights.blogspot.in/2013/03/what-is-test-data-ageing-in-tdm.html" target="_blank">Test Data Ageing</a>, <a href="http://tdminsights.blogspot.in/2013/03/test-data-refresh-in-tdm.html" target="_blank">Test Data Refresh</a>, <a href="http://tdminsights.blogspot.in/2013/03/data-archive-in-test-data-management-tdm.html" target="_blank">Data Archive</a> and <a href="http://tdminsights.blogspot.in/2013/03/gold-copy-in-test-data-management-tdm.html" target="_blank">Gold Copy</a>. In this post, I will focus on the life cycle of Test Data.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">So what is meant by a life cycle. Life Cycle is the various stages that a product/service/artifact goes through before attaining its end of life. So a Test Data Cycle explains the various stages through which the test data goes through in order to reach its end of life or alternatively start a recurring life cycle.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">So similar to a test life cycle or a software development life cycle, Test Data goes through the following phases. They are</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Requirement Gathering & Analysis</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">This is pretty straightforward. In this phase, the test data requirements pertaining to the test requirements are gathered. They are categorized into various heads</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
</div>
<ul>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Pain Areas</span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Data Sources</span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Data Security/Masking</span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Data Volume requirements</span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Data Archival requirements</span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Test Data Refresh considerations</span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Gold Copy considerations</span></li>
</ul>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">This phase is typically carried out in the form of a TDM assessment or Test Data Assessment. Since that topic requires separate attention, I will dedicate a blog post to it.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;"></span><br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Planning & Design</span><br />
<a name='more'></a></div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">
</span>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">As the name indicates, based on the requirement analysis an appropriate solution is designed to solve the various pain areas in the Test Data. After looking at the problem scale and the feasible solution, a suitable test data process is suggested and we would need to choose between an In-House solution or a Commercial Product or a combination of both. Also in this phase, an effort estimate is done for the entire project. And a test data plan/strategy is also developed that will propose a direction that the project will take and what approaches will be followed to solve the test data problems. That could be either in the form of process improvements or in the form of an automated solution.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Test Data Creation</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In this phase, based on the Test Data Strategy, the solution is developed and test data is created through various techniques depending on the project test data requirements. It can be a combination of manual and automated techniques. Automation techniques might include In-house tools or commercial products. It can be either a refresh from production or generation from scratch or a hybrid approach. The output at the end of this phase will be the actual test data required for the project.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Test Data Validation</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In this phase, the created test data is validated against the business requirements. This can be done by Business Analysts or using automated tools if the volumes are very high.</span></div>
<div style="text-align: left;">
</div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;"></span><br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Test data maintenance</span></div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">
</span>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">This is similar to a test maintenance phase, where there might be requests for changes in the test data according to the changes in the tests. Hence again the entire life cycle is followed for maintenance of the test data. This might include creation of Gold Copy for future use, Archives for size management, updating of Gold Copy, Restoration of older data for testing, etc.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">The following figure depicts the Test Data Life Cycle</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpP1Bz0zB-x6fJV4uGLsVez-fDF6nwIH5B0WxJC0KJRIEzr5QdTTwZ-F7pWKFJlUBCw11y95DEKmt3hxQIYK1jHoa5WJbbcxQeSHK2Uutxj6H1qYQr6WaHH7AaPiMwivW2y85qYpLrdb0/s1600/Test+Data+Life+Cycle.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-left: 1em; text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="262" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpP1Bz0zB-x6fJV4uGLsVez-fDF6nwIH5B0WxJC0KJRIEzr5QdTTwZ-F7pWKFJlUBCw11y95DEKmt3hxQIYK1jHoa5WJbbcxQeSHK2Uutxj6H1qYQr6WaHH7AaPiMwivW2y85qYpLrdb0/s400/Test+Data+Life+Cycle.png" width="400" /></span></a></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-4JYbhfaKcbZVc2WGY9Y7ZXtozaFNitkpZMw1QGImAXbens9buKIXqpp3v428wy9aP9QLQxUL3hRfOTrrsiP4NVn7wLK-HfHO58VIutpyRq5zdfg_ydjRtiZ3ZIOOagEqHIP2Pau0WBg/s1600/Test+Data+Life+Cycle.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-left: 1em; text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-4JYbhfaKcbZVc2WGY9Y7ZXtozaFNitkpZMw1QGImAXbens9buKIXqpp3v428wy9aP9QLQxUL3hRfOTrrsiP4NVn7wLK-HfHO58VIutpyRq5zdfg_ydjRtiZ3ZIOOagEqHIP2Pau0WBg/s1600/Test+Data+Life+Cycle.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-left: 1em; text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-4JYbhfaKcbZVc2WGY9Y7ZXtozaFNitkpZMw1QGImAXbens9buKIXqpp3v428wy9aP9QLQxUL3hRfOTrrsiP4NVn7wLK-HfHO58VIutpyRq5zdfg_ydjRtiZ3ZIOOagEqHIP2Pau0WBg/s1600/Test+Data+Life+Cycle.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-left: 1em; text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></a><br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Hope you got an idea of how the test data flows through a life cycle. Hope you found this informative. Thanks for reading. Comments are welcome!</span></div>
<br />
<br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i><b>About the Author</b></i></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" style="color: #888888; text-decoration: none;" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/" style="color: #888888; text-decoration: none;">Agile Blog</a>. </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
<div>
<br /></div>
<br />
<br />
<br />
<br /></div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com29tag:blogger.com,1999:blog-2871229621692677758.post-59215882187585267242013-03-09T19:08:00.000+05:302013-04-03T13:30:16.752+05:30Gold Copy in Test Data Management (TDM)<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In the previous posts, we discussed about <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset</a>, <a href="http://tdminsights.blogspot.in/2013/02/data-masking-in-tdm.html" target="_blank">Data Masking</a>, <a href="http://tdminsights.blogspot.in/2013/03/what-is-test-data-ageing-in-tdm.html" target="_blank">Test Data Ageing</a>, <a href="http://tdminsights.blogspot.in/2013/03/data-archive-in-test-data-management-tdm.html" target="_blank">Data Archive in TDM</a> and <a href="http://tdminsights.blogspot.in/2013/03/test-data-refresh-in-tdm.html" target="_blank">Test Data Refresh</a>. In this post, we will try to focus on what is Gold Copy in TDM.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><span style="font-size: large;">So what is meant by Gold Copy?</span></span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;">This is the baseline version of the data that can be used for future releases. For example, if you are trying to load your test database from the production database for the first time. In this case, you can save the copy as a baseline from which future test data refreshes can be made. The following picture depicts this concept of Gold Copy</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKLXXA_ukKxfqK4BhngDT2B-LcKy5ertRnc_HgKeMQBNjZXYaf_Il7gg45xrU8ux3yRfqFt3SPh9yXwIwo-hrUT212rcEPfHGowDsOfXUe3mCMKyfLH0Uwwetq2TRlg-bxBZmig7OcIhE/s1600/Gold+Copy+-+Basics.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: justify;"><img alt="Gold Copy - Basics" border="0" height="246" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKLXXA_ukKxfqK4BhngDT2B-LcKy5ertRnc_HgKeMQBNjZXYaf_Il7gg45xrU8ux3yRfqFt3SPh9yXwIwo-hrUT212rcEPfHGowDsOfXUe3mCMKyfLH0Uwwetq2TRlg-bxBZmig7OcIhE/s400/Gold+Copy+-+Basics.png" title="Gold Copy" width="400" /></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Storage</span><br />
<a name='more'></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;">Gold copy can be stored either in databases or archived File systems. Things to consider before storing these copies is convenience of storage and time taken for data retrieval.</span></div>
<br />
<span style="font-family: Georgia, 'Times New Roman', serif; font-size: large; text-align: justify;">Benefits</span><br />
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Serves as a starting point/baseline for further processing</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This version serves as a baseline for all future Test Data Refresh.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Reusable</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Reusable across multiple environments. If there are more than one environment, then all the refreshes can be made from the Gold Copy.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Saves critical extract time from production</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">One of the critical challenge in Test Data Refresh is that the Production database needs to be connected in order to refresh the test database. Using a gold copy approach, you would reduce this time immensely and without disturbing the production database.</span></li>
</ul>
</ul>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: large;">
<span style="font-family: Georgia, Times New Roman, serif;">Challenges</span></span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Storage Space</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">A Gold copy needs to be stored for future reference. If the data volumes are very high, storage size can increase proportionately. So you need to spend more on your storage costs.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Difficult to create</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">It is quite difficult to come up with a gold copy, as data might be from multiple sources and there might a need to consolidate different data into a common repository.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Need to maintain it</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">A gold copy needs to be created and maintained. A Gold copy need to be updated with the latest data and the fresh data that was created. This is another challenge associated with Gold Copy.</span></li>
</ul>
</ul>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Do you think there are any other challenges / Do you have any alternate view points. Please feel free to comment. Thanks for the read.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i><b>About the Author</b></i></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" style="color: #888888; text-decoration: none;" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/" style="color: #888888; text-decoration: none;">Agile Blog</a>. </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com11tag:blogger.com,1999:blog-2871229621692677758.post-18258236574458241992013-03-07T11:15:00.003+05:302013-04-03T13:30:43.257+05:30Data Archive in Test Data Management (TDM)<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In the previous posts, I explained about <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html">Data Subset</a>, <a href="http://tdminsights.blogspot.in/2013/02/data-masking-in-tdm.html" target="_blank">Data Masking</a>, <a href="http://tdminsights.blogspot.in/2013/03/what-is-test-data-ageing-in-tdm.html" target="_blank">Test Data Ageing</a> and <a href="http://tdminsights.blogspot.in/2013/03/test-data-refresh-in-tdm.html" target="_blank">Test Data Refresh</a>. In this post, we will focus on the topic of Data Archival and how important it is to the process of Test Data Management.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">What does Data Archival typically mean?</span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Size Management</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">You would want to provide an efficient mechanism for the database size management. Over time a database size grows and you need to actively manage it.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Archival of older data</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Older data can be archived to some low disk space occupying area and can be later retrieved whenever needed</span></li>
</ul>
<ul>
</ul>
</ul>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Types or Archive Mechanisms:</span><br />
<a name='more'></a></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Live Archive</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is the mechanism of archiving a live production data so that the production database is not disturbed. This mechanism is typically used for Production databases which contain a lot of transaction data but may not need every data for its routine processing. In this case, the data is archived to preferably another database so that the Production database size doesn't grow beyond limits</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>File Based Archive</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">In this mechanism, the data to be archived is converted to a proprietary format (text or preferably binary) so that it can be stored for future use.</span></li>
</ul>
</ul>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Challenges in Data Archive</span></div>
</div>
<div>
<br />
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;"><b>Security</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">Since data from the database is archived and stored in file formats, utmost care should be taken to secure the file contents</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;"><b>Compression</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">Since one of the objectives is to reduce the storage costs, effective data compression algorithms needs to be used in order to compress the data files</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;"><b>Data sources</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">The data can come from multiple data sources, and need to be restored to multiple data targets. Hence it is also necessary to take this into account while designing a data archival solution (or) using a commercial data archival solution.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;"><b>Data Relationships (Yet again)</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">We generally archive only a portion of the data depending on the need. Hence effective sub setting techniques should be used to maintain data integrity and the same needs to be maintained when restoring the data.</span></li>
</ul>
</ul>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Out of these challenges, security doesn't apply to TDM because you are dealing with Test data in here. But the other challenges do get applied.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><span style="font-size: large;">Data Archive in the Scope of Test Data Management</span></span></div>
</div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Maintenance of test data</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Typically used in maintenance of test data over a period of releases.</span></li>
<ul></ul>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Archival of older release data</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">You can always archive your older release test data so that it remains intact for future use</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Archival of multiple environment's test data</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">If there are multiple test environments, the test database size grows proportionate to the number of environments. In this case, archiving the data would save a lot of disk space.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Restore whenever necessary</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">An archive should be easily restorable. </span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Release/Build/Cycle wise snapshots for easy restore</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Snapshots can be maintained as per the project release cycles. This is useful in case of production support wherein, we would need an older environment for testing the production support release.</span></li>
</ul>
</ul>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In summary, we have seen what is data archival, its challenges and its application in Test Data Management. Hope you found the post useful.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Do you think there are any other challenges that you see with data archive in the scope of Test Data Management? Please feel free to comment.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i><b>About the Author</b></i></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" style="color: #888888; text-decoration: none;" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/" style="color: #888888; text-decoration: none;">Agile Blog</a>. </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com6tag:blogger.com,1999:blog-2871229621692677758.post-45521315403942083512013-03-04T01:02:00.001+05:302013-04-03T13:31:03.224+05:30What is Test Data Ageing in TDM?<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In our previous posts I explained about Data Subset and Data Masking in TDM. In this post we will focus on Test Data Ageing.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;">This is useful for Time based testing. Let's assume you create a customer and it requires 48 hours for activation of that particular customer. What if you have to test the scenario that will occur after 48 hours? Will you wait till 48 hours for that scenario to happen for your testing? The answer is No. Then how will you handle this scenario?</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">There are basically 2 approaches by which we can do this</span></div>
<br />
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Tamper the system dates</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Although it is possible in some cases to tamper the system dates and continue with the testing, this method will fail if the date is generated by a database server or an application server instead of the client.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Tamper the dates in the backend</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This should be most viable and practical solution for such scenario. In this approach, we modify the date at the backend so that it reflects the new date. But care should be taken to ensure that data integrity doesn't get lost or the data semantics doesn't get lost.</span></li>
</ul>
</ul>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">This method of modifying the date according to the scenario needs is known as Test Data Ageing. </span><span style="font-family: Georgia, 'Times New Roman', serif;">Depending on the scenario that needs to be tested, we can either Back date or Front date the given date.</span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Challenges</span><br />
<a name='more'></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;"><br /></span></div>
<div style="text-align: justify;">
</div>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;"><b>Relationship among other date fields</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">It might happen that more than one date fields are related. So lets assume there are two date fields D1 and D2, if D1 is aged forward by 30 days, then D2 also needs to be aged by 30 days to make the data as a whole correct.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;"><b>Relationship among non date fields</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">Relationships might exists from a date field to a Non-date field also. For ex. a customer that is created might have 2 fields Creation date and a Flag depicting if the status is Pending or Activated. If the customer creation date is aged, then the status field should also be modified.</span></li>
</ul>
</ul>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">The following picture depicts the challenges in both the scenarios.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisrmhqJlE3q7eHysL7saEvT3bmril0Y3bVDDw8Et6KX88DrgEYNviGCTxuYkh7X6PelJ70V8dGlmPmspjThRRbofiE5VKhxDdPmoo12UlKofrgFpSghWUH92WCHSRxO_RWHhfG04eoEQM/s1600/Aging+-+Scenario+1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: justify;"><img border="0" height="235" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisrmhqJlE3q7eHysL7saEvT3bmril0Y3bVDDw8Et6KX88DrgEYNviGCTxuYkh7X6PelJ70V8dGlmPmspjThRRbofiE5VKhxDdPmoo12UlKofrgFpSghWUH92WCHSRxO_RWHhfG04eoEQM/s400/Aging+-+Scenario+1.png" width="400" /></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqSW-QGFbB6Hkg8STY5gQ1Cjdce4rna-vqQYlf2ks4TFmI-FLoU2jx6_ZbPUTqd60C3EGdAM479PUgGvWfNQEn_5ITMFq-Wsx-7YxcWmZgGtpWjgfOiCcO7aF7LfiyxN47UfX3PBAfPg0/s1600/Aging+-+Scenario+2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="236" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqSW-QGFbB6Hkg8STY5gQ1Cjdce4rna-vqQYlf2ks4TFmI-FLoU2jx6_ZbPUTqd60C3EGdAM479PUgGvWfNQEn_5ITMFq-Wsx-7YxcWmZgGtpWjgfOiCcO7aF7LfiyxN47UfX3PBAfPg0/s400/Aging+-+Scenario+2.png" width="400" /></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">These challenges can be overcome by setting rules that specify the relation between the different data fields and the values that need to be set.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">There are more techniques to age the data. They are </span></div>
<div style="text-align: justify;">
</div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">General Calendar</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">This is based on the normal calendar days. This is pretty straightforward and this is the most widely used Ageing technique.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">Business Calendar</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">This is based on certain special rules. For ex. If you want to test a Payroll processing system, you might need test data that will fall on the exact pay day. The pay day in India is the last working day of every month and in the US, it is biweekly. So based on these differences, rules can be formed and appropriate data ageing can be applied.</span></li>
</ul>
</ul>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Hope this post was informative. Thanks for the read. Do you think there are any other means by which Data Ageing can be performed. I would like to hear your comments.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
</div>
<div style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i><b>About the Author</b></i></div>
<div style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br /></div>
<div style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" style="color: #888888; text-decoration: none;" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/" style="color: #888888; text-decoration: none;">Agile Blog</a>.</i><i style="background-color: white;"> </i><i style="background-color: white;">Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com6tag:blogger.com,1999:blog-2871229621692677758.post-88158217598028433292013-03-02T00:39:00.001+05:302013-04-03T13:31:20.809+05:30Test Data Refresh in TDM<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In my previous posts, I explained about <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset</a> and <a href="http://tdminsights.blogspot.in/2013/02/data-masking-in-tdm.html" target="_blank">Data Masking</a>. In this post we will focus on the topic of Test Data Refresh.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">So what is Test Data Refresh? It is the process of loading / refreshing the Test Database with the latest data from the Production database or any other data source.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">What are the Challenges in Test Data Refresh</span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Diverse data targets</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Much like the Production data sources, Test data targets can also be across databases and different file systems. The data needs to be in sync across these targets</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Test DB can already contain data</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">If it is an existing system, already the Test Database will contain older data. Steps should be taken to carefully overwrite the data without losing the data integrity.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Test DB downtime</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">During the time of test data refresh, it might be necessary to bring down the test environment in order to accommodate the refresh.</span></li>
</ul>
</ul>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif; font-size: large;">Types of Refresh</span><br />
<a name='more'></a></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>From Scratch</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">In this approach, the database schema is created and then the data is loaded into it. This is useful in case of first time Test database creation or a new Test environment creation.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Complete Refresh</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">In this approach, the assumption is that the Test DB already contains data. This data is wiped out completely and the new data is loaded</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Partial Refresh</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">In this approach, only a subset of the data in the Test DB is refreshed rather than the entire data. This is useful when only a specific or a few of the modules are to be tested in the application/product</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Incremental refresh</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">In this approach, the existing data in the Test Database is untouched and the new data is appended to the existing data. This method is useful when the old data could not be removed due to various reasons.</span></li>
</ul>
</ul>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Hope the post was informative. Thanks for the read. Comments are welcome.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<i style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;"><b>About the Author</b></i><br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" style="color: #888888; text-decoration: initial;" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/" style="color: #888888; text-decoration: initial;">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com3tag:blogger.com,1999:blog-2871229621692677758.post-24863306284427234812013-02-27T10:47:00.003+05:302013-04-03T13:31:40.111+05:30Commonly Used Data Masking Techniques - TDM<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In my previous posts I discussed about <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset</a> and <a href="http://tdminsights.blogspot.in/2013/02/data-masking-in-tdm.html" target="_blank">Data Masking</a>. In this post, I will discuss the data Masking techniques that are widely used. This is by no means exhaustive but will provide a general idea of the techniques that are available.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Random Substitution</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">In this technique, the value to be masked is replaced or substituted with a random value. Depending on the nature of the random value, they can be further categorized into</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Random Numbers</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Random Dates</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Random Seed Values For ex.</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Names</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Addresses</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">SSN Numbers</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Credit Card Numbers</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Telephone numbers</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">And a lot more</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Random Alphanumerics<a name='more'></a></span></li>
</ul>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Algorithmic Substitution</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Even though a random substitution technique is used, certain fields need certain algorithms to be followed. For ex. a Credit Card number need to follow the mod-10 algorithm and an SSN Number should only be 9 digits in the following format AAA-GG-RRRR</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">A -> Area Code within US</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">G -> Group Code</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">R -> Random number</span></li>
</ul>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Sequence</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This technique is to generate a sequence of data.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Selective Mask</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Masking a selective portion of the data. For example, altering only the domain name of an Email ID.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Nulling</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This technique will null the column values to a Null value in the database.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Blurring</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is a technique of adding a random variance to the existing values. This is mostly used for numeric fields for providing variations of the same data. For ex. producing a variation of 80% to 120% of the current salary values.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Custom Rules / Expressions</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Certain fields can be more complicated to mask than the others. For those fields, custom rules / expressions might be needed to satisfy those requirements. For ex. A bank account number might have the following rule for a customer account number - BBB-LLLLLL-AAAA</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">B -> Bank Unique Code</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">L -> Location / Branch code of the bank</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">A -> Account number</span></li>
</ul>
</ul>
</ul>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">The technique of generating meaningful values for masked data is known as Intelligent masking. This technique is widely used in today's data masking solutions.</span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;">Hope this post was informative. Please feel free to comment. Thanks for the read.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<br />
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i><b>About the Author</b></i></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" style="color: #888888; text-decoration: initial;" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/" style="color: #888888; text-decoration: initial;">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com12tag:blogger.com,1999:blog-2871229621692677758.post-30693815102175072512013-02-22T10:02:00.003+05:302013-04-03T13:32:06.205+05:30Techniques for Data Subset <div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In my previous posts, I explained about <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset in TDM</a> & <a href="http://tdminsights.blogspot.in/2013/02/implementation-approaches-to-data-sub.html" target="_blank">Implementation Approaches to Data Sub-setting</a>. In this post, I will explain some basic techniques with the help of which we can do a data subset.</span></div>
<div style="text-align: justify;">
<br /></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>First N records</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is a pretty simple technique, wherein the first N records of a table is retrieved from the Production database. This can be achieved using a simple SQL Query such as</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><i>SELECT TOP 10000 * FROM DBO.CUSTOMERS</i></span></li>
</ul>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Based on a filter criteria</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is a scenario where the subset conditions can be based on a simple filter criteria like Age > 50, City = London, etc. This is easier to implement in cases where the subset requirements are less complicated. An example query for such technique would be</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><i>SELECT * FROM DBO.CUSTOMERS WHERE AGE > 50 AND CITY = 'LONDON'</i></span></li>
</ul>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Based on a complex SQL query</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Sometimes the subset requirements can be more complicated. It might involve dependencies across multiple tables. What that means is <a name='more'></a>multi-level queries, and complicated joins in SQL terms. So a subset can be based on complex SQL queries. For ex. Fetch all the customers that have placed orders during the Christmas season. This will translate into an SQL query which will involve both Customer as well as Orders table. For example a query might be</span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><i>SELECT * FROM CUSTOMERS WHERE CUSTID IN (SELECT DISTINCT ORD_CUSTID FROM ORDERS WHERE ORDER_DATE > '1-DEC-2012' AND ORDER_DATE < '1-JAN-2013')</i></span></li>
</ul>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Sampling / Distribution based</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Sampling is a process of picking a sample of the records, for example every 5000th record or based on the distribution of certain fields. For ex. Fetching the records with 10% data with London city, 50% data with Chennai city, 40% data with Mumbai city. This can be useful in scenarios where there is a need for specific sets of test data in larger quantities. There are no straight forward SQL queries for this technique and is more complicated than the rest of the techniques.</span></li>
</ul>
</ul>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">A thing to be noted here is, the condition that is applied on the Parent table should be cascaded down to the Child tables also. This is a critical requirement for any data subset as it will ensure data integrity across all the tables.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><b><i>NOTE: The assumption is the RDBMS used here in our case is SQL Server, the concept is similar for other RDBMS also but the syntax might vary slightly.</i></b></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Hope this post was informative. Thanks for the read. Comments are welcome. Please feel free to add any techniques / debate on the techniques mentioned above. Cheers.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<i><b>About the Author</b></i></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" style="color: #888888; text-decoration: initial;" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/" style="color: #888888; text-decoration: initial;">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
<ul style="text-align: left;"><ul>
</ul>
</ul>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com1tag:blogger.com,1999:blog-2871229621692677758.post-24716831476452058532013-02-20T11:40:00.003+05:302013-04-03T13:32:30.205+05:30Implementation Approaches to Data Sub-setting<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In one of my previous post, I described about the process of <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset</a>. In this post we will focus on the implementation approaches to data sub-setting.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">There are 3 broad categories in which you can implement sub-setting.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">SQL Query based approach</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In this approach, we will use SQL queries to fetch the subset of the production data and load them it into the target environment. Lets assume you have 2 tables in your production from which you need to create a small subset. The following shows the relationship of the tables Customers and Orders where they are related through the custid field. </span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdbBng6nwZ_62FVrAoqfO7wBIew-oM58Xe1Ib9hoTehMrkcsmxFtXkxo9_pPhVv-SE8n20PCghxqixXscTuLmpWx2SWt9MlkVS7RiAukK6_bSNRbtsI_QB6i-2hAo1WRt20UdjU-4ECto/s1600/ERD+++Data.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdbBng6nwZ_62FVrAoqfO7wBIew-oM58Xe1Ib9hoTehMrkcsmxFtXkxo9_pPhVv-SE8n20PCghxqixXscTuLmpWx2SWt9MlkVS7RiAukK6_bSNRbtsI_QB6i-2hAo1WRt20UdjU-4ECto/s400/ERD+++Data.png" width="400" /></span></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"></span><br />
<span style="font-family: Georgia, 'Times New Roman', serif;">The picture also shows the sample data within those tables. So we need to subset this. We find out a sample condition. Lets assume we will pull out only the customer ids which are odd numbers. A simple query will do this trick. The following will be the query for the Customers table.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"></span><br />
<a name='more'></a></div>
<br />
<div style="text-align: justify;">
<i><span style="font-family: Georgia, Times New Roman, serif;">SELECT * FROM [TDMMock].[dbo].[Customers] where custid % 2 = 1</span></i></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">This query will return the customer ids which are odd numbers. So in our case, custid - 1,3,5,7,9 will be in the result set. But the trick here is, the orders table should also contain orders only pertaining to custids 1,3,5,7,9. So how to do that. SQL Query again to the rescue. The following will be the query for the Orders table.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<br />
<div style="text-align: justify;">
<i><span style="font-family: Georgia, Times New Roman, serif;">SELECT * FROM [TDMMock].[dbo].[Orders]</span></i></div>
<div style="text-align: justify;">
<i><span style="font-family: Georgia, Times New Roman, serif;">WHERE order_custid in</span></i></div>
<div style="text-align: justify;">
<i><span style="font-family: Georgia, Times New Roman, serif;">(SELECT custid FROM [TDMMock].[dbo].[Customers] WHERE custid % 2 = 1)</span></i></div>
<br />
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">What this simple query does is fetches only the orders based on the parent customer's subset conditions. This concept applies to how much ever tables that are under consideration and it</span><span style="font-family: Georgia, 'Times New Roman', serif;"> forms the basic building block of any Data Subset solution / algorithm. </span><span style="font-family: Georgia, 'Times New Roman', serif;">The following picture depicts the queries and the corresponding result sets. </span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTIV6f1PIwJPbMB64vI11R25VRLVnMcZJpliS-gUADBEvUKga3k6muwOYboEuXGCG2eEliWxtxPE2g9xXB5kox6NAPGlREZbDiEhsTmjWp1DQAbvBf947zcj2BM2dc03o1uZ_pZKgfxAE/s1600/Subset+-+Results.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="193" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTIV6f1PIwJPbMB64vI11R25VRLVnMcZJpliS-gUADBEvUKga3k6muwOYboEuXGCG2eEliWxtxPE2g9xXB5kox6NAPGlREZbDiEhsTmjWp1DQAbvBf947zcj2BM2dc03o1uZ_pZKgfxAE/s400/Subset+-+Results.png" width="400" /></span></a></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">As you can see the data in the customers and orders table are in sync only containing custid 1,3,5,7,9. This is a valid subset.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Bottom line is </span></div>
<br />
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">We now have a valid subset of both Customers and Orders table</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Data integrity is still intact as the foreign key is still in sync</span></li>
</ul>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Thus with the help of simple and in a real time project, more complex SQL Queries, we can do effective sub-setting. However there are a few technical challenges in this approach which I would probably detail in another post.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<b><span style="font-family: Georgia, Times New Roman, serif;">Pros:</span></b></div>
<div style="text-align: justify;">
</div>
<ol>
<li><span style="font-family: Georgia, Times New Roman, serif;">Quick to build especially with Database knowledge</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Easier to understand for the DBAs</span></li>
</ol>
<div style="text-align: justify;">
<b><span style="font-family: Georgia, Times New Roman, serif;">Cons:</span></b></div>
<div style="text-align: justify;">
</div>
<ol>
<li><span style="font-family: Georgia, Times New Roman, serif;">Difficult to maintain with changes. Changes in the subset criteria in a parent will affect all the subsequent child queries also.</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Need to have knowledge of the data model & good working knowledge of SQL Queries, especially when needed to use in a medium to large project.</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">More complications in case of multiple data sources.</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Need to optimize the queries used or else it can hamper the performance.</span></li>
</ol>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Custom solution approach</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In this approach, a customized solution can be built according to the project needs and it is used in the project. The technology depends on the project requirements and the expertise levels within the team. For example, a Java or .NET based UI can be developed on top of the SQL Queries, so that the user need not worry about the inner details of how the query is going to be constructed.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<b><span style="font-family: Georgia, Times New Roman, serif;">Pros:</span></b></div>
<div style="text-align: justify;">
</div>
<ol>
<li><span style="font-family: Georgia, Times New Roman, serif;">Takes time to build depending on the nature of the project</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Can be maintained like a regular development project</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Can be cost effective for small to medium sized projects</span></li>
</ol>
<div style="text-align: justify;">
<b><span style="font-family: Georgia, Times New Roman, serif;">Cons:</span></b></div>
<div style="text-align: justify;">
</div>
<ol>
<li><span style="font-family: Georgia, Times New Roman, serif;">Needs both DB as well as development knowledge</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Maintenance costs associated with maintaining the solution</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Need to worry about optimization and performance or else it can hamper the data load time.</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">More complications in case of multiple data sources</span></li>
</ol>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Commercial Tool based approach</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Maintaining SQL scripts and maintaining a custom solution has its own demerits. Hence rather than reinventing the wheel, a commercial TDM tool (Subset feature) can be used for the subsetting needs of the project. There are many such tools in the market. However tools such as Informatica ILM TDM, IBM Infosphere Optim, GridTools DataMaker are the market leaders in this space. They are optimized for high performance and are feature rich and supports multiple data sources.</span></div>
<br />
<div style="text-align: justify;">
<b><span style="font-family: Georgia, Times New Roman, serif;">Pros:</span></b></div>
<div style="text-align: justify;">
<ol>
<li><span style="font-family: Georgia, Times New Roman, serif;">Scalable, Robust & proven solutions</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Support for a large number of data sources and platforms</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Feature rich</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Support model available</span></li>
</ol>
</div>
<div style="text-align: justify;">
<b><span style="font-family: Georgia, Times New Roman, serif;">Cons:</span></b></div>
<div style="text-align: justify;">
</div>
<ol>
<li><span style="font-family: Georgia, Times New Roman, serif;">High costs usually associated with them</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Learning curves can be steep for some of the tools.</span></li>
</ol>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">As we saw there are different implementation approaches and each has its own pros and cons. All these factors need to be taken into consideration before making a final decision on the approach.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Thanks for the read. Hope the information in this post was useful. Comments are welcome. If you have any questions / alternate approaches, please feel free to add a comment.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<br />
<div style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<i><b>About the Author</b></i></div>
<div style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<br /></div>
<div style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/">Agile Blog</a>.</i><i style="background-color: white;"> </i><i style="background-color: white;">Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com3tag:blogger.com,1999:blog-2871229621692677758.post-37439255282587579052013-02-19T00:30:00.001+05:302013-04-03T13:32:41.026+05:30Top smells that indicate that your project needs TDM<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In my previous posts, I explained about the basics of <a href="http://tdminsights.blogspot.in/2013/02/how-to-create-test-data.html" target="_blank">Test Data Creation</a>, <a href="http://tdminsights.blogspot.in/2013/02/challenges-in-production-cloning.html" target="_blank">Challenges in Production Cloning</a>, <a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset</a> and <a href="http://tdminsights.blogspot.in/2013/02/data-masking-in-tdm.html" target="_blank">Data Masking</a>. In this post we will focus on a slightly different note.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Invariably every problem has a symptom that we call smells in the modern Agile days. So this post is going to focus on the typical smells that indicate that your project needs Test Data Management (TDM).</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
</div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Testers waste more time preparing test data rather than testing the application</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is probably the number one symptom or smell that warrants a TDM process and solution in place.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Testers depend a lot of Business Analysts to provide the required test data</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is also one of the top symptom when it comes to the need for TDM. There is a lot of dependency for test data from the Business Analysts.<a name='more'></a></span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Testing deadlines slipped more than once due to delay in test data refresh</b></span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Lot of false defects due to data related issues</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Lot of defects get raised, then they get rejected mentioning invalid data as the reason. A clear alarm that the data related false defects are growing and needs to be eliminated.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Testers often complain that creation of test data is a very complicated process & very time consuming</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is also another top symptom that the process of creation of test data needs attention</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Test database is as voluminous as the Production database and it hinders performance</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is a sure shot indication that the project needs test data attention</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>There is no reuse of test data</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">For each and every release, same process needs to be followed and lot of repeatable steps needs to be followed to create test data, indicating lack of test data. This is also another critical indication.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>There is a lot of dependency on the upstream systems for the test data to be created.</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is a critical indication. A lot of delay can happen waiting for another system to provide the test data.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>As the project size increases, people complain that is it getting difficult to manage the test data.</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">There can be too many point of contacts for getting the test data, too many data sources, test data all over the place, etc. All of these combined form a clear indication that the project needs TDM.</span></li>
</ul>
</ul>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Hope the information in this post was useful. Thanks for reading.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<br />
<div>
<i><b>About the Author</b></i></div>
<div>
<br /></div>
<div>
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com1tag:blogger.com,1999:blog-2871229621692677758.post-8319297315399130142013-02-16T22:31:00.000+05:302013-05-04T23:34:07.530+05:30Data Masking in TDM<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In my previous posts, I explained about the <a href="http://tdminsights.blogspot.in/2013/02/challenges-in-production-cloning.html" target="_blank">Challenges in Production Cloning</a>. One of the major challenges in the Production Cloning approach is Data Security. This post will focus on the solution for Data Security, Data Masking.</span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;">As already explained, Data Masking is the process of masking the sensitive fields from the complete data set. The whole objective of data masking is to ensure that no sensitive data is leaked into non-production regions like the Dev and Testing regions.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span></div>
<br />
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, 'Times New Roman', serif;">What are the sensitive fields that needs to be masked? That basically depends on the project needs. But some of the generic fields that need to be masked are:</span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
</div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Personal information like First names, Last Names, Email IDs, DOB, Phone & Fax numbers, SSN Numbers, National Insurance Numbers, Other national unique identifiers.</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">In Banking, Financial Services & Insurance industry - Bank Balances, Account numbers, Credit card numbers, Policy numbers, etc.</span></li>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">In Healthcare industry - PHI attributes like Medical record numbers, Member IDs, etc.</span></li>
</ul>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">This list is by no means exhaustive, but will give a fair idea of how many fields are sensitive in nature that needs to be handled with care. Any lapse in masking any of these fields might have a big impact on the Organization as a whole.</span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
</div>
<div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Challenges in Data Masking</span></div>
<br />
<a name='more'></a><br />
<br />
<ul style="text-align: left;">
<li><span style="font-family: Georgia, Times New Roman, serif;"><b>Individual data has a meaning</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">Many of the sensitive fields have a meaning attached to them, for ex. a Credit Card number containing 16 digits can't just be masked to some random 16 digit number. It needs to follow a unique algorithm. The following picture depicts few of the sensitive fields format & its meanings </span></li>
</ul>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE2TCwwraOndOpVYemy37ZBUKFHMw_YskkhFnoM0UUeLBjYb590LBF_5xn4csQ1ga01kspW_8wDvJnGWNrL_1medMA5FYs9isy-vQMX838ix5f9e54-oJE1zXN_K_r6wCZakxf9aKtXyI/s1600/Masking+Fields+logic.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="288" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE2TCwwraOndOpVYemy37ZBUKFHMw_YskkhFnoM0UUeLBjYb590LBF_5xn4csQ1ga01kspW_8wDvJnGWNrL_1medMA5FYs9isy-vQMX838ix5f9e54-oJE1zXN_K_r6wCZakxf9aKtXyI/s400/Masking+Fields+logic.png" width="400" /></span></a></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Data Integrity</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">One of the biggest challenges in data masking is maintaining data integrity. As we have already seen that tables can contain relationships, and hence masking on a primary key should be cascaded to the foreign key as well.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Cross-DB relationships</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">And to make things more complicated, data relationships can exist across databases (For ex, a federated relationships)</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Multiple data sources</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;">This is also another challenge, the source of data can be different, not just databases. They can be in flat files, XML files, EDI files and delimited files. </span></li>
</ul>
</ul>
<div>
<span style="font-family: Georgia, Times New Roman, serif;">A good masking solution should have a solution to overcome these challenges. </span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Types of Masking</span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;"><b>Static or In-DB Masking</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">This is a technique where the data masking operates on the data in the database or files that are at rest. In this technique, typically the production data needs to be first dumped into a temporary region called Staging Database. Once the data is loaded into the Staging database, data is masked in the same database and then the masked data is loaded into the Test Database.</span></li>
</ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;"><b>Dynamic or In-Memory Masking</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">This is a technique where data masking happens in memory and doesn't happen in a database. In this technique, the masking happens while loading the data from Production to Test region.</span></li>
</ul>
</ul>
<div>
<span style="font-family: Georgia, Times New Roman, serif;">The below picture will depict the process of Static and Dynamic Masking.</span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTBv4S5nvUj_c1wft7g1nlkm5CZJsx0A52gTwca8M17tJVXVkw7AoO499dQ90dvL-S1smg72M9joR_xad_nzWEF9lqeOB0yGK0TUo3BydWxiJZioWSIVsJGBn_ObJr8wQTKHIUoAzNX4I/s1600/Static+&+Dynamic+Masking.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="285" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTBv4S5nvUj_c1wft7g1nlkm5CZJsx0A52gTwca8M17tJVXVkw7AoO499dQ90dvL-S1smg72M9joR_xad_nzWEF9lqeOB0yGK0TUo3BydWxiJZioWSIVsJGBn_ObJr8wQTKHIUoAzNX4I/s400/Static+&+Dynamic+Masking.png" width="400" /></a></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Hope the information in this post was useful. In my future post, I will focus a bit more in detail about the different data masking techniques.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
</div>
</div>
<br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br />
<div>
<i><b>About the Author</b></i></div>
<div>
<br /></div>
<div>
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
<br />
<span style="font-family: Georgia, Times New Roman, serif;"> </span></div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com4tag:blogger.com,1999:blog-2871229621692677758.post-91598383205571018952013-02-15T11:13:00.001+05:302013-04-03T13:33:16.402+05:30Data Subset in TDM<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In my previous post, I discussed the <a href="http://tdminsights.blogspot.in/2013/02/challenges-in-production-cloning.html" target="_blank">Challenges in Production Cloning approach</a>. In this post, we will focus on its solution, the Data Subset process / Data Sub-setting.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Data subset is the process of slicing a part of the Production Database and loading it into the Test Database. For ex. instead of cloning a 50 TB production database, create a subset that is only 50 GB worth data and put it back into the Test Database. Lets assume in a retail application, you have a Customers table having 10 million customers and Orders table having 100 million orders and 100 million other transaction tables, our subset process will try to shrink the sizes to good reasonable limits as depicted in the picture below.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuiyfh33notEgxHCsY2DhhUpQoaUujN_N2rdWE-xG2LaXqn3nkaPnU3TPYS250G3nlanlGJNcDPEgPooutlQhyUdlTH2hccrEdu8S2yWbERvKFDLzw6Pb5ENfyVHi67qNz7bSvGIoZvpg/s1600/Subset+-+Basics.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuiyfh33notEgxHCsY2DhhUpQoaUujN_N2rdWE-xG2LaXqn3nkaPnU3TPYS250G3nlanlGJNcDPEgPooutlQhyUdlTH2hccrEdu8S2yWbERvKFDLzw6Pb5ENfyVHi67qNz7bSvGIoZvpg/s400/Subset+-+Basics.png" width="400" /></span></a></div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">Advantages of data sub-setting</span><br />
<br />
<a name='more'></a><br />
<br />
<ol style="text-align: left;">
<li><span style="font-family: Georgia, Times New Roman, serif;">Reduced storage costs</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Reduced load time/Test data refresh time</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Reduced Infrastructure costs</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Massive savings in terms of multiple environments</span></li>
</ol>
<div>
<span style="font-family: Georgia, Times New Roman, serif;">The below picture depicts the advantages of data subset process:</span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEio-g5_IathMVW0CxFyw81A5m85YVZB9TZKaIpJKoNO5qB1tTRFngrY4yPur_VvBXMCd9OgOJ6KlYqYf7XJbya33TqLQYWcS7icJp_aJc_vUgGe59oXrwfaAP_I8XgJ0COgJm06OUtBp6U/s1600/Subset+-+Pict+overview+-+savings.png" imageanchor="1" style="clear: left; margin-bottom: 1em;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="318" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEio-g5_IathMVW0CxFyw81A5m85YVZB9TZKaIpJKoNO5qB1tTRFngrY4yPur_VvBXMCd9OgOJ6KlYqYf7XJbya33TqLQYWcS7icJp_aJc_vUgGe59oXrwfaAP_I8XgJ0COgJm06OUtBp6U/s400/Subset+-+Pict+overview+-+savings.png" width="400" /></span></a></div>
<div style="text-align: center;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<b><i>Note:</i></b> The above picture assumes that we are loading the subset of production data into 3 different regions (DEV, UAT and QA1).<br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, 'Times New Roman', serif;"><span style="font-size: large;">Challenges in Data Subset</span></span><br />
<br />
<h3>
</h3>
<h3>
</h3>
<h3>
</h3>
<h3 style="text-align: left;">
<ul style="text-align: left;">
<li><span style="font-family: Georgia, Times New Roman, serif; font-size: small;">Referential Integrity</span></li>
<ul>
<li style="text-align: left;"><span style="font-family: Georgia, 'Times New Roman', serif; font-size: small; font-weight: normal; text-align: justify;">A real time database involves a lot of referential integrity between tables. So lets assume, we are fetching only 100K Customers, we need to fetch orders only for those 100K customers and not all the orders from the production database. In short, the criteria that is applied on a Parent table should get cascaded down to all the child tables.</span></li>
</ul>
</ul>
</h3>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Cross DB (Federated) relationships</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">In the previous point, I explained about the challenge in maintaining data integrity of the subset of data. Whats worse, the data relationships can span across databases (For ex. customers database can be in a Oracle database, and orders database can be in SQL Server). This is called Federated relationships and is even more complicated to handle.</span></li>
</ul>
</ul>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Georgia, Times New Roman, serif;"><b>Data Relationships across multiple sources</b></span></li>
<ul>
<li style="text-align: justify;"><span style="font-family: Georgia, 'Times New Roman', serif;">If the previous two points were not enough, let me remind the fact that data can be from multiple sources, and so are the relationships. For ex. a vendor might provide a data feed in a flat file format or an output from a Mainframe system might be a fixed length format, and there might be a relationship between the data in these files and the data residing in a Oracle database. Handling all these relationships can be quite tricky in nature.</span></li>
</ul>
</ul>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">As I described in this post, the Subset process can save a lot of costs for the organization but have many challenges. How to tackle those challenges? What is the approach that we can follow for Data Sub-setting? I will try to throw more light in another detailed post.</span><br />
<br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px; text-align: justify;">
<br />
<div>
<i><b>About the Author</b></i></div>
<div>
<br /></div>
<div>
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com3tag:blogger.com,1999:blog-2871229621692677758.post-15692365182556267732013-02-13T19:18:00.000+05:302013-04-03T13:33:29.458+05:30Challenges in Production Cloning approach<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In my previous articles, I have already discussed the topics "<a href="http://tdminsights.blogspot.in/2013/02/how-to-create-test-data.html" target="_blank">How to create Test Data</a>" and "<a href="http://tdminsights.blogspot.in/2013/02/top-3-challenges-in-using-production.html" target="_blank">Top 3 Challenges in using Production data in Test Environments</a>". In this post we will focus on the challenges that we face in Production Cloning approach and how to overcome those challenges.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">1. Infrastructure</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Even though it is highly recommended to have the Test Environment in the same lines as Production, it is not always feasible to test under those real-time conditions. It is highly recommended to do Performance / load / stress tests exactly mimicking the Production database, but the expensive infrastructure requirements might be an overkill for Functional Testing. But cloning might force you to have production like infrastructure which will translate into higher costs for the customer.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">2. High Storage Costs</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Another major challenge associated with Production Cloning is that all the production data needs to be stored in testing region. Assuming the production data is 50 TBs (Terabytes), the Test Database also needs to hold 50 TBs of data. So storage has to be provided for storing all of the data. And with the databases being backed up regularly, that would mean higher storage costs for the customer.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"></span><br />
<a name='more'></a><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif; font-size: large;">3. Increased Load Time</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Time taken to load 50TB data from Production into Test database will obviously take longer than any smaller amount of data. And you will be also locking the Test Database during the entire load operation to avoid deadlocks and contention. So the faster the data gets loaded into the Test database, the better it is. Sometimes due to delay in development schedule, the product or application will be pretty late to hit the testers desks. In such cases, higher load time will eventually leave the team with no time to test all the features.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-size: large;"><span style="font-family: Georgia, Times New Roman, serif;">4. Multiple Test Environments</span></span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<br />
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">You might have several Test Environments to load data into. For ex. You might have an environment for QA, UAT, Performance and Dev teams. One of the largest banks had around 8000 Test environments to work with. Multiple environments will increase the infrastructure requirements, Load Time and storage costs and will directly multiply the associated costs.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">So how to overcome these challenges? The answer lies in a technique named simply Data Subset. In this technique, instead of fetching the entire data from Production, we will fetch a well defined structurally valid subset of data from the Production database and load them into the Test Database. For ex. in the same banking scenario, we might only take a portion of the production database (Lets say transactions of customers which have been created in the last 5 years instead of all the transactions). That way we effectively reduce the volume of data in the Test Database. Please read my detailed post on Data Subset for more information.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px;">
<br />
<div style="font-size: 13px;">
<i><b>About the Author</b></i></div>
<div style="font-size: 13px;">
<br /></div>
<div style="font-size: 13px;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com4tag:blogger.com,1999:blog-2871229621692677758.post-86748363738769781872013-02-13T01:59:00.000+05:302013-04-03T13:33:56.394+05:30Top 3 Challenges in using Production data in Test Environments<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
In my previous post "<a href="http://tdminsights.blogspot.in/2013/02/how-to-create-test-data.html" target="_blank">How to create Test Data</a>", I explained the concept of creating test data directly from the production data. In this post we will concentrate on the Top 3 challenges in using the Production data for testing purposes.</div>
<div style="text-align: justify;">
<br /></div>
<span style="font-size: large;">Data Security</span><br />
<br />
<div style="text-align: justify;">
This is by far the most crucial challenge of using Production data in Test Environments. Production data can contain a lot of sensitive information. Even though the data sets will be rich in nature in the Production database, the very thought of using production data involves a lot of risk. For ex. if you are testing an application for a bank, production data will contain real customer information like Names, Addresses, Account Numbers, Balances, Credit Card Numbers, etc. As you can see, if you try to use these data for testing, it exposes huge security risks for the bank. So how do we overcome this, the answer is Data Masking.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Data Masking is the process of masking of the sensitive fields from the complete data set. Please read my future post on Data Masking and the Techniques used for Data Masking for more details. The following figure depicts the data security challenge and the approaches.</div>
<div style="text-align: justify;">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga1O1mThaAvZJmd13QT_8k9PV9Oeokv58BGUn2gc7Uwc-cqc6iGGV94mfDladEf1wuWz7A4rd60UFQxeC-yr5EgXRFGrQKbNAONy8wDv-Hj6Ya1jgm8caBptvYJ16X2zytcF9o8wYo8X8/s1600/Masking+-+Basics1.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="245" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga1O1mThaAvZJmd13QT_8k9PV9Oeokv58BGUn2gc7Uwc-cqc6iGGV94mfDladEf1wuWz7A4rd60UFQxeC-yr5EgXRFGrQKbNAONy8wDv-Hj6Ya1jgm8caBptvYJ16X2zytcF9o8wYo8X8/s400/Masking+-+Basics1.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Data Security Challenge</td></tr>
</tbody></table>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<a name='more'></a><br />
<br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">Data Volumes</span><br />
<br />
<div style="text-align: justify;">
Another one of the biggest challenges that we face in using Production data in testing is that the data volumes that we deal with is pretty huge. Assuming the example of a bank, it will contain huge number of Customer data and also the data of all the transactions that the customers have made. Assuming a very simple case of 100K customers doing an average of 5 transactions per month will generate about 500K transaction records per month. Production data will contain transactions right from the inception of the bank. Just imagine the scale of data that needs to be loaded into the Test Region if all the data is to be moved. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This method of moving the entire production data into the test region is called <b>Production Cloning</b> and has several disadvantages such as increased load time and increased disk costs. The post "<a href="http://tdminsights.blogspot.in/2013/02/challenges-in-production-cloning.html" target="_blank">Challenges in Production Cloning Approach</a>" describes the challenges in detail. So how to overcome this challenge. The answer is Data Subset / Data Sampling where you load only a subset or portion of the production data into the Test database. Please read my future post on Data Subset / Data Sampling for more details.</div>
<div style="text-align: justify;">
<br /></div>
<span style="font-size: large;">Data Sources</span><br />
<span style="font-size: large;"><br /></span>
<br />
<div style="text-align: justify;">
Another major challenge is the variety of data sources. For example, in a real time enterprise application, the data could come from multiple sources namely RDMBS like Oracle, DB2, SQL Server, Sybase, Informix, etc and from file sources such as Excel, Flat Files, Mainframe delimited files, EDI files, etc and also from sources such as Web services. And worse there will be relationships between the data that flows from and to these data sources. Hence while loading the production data to the test region, utmost care should be taken to maintain the data relationships and data integrity. Please read my future post on Data relationships and their effects on TDM approach.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<i><b>About the Author</b></i></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com6tag:blogger.com,1999:blog-2871229621692677758.post-87256578074395195422013-02-09T23:58:00.001+05:302013-03-14T12:08:14.672+05:30TDM Topics to be covered in this blog<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Georgia, "Times New Roman", serif;">Hello all,</span><br />
<span style="font-family: Georgia, "Times New Roman", serif;"><br /></span>
<span style="font-family: Georgia, "Times New Roman", serif;">The intention of this blog is to share my insight and knowledge in the area of Test Data Management. </span><span style="font-family: Georgia, 'Times New Roman', serif;">I am looking forward to write a few posts in the following topics. I will write those whenever I get some free time. Thanks.</span><br />
<span style="font-family: Georgia, 'Times New Roman', serif;"><br /></span>
<br />
<ul style="text-align: left;">
<li><span style="font-family: Georgia, 'Times New Roman', serif;"><a href="http://tdminsights.blogspot.in/2013/02/how-to-create-test-data.html" target="_blank">How to Create Test Data</a></span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;"><a href="http://tdminsights.blogspot.in/2013/02/top-3-challenges-in-using-production.html" target="_blank">Top 3 Challenges in using Production data in Test Environments</a></span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;"><a href="http://tdminsights.blogspot.in/2013/02/challenges-in-production-cloning.html" target="_blank">Challenges in Production Cloning Approach</a></span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://tdminsights.blogspot.in/2013/02/data-subset-in-tdm.html" target="_blank">Data Subset in TDM</a></span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;"><a href="http://tdminsights.blogspot.in/2013/02/data-masking-in-tdm.html" target="_blank">Data Masking in TDM</a></span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://tdminsights.blogspot.in/2013/02/top-smells-that-indicate-that-your.html" target="_blank">Top smells that indicate that your project needs TDM</a></span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://tdminsights.blogspot.in/2013/02/implementation-approaches-to-data-sub.html" target="_blank">Implementation approaches to Data Sub-setting</a></span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://tdminsights.blogspot.in/2013/02/techniques-for-data-subset.html" target="_blank">Techniques for Data Subset</a></span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;"><a href="http://tdminsights.blogspot.in/2013/02/commonly-used-data-masking-techniques.html" target="_blank">Commonly used Data Masking Techniques - TDM</a></span></li>
<li><a href="http://tdminsights.blogspot.in/2013/03/test-data-refresh-in-tdm.html" target="_blank"><span style="font-family: Georgia, Times New Roman, serif;">Test Data Refresh in TDM</span></a></li>
<li><a href="http://tdminsights.blogspot.in/2013/03/what-is-test-data-ageing-in-tdm.html" target="_blank"><span style="font-family: Georgia, Times New Roman, serif;">Test Data Ageing in TDM</span></a></li>
<li><a href="http://tdminsights.blogspot.in/2013/03/data-archive-in-test-data-management-tdm.html" target="_blank"><span style="font-family: Georgia, Times New Roman, serif;">Data Archive in TDM</span></a></li>
<li><a href="http://tdminsights.blogspot.in/2013/03/gold-copy-in-test-data-management-tdm.html" target="_blank"><span style="font-family: Georgia, Times New Roman, serif;">Gold Copy in Test Data Management</span></a></li>
<li><a href="http://tdminsights.blogspot.in/2013/03/test-data-life-cycle.html" target="_blank"><span style="font-family: Georgia, Times New Roman, serif;">Test Data Life Cycle</span></a></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">What is Test Data Management?</span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Technical Challenges in Test Data Management</span></li>
<li><span style="font-family: Georgia, 'Times New Roman', serif;">Non-Technical Challenges in Test Data Management</span></li>
<li><span style="font-family: Georgia;">Synthetic Data Generation</span></li>
<li><span style="font-family: Georgia;">Is Test Data Management same as ETL?</span></li>
<li><span style="font-family: Georgia;">Tools for TDM - COTS or In-house?</span></li>
<li><span style="font-family: Georgia;">Test Data Management Challenges</span></li>
<li><span style="font-family: Georgia;">Test Data Management Strategy</span></li>
<li><span style="font-family: Georgia;">Test Data Management (TDM) Best Practices</span></li>
<li><span style="font-family: Georgia;">Test Data Management Tools</span></li>
<li><span style="font-family: Georgia;">Aligning TDM with Testing process</span></li>
</ul>
<br />
<div style="text-align: left;">
<span style="font-family: Georgia, "Times New Roman", serif;">Regards</span><br />
<span style="font-family: Georgia, "Times New Roman", serif;">Rajaraman R</span></div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com2tag:blogger.com,1999:blog-2871229621692677758.post-71924037460753314962013-02-02T20:03:00.000+05:302013-04-03T13:34:22.057+05:30How to create Test Data?<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif; text-align: left;">Let's assume you have a very basic testing need. You need to have around 50 customers created in your system for testing it. Lets assume it is a web based application. In fact, the concept is applicable to any technology/application. So you have a customer creation screen as shown below.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5ncFdYCGqq3KA5ZvQ0wBbguiqetEuEoySi-ruBYDE1pHsXZm4MPNbwQYzDekBTEt44qt7thUDxE5GGOvXyvSctD__bJHtxXAuH_IIpFHKNqRs7aGnvRCfhYuiyT18aioZCSQ61BJTqOc/s1600/Customer+Creation+Form.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="291" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5ncFdYCGqq3KA5ZvQ0wBbguiqetEuEoySi-ruBYDE1pHsXZm4MPNbwQYzDekBTEt44qt7thUDxE5GGOvXyvSctD__bJHtxXAuH_IIpFHKNqRs7aGnvRCfhYuiyT18aioZCSQ61BJTqOc/s400/Customer+Creation+Form.png" width="400" /></span></a></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">So how do you create the test data that is required for you.</span><br />
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;">Basically there are 3 approaches to do it:</span><br />
<div>
<ul style="text-align: left;">
<li><span style="font-family: Georgia, Times New Roman, serif;">Manual approach</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Functional Automation Approach</span></li>
<li><span style="font-family: Georgia, Times New Roman, serif;">Database Approach</span></li>
</ul>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
</div>
</div>
<div>
<b><span style="font-family: Georgia, Times New Roman, serif;">Manual Approach:</span></b></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In the manual approach, you would manually feed the data in the screens and then create a customer. And similarly you would do this for 50 customers. Needless to say the time taken to do it in a manual fashion is going to be big.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif;">The time taken for the example application would be :</span><br />
<span style="font-family: Georgia, 'Times New Roman', serif;">For 1 Customer = 1 min.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;">For 50 Customers = 50 mins.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<b><span style="font-family: Georgia, Times New Roman, serif;">Functional Automation Approach:</span></b></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In the automated approach, you would automate the user interface (UI) for creating the data. Thus you will effectively speed up the process of creating the required test data. In our example, we would automate the web based UI using a Automation Tool such as QTP, RFT, Selenium, etc. and then data drive those tests to create the data that we require.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<span style="font-family: Georgia, Times New Roman, serif;">The time taken for the example application would be :</span><br />
<span style="font-family: Georgia, 'Times New Roman', serif;">For 1 Customer = 10 seconds</span><br />
<span style="font-family: Georgia, Times New Roman, serif;">For 50 Customers = 500 seconds = 8 mins.</span></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<b><span style="font-family: Georgia, Times New Roman, serif;">Database Approach:</span></b></div>
<div>
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">In all probabilities, you will have plenty of real-time customer information lying around in your production database. So our job will be to query the right set of customers from the production database and load them into the test database. Simple. The data is ready to be used for testing. </span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Here in our example application, since its a pretty straightforward requirement, we would fetch the first 50 rows from the Customers table in Production and Insert those rows into the Customers table in Test Database. The work flow will be as depicted below.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifkNe6H_Q1BRyWNQf0y2hvrUe52u2RTPk5rfwYDUAw1orZG_5ebuTywl5ICOF4ANs0TwFXk5yoYZgVrqeNYO1P-A5KXAzJG5p4uw1-70sLYS9YoSSUet10L0IR8KXthbJNbak9FFAuNIk/s1600/Prod2Test.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Georgia, Times New Roman, serif;"><img border="0" height="76" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifkNe6H_Q1BRyWNQf0y2hvrUe52u2RTPk5rfwYDUAw1orZG_5ebuTywl5ICOF4ANs0TwFXk5yoYZgVrqeNYO1P-A5KXAzJG5p4uw1-70sLYS9YoSSUet10L0IR8KXthbJNbak9FFAuNIk/s400/Prod2Test.jpg" width="400" /></span></a></div>
<div class="separator" style="clear: both; text-align: left;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">The time taken for the example application would be :</span></div>
<div style="text-align: justify;">
<i><span style="font-family: Georgia, Times New Roman, serif;">For 50 Customers = 60 seconds = 1 min (Just an example)</span></i></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><b>NOTE:</b> The above example assumes that the back end is a Microsoft SQL Server database and hence the "<b>SELECT TOP 50</b>" query.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">As you can see, the database approach is much faster than any of the other approaches. The effort savings are enormous in a real time test data requirement as the data volumes are much higher. </span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">This methodology of creating test data directly from the Production data will form the corner stone and the building block of the concept called Test Data Management. Of course we are dealing with real time data and hence we need to secure the data before loading it into the Test Database, but we would deal all those topics in a separate post.</span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Georgia, Times New Roman, serif;">Hope the information was useful in giving a basic idea about Test Data creation. I welcome your comments. Cheers.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<br />
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<i><b>About the Author</b></i></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;">
<i>Rajaraman Raghuraman has nearly 8 years of experience in the Information Technology industry focusing on Product Development, R&D, Test Data Management and Automation Testing. He has architected a TDM product from scratch and currently leads the TDM Product Development team in Cognizant. He is passionate about Agile Methodologies and is a huge fan of Agile Development and Agile Testing. He blogs at </i><i><a href="http://tdminsights.blogspot.in/" target="_blank">Test Data Management Blog</a> & </i><i><a href="http://agiledevtest.blogspot.in/">Agile Blog</a>.</i><i> </i><i>Connect with him on <a href="https://plus.google.com/100297834218867757772?rel=author" target="_blank">Google+</a></i></div>
</div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/15762019168119375421noreply@blogger.com8