Showing posts with label data subset approach. Show all posts
Showing posts with label data subset approach. Show all posts

Friday 22 February 2013

Techniques for Data Subset

In my previous posts, I explained about Data Subset in TDM & Implementation Approaches to Data Sub-setting.  In this post, I will explain some basic techniques with the help of which we can do a data subset.

  • First N records
    • This is a pretty simple technique, wherein the first N records of a table is retrieved from the Production database.  This can be achieved using a simple SQL Query such as
      • SELECT TOP 10000 * FROM DBO.CUSTOMERS
  • Based on a filter criteria
    • This is a scenario where the subset conditions can be based on a simple filter criteria like Age > 50, City = London, etc.  This is easier to implement in cases where the subset requirements are less complicated.  An example query for such technique would be
      • SELECT * FROM DBO.CUSTOMERS WHERE AGE > 50 AND CITY = 'LONDON'
  • Based on a complex SQL query
    • Sometimes the subset requirements can be more complicated.  It might involve dependencies across multiple tables.  What that means is

Wednesday 20 February 2013

Implementation Approaches to Data Sub-setting

In one of my previous post, I described about the process of Data Subset.  In this post we will focus on the implementation approaches to data sub-setting.

There are 3 broad categories in which you can implement sub-setting.

SQL Query based approach

In this approach, we will use SQL queries to fetch the subset of the production data and load them it into the target environment.  Lets assume you have 2 tables in your production from which you need to create a small subset.  The following shows the relationship of the tables Customers and Orders where they are related through the custid field.  



The picture also shows the sample data within those tables.  So we need to subset this.  We find out a sample condition.  Lets assume we will pull out only the customer ids which are odd numbers.  A simple query will do this trick.  The following will be the query for the Customers table.