QA City

Most Popular

All About Testing Data Warehousing Applications

By SiliconIndia   |  
Print Email

Bangalore: Businesses today are concentrating on assembling and analyzing data to make informed decisions. Take a look at a few points to keep in mind when testing ETL (Extract, Transform and Load) applications. These points are applicable to any organization that is attempting to improve an existing data warehouse.

Data must be completely loaded:

This basically means all the data must be fully loaded into the data warehouse. This includes validating all records, all fields and the full contents of each field are loaded. Strategies to consider include:

1. Record counts between source data, data loaded to the warehouse and rejected records could be compared.
2. Unique values of key fields between source data and data loaded to the warehouse could also be compared. This is a valuable technique that points out a variety of possible data errors without doing a full validation on all fields.

Data Transformation:

Ensuring that data has transformed correctly based on business rules can be the most challenging part of testing an ETL application and it involves a lot of transformation logic. The straightforward technique is to pick some sample records and "stare and compare" to validate data transformations manually. But this requires manual testing steps and testers who comprehend the ETL logic. A combination of automated data profiling and automated data movement validations is a better long-term strategy.

Quality of Data:

We can define Data Quality as how the ETL system handles data rejection, substitution, correction and notification without modifying data. Usually, data quality rules are defined during design. These may include rejecting the record if a certain decimal field has nonnumeric data or substituting null if a certain decimal field has nonnumeric data.

Depending on these rules, scenarios to test might include null key values, duplicate records in source data and invalid data types in fields. Go through the detailed test scenarios with business users and technical designers to ensure that all are kept in the loop.


As the volume of data in a data warehouse grows, ETL load times can be expected to increase and performance of queries might degrade. This could be navigated by having a solid technical architecture and good ETL design. The aim of the performance testing is to point out any potential weaknesses in the ETL design, such as reading a file multiple times or creating unnecessary intermediate files.

User-Acceptance Testing

The main reason behind deploying a data warehouse application is to make data available to business users as they know the data best. So their participation in the testing effort is vital for the success of a data warehouse implementation. User-acceptance testing (UAT) typically focuses on data loaded to the data warehouse and any views that have been created on top of the tables, not the intricacies of how the ETL application functions. You could consider the following strategies:

1. Make use of data that is either from production or as close to production data as possible. Users might find issues once they see the "real" data which usually leads to design changes.
2. It is also important that users fully understand how the views have been created. It will be necessary to test the database views.
3.  System test team must be able to support users during UAT. The users would most probably have queries about how the data is populated and need to understand details of how the ETL works.

Regression Testing

Regression testing is basically the revalidation of existing functionality with each fresh release of code. When building test cases, keep in mind that they will likely be executed multiple times as new releases are created several times due to defect fixes, enhancements or upstream systems changes. An efficient strategy to retest basic functionality is storing source data sets and results from successful runs of the code and contrasting new test results with previous runs. When doing a regression test, it is more effective to compare results to a previous execution than to do a whole data validation from scratch.

Taking these considerations into account during the design and testing portions of building a data warehouse will ensure that a quality product is produced and prevent costly mistakes.

Don't Miss
Experts on QA
Swaid Qadir Bhat
Sr System Architect
Virtusa Corporation
Subhash  Motwani
Prasad Rao Pasam
Ayaskanta  Mohanty
Managing Director
TATWA Technologies
Rajesh  Dagar
Software Architect
Connect Icon Pvt Ltd
Yasar  Khuthub
Software QA Manager
Azure IT Solutions
Sunil  Bhat
Project Management
HCL Infosystems Limi
Sharad  Agarwal
Team Lead
Write your comment now
This report is the result of the largest public-private sector rese...
For those not familiar with the Coverity Scan™ service, i...