Skip to content

What is ETL Testing?

ETL testing is a process used to verify that data has been accurately moved from a source to a destination after undergoing business transformation. This process also includes verifying data at any intermediate steps between the source and destination. The acronym ETL stands for Extract-Transform-Load.

What is ETL Testing?

Data Warehouse Testing

Data Warehouse Testing is a procedure used to guarantee the integrity, accuracy, trustworthiness, and consistency of data stored inside a data warehouse. This testing method is employed to ensure that the integrated data within the data warehouse is dependable enough for a business to base their decisions on.

What is ETL?

Extract-Transform-Load (ETL) is a process used to load data from its source system into a data warehouse. Data is extracted from an OLTP database, converted to the required data warehouse schema, and then loaded into the data warehouse database. Data warehouses may also incorporate data from non-OLTP systems, such as text files, legacy systems, and spreadsheets.

An example of ETL in action is a retail store with different departments, such as sales, marketing, and logistics, each managing customer information differently. For instance, the sales department may store customer data by name, while the marketing department stores it by customer ID. If the store needs to view a customer’s purchase history or determine which products were purchased due to specific marketing campaigns, it can be done with ETL. By transforming the different data sets into a unified structure, ETL can provide the needed information. Finally, Business Intelligence (BI) tools can be used to derive meaningful insights and reports from the resulting data.

This ETL testing tutorial provides a roadmap for the ETL testing process and covers various ETL testing concepts. The diagram below outlines the process flow: data is extracted from an OLTP database, transformed to match the data warehouse schema, and loaded into the data warehouse. Additionally, data from non-OLTP sources, such as text files, legacy systems, and spreadsheets can be incorporated into the data warehouse. Finally, BI tools can be used to generate meaningful insights and reports from the resulting data.

What is ETL Testing?

How to Extract Relevant Data for a Data Warehouse

Data warehouses require the extraction of relevant data from various sources. This can be done by gathering information from databases, flat files, and other sources. Once the data is extracted, it can be transformed into a Data Warehouse format to make it easier to access.

Data Transformation for Data Warehousing

Data transformation is the process of converting data from its existing format into one that is more suitable for a data warehouse environment. This process involves building keys and cleansing the data to ensure accuracy. Additionally, meta-data can be created to help diagnose source system problems and improve data quality.

Loading Data into a Data Warehouse

Data loading is the process of taking the transformed data and loading it into a data warehouse. This can involve building aggregates and summarizing the data to improve query performance. Once the data is loaded, it can be accessed and used by end users.

ETL Testing Process

ETL testing is performed in five stages

  1. 1. Identifying Data Sources and Requirements for ETL Testing: ETL testing involves the identification of data sources and the requirements of the ETL process. This includes understanding the source data and how to transform it into the desired format. It is important to identify the data sources and their associated requirements in order to ensure the accuracy of the ETL process.
  2. 2. Data Acquisition for ETL Testing: Once the data sources and requirements have been identified, the next step is to acquire the data. This can be done through manual extraction or automated extraction processes. Once the data is acquired, it must be evaluated to ensure that it is complete and accurate.
  3. 3. Implementing Business Logic and Dimensional Modeling for ETL Testing: After the data is acquired and evaluated, the next step is to implement the business logic and dimensional modeling. This involves creating the required data models and mapping the data to the appropriate business logic. The data models must be designed to ensure the accuracy and completeness of the data.
  4. 4. Building and Populating Data for ETL Testing: Once the data models and business logic have been implemented, the next step is to build and populate the data. This involves creating the ETL scripts and running them to populate the data in the target database. The scripts must be designed to ensure that the data is accurately and consistently loaded into the target database.
  5. 5. Building Reports for ETL Testing: The final step of ETL testing is to build the reports. This involves creating the reports to allow users to view and analyze the data. The reports must be designed to ensure that the data is presented clearly and accurately.
What is ETL Testing?

ETL Test Scenarios and Test Cases

  1. Production Validation Testing: Production Validation Testing, also known as “Table balancing” or “production reconciliation”, is a type of ETL testing that is done on data as it is being moved into production systems. This type of testing is done to ensure that the data in production systems is in the correct order and can be used to support business decisions. Automation and management capabilities are provided by Informatica Data Validation Option to ensure that production systems are not compromised by the data.
  2. Source to Target Testing (Validation Testing): Source to Target Testing (Validation Testing) is carried out to validate whether the data values transformed are the expected data values. This type of testing is used to check the consistency of the data being moved from the source to the target.
  3. Application Upgrades: Application Upgrades is a type of ETL testing that is used to check whether the data extracted from an older application or repository is exactly the same as the data in a repository or new application. This type of testing can be automatically generated, saving substantial test development time.
  4. Metadata Testing: Metadata testing includes testing of data type check, data length check, and index/constraint check. This type of testing is used to check the structure and content of the data being moved.

  5. Data Completeness Testing: Data Completeness Testing is done to verify that all the expected data is loaded in target from the source. Tests that can be run to check data completeness include comparing and validating counts, aggregates and actual data between the source and target for columns with simple transformation or no transformation.
  6. Data Accuracy Testing Data Accuracy Testing is done to ensure that the data is accurately loaded and transformed as expected. This type of testing is used to validate the accuracy of the data being moved.
  7. Data Transformation Testing Data Transformation Testing is done to check if the data is being transformed as expected. This type of testing is used to ensure that the data is being correctly transformed from source to target.
  8. Data Quality Testing: Data Quality Testing includes syntax and reference tests. Syntax Tests are used to report dirty data, based on invalid characters, character patterns, incorrect upper or lower case order, etc. Reference Tests are used to check the data according to the data model. For example, Customer ID. Data quality testing also includes number check, date check, precision check, data check, null check, etc.
  9. Incremental ETL Testing: Incremental ETL testing is done to check the data integrity of old and new data with the addition of new data. This type of testing is used to verify that the inserts and updates are getting processed as expected during incremental ETL process.
  10. GUI/Navigation Testing: GUI/Navigation Testing is done to check the navigation or GUI aspects of the front end reports. This type of testing is used to ensure that the user interface is functioning correctly.

How to Create ETL Test Case

What is ETL Testing? ETL testing is a process used in the information management industry to ensure that data is correctly loaded from a source to a destination, and that data is accurately transformed between these two points. This process also involves verifying data at various stages as it is transferred from source to destination.

Essential Documents for ETL Testing

There are two documents that are essential for successful ETL testing:

1. ETL Mapping Sheets: ETL mapping sheets provide all the necessary information regarding source and destination tables, including all columns and look-ups in reference tables. ETL testers need to be familiar with SQL queries as ETL testing may involve writing complex queries with multiple joins to validate data at any stage of the ETL process. ETL mapping sheets provide a great help for writing these verification queries.

2. Database Schemas of Source and Target: It is important to keep the database schemas of source and target handy in order to verify any detail in the mapping sheets.

ETL Test Scenarios and Test Cases

Mapping doc validation: Documentation validation is a process in which a mapping document is checked to ensure that it accurately reflects the corresponding ETL (Extract, Transform, Load) information. Change logs should be maintained in the mapping document to ensure that any changes made to the ETL information can be tracked and documented. Additionally, the mapping document should detail any rules or processes that are used to transform the data from the source system to the target system.

Validation: To validate the source and target table structure against the corresponding mapping document, the following checks must be performed: – Ensure the source and target data types are the same. – Verify that the lengths of the data types in both the source and target are equal. – Confirm that data field types and formats are specified. – Confirm that the source data type length is not less than the target data type length. – Validate the names of columns in the table against the mapping document.

Constraint Validation: Ensure that the constraints specified for a given table are met.

Data consistency issues: Data consistency issues can arise from mismatches in data types and lengths for an attribute though their semantic definition may be the same, or from misuse of integrity constraints.

Completeness Issues: To ensure completeness, it is important to check for the following: compare record counts between the source and target, check for any rejected records, ensure that all expected data is loaded into the target table, check that data is not truncated in the target table’s columns, perform boundary value analysis, and compare unique values of key fields between data loaded to the warehouse and source data.

Correctness Issues: Correctness issues include data that is misspelled or inaccurately recorded, null values, non-unique values, and data that is out of the specified range.

Data Quality: Data Quality checks need to be conducted to ensure the accuracy of the data. This includes number checks to validate values, date checks to ensure they follow the correct format and are consistent across all records, precision checks, data checks, and null checks.

Null Validate: Verify that no null values are present in the columns where “Not Null” has been specified.

Duplicate Check: To ensure that there are no duplicate values in the target, duplicate checks must be performed to validate any unique keys, primary keys, and any other columns that must be unique as per the business requirements. Additionally, this check must ensure that no duplicates exist in the combined values of multiple columns extracted from the source.

Date Validation: Date validation is used in many areas of ETL development, such as to determine the row creation date, identify active records from both an ETL development and business requirements perspective, and sometimes to generate updates and inserts based on the date values.

Complete Data Validation: To ensure complete data validation between a source and target table, a query can be used to identify any mismatching rows. This can be done by using a source minus target and target minus source query. If the query returns any values, these should be taken into consideration. Additionally, matching rows can be identified using an intersect statement. If the count returned by the intersect statement is less than the count of the source or target table, then it can be assumed that duplicate rows exist.

Data Cleanness: Unnecessary columns should be removed from the data prior to loading into the staging area.

Types of ETL Bugs

What is ETL Testing?
Type of Bugs Description
User interface bugs/cosmetic bugs Related to GUI of application Font style, font size, colors, alignment, spelling mistakes, navigation and so on
Boundary Value Analysis (BVA) related bug Minimum and maximum values
Equivalence Class Partitioning (ECP) related bug Valid and invalid type
Input/Output bugs Valid values not accepted Invalid values accepted
Calculation bugs Mathematical errors Final output is wrong
Load Condition bugs Does not allows multiple users Does not allows customer expected load
Race Condition bugs System crash & hang System cannot run client platforms
Version control bugs No logo matching No version information available This occurs usually in Regression Testing
H/W bugs Device is not responding to the application
Help Source bugs Mistakes in help documents

Difference between Database Testing and ETL Testing

Database Testing ETL Testing
  • Database testing focuses on the accuracy of data stored in the database.
  • It is done to ensure that data integrity is maintained and that data is properly stored and retrieved from the database.
  • It includes verifying data type, validating the data against the business rules, etc.
  • ETL testing is a type of software testing that ensures that data is correctly transferred from the source system to the destination system.
  • It is done to ensure that data is loaded correctly and that data accuracy is maintained.
  • It includes verifying the data mapping, validating the transformations, testing the data quality, etc.

Responsibilities of an ETL Tester

The main responsibilities of an ETL tester can be divided into three categories: stage table/ SFS or MFS, business transformation logic applied, and target table loading from stage file or table after applying a transformation.

Testing ETL Software: ETL testers are in charge of testing the ETL software they are using. This involves testing components of the data warehouse, executing backend data-driven tests, and creating, designing, and executing test cases, test plans, and test harness.

Identifying and Solving Problems: ETL testers are also responsible for identifying any potential issues that may arise and providing solutions for them. This includes approving requirements and design specifications, as well as data transfers and tests for flat files.

Identifying and Resolving Issues: ETL testers are also responsible for identifying potential issues with the system and providing solutions. They must review and approve requirements and design specifications, as well as test data transfers and flat files. Additionally, they should be able to write SQL queries for various scenarios, such as count tests.

Overall, the responsibilities of an ETL tester include testing and verifying software, identifying and resolving issues, and approving requirements and design specifications. They must also be able to write SQL queries and test data transfers and flat files.

Performance Testing in ETL

Performance Testing in ETL: What and Why?

Performance Testing in ETL is a testing technique to ensure that an ETL system can handle load of multiple users and transactions. Its primary goal is to optimize and improve session performance by identification and elimination of performance bottlenecks. This type of testing helps identify any potential issues and resolve them before they become a problem.

Tools Used for Performance Testing/Tuning:

One of the best tools used for Performance Testing/Tuning is Informatica. This software offers powerful session and workflow management capabilities, allowing developers to ensure that their ETL processes are running smoothly. It also provides detailed performance reports, allowing users to identify and address any issues quickly.

Automation of ETL Testing

Why Automate ETL Testing? ETL testing is a complex process that requires considerable time and effort to complete manually. Manual approaches to ETL testing, such as SQL scripting or “eyeballing” of data, are labor-intensive and prone to human error. Automation of ETL testing offers a way to reduce costs, improve test coverage, and increase the defect detection ratio in both development and production environments.

Advantages of Automating ETL Testing: Automating ETL testing can provide several advantages over manual approaches, including cost savings, improved test coverage, and increased defect detection ratio. Automation also reduces the time required to complete tests, making it faster and more efficient than manual methods. Additionally, automation of ETL testing helps to reduce the potential for human error.

Informatica for Automated ETL Testing: Informatica is one of the leading tools used for automated ETL testing. It offers an intuitive user interface and comprehensive suite of testing tools designed to help organizations improve their testing process. Informatica’s tools provide comprehensive coverage of ETL testing scenarios, from data validation to data integrity checks, ensuring that the data is processed correctly and efficiently. Additionally, Informatica’s tools are designed to be easy to use and can be integrated with existing frameworks and processes to ensure smooth and successful data transfer.

Best Practices for ETL Testing

Ensuring Data Transformation Accuracy When data is loaded into the data warehouse, it is important to ensure that it is transformed correctly, without any data loss or truncation. This is a key component of ETL testing.

Rejecting and Replacing Invalid Data: ETL applications should appropriately reject and replace invalid data with default values, and report any errors that occur. This is an important part of testing an ETL system.

Testing Scalability and Performance: It is important to ensure that data is loaded into the data warehouse within the prescribed and expected time frames in order to confirm the scalability and performance of the ETL system.

Creating Unit Tests: Creating unit tests is essential for ensuring that methods are effective. All unit tests should use appropriate coverage techniques, and should strive for one assertion per test case. Additionally, tests should be created that target exceptions.

Leave a Reply

Your email address will not be published. Required fields are marked *