Friday, April 10, 2015

What is ETL?



ETL is a process to extract data from different types of data sources, transform it into a structure that’s more appropriate for reporting and analysis and finally load it into the database.

ETL is short for Extract, Transform, Load, three database functions that are combined into one tool to pull data out of one database and place it into another database.

The three major steps in ETL:-

1.  Extract is the process of reading data from a database. After this step data will be available in a Staging Area (SA), usually with the same structure as the source.

2.  Transform is the process of converting the extracted data from its previous form into the form it needs to be so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data.Once the data is available in the Staging Area, it is on one platform and one database. So we can easily join tables, filter and sort the data using specific attributes, pivot to another structure and make business calculations. In this step of the ETL process, we can check on data quality and cleans the data if necessary.

 Transforming the data may involve the following tasks:
  •  Applying business rules (e.g., calculating new measures and dimensions like Margin from Revenue and Cost ),
  • Cleaning (e.g., mapping all NULL values to 0 or empty string "" , removing all the special characters from currency etc.),
  • Filtering (e.g., filtering data only for some of the countries, filtering only required columns)
  • Joining data from multiple sources or tables using Lookups, transposing rows and columns.
3.   Load is the process of writing the data into the target database.Finally, data is loaded into a data warehouse, usually into fact and dimension tables. From there the data can be combined, aggregated and loaded into datamarts or cubes as is deemed necessary. The business user analysis uses the transformed data with BI instruments like data visualization software, dashboards, OLAP tools and reporting tools.

No comments:

Post a Comment