Tuesday, April 21, 2015

How to split a multi valued attributes column into individual rows using tNormalize ?

How to split a multi valued attributes column into individual rows using tNormalize ?

In today's post will see how to split a multi valued attributes column into individual rows, this is useful when data is not organized properly, tNormalize helps to improve data quality which makes ease the data update.

tNormalize: This component split a record having comma separated values into multiple records. 

This is the Input Supplier File

Supplier_ID;Supplier_Name;Supplier_Products
101;Nick Pvt Ltd;Desktop,Laptop,Mobiles
102;Shiny Shipping;WiFi Routers,PC,Networking Wires
103;Michael Pvt Ltd;RAM,Hard-Disk,Servers
104;Rose Marry Ship Pvt Ltd;Pen Drive,DDR Memory,Tablet
105;Mark Pvt Ltd;Watch,Refrigerator,CD


Firstly Create metadata for Input Delimited Supplier file as I have shown in my earlier post how to create metadata form csv file. 
             

Drop the metadata from Metadata > File Delimited > normalize and select tFileInputDelimited then
drop the following components from the Palette to the design workspace: tNormalize, tLogRow.
Connect the components using Row > Main connections.



Open the component properties of the tNormalize component click on Sync columns to propagate the metadata.
Select Supplier_Products in the Column to normalize drop down. Here we have selected Supplier_Products as we want to normalize the values from this column. And fill the Item Separator as ",".

Open the Basic Settings of tlogrow component properties click on Table so that result should appear in Table format.

Then Run the job.
Your Result should be like this.


Here all the rows split into multiple records.We have 5 rows in input it has been split into 15 rows  as output.All the comma separated values of Supplier_Products are split into multiple records where each output record contain only one product.

2 comments:

  1. Thanks for sharing. To watch a demo example, then you can visit my YouTube channel -
    Talend ETL - How does tNormalize Component work?
    https://youtu.be/_cIhMVTNx90

    ReplyDelete
  2. Please post how to schedule a Talend job in TAC also TAC related posts.

    ReplyDelete