Wednesday, April 29, 2015

How to use tExtractDelimitedFields component in Talend

In the earlier post of tNormalize I have shown you how to split multivalued columns values to multiple records. Here In tExtractDelimitedFields  component I will show you, how to extract value of single column into multiple columns or how to generate multiple columns from single column values.

This is Suppliers Input File

101;Nick Pvt Ltd;Desktop,Laptop,Mobiles
102;Shiny Shipping;WiFi Routers,PC,Networking Wires
103;Michael Pvt Ltd;RAM,Hard-Disk,Servers
104;Rose Marry Ship Pvt Ltd;Pen Drive,DDR Memory,Tablet

105;Mark Pvt Ltd;Watch,Refrigerator,CD

In Supplier_Products column there are comma separated values now these values will be split into multiple columns.

So firstly create a new job from Job Designs > Create Job.
  • Drag the schema of Suppliers.csv from Metadata > File Delimited > Suppliers.csv and drop it to the design work space and select tfileInputDelimited option from pop window.This Suppliers.csv file is been taken as an input file.

Or you can simply drag this component from the palette and double click on it to open the component properties and click [...] next to the File Name field to specify the path where you have created your input file.
  • Then drag and drop the following components from the palette into the design workspace:-tExtractDelimitedFields and tLogRow.
  • Connect each component by right clicking and select Row > Main.

  •  Open the component properties of tExtractDelimitedFields to view the Basic Settings 
  1. Select Suppliers_Products in "Field to split" field.
  2. Type the "Field separator" as ","
  3. Click on "Edit Schema" Button.

In the output panel of the tExtractDelimitedFields_1(Output) , click the (+) button to add three columns for the output schemaHere we want to split the Suppliers_Product column into three columns in the output Product_1,Product_2 and Product_3.

In the Basic settings of tLogRow components, select Table (print values in cells of a table) to show result in table format.

Atlast Run the Job.

In the you can see that all the five rows are executed successfully and Suppliers_Product column is split into multiple columns named as  Product_1,Product_2 and Product_3.

1 comment:

  1. name;cntry;qual
    im getting my result this way after trying the same steps

    | tLogRow_1 |
    |name |cntry|qual |qul1|qul2|
    |rahul |uk |mba,bca,ma|mba |bca |
    |manu |us |mba,bca,ma|mba |bca |
    |sanath|sg |mba,bca,ma|mba |bca |