Wednesday, April 29, 2015

How to use tSplitRow component in Talend

In my previous post of tnormalize I have shown you how to split a multi valued attributes column into individual rows in this comma separated values are split into multiple rows but in 
tSplitRow it splits one row into multiple rows.

This is Student Input File

Student_ID ,Institution,Address,City,Country,Course 1,Course 2,Course 3
101,NIIT,722 Bur Oak Avenue ,Berlin,Germany,CS,EC,EE
102,AIIM,11 Collaroy  2093,Medrid,Spain,MBA,BBA,BCA
103,SRIT,223 Wellington  5011,London,UK,BSC,ME,MTECH

So firstly create a new job from Job Designs > Create Job.
Drag the schema of Student.csv from Metadata > File Delimited > Studentsplitrow.csv and drop it to the design work space and select tfileInputDelimited option from pop window.This Student.csv file is been taken as an input file.
Or you can simply drag this component from the palette and double click on it to open the component properties and click [...] next to the File Name field to specify the path where you have created your Student.csv file.
Then drag and drop the following components from the palette into the design workspace:-tSplitRow,tLogRow.
Connect each component by right clicking and select Row > Main.


Open the component properties of tSplitRow here we want different record for every Courses for every Student.So we will drag Institution_ID,Institution_Name from Student(Input-Main) to tSplitRow_1(Output) and add Institution_Address and Institution_Courses fields as output to tSplitRow component.


In the column mapping table of tSplitRow click the (+) sign to add mapping from source to target record. If the source input file has 3 records and 3 mappings are defined in the Columns Mapping table in tSplitRow component then the output will provide 9 records.  


  • On the first parameter line, write row1.Institution_ID in Institution_ID field. Type row1.Institution_Name in the Institution_Name field and row1.Address+" "+row1.City+" "+row1.Country in the Institution_Address field.Type row1.Course_1 in Institution_Courses field.
  • On the second parameter line, write row1.Institution_ID in Institution_ID field. Type row1.Institution_Name in the Institution_Name field and row1.Address+" "+row1.City+" "+row1.Country in the Institution_Address field.Type row1.Course_2 in Institution_Courses field.
  • On the third parameter line, write row1.Institution_ID in Institution_ID field. Type row1.Institution_Name in the Institution_Name field and row1.Address+" "+row1.City+" "+row1.Country in the Institution_Address field.Type row1.Course_3 in Institution_Courses field.
Here in Institution_Address field I have concatenated Address,City and Country Column together.
In the Basic settings of tLogRow components, select Table (print values in cells of a table) to show result in table format.

  • Atlast Run the Job.


Starting job how_to_use_tSplitRow_in_Talend at 13:54 29/04/2015.

[statistics] connecting to socket on port 3988
[statistics] connected
.--------------+----------------+----------------------------------+-------------------.
|                                      tLogRow_1                                       |
|=-------------+----------------+----------------------------------+------------------=|
|Institution_ID|Institution_Name|Institution_Address               |Institution_Courses|
|=-------------+----------------+----------------------------------+------------------=|
|101           |NIIT            |722 Bur Oak Avenue  Berlin Germany|CS                 |
|101           |NIIT            |722 Bur Oak Avenue  Berlin Germany|EC                 |
|101           |NIIT            |722 Bur Oak Avenue  Berlin Germany|EE                 |
|102           |AIIM            |11 Collaroy  2093 Medrid Spain    |MBA                |
|102           |AIIM            |11 Collaroy  2093 Medrid Spain    |BBA                |
|102           |AIIM            |11 Collaroy  2093 Medrid Spain    |BCA                |
|103           |SRIT            |223 Wellington  5011 London UK    |BSC                |
|103           |SRIT            |223 Wellington  5011 London UK    |ME                 |
|103           |SRIT            |223 Wellington  5011 London UK    |MTECH              |
'--------------+----------------+----------------------------------+-------------------'

[statistics] disconnected
Job how_to_use_tSplitRow_in_Talend ended at 13:54 29/04/2015. [exit code=0]

Here you can see that 3 records are split into 9 records each Institution has multiple courses.


No comments:

Post a Comment