Wednesday, April 29, 2015

How to use tExtractRegexFields component in Talend

tExtractRegexFields component is used to extract multiple column from single column using regular expression.

This is Student Input File

Student_ID;Student_Email;Student_Age
101;megha@yahoo.com;21
102;leena@gmail.com;34
103;shailja@hotmail.in;22
104;anupama@twitter.org;19
105;ayushi@facebook.in;24

So firstly create a new job from Job Designs > Create Job.
  • Drag the schema of Student.csv from Metadata > File Delimited > Student.csv and drop it to the design work space and select tfileInputDelimited option from pop window.This Student.csv file is been taken as an input file.

Or you can simply drag this component from the palette and double click on it to open the component properties and click [...] next to the File Name field to specify the path where you have created your input file.
  • Then drag and drop the following components from the palette into the design workspace:-tExtractRegexFields and tLogRow.
  • Connect each component by right clicking and select Row > Main.




  •  Open the component properties of tExtractRegexFields to view the Basic Settings 
  1. Select Student_Email in "Field to split" field.
  2. Type the regular expression in the"Regex" field as "([a-z]*)@([a-z]*).([a-z]*)" is used to match the three parts of an Student_Email column :  Name, Domain and ID .
  3. Click on "Edit Schema" Button.



Drag Student_ID,Sudent_Age as it is from tFileInputDelimited_1(Input - Main) in the output panel of the tExtractRegexFields_1(Output) , click the (+) button to add three columns for the output schemaHere we want to split the Student_Email column into three columns in the output as Name, Domain and ID.


In the Basic settings of tLogRow components, select Table (print values in cells of a table) to show result in table format.

Atlast Run the Job.



In the result you can see that all the five rows are executed successfully and Student_Email column is split into multiple columns.

No comments:

Post a Comment