Tuesday, June 23, 2015

How to use UnpivotRow component in Talend !!

In this post I will show you how to convert columns to multiple rows.  Step by Step post will help you to follow the steps to convert columns to rows in case if you get the input data as per the below
format.



This is Institution Input File:-
Intitution_ID;Intitution_Name;Address;City;Country;Course_1;Course_2;Course_3
101;NIIT;722 Bur Oak Avenue;Berlin;Germany;CS;EE;EC
102;AIIM;11 Collaroy 2093;Medrid;Spain;MBA;BBA;BCA
103;SRIT;223 Wellington 5011;London;UK;BSC;ME;MTECH

There is a specific talend component inbuilt for this purpose which will convert the columns to rows based on the key ID. Just follow the steps and you will be able to achieve your desired output.

Search for the tUnpivotRow component in the pallete area and Drag and drop the following components from the palette tFileInputDelimited,tUnpivotRow, and tLogRow.



Open the component properties of tUnpivotRow .
Click on Edit schema button, columns will be same as in the input file but in tUnpivotRow_1(Output) there will be one pivot key and pivot value column.And we want one more column to appear in output i.e Institution_ID so add it by clicking + button.


In the Row keys field if you want Institution_ID as a key i.e ID should be displayed for all the institutions so add it by clicking on + button.


When you Run the job excluding Institution_ID columns all other columns will be multiplied by 3 rows i.e there will be 3 rows * 7 columns = 21 rows will appear in the output.

Starting job how_to_use_unpivot at 14:41 21/05/2015.

Now after running the job you should be able to see the desired output in the screen as per below format, for the proper view you can choose the "table(print values in cells of a table)" which will print all the rows in proper table format.


[statistics] connecting to socket on port 3454
[statistics] connected
.----------------+--------------------+---------------------------.
|                               tLogRow_1                                       |
|=---------------+--------------------+-------------------------=|
|pivot_key            |pivot_value                |Institution_ID  |
|=-------------------+-------------------------+----------------=|
|Institution_Name|NIIT                            |101                 |
|Address               |722 Bur Oak Avenue  |101                 |
|City                     |Berlin                          |101                 |
|Country               |Germany                     |101                 |
|Course_1             |CS                               |101                 |
|Course_2             |EC                               |101                 |
|Course_3             |EE                               |101                 |
|Institution_Name|AIIM                           |102                 |
|Address               |11 Collaroy  2093       |102                 |
|City                     |Medrid                         |102                |
|Country               |Spain                           |102                |
|Course_1             |MBA                           |102                |
|Course_2             |BBA                            |102                |
|Course_3             |BCA                            |102                |
|Institution_Name|SRIT                            |103                |
|Address               |223 Wellington  5011  |103               |
|City                     |London                         |103               |
|Country               |UK                               |103               |
|Course_1             |BSC                             |103               |
|Course_2             |ME                               |103               |
|Course_3             |MTECH                       |103               |
'---------------------+--------------------------+----------------'
[statistics] disconnected
Job how_to_use_unpivot ended at 14:41 21/05/2015. [exit code=0]

Monday, June 22, 2015

How to convert rows to columns using tPivotToColumnDelimited component in Talend !!

In this post I will show you how to convert rows to columns using tPivotToColumnsDelimited . 
It requires at least three columns in the input schema: the Pivot column, the Aggregation column and one or more Group keys.

This is Institution Input File:-

Intitution_ID;Intitution_Name;Intitution_Address;Intitution_Course;Institution_CourseName
101;IT;722 Bur Oak Avenue  Berlin Germany;Course1;CS
101;IT;722 Bur Oak Avenue  Berlin Germany;Course2;EE
101;IT;722 Bur Oak Avenue  Berlin Germany;Course3;EC
102;AIIM;11 Collaroy  2093 Medrid Spain;Course1;BCOM
102;AIIM;11 Collaroy  2093 Medrid Spain;Course2;BBA
102;AIIM;11 Collaroy  2093 Medrid Spain;Course3;BCA          
103;SRIT;223 Wellington  5011 London UK;Course1;BSC
103;SRIT;223 Wellington  5011 London UK;Course2;ME
103;SRIT;223 Wellington  5011 London UK;Course3;MTECH

Drag and drop the components from the palette and connect each of them as shown in the blow screenshot.

Configurations setting of the tPivotToColumnsDelimited  component :
Pivot Column =”Type”
Aggregation column=”Value”
Aggregation Function =”last”
Group by “ID” and “Name” column.
Rest of the configuration is for output file, where our output will be transferred. to read output file we can use either delimited component but for quick review I`ll use tFileInputFullRow.
Add tFileInputFullRow below the tFixedFlowInput component and connect with “On Sub Job Ok” trigger. and provide previously created file path and rest of the details.
add tLogRow and connect to tFileInputFullRow component and execute the job you will get above out put on console.
Final Job Design.


Open the component properties of tPivotToColumnDelimited.

Pivot Column – In order to convert rows to columns, we need to identify a one column which need to be converted to multiple columns based on Aggregate column. 
In this we want to convert Institution_Course to multiple column so select Institution_Course in Pivot column field.

Aggregate column  - Aggregation column is the column from source data on which aggregation is to be applied with specific function.
Aggregation column is Institution_CourseName here we want aggregate the course name.

Aggregation function – which type of aggregation is to be applied on input data. If no aggregation function is applied, then you can select “First”. Aggregation functions available are Sum, Count, Min, Max, First, Last. Select the Aggregation function first.

Group by – You need to provide a group by column name, this is the column based on which pivot columns are created. Select Institution_ID, Institution_Name, Institution_Address.These columns we want to be grouped by pivot column.

Give the File Name path where you want to store the data.

Open the component properties of tFileInputFullRow.
Provide the File Name path where you want to store the data.

Click on Edit schema button this schema will be shown as below.


In tLogRow component select Table(print value in cells of table) option so that result will appear in table format.

Run the job you will see all the Institution_CourseName are grouped together with Institution_ID, Institution_Name, Institution_Address.
We have 9 rows in input and 3 rows as output .

Starting job how_to_tpivottocolumn at 15:16 21/05/2015.

[statistics] connecting to socket on port 4080
[statistics] connected
.--------------------------------------------------------------------------------------.
|                            tLogRow_1                                                                    |
|=-----------------------------------------------------------------------------------=|
|line                                                                                                             |
|=-----------------------------------------------------------------------------------=|
|101;IT;722 Bur Oak Avenue  Berlin Germany;CS;EE;EC                       |
|102;AIIM;11 Collaroy  2093 Medrid Spain;BCOM  ;BBA;BCA            |
|103;SRIT;223 Wellington  5011 London UK;BSC ;ME;MTECH           |
'--------------------------------------------------------------------------------------'
[statistics] disconnected
Job how_to_tpivottocolumn ended at 15:16 21/05/2015. [exit code=0]

Wednesday, June 17, 2015

How to Combine Excel Workbook together With tUnite Component in Talend !!

In the scenaio we are combining several excel workbook or sheets inside the workbook by using tUnite component .
There are two xlsx files one contain the data of France countries with 2 rows and another file contains the data of Mexico countries with 3 rows .tUnite component will merge both the files and get the data of both the countries with output containing 5 rows.

This job will read both the files from tFileInputExcel.
Combine the files with tUnite component.
tUnite component is used to merge data from various sources, based on a common schema. 
For the tUnite component you need to match the schema.

This is the Excel workbook consisting of different sheets France.xlsx,Mexico.xlsx.



This is France data:--

Customer_ID|Customer_First_Name|Customer_Last_Name|Customer_Address|Customer_City| Customer_PostalCode|Customer_Country|RegisterDate
7|Justin|Ash|90 Ruapehu Road   Ohakune |Strasbourg|67000|France|1/13/2010
9|David|French|Williamstown Road|Marseille|45668|France|3/16/2000

This is Mexico data:--

Customer_ID|Customer_First_Name|Customer_Last_Name|Customer_Address|Customer_City| Customer_PostalCode|Customer_Country|RegisterDate
2|William|Hinkle|543 ANJOU ANJOU H1K 2P4|Mexico|71450|Mexico|10/4/2007
3|Nick|Petrakis|40  Toronto M8Z 3Z7|Mexico|48577|Mexico|3/16/2000
13|George|Loomis|16 Martel   Beloeil|Mexico|43256|Mexico|1/13/2010

Drag and drop the tFileInputExcel ,tUnite and tLogRow component from the palette.
Connect each component as shown in the below screenshot.

Open the component properties of France[tFileInputExcel_1] and Mexico[tFileInputExcel_2].In both tFileInputExcel do the same process.
Tick the box of Read excel 2007 file format.
In the File name/Stream tab provide the path of the file where you have stored the file.
To apply header to each sheet tick the box of Affect each sheet(header&footer).
Tick the checkbox of Die on error so that if there is any error while processing the file then it will stop the job.



Open the component properties of tUnite .
Here provide the same schema of France country .Simply drag and drop the schema.
In the Basic settings view of tLogRow, select the Basic option or tou can select Table option to display properly the output values.

Run the job.
Your output will look like below containing the data of both the countries France and Mexico with total 5 rows.

Starting job tunite at 13:51 17/06/2015.

[statistics] connecting to socket on port 3662
[statistics] connected
Customer_ID Customer_First_Name Customer_Last_Name Customer_Address Customer_City Customer_PostalCode Customer_Country RegisterDate
7|Justin|Ash|90 Ruapehu Road   Ohakune |Strasbourg|67000|France|1/13/2010
9|David|French|Williamstown Road|Marseille|45668|France|3/16/2000
2|William|Hinkle|543 ANJOU ANJOU H1K 2P4|Mexico|71450|Mexico|10/4/2007
3|Nick|Petrakis|40  Toronto M8Z 3Z7|Mexico|48577|Mexico|3/16/2000
13|George|Loomis|16 Martel   Beloeil|Mexico|43256|Mexico|1/13/2010
[statistics] disconnected
Job tunite ended at 13:51 17/06/2015. [exit code=0]

Tuesday, June 16, 2015

How to resolve Memory Error in Talend !

When dealing with large amounts of data, there is often a issue between performance and memory usage, so it is likely that at some point in your Talend career, you will encounter a problem which is memory related.

Increasing the memory allocated to a Job

If you have enough memory and yet your job is failing, then it is worth increasing the amount of memory available to the job you are running. You can do this by changing the value of the Java Xmx setting.

This setting is available via the Advanced Settings option from the Run tab, as shown in the below screenshot. Simply tick the box for Use specific JVM arguments, and change the value to suit your needs.

There are two types of memory allocation :--
-Xms :--When we will use this memory allocation this means that when we get started a job it is using this much memory .
-Xmx:--When we will use this memory allocation this means that the memory is the maximum memory, your job cannot use more than this memory space.



Double click on the argument -Xms256M , this screen will appear change the value and click OK button.Generally we are changing the values of Xmx maximum memory so that our job can use more memory space.

Note that you can also use G for gigabytes, for example, –Xmx3G.

If you are running the jobs in TAC(Talend Administration Control ) you need to assign the JVM parameter if you are going to process large amount of data.

You can also set "Store temp data" to true if you are using huge lookup data file to get the desired output. 

Monday, June 15, 2015

How to get reject data using tFilterRow in Talend !!

In a given scenario there is a Customer data belonging to different country we have to put each customer data with there country in different excel sheets and in one excel sheet with sub sheets.

This is Customers Data Input File:--
Customer_ID,Customer_First_Name,Customer_Last_Name,Customer_Address,Customer_City,Customer_PostalCode,Customer_Country,RegisterDate

1,Lee,Sime,722 Bur Oak Avenue ,Berlin,12232,Germany,1/13/2010
2,William,Hinkle,543 ANJOU ANJOU H1K 2P4,Mexico,71450,Mexico,10/4/2007
3,Nick,Petrakis,40  Toronto M8Z 3Z7,Mexico,48577,Mexico,3/16/2000
4,Jian,Paysnoe,11 Collaroy  2093,London,56008,UK,1/13/2010
5,Min,Parks,223 Wellington  5011,Berlin,32456,Germany,10/4/2007
6,Joseph,guo,324 Tman Street   Stafford Heights ,Mannheim,43567,Germany,3/16/2000
7,Justin,Ash,90 Ruapehu Road   Ohakune ,Strasbourg,67000,France,1/13/2010
8,Ming,Ho,32 Wilfred road,Madrid,33456,Spain,10/4/2007
9,David,French,Williamstown Road,Marseille,45668,France,3/16/2000
10,GORDON,baleri,East 7th Avenue   Vancouver Vancouver,Tsawassen,98977,Canada,1/13/2010
11,Aaron,Vanzin,431 algary Calgary ,London,56785,UK,10/4/2007
12,Gregory,wang,29 View Royal Ave   Victoria Victoria,Buenos Aires,54367,Argentina,3/16/2000
13,George,Loomis,16 Martel   Beloeil,Mexico,43256,Mexico,1/13/2010
14,JACOB,Jiang,13 Surrey Close,Bern,35843,Switzerland,10/4/2007

Sync Columns to propogate the metadata.

  • In the Conditions table, fill the filtering parameters.
  • In InputColumn, select Customer_Country, Function as Empty,Operator as Equals.
  • Or you can select other options from dropdown list such as greater than,lower based on your requirement in Operator field.


  • In the Value column, type "UK" to filter .

Open the component properties of UK(tFileOutputExcel).
Tick the check box of Write excel2007 file format and Include header.
In the File Name tab provide the path where you want to store the data of UK country.

And do the same process for other countries also.

Here in tFilterRow_4 we are taking three countries Germany Spain and Canada together so here we will use option Use advanced mode then click check box and type in the following regular expression that includes Country to be searched---

input_row.Customer_Country.equals("Germany") ||
input_row.Customer_Country.equals("Spain") || 
input_row.Customer_Country.equals("Canada")


Click on the tFileList component properties and then click on “Directory” tab select the directory that contains the excel files by clicking on the “…” button.For example I have all four files in one folder named as customer_reject so provide the path of this folder.

In the "Files" column write the Filemask such as "U*" , "Fran*","Mexico*","Germany*" this means that the filename which starts from this words that files will be only considered in tFileList .

Open the tFileInputEcxel_1 component properties and set Property type  to “Built In”.
Under “File name / Stream” tab type 
((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))
Or you can type tfilelist then press ctrl + space.
Then select this option tFileList_1.CURRENT_FILEPATH.


Open tmap properties here input column with output column auto map is done each input column is connected to its output column.Click ok button.

Open the tFileInputEcxel_3 component properties and set Property type  to “Built In”.
Give the File Name path where you want to store all the countries together in excel sheet with sub sheets for eg, I have given the folder name country_all.

Under “Sheet Name” tab type 
StringHandling.EREPLACE(((String)globalMap.get("tFileList_1_CURRENT_FILE")),".xlsx","")

When you will Run the job firstly each country will be in your directory path and then from there tfilelist component will check eack country and will output each countries in one excel sheets named Country_all with sub countries.As you can see in screenshot below country_all.xlsx there are sub sheets named as France ,GermanySpainCanada ,Mexico,UK.


Friday, June 12, 2015

How to do Deploying and Scheduling in Talend !!

This post will help you to schedule the Talend jobs in case if you are not using Enterprise edition of Talend, because  enterprise Talend comes with TAC (Talend Administration Center) where you can schedule the job easily.

Below steps are to schedule the job if you using Open Source Talend. To schedule the job follow the below steps and at the end of this post you should be able to schedule the talend job.

1.Right-click on the job and select the option Build Job.


2.Click on Browse to navigate to the folder .

3.Ensure the Export type is Autonomous Job, and tick Extract the zip file.

4. In the options tick Shell launcher and Context Scripts.

The Shell launchers option allows you to specify if you want to create shell launch scripts. These are Unix and Windows style scripts for launching your Job. You may choose the style of scripts that you want to export.

If you select the Context scripts option, then Talend will export scripts for each of the Context that you have defined in your Job.
.
5. Click on Finish to compile the job.


6.Navigate to the compiledCode folder, and you will see a zip file and a directory for the compiled job.

Executing the job

Open a command window within Windows or a shell window in Unix.
.bat is for windows and .sh is for UNIXHere I am executing the job in windows.

Enter the command “<jobName>_run.bat”. The job shown here is how_to_use_tjava_component, thus  how_to_use_tjava_component_run.bat.




Now if you want to Schedule your Talend job daily ,weakly etc.Follow the Steps:--
Go to the Control Panel Settings in your system under Administrative Tools > Task Scheduler.

Click Create Basic Task on right side of screen.

 Write the Name and Description of the task.

 Then select how often you want to run your task . Then click Next button.

 Above you have selected daily option so task will occur every day.

 Select Start a program in Action Field.Click on Next button.

 Browse to the location of the batch file.Click on Next button.

 Here the Summary of your task is shown.Click on Finish button.

Tuesday, June 2, 2015

How to Download Files From FTP Server Using Talend !!

In this post I will show you how to extract files or get files from FTP server using Talend.
In the given scenario we assume that we don’t know the exact names of the files to be downloaded from the remote location and we just know the extensions type and the path of the incoming files .

As an ETL developer we know most of the time data comes in multiple format through FTP and we need to process the data to keep the business running. In this post will try to give an idea how we can process the data to achieve our desired data. I will be showing how to get the file from FTP and UnArchive the file. Once Archived file is available you can push that data into the database as per you required business rules.

Drag and drop the following component from the palette :-
tFTPConnection , tFTPGet , tFileList , tFileUnarchive.
Connect each component as shown in the screenshot below.



tFTPConnection creates a connection to your FTP server.
Open the component properties of tFTPConnection and fill the Host name , Port , Username and Password field.


You can also create the FTP connection under MetaData repository

Open the component properties of tFileGet and fill the Local Directory path where you want to copy the files and remember it should always be in double quotes "".
And fill the path of incoming files in the Remote Directory .
In the Files list using Filemask specify all the types of files you want to retrieve from the remote server by clicking + button.



In the tFileList component fill the Directory path and Filemask of the files same as you gave in the tFileGet component local directory path and filemask.


Now tFileList component is connected with Iterate link to tFileUnarchive component so that all the files one by one get archive.

Open the component properties of tFileUnarchive.
In the Archive File field provide the path of the tFileList.
((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))

Or you can press ctrl + space key and select tFileList_1_CURRENT_FILEPATH.




Atlast when you Run the job all the files get retrieved in your local directory which you have given with mentioned extensions.