Wednesday, April 29, 2015

How to use tAggregateRow component in Talend


tAggregateRow receives a input and aggregates it based on one or more columns. 

This is Orders Input File :-

Order_ID Customer_ID Employee_ID Order_Date Shipper_ID
11248
1 1 41655 3
11248
2 1 41656 1
11248
4 1 41657 4
11249
9 3 41658 1
11249
10 8 41659 1
11250
11 2 41660 1
11250
13 6 41661 3
11250
6 7 41662 2
11251
7 6 41663 2
11251
3 4 41664 4
11251
2 4 41665 2
11252
11 7 41666 1
11252
7 3 41667 2
11252
5 4 41668 1
11253
8 4 41669 4

Shippers Lookup File

Shipper_ID Shipper_Name Phone
1
Shiny Shipping 4133694044
2
Rose Marry Ship Pvt 5703453631
3
Nick Ltd 3038141672
4
Michle Ltd 7274882222
 

So firstly create a new job from Job Designs > Create Job.
Drag the schema of Orders.csv from Metadata > File Delimited > Orders.csv and drop it to the design work space and select tfileInputDelimited option from pop window.This Orders.csv file is been taken as an input file.
Or you can simply drag this component from the palette and double click on it to open the component properties and click [...] next to the File Name field to specify the path where you have created your Orders.csv file.
Then drag and drop the following components from the palette into the design workspace:-tAggregateRow,tMapt and LogRow.
Connect each component by right clicking and select Row > Main.

Now you can see that I have used a lookup file named as Shippers.csv
Drag the schema of Shippers.csv from Metadata > File Delimited > Shippers.csv and drop it to the design work space. Connect Shippers.csv FileInputDelimited by right clicking and select Row > Main.


Double-click  (tAggregateRow component) to set the properties. Click Edit schema and define the output schema. You can add as many columns as you need to hold the set operations results in the output flow.Here we want two columns Order_ID and Shipper_ID so drag it from Orders (Input - Main) to tAggregareRow_1(Output) column.
·     
  
In this example, we will calculate the how many Order_ID  per Shipper_ID. 

  • In the Group By select Shipper_ID as Output column and Shipper_ID as Input column positionThe first column mentioned as output column in the Group By table is the main for tAggregateRow.
  • In the Operations Field select Order_ID as Output column,select count as function and Order_ID as Input column position.


Now in tMap settings drop Shippers_ID from row 2 to row 3 and drag and drop Order_ID,Shippers_ID from row 2 column to output column and Shippers_Name from row 3 to output column so that these columns can appear in our output.


In the Basic settings of tLogRow components, select Table (print values in cells of a table) to show result in table format.
Atlast Run the Job.



Here you can see that Shipper_ID 1 has 6 Order_ID each shipper is counting how many orders are there and there names are displayed.

No comments:

Post a Comment