Monday, May 11, 2015

How to Append File Using tFileList,tFileDelete and tJava component in Talend !!

In my previous post I have shown you how to append file using tFileList component here .In this what happens whenever we run the job it will append the data again and again in same file the records will be more than the records you want to append.Old data is there in file and new data will also append in the same file.

So in this post I will show you how to first delete the data then append the new data in the same file in this way what will happen is whenever you will run the job it will first delete the data then it will append the new data.

tFileList component iterates on files or folders of set directory and it retrieves a set of files or folders based on a filemask pattern.

In below job I have taken three input csv files(with same schema) they are as follows:--

Student_List1

student_id;student_name;student_branchid
101;Sameer Chowdhary;1
102;Aditya Tiwari;1
103;Gaurav Tiwari;1
104;Shashi Singh;1
105;Yogesh Mishra;5
106;Ankit Gupta;8
107;Mohit Sharma;9
108;Rajesh Soni;8
109;Rohit Sinha;1

Student_List2

student_id;student_name;student_branchid
110;Radha Singh;2
111;Richa Swankar;9
112;Santosh Tiwari;4
113;Gaurav Tiwari;1
114;Mohammad Singh;1
115;Prachi Mishra;5
116;Duddu Gupta;3
117;Mahi Sharma;9
118;Renuka Soni;6
119;Swati Sinha;1

Student_List3

student_id;student_name;student_branchid
120;Ravi Mahlotra;7

1. Select the component  tFileDelete,tFileList,tFileInputDelimited,tJava and  tFileOutputDelimited from the Palette and drag it into the job design window.

2. Connect tFileDelete component by Right click on the tFileDelete component, select Trigger > On SubJob Ok and drag it to the  tFileList.
Connect each componnet as shown in the screenshot below:--




3. Click on the tFileDelete component properties and then click on “FileName” tab select the path that contains the excel files by clicking on the “…” button.The path where you want to append all the file together. tFileList will give filename as output into a global variable which we can use as file name in tFileDelete. 


4. Click on the tFileList component properties and then click on “Directory” tab select the directory that contains the excel files by clicking on the “…” button.For example I have all three files in one folder named as Student_List.

In the "Files" column write the Filemask such as "student*" this means that the filename which starts from student that files will be only considered in tFileList .

5. Open the tJava_1 component properties and write in the "Code" section :--

context.FileName = ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"));

System.out.println(context.FileName);

Here I have created a context variable "FileName" it will get the path of the current file.
System.out.println this will print the names of files which I have used in my job.

6. Open the tFileInputDelimited component properties and set Property type  to “Built In”.
Under “File name / Stream” tab type tfilelist then press ctrl + space.
Select tFileList_1.CURRENT_FILEPATH.

7. Click on Edit Schema to provide the schema of the files. In the Popup window add three columns as shown in screenshot below.

8. Open the component properties of tFileOutputDelimited :--

Write the File Name by clicking .... button where you want to store your single csv file.
Tick append check box to append the data to existing file instead of creating new file every time.

Note: While running job for multiple files, All input files data will be appended to single file with this append check box.
Check include header so that single file will appear with header column.



9. Open the tJava_2 component properties and write in the "Code" section :--

context.NumberOfRows=((Integer)globalMap.get("tFileInputDelimited_1_NB_LINE"));
System.out.println(context.NumberOfRows);

Here I have created a context variable "NumberOfRows" it will get the number of lines in the input file.
System.out.println this will print the number of rows of all the files.

Make sure you have multiple CSV files in your directory.
10. Click the “Run” button.
Now you will see all your files are processed one by one and loaded into the Single File.


Append_StudentList.csv(in excel format)
student_id;student_name;student_branchid
101;Sameer Chowdhary;1
102;Aditya Tiwari;1
103;Gaurav Tiwari;1
104;Shashi Singh;1
105;Yogesh Mishra;5
106;Ankit Gupta;8
107;Mohit Sharma;9
108;Rajesh Soni;8
109;Rohit Sinha;1
110;Radha Singh;2
111;Richa Swankar;9
112;Santosh Tiwari;4
113;Gaurav Tiwari;1
114;Mohammad Singh;1
115;Prachi Mishra;5
116;Duddu Gupta;3
117;Mahi Sharma;9
118;Renuka Soni;6
119;Swati Sinha;1
120;Ravi Mahlotra;7

This result is shown in your Run tab
Starting job how_to_append_file_using_tFileList at 15:34 11/05/2015.

[statistics] connecting to socket on port 3408
[statistics] connected
C:\Users\mini\Documents\Talend_Tutorials\Student_List\studentList1.csv
9
C:\Users\mini\Documents\Talend_Tutorials\Student_List\studentList2.csv
10
C:\Users\mini\Documents\Talend_Tutorials\Student_List\studentList3.csv
1
[statistics] disconnected
Job how_to_append_file_using_tFileList ended at 15:34 11/05/2015. [exit code=0]

Here you can see that first all the files append in single file then the file names and how many rows are there in each file are displayed using tJava component.

First file name StudentList1.csv number of rows 9
Second file name StudentList2.csv number of rows 10
Third file name StudentList3.csv number of rows 1

No comments:

Post a Comment