Tuesday, September 12, 2017

Put File Mask on tFileList Component dynamically in Talend

How I can set file mask for tFilelist component in Talend that it recognize date automatically and it will download only data for desired date?

There are two ways of doing it.
1.    Create context variable and use this variable in file mask.
2.    Directly use TalendDate.getDate() or any other date function in file mask.
See both of them in component
1st Menthod,
·         Create context variable named with dateFilter as string type.
·         Assign value to context.dateFilter=TalendDate.getDate("yyyy-MM-dd");
·         Suppose you have file name as "ABC_2015-06-19.txt" then
·         In tFileList file mask use this variable as follows.
"ABC_"+context.dateFilter+".*"

2nd Menthod
·         In tFileList file mask use date function as follows.
"ABC_"+TalendDate.getDate("yyyy-MM-dd")+".*"


Above are the two best way, you can make changes in file mask per your file names.

LIKE function in Talend

For example there is a column order_payment_method , in which values in other format and you want to change in proper format , for that we have made some business rules as below.



StringHandling.LEN(row1.Order_Payment_Method) >  0 ? 

CommonDb.like(StringHandling.DOWNCASE(row1.Order_Payment_Method),  "%ccavenue-credit_card%")?"CCAvenue_CC":

CommonDb.like(StringHandling.DOWNCASE(row1.Order_Payment_Method),"%credit_card%")?"CreditCard":

CommonDb.like(StringHandling.DOWNCASE(row1.Order_Payment_Method),"%ccavenue%")?"CCAvenue": “N/A”


Here orders of the business rules are very important because if you have noticed there are two categories “credit_card” and “ccavenue-credit_card” so if you interchange the order it will change “ccavenue-credit_card” to CreditCard so you should put “ccavenue-credit_card” first then for “credit_card” same for others.

LIKE is better option if you are not sure what else will be coming with value otherwise you can use .equals() option , in above case you should better use .equals because those categories are fixed.


Best example for using like option is color, like if item name contains black then Black, and if contain “dusk black” then “Dusk Black”
Need to add commonDb code

CommonDb.like(StringHandling.DOWNCASE(main.Item_Name),"%dusk black%")?"Dusk Black":
CommonDb.like(StringHandling.DOWNCASE(main.Item_Name),"%deep black%")?"Deep Black":
CommonDb.like(StringHandling.DOWNCASE(main.Item_Name),"%black%")?"Black"


Generate rows by month for the month between Start Date and End Dates in Redshift



If you run the first only SELECT part which is for the sample data set you will see below dataset.


Sample Data Set :-

item_id start_date end_date monthly_revenue
242345 2/26/2016 7/26/2016        700



Desired Output by month :-

item_id       month  monthly_revenue
242345    2/26/2016 700
242345    3/26/2016 700
242345    4/26/2016 700
242345    5/26/2016 700
242345    6/26/2016 700
242345    7/26/2016 700


Redshift Query to get above data set :-



WITH SampleData AS (
    SELECT
       CAST(242345 AS INTEGER) AS Item_Id
       ,CAST('2016-02-26' AS DATE) AS Start_Date
       ,CAST('2016-07-26' AS DATE) AS End_Date
       ,CAST(700 AS INTEGER) AS Monthly_Revenue
)

,cteTally AS (
        SELECT 0 AS TallyNum
        UNION ALL
        SELECT 1
        UNION ALL
        SELECT 2
        UNION ALL
        SELECT 3
        UNION ALL
        SELECT 4
        UNION ALL
        SELECT 5
        UNION ALL
        SELECT 6
        UNION ALL
        SELECT 7
        UNION ALL
        SELECT 8
        UNION ALL
        SELECT 9
        UNION ALL
        SELECT 10
        UNION ALL
        SELECT 11
    )
    SELECT
        Item_ID,
        DATEADD(MONTH,   c.TallyNum,   t.Start_date) AS "Month",
        Monthly_Revenue
    FROM  SampleData t  INNER JOIN cteTally c
    ON DATEDIFF(MONTH,   t.Start_Date,   t.End_Date) >= c.TallyNum ;

Calculate Cumulative Sum in Redshift - Get sum of values based on previous row

Here in this post I will be explaining you about the query to get the cumulative sum of the rows based on the previous rows. many times you will need to get this for the reporting purpose.

Sample result set :-

itemweekstartcountryregionquantity
AAAA9/11/2017    INIndia
AAAA9/18/2017    INIndia
AAAA9/25/2017    INIndia
AAAA10/2/2017    INIndia2000
AAAA10/9/2017    INIndia3000
AAAA10/16/2017    INIndia
AAAA10/23/2017    INIndia
AAAA10/30/2017    INIndia

Expected Output Result :-


item weekstart country region quantity csum
AAAA 9/11/2017     IN India
AAAA 9/18/2017     IN India
AAAA 9/25/2017     IN India
AAAA 10/2/2017     IN India 2000 2000
AAAA 10/9/2017     IN India 3000 5000
AAAA 10/16/2017     IN India 5000
AAAA 10/23/2017     IN India 5000
AAAA 10/30/2017     IN India 5000



Below is the query which you can use to get the desired output as mentioned above.

Redshift Query :-

SELECT  ITEM,
                WEEKSTART,
                 COUNTRY,
                 REGION,
                 SUM(QUANTITY) OVER (PARTITION BY item ORDER BY item,weekstart,country ROWS UNBOUNDED PRECEDING) AS csum
FROM table_name
ORDER BY ITEM,WEEKSTART,COUNTRY