Difference between tAggregateRow and tAggregateSortedRow was question to me, so I started searching possible answers and I found nothing even on Talend help centre nor on Google. Therefore writing this post.
This is our input for demonstration.
I know these many columns not required but still wanted to use.
tAggregateRow: Receives a flow and aggregates it based on one or more columns. For each output line, are provided the aggregation key and the relevant result of set operations (min, max, sum…).
Question 1: Display Maximum quantity by Continent?
Step 1: Create simple job with above given input and tAggregareRow and tLogRow.
Step 2: Connect input with tAggregareRow and do the following settings.
- Add two columns in output Schema of tAggregateRow component, for quantity & Continent. you final schema should look like below image.
- Do the following setting in tAggregateRow
- In Group by table, add Continent as input and output.
- In “Operations” table, add Quantity column as input and output and select max function from function tab. see the below image for more details.
We have done the basic setting, now we can execute the job and get output no problem question was easy but when it comes to tAggregateSortedRow it becomes complicated, because official description of tAggregateSortedRow says
tAggregateSortedRow: tAggregateSortedRow receives a sorted flow and aggregates it based on one or more columns. For each output line, are provided the aggregation key and the relevant result of set operations (min, max, sum…).
lets see how it behaves with our example job.
Add another sub job with same input and output just change the tAggregateRow to tAggregateSortedRow with same setting we did for tAggregateRow except that we will add “Input Rows Count”=7 ( we have seven rows only)
But outputs are different, see the below image with both the output.
Outputs are different because we do not have sorted flow for tAggregateSortedRow component. We got our first difference that is
tAggregateSortedRow works on Sorted rows only. But tAggregateRow performs same operation without sorting rows.
Step 2: Add tSortRow to the tAggregateSortedRow flow.
- Add tSortRow component after the input and connect with input and tAggregateSortedRow using main flow.
- Configure tSortedRow component as follows.
- Sync columns using sync button.
- Inside “Criteria” table add one row and
- Schema column=continent
- sort num or alpha?=alpha
- Order asc or desc?=asc
- now execute the same job we will get a below output.
Now results are matching but order is shuffle.
We got our second difference.
tAggregateRow does not sort the result, but tAggregateSortedRow works on sorted flow that is why it produces result in sorted order.
This is the final job design which is being used for demonstration.
Now we will use same job for further demonstration.
Step 3: Modify tAggregateSortedRow Setting.
- we are working on fixed flow input so we know how many rows are in input flow. we will change the
- “Input Rows Count”=0
- Execute the job you will get below output.
We got the another difference.
tAggregateRow is not dependanat on input row count, means we can use tAggregateRow component without knowing input row count whereas tAggregateSortedRow requires input row count in prior.
Except that I did not see any major differences using these components it behaves seemlier except above differences .