Talend is great code generator having 200+ connectors, which gives you ability to transform data from one system to another. Talend is good for mid size organisation where you have to process few MB of data not GB`s and TB`s of data. because having lack of parallel processing, generic schema load model and batch processing features. there are some component and feature available which Talend claims it will give you parallel and batch processing but it fails at certain level. any way we are not going to discuss Talend perhaps we will discuss how can we automate Talend Job creation? instead creating hundreds of jobs for hundreds of metadata?
I am ETL developer and i have been assigned task to create one such job which will be used like generic data loader where metadata will be stored in SQL database tables, and these tables will be used by my job to create schema, apply transformations and then load the SQL, is short Dynamic schema using Talend.
I thought it`s great idea and Talend has Dynamic schema feature, then it can be done in few days. but when i started working on it became nightmare, so finally i dropped the idea of Dynamic schema. because of following reason.
- Reject Connectore will not work.
- You can not apply custom transformations during load
- You have to apply transformation using SQL.
- Your file must have header row.
- All the fields loaded with string data type.
- You have to change data type at the SQL side.
- No escape character support.
- SQL Table must present before start the load.
- Log management will not work.
I communicated with Talend using help center, Talend Forged and after so long found solution which is not Dynamic Schema but Dynamic Job creation using JobScript. Yes JobScript , it is Json like structure with nodes and child nodes, properties with values, components and settings, connections, context, routines and many more. every can be define using JobScript.
In next post i will explain what JobScript exactly, it basic structure and basic things need to create JobScript.