is Our First Talend component which allows you to Parse HTML document from URL.
There are many components which claims that it can parse HTML but non of them provides CSS & JQuery type custom parsing.
tHTMLInput component allows you to provide custom selector and get results in tabular format as per schema defined in component.
It iterates on Parent element which is provided on Parent Element text box and then try to get the information which is listed under the parent.
- Provides output in Tabular format.
- You can customise selectors.
- Element can be added, deleted, edited through UI table.
- Text Extraction.
- Element/Tag extraction.
- Link Extraction.
Next version we will add ability to parse HTML from files and String column.
tHTMLInput is listed on Talend Exchange you can download it from there. if you face any problem regarding component then please refer the official document page for more details.
Example Use case:
We have add one post which will describe how to use tHTMLInput component in Talend, which also teach you how to provide JSOUP syntax to extract required text from HTML page.
Follow parse html in talend post. let us know if you face any problem.
For issue/Help use contact us page with detail description of problem it will help us to understand problem in better way. You may post issue on “https://www.talendforge.org/forum/”. Talend forum. If you still facing issue then you can use below comment box to send us query we will respond you ASAP.