Back to Integrations
integration integration
integration

Integrate HTML Extract with 500+ apps and services

Unlock HTML Extract’s full potential with n8n, connecting it to similar Core Nodes apps and over 1000 other services. Create adaptable and scalable workflows between HTML Extract and your stack. All within a building experience you will love.

The HTML Extract integrations are replaced by the HTML integrations

The HTML node replaces the HTML Extract node from version 0.199.0 onwards. Check out the HTML node!

Popular ways to use HTML Extract integration

HTTP Request node

Extract post titles from a blog

This workflow uses n8n to extract the names of all the posts from the Hackernoon homepage.
sm-amudhan
amudhan

Get only new RSS with photo

At the end, add the service you need, for example Telegram ++You can only see the result when you run workflow.++ *Based on these answers: Latest RSS Feed -> Rocket.Chat for get only new post Rss to Twitter with Image for get image*
thevllad
Vlad Knyzhnyk
HTTP Request node
Merge node
+2

Parse Ycombinator news page

Extract data from a webpage (Ycombinator news page) and create a nice list using itemList node. It seems that current version in n8n (0.141.1) requires to extract each variable one by one. Hopefully in a futute it will be possible to create the table using just one itemList node. Another nice feature of the workflow is an automatically generated file name with the resulting table. Check out the "fileName" option of the Spreadsheet File node: "Ycombinator_news_{{new Date().toISOString().split('T', 1)[0]}}.{{$parameter[\"fileFormat\"]}}" The resulting table is saved as .xls file and delivered via email
eduard
Eduard
HTTP Request node
+6

Track changes of product prices

This workflow automatically tracks changes on specific websites, typically in e-commerce where you want to get information about price changes. Prerequisites Basic knowledge of HTML and JavaScript Nodes Execute Command nodes create a file named kopacky.json in the /data/ folder (Make sure that n8n has the permissions to make changes to the folder in your setup) and clean data. Cron node triggers the workflow at regular intervals (default is 15 minutes), depending on how often you want to crawl URLs of your watchers. Function Item node (Change me) adds the URL watchers. You can put as many URLs (watchers) as you want by changing the JavaScript code in the node. There are four properties for each watcher: |Property|Meaning| |-|-| |slug|Unique identifier for the watcher.| |link|URL of the website where you want to track changes.| |selector|CSS selector of the HTML tag, where your price is placed. You can use browser web tools to get a specific selector.| |currency|Currency code in which your price is set.| Function Item node (Init item) saves all required data from each watcher to the kopacky.json file. HTTP Request node fetches data from the website. HTML Extract node extracts the required information from the webpage. Send Email nodes (NotifyBetterPrice) send you an email when there is an issue with getting the price, and when a better price is available (this could happen if the website is down, your tracking product is not available anymore, or the owner of the website changed the selector or HTML). IF nodes filter the incoming data and route the workflow. Move Binary Data nodes convert the JSON file to binary data. Write Binary File nodes write the product prices in the file. NOTE: This is the first (beta) version of this workflow, so it could have some issues. For example, there is an issue with getting content of those websites, where the owner of the website blocks any calls from unknown foreign services - it's typical protection against crawlers.
stehos
sthosstudio
HTTP Request node
+8

Scrape and store data from multiple website pages

This workflow allows extracting data from multiple pages website. The workflow: 1) Starts in a country list at https://www.theswiftcodes.com/browse-by-country/. 2) Loads every country page (https://www.theswiftcodes.com/albania/) 3) Paginates every page in the country page. 4) Extracts data from the country page. 5) Saves data to MongoDB. 6) Paginates through all pages in all countries. It uses getWorkflowStaticData('global') method to recover the next page (saved from the previous page), and it goes ahead with all the pages. There is a first section where the countries list is recovered and extracted. Later, I try to read if a local cache page is available and I recover the cached page from the disk. Finally, I save data to MongoDB, and we paginate all the pages in the country and for all the countries. I have applied a cache system to save a visited page to n8n local disk. If I relaunch workflow, we check if a cache file exists to discard non-required requests to the webpage. If the data present in the website changes, you can apply a Cron node to check the website once per week. Finally, before inserting data in MongoDB, the best way to avoid duplicates is to check that swift_code (the primary value of the collection) doesn't exist. I recommend using a proxy for all requests to avoid IP blocks. A good solution for proxy plus IP rotation is scrapoxy.io. This workflow is perfect for small data requirements. If you need to scrape dynamic data, you can use a Headless browser or any other service. If you want to scrape huge lists of URIs, I recommend using Scrapy + Scrapoxy.
mcolomer
Miquel Colomer
HTTP Request node

Send trending "Show HN" to email

Triggers every day at 1pm Gets the current content from Hacker News Gets all the different submission items Extracts the rank, title and url Checks if it is a "Show HN" submission Combines the items into a simple email text Sends an email with the email text
jan
Jan Oberhauser

Over 3000 companies switch to n8n every single week

Connect HTML Extract with your company’s tech stack and create automation workflows

Last week I automated much of the back office work for a small design studio in less than 8hrs and I am still mind-blown about it.

n8n is a game-changer and should be known by all SMBs and even enterprise companies.

We're using the @n8n_io cloud for our internal automation tasks since the beta started. It's awesome! Also, support is super fast and always helpful. 🤗

in other news I installed @n8n_io tonight and holy moly it’s good

it’s compatible with EVERYTHING

Implement complex processes faster with n8n

red icon yellow icon red icon yellow icon