Back to Integrations
integration integration
integration

Integrate HTML Extract with 500+ apps and services

Unlock HTML Extract’s full potential with n8n, connecting it to similar Core Nodes apps and over 1000 other services. Create adaptable and scalable workflows between HTML Extract and your stack. All within a building experience you will love.

The HTML Extract integrations are replaced by the HTML integrations

The HTML node replaces the HTML Extract node from version 0.199.0 onwards. Check out the HTML node!

Popular ways to use HTML Extract integration

HTTP Request node

Send trending "Show HN" to email

Triggers every day at 1pm Gets the current content from Hacker News Gets all the different submission items Extracts the rank, title and url Checks if it is a "Show HN" submission Combines the items into a simple email text Sends an email with the email text
jan
Jan Oberhauser
HTTP Request node

Extract post titles from a blog

This workflow uses n8n to extract the names of all the posts from the Hackernoon homepage.
sm-amudhan
amudhan
HTTP Request node
Merge node
+2

Parse Ycombinator news page

Extract data from a webpage (Ycombinator news page) and create a nice list using itemList node. It seems that current version in n8n (0.141.1) requires to extract each variable one by one. Hopefully in a futute it will be possible to create the table using just one itemList node. Another nice feature of the workflow is an automatically generated file name with the resulting table. Check out the "fileName" option of the Spreadsheet File node: "Ycombinator_news_{{new Date().toISOString().split('T', 1)[0]}}.{{$parameter[\"fileFormat\"]}}" The resulting table is saved as .xls file and delivered via email
eduard
Eduard
HTTP Request node
+8

Scrape and store data from multiple website pages

This workflow allows extracting data from multiple pages website. The workflow: 1) Starts in a country list at https://www.theswiftcodes.com/browse-by-country/. 2) Loads every country page (https://www.theswiftcodes.com/albania/) 3) Paginates every page in the country page. 4) Extracts data from the country page. 5) Saves data to MongoDB. 6) Paginates through all pages in all countries. It uses getWorkflowStaticData('global') method to recover the next page (saved from the previous page), and it goes ahead with all the pages. There is a first section where the countries list is recovered and extracted. Later, I try to read if a local cache page is available and I recover the cached page from the disk. Finally, I save data to MongoDB, and we paginate all the pages in the country and for all the countries. I have applied a cache system to save a visited page to n8n local disk. If I relaunch workflow, we check if a cache file exists to discard non-required requests to the webpage. If the data present in the website changes, you can apply a Cron node to check the website once per week. Finally, before inserting data in MongoDB, the best way to avoid duplicates is to check that swift_code (the primary value of the collection) doesn't exist. I recommend using a proxy for all requests to avoid IP blocks. A good solution for proxy plus IP rotation is scrapoxy.io. This workflow is perfect for small data requirements. If you need to scrape dynamic data, you can use a Headless browser or any other service. If you want to scrape huge lists of URIs, I recommend using Scrapy + Scrapoxy.
mcolomer
Miquel Colomer

Get only new RSS with photo

At the end, add the service you need, for example Telegram ++You can only see the result when you run workflow.++ *Based on these answers: Latest RSS Feed -> Rocket.Chat for get only new post Rss to Twitter with Image for get image*
thevllad
Vlad Knyzhnyk
HTTP Request node
Webhook node
Notion node

Add articles to a Notion list by accessing a Discord slash command

This workflow allows you to add articles to a Notion reading list by accessing a Discord slash command. Prerequisites A Notion account and credentials, and a reading list similar to this template. A Discord account and credentials, and Discord Slash Command connected to n8n. Nodes Webhook node triggers the workflow whenever the Discord Slash command is issued. IF node checks the type returned by Discord. If the type is not equal to 1, it will return true, otherwise false. HTTP Request node makes an HTTP call to the link and gets the HTML of the webpage. HTML Extract node extracts the title from the HTML which we will use in the next node. Notion node adds the link to your Notion reading list. Set nodes set the reply values for Discord and register the Interaction Endpoint URL.
harshil1712
ghagrawal17

Over 3000 companies switch to n8n every single week

Connect HTML Extract with your company’s tech stack and create automation workflows

Last week I automated much of the back office work for a small design studio in less than 8hrs and I am still mind-blown about it.

n8n is a game-changer and should be known by all SMBs and even enterprise companies.

We're using the @n8n_io cloud for our internal automation tasks since the beta started. It's awesome! Also, support is super fast and always helpful. 🤗

in other news I installed @n8n_io tonight and holy moly it’s good

it’s compatible with EVERYTHING

Need help setting up your HTML Extract integration?

Discover our latest community's recommendations and join the discussions about HTML Extract integration.
Dan Burykin
Jace Byers

Implement complex processes faster with n8n

red icon yellow icon red icon yellow icon