When working with n8n and AI workflows, retrieving parseable data via established means is critical. Not everything has to be AI. I wanted a simple way to extract an article from a web page so that I could pass the content to an LLM without it getting confused by marketing and other links.
The Challenge #
I needed a clean way to get just the article content from websites. Luckily, Mozilla offers such a project , and there are a couple other n8n node projects that do similar things. But none of the projects at that time were guaranteed to come from the source repo.
The Solution: Article Extract Node with Provenance #
This is why I decided to build my own. It’s super basic. Not complicated in any way. But it does introduce the concept of provenance with n8n-nodes-article-extract and allows users the security of knowing what they are downloading without having to read the files in the package.
The node is simple by design:
- Does one thing well - extracts clean article content
- Easy to understand and use
- Integrates seamlessly with n8n workflows
Why Provenance Matters #
Without provenance , guaranteeing the repo matches to the published package is a package consumer responsibility. You’d have to:
- Hope the package author is trustworthy
- Download and inspect the package code yourself
- Compare it manually to the repo code
- Just trust that nothing sneaky happened in the build process
I wanted to make this easier and more secure. With provenance, you get cryptographic proof that what you’re downloading was actually built from the source code you can see. Without provenance, you’re just trusting that the person publishing the package is being honest.
How It Works #
I built the node on top of Mozilla’s Readability library - the same one that powers Firefox’s Reader View. It’s battle-tested and handles most websites pretty well.
The cool part is the provenance setup. When I publish a new version, the entire build process is verified and signed. You can check out the GitHub repo to see exactly how it works, or even contribute if you want to make it better!