The hard part of web scraping is extracting useful JSON data from downloaded HTML. It's a tedious process, requires mastering CSS selectors, and if (when) a target website updates its layout, this task needs to be done again. Cheerio.js library (used in server-side JavaScript) is one of the most popular libraries for this task. ScrapeNinja is a web scraping API and uses Cheerio.js under the hood to extract useful data.
The tool you see on this page leverages latest advancements in 🤖 agentic AI and allows you to generate JavaScript code for a Cheerio.js extractor using LLM (Large Language Model). The best part is that it is smart enough to evaluate the quality of the generated extractor and iterate on it until it is good enough.
The generated extractor code can later be used in ScrapeNinja API calls and the ScrapeNinja sandbox to extract structured data from similar webpages.
🧪 This is an experimental tool.
You can now take the generated extractor and use it in the ScrapeNinja Sandbox and after testing it against multiple similar pages of the same website, integrate it into your project by using ScrapeNinja API. Open the sandbox and paste the code into the Extractor field:
https://scrapeninja.net/scraper-sandbox
// Remove this line when pasting into the sandbox: export { extract };
function extract(input, cheerio)
. Provide an example HTML in the input, run, and iterate on selectors if needed.