Web Scraping & Browser Automation

Headless browsers, scrapers, RPA, and AI-driven web automation.

Headless browsers

★ Playwright — the default. Multi-browser (Chromium, Firefox, WebKit), great auto-wait, locators, traces, codegen, scriptable across Node / Python / Java / .NET.
Puppeteer — Chrome-only; older but still huge install base.
Selenium / WebDriver — when you need real Selenium grids or BiDi protocol; mature.
Cypress — better-known as a test framework, but works for scraping.

★ Cheerio — jQuery-like server-side HTML parser; fast for static HTML scraping.
node-html-parser — fastest pure-JS parser; less ergonomic than Cheerio.
linkedom — lightweight DOM in Node.
jsdom — full DOM implementation; heavier, more accurate.
Parsel — CSS selector library; works with any DOM.
htmlparser2 — fast streaming parser.

★ Browserbase — managed Chrome instances; built for AI agents and scraping; TS SDK; generous free tier. The most popular new pick.
Browserless — hosted Puppeteer; pay-per-second.
Steel.dev — open source + hosted; competitor to Browserbase.
Hyperbrowser — newer hosted browser API.
ScrapingBee, ScraperAPI, ZenRows — proxy + render services for anti-bot-heavy sites.
Apify — full scraping platform with marketplace of pre-built actors.

★ Stagehand (Browserbase) — natural-language browser automation on top of Playwright; the most popular AI scraping framework in 2026.
browser-use — Python; LangChain-flavored browser agent.
WebVoyager — research agent; less commercial.
Skyvern, Multi-On, Adept ACT — commercial browser agents.

★ Crawlee (Apify) — batteries-included crawling framework on top of Playwright / Puppeteer / Cheerio. Queues, dedup, proxies, fingerprinting.
node-crawler — older.
@mozilla/readability — extract main article content from HTML; pair with any scraper.