Web Dev Tools

Web Scraping & Browser Automation

Headless browsers, scrapers, RPA, and AI-driven web automation.

Headless browsers

  • Playwright — the default. Multi-browser (Chromium, Firefox, WebKit), great auto-wait, locators, traces, codegen, scriptable across Node / Python / Java / .NET.
  • Puppeteer — Chrome-only; older but still huge install base.
  • Selenium / WebDriver — when you need real Selenium grids or BiDi protocol; mature.
  • Cypress — better-known as a test framework, but works for scraping.

HTML parsing (no browser, just HTML)

  • Cheerio — jQuery-like server-side HTML parser; fast for static HTML scraping.
  • node-html-parser — fastest pure-JS parser; less ergonomic than Cheerio.
  • linkedom — lightweight DOM in Node.
  • jsdom — full DOM implementation; heavier, more accurate.
  • Parsel — CSS selector library; works with any DOM.
  • htmlparser2 — fast streaming parser.

Hosted browser automation

  • Browserbase — managed Chrome instances; built for AI agents and scraping; TS SDK; generous free tier. The most popular new pick.
  • Browserless — hosted Puppeteer; pay-per-second.
  • Steel.dev — open source + hosted; competitor to Browserbase.
  • Hyperbrowser — newer hosted browser API.
  • ScrapingBee, ScraperAPI, ZenRows — proxy + render services for anti-bot-heavy sites.
  • Apify — full scraping platform with marketplace of pre-built actors.

AI-driven browser agents

  • Stagehand (Browserbase) — natural-language browser automation on top of Playwright; the most popular AI scraping framework in 2026.
  • browser-use — Python; LangChain-flavored browser agent.
  • WebVoyager — research agent; less commercial.
  • Skyvern, Multi-On, Adept ACT — commercial browser agents.

Crawl orchestration

  • Crawlee (Apify) — batteries-included crawling framework on top of Playwright / Puppeteer / Cheerio. Queues, dedup, proxies, fingerprinting.
  • node-crawler — older.
  • @mozilla/readability — extract main article content from HTML; pair with any scraper.

Anti-detection

  • Playwright Extra + stealth plugin — bot fingerprint hiding.
  • puppeteer-extra-plugin-stealth — same idea for Puppeteer.
  • Camoufox — privacy-hardened Firefox build for scraping.
  • Residential proxies — Bright Data, Smartproxy, IPRoyal, Oxylabs, Decodo.

Be a good citizen

  • Respect robots.txt — many sites disallow it; check before scraping.
  • Rate-limit yourselfpacer, p-limit, etc.; treat each origin gently.
  • Identify your bot in User-Agent (MyBot/1.0 (+contact@example.com)).
  • Handle 429 with exponential backoff.
  • Cache aggressively — don't re-fetch the same page within a session.
  • Some scraping is illegal in some jurisdictions; this isn't legal advice.

Pick this if…

  • Default browser automation: Playwright.
  • Static HTML, no JS rendering needed: Cheerio.
  • Hosted browsers / AI agents: Browserbase + Stagehand.
  • Heavy anti-bot target: ScrapingBee or ZenRows + residential proxies.
  • Multi-page crawler: Crawlee.
  • Article extraction: @mozilla/readability.

On this page