Web Dev Tools

OCR & Computer Vision

Reading text from images, document AI, face & object detection.

OCR (text from images)

  • Tesseract.js — Tesseract compiled to WASM. Runs in the browser or Node. Best free option; quality depends heavily on image preprocessing.
  • MiniCPM-OCR / RapidOCR — newer ML-based OCRs, more accurate than Tesseract on real-world receipts / handwriting.
  • EasyOCR (Python) — great accuracy; call from Node via subprocess if needed.
  • PaddleOCR — strong on CJK languages.

Hosted Document AI / OCR services

  • Mindee — receipts, invoices, IDs, custom docs; free tier with developer credits.
  • AWS Textract — full doc AI; tables, forms, signatures.
  • Google Document AI — same niche, deeper models.
  • Azure Document Intelligence (formerly Form Recognizer) — competitive.
  • Klippa, Veryfi, Rossum, Unstructured.io — vertical OCR / extraction.
  • LLM-as-OCR — Claude / GPT-4o / Gemini Flash with vision give very accurate "read the receipt" results in 2026; often cheaper than dedicated OCR for low volumes.

Computer vision / models in the browser

  • MediaPipe (Google) — face / hand / pose / segmentation / object detection in the browser via WASM. The default for general CV.
  • TensorFlow.js — run TF / Keras models in the browser.
  • Transformers.js (Hugging Face) — run transformer models including vision in browser via WebGPU.
  • face-api.js — face detection / recognition; older but still works.
  • Human (Vladimir Mandic) — comprehensive person analysis (face, body, hand, gesture).
  • OpenCV.js — full OpenCV in WASM; for traditional CV (edges, contours, transforms).

Object detection / classification

  • YOLO via ONNX Runtime Webonnxruntime-web runs ONNX YOLO models in the browser.
  • TensorFlow.js Coco-SSD — basic object detection out of the box.
  • MediaPipe Object Detector — newer, more accurate.

Image embeddings (for visual search / dedup)

  • CLIP via Transformers.js — compute image embeddings client-side.
  • DINOv2 via Transformers.js — better embeddings.
  • @xenova/transformers — convenient bundling.

Background removal

  • @imgly/background-removal — runs in the browser with WebGPU (also see Image Editing).
  • rembg (Python) — server-side; or call Replicate / Fal.ai.

Pick this if…

  • Default OCR, free: Tesseract.js (with good preprocessing).
  • Production document parsing: Mindee (small) or AWS Textract / Google Document AI (scale).
  • Sometimes cheaper, very good for occasional OCR: Claude / GPT-4o vision.
  • Browser face / hand / pose tracking: MediaPipe.
  • Run any HF model in the browser: Transformers.js.
  • Background removal client-side: @imgly/background-removal.

On this page