OCR & Computer Vision

Reading text from images, document AI, face & object detection.

OCR (text from images)

★ Tesseract.js — Tesseract compiled to WASM. Runs in the browser or Node. Best free option; quality depends heavily on image preprocessing.
MiniCPM-OCR / RapidOCR — newer ML-based OCRs, more accurate than Tesseract on real-world receipts / handwriting.
EasyOCR (Python) — great accuracy; call from Node via subprocess if needed.
PaddleOCR — strong on CJK languages.

★ Mindee — receipts, invoices, IDs, custom docs; free tier with developer credits.
AWS Textract — full doc AI; tables, forms, signatures.
Google Document AI — same niche, deeper models.
Azure Document Intelligence (formerly Form Recognizer) — competitive.
Klippa, Veryfi, Rossum, Unstructured.io — vertical OCR / extraction.
LLM-as-OCR — Claude / GPT-4o / Gemini Flash with vision give very accurate "read the receipt" results in 2026; often cheaper than dedicated OCR for low volumes.

★ MediaPipe (Google) — face / hand / pose / segmentation / object detection in the browser via WASM. The default for general CV.
TensorFlow.js — run TF / Keras models in the browser.
Transformers.js (Hugging Face) — run transformer models including vision in browser via WebGPU.
face-api.js — face detection / recognition; older but still works.
Human (Vladimir Mandic) — comprehensive person analysis (face, body, hand, gesture).
OpenCV.js — full OpenCV in WASM; for traditional CV (edges, contours, transforms).

YOLO via ONNX Runtime Web — onnxruntime-web runs ONNX YOLO models in the browser.
TensorFlow.js Coco-SSD — basic object detection out of the box.
MediaPipe Object Detector — newer, more accurate.

@imgly/background-removal — runs in the browser with WebGPU (also see Image Editing).
rembg (Python) — server-side; or call Replicate / Fal.ai.

Default OCR, free: Tesseract.js (with good preprocessing).
Production document parsing: Mindee (small) or AWS Textract / Google Document AI (scale).
Sometimes cheaper, very good for occasional OCR: Claude / GPT-4o vision.
Browser face / hand / pose tracking: MediaPipe.
Run any HF model in the browser: Transformers.js.
Background removal client-side: @imgly/background-removal.