OCR & Computer Vision
Reading text from images, document AI, face & object detection.
OCR (text from images)
- ★ Tesseract.js — Tesseract compiled to WASM. Runs in the browser or Node. Best free option; quality depends heavily on image preprocessing.
- MiniCPM-OCR / RapidOCR — newer ML-based OCRs, more accurate than Tesseract on real-world receipts / handwriting.
- EasyOCR (Python) — great accuracy; call from Node via subprocess if needed.
- PaddleOCR — strong on CJK languages.
Hosted Document AI / OCR services
- ★ Mindee — receipts, invoices, IDs, custom docs; free tier with developer credits.
- AWS Textract — full doc AI; tables, forms, signatures.
- Google Document AI — same niche, deeper models.
- Azure Document Intelligence (formerly Form Recognizer) — competitive.
- Klippa, Veryfi, Rossum, Unstructured.io — vertical OCR / extraction.
- LLM-as-OCR — Claude / GPT-4o / Gemini Flash with vision give very accurate "read the receipt" results in 2026; often cheaper than dedicated OCR for low volumes.
Computer vision / models in the browser
- ★ MediaPipe (Google) — face / hand / pose / segmentation / object detection in the browser via WASM. The default for general CV.
- TensorFlow.js — run TF / Keras models in the browser.
- Transformers.js (Hugging Face) — run transformer models including vision in browser via WebGPU.
- face-api.js — face detection / recognition; older but still works.
- Human (Vladimir Mandic) — comprehensive person analysis (face, body, hand, gesture).
- OpenCV.js — full OpenCV in WASM; for traditional CV (edges, contours, transforms).
Object detection / classification
- YOLO via ONNX Runtime Web —
onnxruntime-webruns ONNX YOLO models in the browser. - TensorFlow.js Coco-SSD — basic object detection out of the box.
- MediaPipe Object Detector — newer, more accurate.
Image embeddings (for visual search / dedup)
- CLIP via Transformers.js — compute image embeddings client-side.
- DINOv2 via Transformers.js — better embeddings.
@xenova/transformers— convenient bundling.
Background removal
@imgly/background-removal— runs in the browser with WebGPU (also see Image Editing).- rembg (Python) — server-side; or call Replicate / Fal.ai.
Pick this if…
- Default OCR, free: Tesseract.js (with good preprocessing).
- Production document parsing: Mindee (small) or AWS Textract / Google Document AI (scale).
- Sometimes cheaper, very good for occasional OCR: Claude / GPT-4o vision.
- Browser face / hand / pose tracking: MediaPipe.
- Run any HF model in the browser: Transformers.js.
- Background removal client-side:
@imgly/background-removal.