OCR a scanned PDF
Make scans searchable or extract plain text — entirely in your browser.
Files never leave your device.
Loading English language pack
Downloading 3 MB. First-time setup only — subsequent runs are instant.
Keep the visible look, add selectable text underneath. Cmd-F finds words.
Get a plain .txt file with the recognized words.
Free preview
OCR ran on page 1 only. Upgrade to Pro to OCR the entire document.
Get Pro — $5.99/moHave a Pro key?
Free preview
OCR ran on page 1 only. Upgrade to Pro to OCR the entire document.
Get Pro — $5.99/moHave a Pro key?
OCR a PDF in your browser
How OCR works in your browser
OCR PDF runs Tesseract — the same open-source OCR engine that powers most desktop scanning apps — compiled to WebAssembly so it executes inside your browser. Each PDF page is rendered to a high-resolution image (roughly 300 DPI), then Tesseract walks the image looking for character shapes and assembles them back into text. Language packs download on first use (one-time per language, cached locally). Output comes in two modes: a searchable PDF (original page images with an invisible selectable text layer) or a plain text file containing just the recognized characters.
When to use it
This is the tool you need when your PDF is a scan rather than a digitally-generated document. Common cases: legal scans (signed contracts archived as images), older corporate paperwork digitized to fixed-resolution PDFs, photographed receipts and business cards, books and magazines scanned for archival. After OCR, the document becomes searchable in any PDF reader, accessible to screen readers, and ready for downstream tools — feed it into PDF→Word for editing, or PDF→Text to pull out the recognized text directly.
The trade-offs
OCR accuracy lives or dies on input quality. Clean 300-DPI scans of typed text in a single supported language reach roughly 95% character accuracy. Accuracy drops on lower-DPI scans (200 DPI and below), photos taken at angles or in poor light, handwritten text, mixed languages on one page, or unusual fonts. The supported languages are English, Spanish, Portuguese, and Polish — passing a document in another language returns garbled output. First-run downloads the chosen language pack (a few MB), so the initial pass is slower than later ones. The free tier processes one page per submission; Pro lifts that to multi-page files up to 100 MB.
How it compares
Adobe Acrobat's built-in OCR is the gold standard for accuracy and language coverage, but it requires a paid subscription and (in the web version) uploads your file to Adobe's servers. Cloud OCR services like Google Vision and AWS Textract are accurate but send your document to a third party. Tesseract on the command line is free and local but requires installing dependencies and writing scripts. pdfmundo's OCR sits in the middle: Tesseract-grade accuracy without the install, without the upload, and without the subscription. For scanned legal, medical, or financial documents, the privacy story matters as much as the accuracy.
Common mistakes
Picking the wrong language is the most common pitfall — Tesseract returns dramatically worse results when it expects English on a Spanish document. Second is running OCR on a PDF that already has a text layer; the engine works on the rendered image, ignores existing text, and may produce a worse version. Third: expecting handwriting recognition. Tesseract is built for printed text and will not recognize cursive or casual handwriting reliably. Fourth: low-DPI scans. A document scanned at 150 DPI for screen viewing yields noticeably worse results than 300 DPI; re-scan higher if accuracy matters.
Workflow tips
Two practices consistently improve results. First, rotate scans to upright before OCR — Tesseract is faster and more accurate on correctly-oriented pages, and sideways scans are a common source of garbled output. Use Rotate PDF if your scanner produced rotated pages. Second, scan new documents at 300 DPI in grayscale rather than color; color does not improve accuracy and triples file size. After OCR, run Compress PDF — the page images dominate the file weight, and a 30-50% reduction is typical.
Frequently asked questions
- What does "searchable PDF" mean?
- The output looks identical to your scan, but Cmd-F (Ctrl-F) finds words. Tesseract.js detects text in the image and embeds it as an invisible layer beneath the visible image. Selection-copy returns the recognized text.
- What's the difference between Searchable PDF and Text Extract modes?
- Searchable PDF preserves the visible look and adds a selectable text layer. Text Extract gives you a plain .txt download with just the recognized words.
- What languages are supported?
- English, Spanish, Portuguese, and Polish at v1. Pick the language that matches your document for best accuracy. German, French, and others are planned for v1.1 once we see traffic data on demand.
- How accurate is the OCR?
- Tesseract works best on clear, typed text at decent scan resolution. Skewed scans, low contrast, or handwritten text may produce gibberish. Try a higher-resolution scan or a different language if the result is poor.
- Why is the first run slow?
- On your first OCR run, your browser downloads the OCR engine (~3 MB) and the language pack (~1-2 MB per language). Both are cached, so subsequent runs are instant. Keep this tab in the foreground while OCR runs — backgrounded tabs are throttled by browsers.
- What's the file size limit?
- 25 MB on the free plan, 100 MB on Pro. The actual limit is your device's memory — older mobile devices may struggle on very large scans.
- What's gated behind Pro?
- Free users get a one-page preview. Pro users OCR the entire document, get a 100 MB file cap, and (when available) bulk operations and API access. $1.99 day pass, $4.99 credits, $5.99/mo Pro, or $49/annual Pro annual.
- Do you upload my files?
- No. OCR runs entirely in your browser via WebAssembly. Your files and the recognized text stay on your device.
More PDF tools
30+ tools — compress, merge, split, rotate, convert, sign, and more. All free to try.
Back to home →