Does this upload my files?

No. Nothing is uploaded — running OCR on your PDF runs entirely in your browser, so your file never reaches a server and never leaves your device. We can't see it because we never receive it.

What does "searchable PDF" mean?

The output looks identical to your scan, but Cmd-F (Ctrl-F) finds words. Tesseract.js detects text in the image and embeds it as an invisible layer beneath the visible image. Selection-copy returns the recognized text.

What's the difference between Searchable PDF and Text Extract modes?

Searchable PDF preserves the visible look and adds a selectable text layer. Text Extract gives you a plain .txt download with just the recognized words.

What languages are supported?

English, Spanish, Portuguese, and Polish at v1. Pick the language that matches your document for best accuracy. German, French, and others are planned for v1.1 once we see traffic data on demand.

How accurate is the OCR?

Tesseract works best on clear, typed text at decent scan resolution. Skewed scans, low contrast, or handwritten text may produce gibberish. Try a higher-resolution scan or a different language if the result is poor.

Why is the first run slow?

On your first OCR run, your browser downloads the OCR engine (~3 MB) and the language pack (~1-2 MB per language). Both are cached, so subsequent runs are instant. Keep this tab in the foreground while OCR runs — backgrounded tabs are throttled by browsers.

What's the file size limit?

10 MB on the free plan, 25 MB on Pro. The actual limit is your device's memory — older mobile devices may struggle on very large scans.

What's gated behind Pro?

Free users OCR files up to 10 MB. Pro users get a 25 MB file cap, and (when available) bulk operations and API access. $1.99 day pass, $4.99 credits, $5.99/mo Pro, or $49/annual Pro annual.

Do you upload my files?

No. OCR runs entirely in your browser via WebAssembly. Your files and the recognized text stay on your device.

OCR a scanned PDF

Make scans searchable or extract plain text — entirely in your browser.
Files never leave your device.

Drop your scanned PDF here

Free · Up to 10 MB · Processed in your browser

OCR a PDF in your browser

How OCR works in your browser

OCR PDF runs Tesseract — the same open-source OCR engine that powers most desktop scanning apps — compiled to WebAssembly so it executes inside your browser. Each PDF page is rendered to a high-resolution image (roughly 300 DPI), then Tesseract walks the image looking for character shapes and assembles them back into text. Language packs download on first use (one-time per language, cached locally). Output comes in two modes: a searchable PDF (original page images with an invisible selectable text layer) or a plain text file containing just the recognized characters.

When to use it

This is the tool you need when your PDF is a scan rather than a digitally-generated document. Common cases: legal scans (signed contracts archived as images), older corporate paperwork digitized to fixed-resolution PDFs, photographed receipts and business cards, books and magazines scanned for archival. After OCR, the document becomes searchable in any PDF reader, accessible to screen readers, and ready for downstream tools — feed it into PDF→Word for editing, or PDF→Text to pull out the recognized text directly.

The trade-offs

OCR accuracy lives or dies on input quality. Clean 300-DPI scans of typed text in a single supported language reach roughly 95% character accuracy. Accuracy drops on lower-DPI scans (200 DPI and below), photos taken at angles or in poor light, handwritten text, mixed languages on one page, or unusual fonts. The supported languages are English, Spanish, Portuguese, and Polish — passing a document in another language returns garbled output. First-run downloads the chosen language pack (a few MB), so the initial pass is slower than later ones. The free tier handles files up to 10 MB; Pro lifts that to 25 MB.

How it compares

Adobe Acrobat's built-in OCR is the gold standard for accuracy and language coverage, but it requires a paid subscription and (in the web version) uploads your file to Adobe's servers. Cloud OCR services like Google Vision and AWS Textract are accurate but send your document to a third party. Tesseract on the command line is free and local but requires installing dependencies and writing scripts. pdfmundo's OCR sits in the middle: Tesseract-grade accuracy without the install, without the upload, and without the subscription. For scanned legal, medical, or financial documents, the privacy story matters as much as the accuracy.

Common mistakes

Picking the wrong language is the most common pitfall — Tesseract returns dramatically worse results when it expects English on a Spanish document. Second is running OCR on a PDF that already has a text layer; the engine works on the rendered image, ignores existing text, and may produce a worse version. Third: expecting handwriting recognition. Tesseract is built for printed text and will not recognize cursive or casual handwriting reliably. Fourth: low-DPI scans. A document scanned at 150 DPI for screen viewing yields noticeably worse results than 300 DPI; re-scan higher if accuracy matters.

Workflow tips

Two practices consistently improve results. First, rotate scans to upright before OCR — Tesseract is faster and more accurate on correctly-oriented pages, and sideways scans are a common source of garbled output. Use Rotate PDF if your scanner produced rotated pages. Second, scan new documents at 300 DPI in grayscale rather than color; color does not improve accuracy and triples file size. After OCR, run Compress PDF — the page images dominate the file weight, and a 30-50% reduction is typical.

OCR your PDF without uploading — your files never leave your device

Unlike most online tools, running OCR on your PDF happens entirely in your browser. The file is never uploaded to a server, never stored, and never leaves your device — so contracts, financial records, and other sensitive documents stay private.

Frequently asked questions

Does this upload my files?: No. Nothing is uploaded — running OCR on your PDF runs entirely in your browser, so your file never reaches a server and never leaves your device. We can't see it because we never receive it.
What does "searchable PDF" mean?: The output looks identical to your scan, but Cmd-F (Ctrl-F) finds words. Tesseract.js detects text in the image and embeds it as an invisible layer beneath the visible image. Selection-copy returns the recognized text.
What's the difference between Searchable PDF and Text Extract modes?: Searchable PDF preserves the visible look and adds a selectable text layer. Text Extract gives you a plain .txt download with just the recognized words.
What languages are supported?: English, Spanish, Portuguese, and Polish at v1. Pick the language that matches your document for best accuracy. German, French, and others are planned for v1.1 once we see traffic data on demand.
How accurate is the OCR?: Tesseract works best on clear, typed text at decent scan resolution. Skewed scans, low contrast, or handwritten text may produce gibberish. Try a higher-resolution scan or a different language if the result is poor.
Why is the first run slow?: On your first OCR run, your browser downloads the OCR engine (~3 MB) and the language pack (~1-2 MB per language). Both are cached, so subsequent runs are instant. Keep this tab in the foreground while OCR runs — backgrounded tabs are throttled by browsers.
What's the file size limit?: 10 MB on the free plan, 25 MB on Pro. The actual limit is your device's memory — older mobile devices may struggle on very large scans.
What's gated behind Pro?: Free users OCR files up to 10 MB. Pro users get a 25 MB file cap, and (when available) bulk operations and API access. $1.99 day pass, $4.99 credits, $5.99/mo Pro, or $49/annual Pro annual.
Do you upload my files?: No. OCR runs entirely in your browser via WebAssembly. Your files and the recognized text stay on your device.

More PDF tools

30+ tools — compress, merge, split, rotate, convert, sign, and more. All free to try.

Back to home →