Question 1

What kinds of changes can it find?

Accepted Answer

Text changes — additions, removals, and modifications detected via PDF text extraction + Myers diff at the token level. Visual changes — image modifications, layout shifts, or any pixel-level difference detected when text extraction yields too little text for reliable diff (under 50 characters per page). The hybrid algorithm picks per page automatically: text diff where text exists, pixel diff where it doesn't. The summary panel classifies each page so you can see what kind of change happened where.

Question 2

Does it work for scanned PDFs?

Accepted Answer

Yes via image-diff fallback. The algorithm detects per page whether text extraction succeeded (50 characters or more). For scanned pages with no extractable text, it falls back to pixel-level canvas comparison automatically. The fallback also covers image-heavy layouts where text is rendered as embedded bitmaps. The trade-off: image diff catches every pixel difference (useful for redaction verification) but doesn't distinguish meaningful changes from noise like font rendering differences. Use the summary panel to focus on pages with the highest change counts.

Question 3

What if the two PDFs have different page counts?

Accepted Answer

The result is classified partial-comparison. Overlapping pages are compared normally; the extras on the longer side appear in the summary panel as unpaired. This is the most common case for revision review — the author added or removed pages between versions. The comparison still works for the overlapping pages, and the summary panel makes the structural change explicit. If the difference is heavy (one document much longer than the other), PDF to Text may serve better for content-only comparison.

Question 4

Are my PDFs uploaded to your servers?

Accepted Answer

No. The entire comparison runs in your browser via pdf-lib and pdfjs. Both PDFs stay on your device. This matters specifically for compare: revision-review use cases often involve sensitive content (contracts, redacted documents, financial statements) where uploading to a third-party server adds risk exactly when you don't want it. Server-based competitors require uploading; pdfmundo's comparison runs browser-only with no upload.

Question 5

Why are some text-identical PDFs flagged as different?

Accepted Answer

PDF text encoding has Unicode variants for visually-identical characters. The most common case: ligatures. The letters 'ff' encoded as two separate code points (U+0066 + U+0066) versus the single ligature glyph 'ﬀ' (U+FB00). Both render identically on screen but compare differently as character sequences. Compare PDF Files applies NFKC Unicode normalization to extracted text before diff, which catches the ligature case automatically. Persistent false positives usually indicate different font subsetting between the two PDFs — visually identical but encoded differently. The summary panel shows where the differences are; you can inspect the relevant pages to verify.

Question 6

Can I download a diff report?

Accepted Answer

Not in v1. The visual side-by-side view + inline summary panel cover the primary review use case. A downloadable diff report (annotated PDF with highlighted regions) is on the v1.1 roadmap. For now, screenshots of the side-by-side view or the summary panel cover most documentation needs.

Question 7

What's the maximum file size and page count?

Accepted Answer

25 MB per file and 50 pages per file. The page cap exists because comparison memory scales with page count: image-diff renders both pages to canvas at full resolution, which costs about 10 MB per page-pair. The 50-page cap keeps browser memory bounded for the worst case where all pages fall to image-diff. Most revision-review use cases fit within both limits; longer documents typically benefit from chunked comparison (compare specific page ranges in separate runs).

Question 8

What if one of my PDFs is corrupted?

Accepted Answer

The comparison errors with a one-corrupted result and a direct link to Repair PDF. The recovery referral chain is compare → repair → re-compare: repair recovers what it can from the damaged PDF, then you bring the recovered file back to compare. If the repaired file still won't parse, PDF to Text may extract any plain-text content for external comparison as a last resort.

Compare PDF Files

Files are identical

{count} differences found

Partial comparison

About PDF comparison

How the diff actually works

Why side-by-side beats overlay

Page count mismatch — what happens

Why browser-only for compare

When compare isn't enough

Frequently asked questions

More PDF tools, coming soon