PDF Suite

Using OCR to Extract Text from PDFs

Optical Character Recognition (OCR) transforms scanned documents and image-based PDFs into searchable, selectable text. StackBloom PDF Suite's built-in OCR engine handles everything from old printed contracts to photographed receipts.

Step 1: Upload a scanned document or image PDF

1

Open PDF Suite from the StackBloom sidebar and navigate to your project, or create a new one. Upload the scanned document or image-based PDF you want to process. The file will appear in your project as an unprocessed document.

  • Supported formats: PDF, JPG, PNG, TIFF, BMP, WebP
  • Image PDFs are PDFs where text is embedded as pictures — OCR makes them searchable
  • You can identify an image PDF by attempting to select text — if no text is selectable, it needs OCR
  • Multi-page documents are fully supported

Step 2: Click the OCR button in the toolbar

2

Open the document in the editor and click the OCR button in the top toolbar (it looks like a text cursor over a document icon). This opens the OCR configuration panel where you can customize the processing settings before starting.

  • The OCR button is only active for documents that have not yet been processed
  • Previously OCR'd documents show a "Re-run OCR" option to update the text layer
  • OCR processing does not modify your original document file

Step 3: Select language and accuracy settings

3

Choose the language of the text in your document and the desired accuracy level. Setting the correct language significantly improves recognition quality, especially for documents with special characters or non-Latin scripts.

  • Over 100 languages are supported, including Latin, Cyrillic, Arabic, Chinese, and Japanese
  • Fast mode — quicker processing, suitable for clear, typed documents
  • Accurate mode — slower but better for handwriting or low-quality scans
  • Select multiple languages if the document contains text in more than one language

Step 4: Run OCR processing

4

Click Start OCR. A progress bar shows how many pages have been processed. Processing time depends on the number of pages and the selected accuracy mode — most documents complete within one to three minutes.

  • You'll receive an in-app notification when OCR finishes
  • Processing runs in the background so you can continue working
  • A page count and estimated time remaining are displayed during processing

Step 5: Review and copy the extracted text

5

Once complete, the PDF becomes fully searchable and the extracted text appears in the Text Layer panel on the right side of the editor. You can select, copy, and edit the recognized text directly.

  • Use Ctrl/Cmd+F to search the document for specific words or phrases
  • Click Export Text to download the extracted content as a .txt or .docx file
  • Confidence scores highlight low-certainty characters in yellow for easy review
  • Correct any recognition errors by clicking directly on the text in the layer panel

💡 Tip: Higher resolution scans produce more accurate OCR results. For best results, scan documents at 300 DPI or higher. If you're photographing a document with a phone, ensure good lighting and hold the camera directly above the page to avoid distortion.