Using OCR to Extract Text from PDFs
Optical Character Recognition (OCR) transforms scanned documents and image-based PDFs into searchable, selectable text. StackBloom PDF Suite's built-in OCR engine handles everything from old printed contracts to photographed receipts.
Step 1: Upload a scanned document or image PDF
Open PDF Suite from the StackBloom sidebar and navigate to your project, or create a new one. Upload the scanned document or image-based PDF you want to process. The file will appear in your project as an unprocessed document.
- Supported formats: PDF, JPG, PNG, TIFF, BMP, WebP
- Image PDFs are PDFs where text is embedded as pictures — OCR makes them searchable
- You can identify an image PDF by attempting to select text — if no text is selectable, it needs OCR
- Multi-page documents are fully supported
Step 2: Click the OCR button in the toolbar
Open the document in the editor and click the OCR button in the top toolbar (it looks like a text cursor over a document icon). This opens the OCR configuration panel where you can customize the processing settings before starting.
- The OCR button is only active for documents that have not yet been processed
- Previously OCR'd documents show a "Re-run OCR" option to update the text layer
- OCR processing does not modify your original document file
Step 3: Select language and accuracy settings
Choose the language of the text in your document and the desired accuracy level. Setting the correct language significantly improves recognition quality, especially for documents with special characters or non-Latin scripts.
- Over 100 languages are supported, including Latin, Cyrillic, Arabic, Chinese, and Japanese
- Fast mode — quicker processing, suitable for clear, typed documents
- Accurate mode — slower but better for handwriting or low-quality scans
- Select multiple languages if the document contains text in more than one language
Step 4: Run OCR processing
Click Start OCR. A progress bar shows how many pages have been processed. Processing time depends on the number of pages and the selected accuracy mode — most documents complete within one to three minutes.
- You'll receive an in-app notification when OCR finishes
- Processing runs in the background so you can continue working
- A page count and estimated time remaining are displayed during processing
Step 5: Review and copy the extracted text
Once complete, the PDF becomes fully searchable and the extracted text appears in the Text Layer panel on the right side of the editor. You can select, copy, and edit the recognized text directly.
- Use Ctrl/Cmd+F to search the document for specific words or phrases
- Click Export Text to download the extracted content as a .txt or .docx file
- Confidence scores highlight low-certainty characters in yellow for easy review
- Correct any recognition errors by clicking directly on the text in the layer panel
💡 Tip: Higher resolution scans produce more accurate OCR results. For best results, scan documents at 300 DPI or higher. If you're photographing a document with a phone, ensure good lighting and hold the camera directly above the page to avoid distortion.