Using OCR to Extract Text from PDFs

Step 1: Upload a scanned document or image PDF

Open PDF Suite from the StackBloom sidebar and navigate to your project, or create a new one. Upload the scanned document or image-based PDF you want to process. The file will appear in your project as an unprocessed document.

Supported formats: PDF, JPG, PNG, TIFF, BMP, WebP
Image PDFs are PDFs where text is embedded as pictures — OCR makes them searchable
You can identify an image PDF by attempting to select text — if no text is selectable, it needs OCR
Multi-page documents are fully supported

Step 2: Click the OCR button in the toolbar

Open the document in the editor and click the OCR button in the top toolbar (it looks like a text cursor over a document icon). This opens the OCR configuration panel where you can customize the processing settings before starting.

The OCR button is only active for documents that have not yet been processed
Previously OCR'd documents show a "Re-run OCR" option to update the text layer
OCR processing does not modify your original document file

Step 3: Select language and accuracy settings

Choose the language of the text in your document and the desired accuracy level. Setting the correct language significantly improves recognition quality, especially for documents with special characters or non-Latin scripts.

Over 100 languages are supported, including Latin, Cyrillic, Arabic, Chinese, and Japanese
Fast mode — quicker processing, suitable for clear, typed documents
Accurate mode — slower but better for handwriting or low-quality scans
Select multiple languages if the document contains text in more than one language

Step 4: Run OCR processing

Click Start OCR. A progress bar shows how many pages have been processed. Processing time depends on the number of pages and the selected accuracy mode — most documents complete within one to three minutes.

You'll receive an in-app notification when OCR finishes
Processing runs in the background so you can continue working
A page count and estimated time remaining are displayed during processing

Step 5: Review and copy the extracted text

Once complete, the PDF becomes fully searchable and the extracted text appears in the Text Layer panel on the right side of the editor. You can select, copy, and edit the recognized text directly.

Use Ctrl/Cmd+F to search the document for specific words or phrases
Click Export Text to download the extracted content as a .txt or .docx file
Confidence scores highlight low-certainty characters in yellow for easy review
Correct any recognition errors by clicking directly on the text in the layer panel

💡 Tip: Higher resolution scans produce more accurate OCR results. For best results, scan documents at 300 DPI or higher. If you're photographing a document with a phone, ensure good lighting and hold the camera directly above the page to avoid distortion.

Step 1: Upload a scanned document or image PDF

Step 2: Click the OCR button in the toolbar

Step 3: Select language and accuracy settings

Step 4: Run OCR processing

Step 5: Review and copy the extracted text

Next Steps