Scan documents to PDF Part 2

Scan documents to PDF Part 2

Recognize text in scanned documents

You can use Acrobat to recognize text in previously scanned documents that have already been converted to PDF. Optical character recognition (OCR) software enables you to search, correct, and copy the text in a scanned PDF. The original scanner resolution must have been set at 72 dpi or higher to apply OCR to a PDF.

Note

Scanning at 300 dpi produces the best text for conversion. At 150 dpi, OCR accuracy is slightly lower.

Recognize text in a single document

  1. 1. Open the scanned PDF.
  2. 2. Select All tools > Scan & OCR > In This File.

    The Recognize Text options are displayed in the pop-up dialog box.

  3. 3. In the pop-up dialog box, select a page range and language for text recognition.

  4. 4. Optionally, select Settings   to open the Recognize Text dialog box and specify the options as needed.

  5. 5. Select Recognize Text. Acrobat creates a text layer in your PDF that can be searched — or copied and pasted into a new document.

Recognize text in multiple documents

  1. 1. Select All tools > Scan & OCR > In multiple files.

  2. 2. In the Recognize Text dialog box, select Add Files, and then select Add FilesAdd Folders, or Add Open Files. Then, select the files or folder. An Output Options dialog box appears. 

  3. 3. In the Output Options dialog box, specify a target folder for output files, and filename preferences. Select OK.

  4. 4. In the Recognize Text - General Settings dialog box, specify the options and select OK.

    Acrobat creates a text layer in your PDF that can be searched — or copied and pasted into a new document.

Recognize Text - General Settings dialog box

Document Language Specifies the language for the OCR engine to use to identify the characters.

Output (PDF Output Style) Determines the type of PDF to produce. All options require an input resolution of 72 dpi or higher (recommended). All formats apply OCR and font and page recognition to the text images and convert them to normal text.

Searchable Image Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box determines whether the image is downsampled and to what extent.

Searchable Image (Exact) Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. Recommended for cases requiring maximum fidelity to the original image.

Editable Text & Images Synthesizes a new custom font that closely approximates the original, and preserves the page background using a low-resolution copy.

Downsample To Decreases the number of pixels in color, grayscale, and monochrome images after OCR is complete. Choose the degree of downsampling to apply. Higher-numbered options do less downsampling, producing higher-resolution PDFs.

Correct OCR text in PDFs

When you run OCR on a scanned output, Acrobat analyzes bitmaps of text and substitutes words and characters for those bitmap areas. If the ideal substitution is uncertain, Acrobat marks the word as suspect. Suspects appear in the PDF as the original bitmap of the word, but the text is included on an invisible layer behind the bitmap of the word. This method makes the word searchable even though it's displayed as a bitmap.

Note: If you try to select text in a scanned PDF that does not have OCR applied or try to perform a Read Out Loud operation on an image file, Acrobat asks if you want to run OCR. If you select OK, the Text Recognition dialog box opens, and you can select options described in detail under the previous topic.

  1. 1. Select All tools > Scan & OCR > Correct recognized text.

    Acrobat identifies suspected text errors and displays the image and text in the pop-up dialog box. All suspect words on the page are enclosed in boxes.

  2. 2. Select the highlighted object or box in the document, then correct it in the Recognized As box in the pop-up dialog box. Select Accept.

    The next suspect is highlighted. Correct mistakes as needed. Select Accept for each correction.

  3. 3. Select Close when the task is complete.

    • Related Articles

    • Scan documents to PDF Part 1

      Acrobat supports TWAIN and WIA drivers on Windows and ICA on macOS. Windows users can select Autodetect Color Mode for automatic content detection or choose from presets like Black & White, Grayscale, or Color Document. Custom scanning options allow ...
    • Create and verify PDF accessibility Part 1

      Overview You can use Acrobat to make PDFs meet the common accessibility standards, such as the latest version of the Web Content Accessibility Guidelines (WCAG) and PDF/UA (Universal Access, or ISO 14289). Acrobat provides the following accessibility ...
    • Viewing PDF preferences Part 2

      General preferences Basic Tools Use Single Key Accelerators To Access Tools: Enables you to select tools with a single keystroke. This option is deselected by default. Create Links From URLs: Specifies whether links that weren’t created with Acrobat ...
    • Using the Adobe PDF printer Part 2

      Set Adobe PDF printer properties (Windows) In Windows, you can usually leave the Adobe PDF printer properties unchanged, unless you have configured printer sharing or set security. Note Printing Properties are different from printer Preferences. The ...
    • Certificate-based signatures Part 1

      A certificate-based signature, like a conventional handwritten signature, identifies the person signing a document. Unlike a handwritten signature, a certificate-based signature is difficult to forge because it contains encrypted information that is ...