Convert images with optical character recognition
Leave feedback
On this page
About image file formats
An image file format is a standard method for organizing and storing images on devices like computers, tablets and smartphones. Digital images store image data in a 2-dimensional grid of pixels where each pixel is a representation of color in terms of a number of bits. Image file types are classified into vector image formats and raster image formats. 3D Images are another type of vector image file format that is used for managing 3D images.
Raster formats
Raster Graphics are digital images that comprise of pixels data for the representation of colors. These are the most common image types for graphics used for the web as well as digital photos. Some of the raster images can be compressed to reduce image file size. Common raster image file extensions and their file formats include BMP (Bitmap image file), PNG (Portable Network Graphics) and GIF (Graphics Interchange File).
Vector formats
Vector images are defined by 2D points, instead of pixels, which are connected to give a geometric shape to the image. The points have properties that define the direction of paths, color, shape, curve, thickness, and fill. Common vector image file extensions and their file formats include SVG (Scalable Vector Graphics), EPS (Encapsulated PostScript language) and PDF (Portable Document Format).
To allow OCR conversions GroupDocs.Conversion provides an extension point to offload the actual OCR process to the OCR processing library, but at the same time gives you the simplicity of conversion setup. The extension point is the IOcrConnector interface.
First, you must decide which OCR processing library will use. Different libraries have different setup processes.
In our example, we will use Aspose.OCR. Install the Aspose.OCR nuget package in your project. Then implement IOcrConnector. The following code snippet provides a sample implementation:
importcom.aspose.ocr.*;importcom.groupdocs.conversion.Converter;importcom.groupdocs.conversion.examples.Constants;importcom.groupdocs.conversion.integration.ocr.IOcrConnector;importcom.groupdocs.conversion.integration.ocr.RecognizedImage;importcom.groupdocs.conversion.integration.ocr.TextFragment;importcom.groupdocs.conversion.integration.ocr.TextLine;importcom.groupdocs.conversion.options.convert.PdfConvertOptions;importcom.groupdocs.conversion.options.load.ImageLoadOptions;importjavax.imageio.ImageIO;importjava.awt.*;importjava.awt.image.BufferedImage;importjava.io.InputStream;importjava.lang.Character;importjava.util.ArrayList;importjava.util.Arrays;importjava.util.List;/**
* This example demonstrates how to convert image using ocr
*/publicclassConvertImageUsingOcr{publicstaticvoidrun(){StringoutputFile=Constants.getConvertedPath("converted.pdf");ImageLoadOptionsimageLoadOptions=newImageLoadOptions();imageLoadOptions.setOcrConnector(newOcrConnector());//Once the `IOcrConnector` interface is implemented, the JPG to PDF conversion code snippet looks like this:
try(Converterconverter=newConverter(Constants.SAMPLE_JPEG,()->imageLoadOptions)){PdfConvertOptionsoptions=newPdfConvertOptions();converter.convert(outputFile,options);}catch(Exceptione){System.out.println("Conversion failed: "+e.getMessage());}System.out.printf("\nDocument converted successfully. \nCheck output in %s%n",outputFile);}}classOcrConnectorimplementsIOcrConnector{@OverridepublicRecognizedImagerecognize(InputStreamimageStream){try{AsposeOCRapi=newAsposeOCR();OcrInputocrInput=newOcrInput(InputType.SingleImage);BufferedImageimage=ImageIO.read(imageStream);ocrInput.add(image);RectangleOutputdetectedRectangles=api.DetectRectangles(ocrInput,AreasType.LINES,false).get(0);RecognitionSettingsrecognitionSettings=newRecognitionSettings();recognitionSettings.setDetectAreasMode(DetectAreasMode.COMBINE);recognitionSettings.setRecognitionAreas(detectedRectangles.Rectangles);RecognitionResultresult=api.Recognize(ocrInput,recognitionSettings).get(0);returncreateRecognizedImageFromResult(result);}catch(Exceptionex){System.out.println("OCR Recognition failed: "+ex.getMessage());}returnRecognizedImage.EMPTY;}privateRecognizedImagecreateRecognizedImageFromResult(RecognitionResultresult){List<TextLine>lines=newArrayList<>();for(inti=0;i<result.recognitionAreasText.size();i++){Stringtext=result.recognitionAreasText.get(i).trim();Rectanglerectangle=result.recognitionAreasRectangles.get(i);List<TextFragment>fragments=splitToFragments(text,(int)rectangle.getX(),(int)rectangle.getY(),(int)rectangle.getWidth(),(int)rectangle.getHeight());lines.add(newTextLine(fragments));}returnnewRecognizedImage(lines);}privateList<TextFragment>splitToFragments(StringlineText,intrectangleX,intrectangleY,intrectangleWidth,intrectangleHeight){List<TextFragment>fragments=newArrayList<>();if(lineText!=null&&!lineText.isEmpty()){List<Character>frag=newArrayList<>();booleanisWhitespace=false;floatfixWidthChar=rectangleWidth/getEquivalentLength(lineText);for(inti=0;i<lineText.length();i++){if(frag.isEmpty()){isWhitespace=(lineText.charAt(i)==' ');}else{booleancurrentIsWhitespace=(lineText.charAt(i)==' ');if(i==lineText.length()-1)frag.add(lineText.charAt(i));if(currentIsWhitespace!=isWhitespace||i==lineText.length()-1){Stringfragment=frag.stream().map(String::valueOf).reduce("",String::concat);intfragWidth=Math.round(getEquivalentLength(fragment)*fixWidthChar);fragments.add(newTextFragment(fragment,newRectangle(rectangleX,rectangleY,fragWidth,rectangleHeight)));frag.clear();isWhitespace=currentIsWhitespace;}}frag.add(lineText.charAt(i));}}returnfragments;}privatefloatgetEquivalentLength(StringlineText){floatlength=0;for(charc:lineText.toCharArray()){if(c==' '){length+=0.6f;}elseif(NARROW_CHARS.contains(c)){length+=0.5f;}elseif(WIDE_CHARS.contains(c)||Character.isUpperCase(c)){length+=1.5f;}else{length+=1.0f;}}returnlength;}privatefinalList<Character>NARROW_CHARS=Arrays.asList(',','.',':',';','!','|','(',')','{','}','l','i','I','-','+','f','t','r');privatefinalList<Character>WIDE_CHARS=Arrays.asList('\t','m','w','M','W');}
sample.jpeg is the sample file used in this example. Click here to download it.
converted.pdf is the expected output PDF file. Click here to download it.
Put it simply - you install an OCR processing library, implement the IOcrConnector interface, load an image file into the Converter class providing the IOcrConnector instance, select the desired output format and GroupDocs.Conversion does all the rest.
Note
Refer to the API reference for more conversion options and customizations.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.