Render PDF documents as HTML and image files
Leave feedback
On this page
GroupDocs.Viewer for Java allows you to render your PDF files in HTML, PNG, and JPEG formats. Use this library to implement a simple PDF viewer within your Java application (web or desktop).
Create a Viewer class instance to get started with the GroupDocs.Viewer API. Pass a document you want to view to the class constructor. You can load the document from a file or stream. Call one of the Viewer.view method overloads to convert the document to HTML or image format. These methods allow you to render the entire document or specific pages.
Create an HtmlViewOptions class instance and pass it to the Viewer.view method to convert a PDF file to HTML. The HtmlViewOptions class properties allow you to control the conversion process. For instance, you can embed all external resources in the generated HTML file, minify the output file, and optimize it for printing. Refer to the following documentation section for details: Rendering to HTML.
Create an HTML file with embedded resources
To save all elements of an HTML page (including text, graphics, and stylesheets) into a single file, call the HtmlViewOptions.forEmbeddedResources method and specify the output file name.
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.HtmlViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create an HTML files.
// {0} is replaced with the current page number in the file name.
HtmlViewOptionsviewOptions=HtmlViewOptions.forEmbeddedResources("page_{0}.html");viewer.view(viewOptions);}
The following image demonstrates the result:
Create an HTML file with external resources
If you want to store an HTML file and additional resource files (such as fonts, images, and stylesheets) separately, call the HtmlViewOptions.forExternalResources method and pass the following parameters:
The output file path format
The path format for the folder with external resources
The resource URL format
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.HtmlViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create an HTML file for each PDF page.
// Specify the HTML file names and location of external resources.
// {0} and {1} are replaced with the current page number and resource name, respectively.
HtmlViewOptionsviewOptions=HtmlViewOptions.forExternalResources("page_{0}.html","page_{0}/resource_{0}_{1}","page_{0}/resource_{0}_{1}");viewer.view(viewOptions);}
The image below demonstrates the result. External resources are placed in a separate folder.
Create HTML with fixed layout
By default, PDF and EPUB documents are rendered to HTML with fixed layout to ensure that the output HTML looks the same as a source document. Rendering to fixed layout means that all the HTML elements are absolutely positioned to the container element. And container element has a fixed size so browser window resizing will not have an effect on the position and size of elements in a document.
The following image demonstrates PDF document rendered HTML with fixed layout:
ImageQuality.LOW — The image resolution is low (96 DPI), and the image size is small. Use this value to increase the conversion performance.
ImageQuality.MEDIUM — The image resolution is medium (192 DPI), and the image size is larger compared to the low quality images.
ImageQuality.HIGH — The image resolution is high (300 DPI), and the image size is big. Use of this value may decrease the conversion performance.
The following code snippet shows how to set the medium image quality when rendering a PDF document to HTML:
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.HtmlViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create an HTML files.
// {0} is replaced with the current page number in the file name.
HtmlViewOptionsviewOptions=HtmlViewOptions.forEmbeddedResources("page_{0}.html");// Set image quality to medium.
viewOptions.getPdfOptions().setImageQuality(ImageQuality.MEDIUM);viewer.view(viewOptions);}
Render text as an image
GroupDocs.Viewer supports the HtmlViewOptions.getPdfOptions().setRenderTextAsImage option that allows you to render text as an image when you convert a PDF file to HTML. In this case, the layout of the output HTML file closely mirrors the layout of the source PDF document.
The following code snippet shows how to enable this option in code:
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.HtmlViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create an HTML files.
// {0} is replaced with the current page number in the file name.
HtmlViewOptionsviewOptions=HtmlViewOptions.forEmbeddedResources("text-as-image_{0}.html");// Enable rendering text as image.
viewOptions.getPdfOptions().setRenderTextAsImage(true);viewer.view(viewOptions);}
The image below illustrates the result. PDF content is exported to HTML as an image, so users cannot select or copy document text.
Enable multi-layer rendering
When you convert a PDF file to HTML, GroupDocs.Viewer creates an HTML document with a single layer (the z-index is not specified for document elements). This helps increase performance and reduce the output file size. If you convert a PDF document with multiple layers and want to improve the position of document elements in the output HTML file, use the HtmlViewOptions.getPdfOptions().setEnableLayeredRendering method to render text and graphics in the HTML file according to their z-order in the source PDF document.
The following code snippet shows how to enable the multi-layer rendering:
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.HtmlViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create an HTML files.
// {0} is replaced with the current page number in the file name.
HtmlViewOptionsviewOptions=HtmlViewOptions.forEmbeddedResources("page_{0}.html");// Enable the multi-layer rendering.
viewOptions.getPdfOptions().setEnableLayeredRendering(true);viewer.view(viewOptions);}
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.PngViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create a PNG image for each PDF page.
// {0} is replaced with the current page number in the image name.
PngViewOptionsviewOptions=newPngViewOptions("output_{0}.png");// Set width and height.
viewOptions.setWidth(950);viewOptions.setHeight(550);viewer.view(viewOptions);}
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.JpgViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create a JPG image for each PDF page.
// {0} is replaced with the current page number in the image name.
JpgViewOptionsviewOptions=newJpgViewOptions("output_{0}.jpg");// Set width and height.
viewOptions.setWidth(950);viewOptions.setHeight(550);viewer.view(viewOptions);}
Preserve the size of document pages
When you render PDF documents as images, GroupDocs.Viewer calculates the optimal image size to achieve better rendering quality. If you want the generated images to be the same size as pages in the source PDF document, use the PdfOptions.setRenderOriginalPageSize method of the PngViewOptions or JpgViewOptions class (depending on the output image format).
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.PngViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create a PNG image for each PDF page.
// {0} is replaced with the current page number in the image name.
PngViewOptionsviewOptions=newPngViewOptions("output_{0}.png");// Preserve the size of document pages.
viewOptions.getPdfOptions().setRenderOriginalPageSize(true);viewer.view(viewOptions);}
Enable font hinting
To adjust the display of outline fonts when you convert PDF documents to PNG or JPEG, use the PdfOptions.setEnableFontHinting method, as shown below:
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.PngViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create a PNG image for each PDF page.
// {0} is replaced with the current page number in the image name.
PngViewOptionsviewOptions=newPngViewOptions("output_{0}.png");// Enable font hinting
viewOptions.getPdfOptions().setEnableFontHinting(true);viewer.view(viewOptions);}
Refer to the following article for more information on font hinting: Font hinting.
Disable character grouping
When you render PDF files in other formats, GroupDocs.Viewer groups individual characters into words to improve rendering performance. If your document contains hieroglyphic or special symbols, you may need to disable character grouping to generate a more precise layout. To do this, use the PdfOptions.setDisableCharsGrouping method, as shown below:
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.PngViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create a PNG image for each PDF page.
// {0} is replaced with the current page number in the image name.
PngViewOptionsviewOptions=newPngViewOptions("output_{0}.png");// Disable character grouping.
viewOptions.getPdfOptions().setDisableCharsGrouping(true);viewer.view(viewOptions);}
Render text comments
Use the ViewOptions.setRenderComments method for a target view to display textual annotations (such as text comments, sticky notes, text boxes and callouts) in the output HTML, PNG, or JPEG files.
The code example below renders a PDF file with text comments as an image.
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.PngViewOptions;// ...
try(Viewerviewer=newViewer("resume.pdf")){// Create a PNG image for each PDF page.
// {0} is replaced with the current page number in the image name.
PngViewOptionsviewOptions=newPngViewOptions("output_{0}.png");// Enable rendering comments.
viewOptions.setRenderComments(true);viewer.view(viewOptions);}
The following image illustrates the result:
Get information about a PDF file
Follow the steps below to obtain information about a PDF file (the number of pages, page size, and printing permissions):
Call the Viewer.getViewInfo method, pass the ViewInfoOptions instance to this method as a parameter, and cast the returned object to the PdfViewInfo type.
Use the PdfViewInfo class properties to retrieve document-specific information.
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.ViewInfoOptions;importcom.groupdocs.viewer.results.PdfViewInfo;// ...
ViewInfoOptionsviewInfoOptions=ViewInfoOptions.forHtmlView();PdfViewInfoviewInfo;try(Viewerviewer=newViewer("resume.pdf")){viewInfo=(PdfViewInfo)viewer.getViewInfo(viewInfoOptions);}// Display information about the PDF document.
System.out.println("File type: "+viewInfo.getFileType());System.out.println("The number of pages: "+viewInfo.getPages().size());System.out.println("Is printing allowed: "+viewInfo.isPrintingAllowed());
The following image shows a sample console output:
importcom.groupdocs.viewer.Viewer;importcom.groupdocs.viewer.options.ViewInfoOptions;importcom.groupdocs.viewer.results.Line;importcom.groupdocs.viewer.results.Page;importcom.groupdocs.viewer.results.PdfViewInfo;// ...
try(Viewerviewer=newViewer("sample.pdf")){ViewInfoOptionsviewInfoOptions=ViewInfoOptions.forHtmlView();viewInfoOptions.setExtractText(true);PdfViewInfoviewInfo=(PdfViewInfo)viewer.getViewInfo(viewInfoOptions);// Retrieve text from the PDF file.
System.out.println("Extracted document text:");for(Pagepage:viewInfo.getPages()){for(Lineline:page.getLines()){System.out.println(line.getValue());}}}
Skip font license verification when rendering XPS and OXPS files
If an XPS or OXPS file contains a font that cannot be embedded due to licensing restrictions, GroupDocs.Viewer throws an exception at runtime. If you have a license for this font, enable the PdfOptions#setDisableFontLicenseVerifications(true) option to skip font license verification.
Enclose images in SVG when rendering PDF and Page Layout files
By default, when rendering to the PDF and Page Layout file formats, all images are combined into a single PNG file, which serves as the background for the output HTML document.
The PdfOptions#setWrapImagesInSvg(…) option allows you to wrap each image in the output HTML document with an SVG tag to improve output quality.
This option is available when rendering PDF and Page Layout file formats to HTML with embedded or external resources.
The following image shows the rendering resume.pdf with the disabled (left) and enabled (right) WrapImagesInSvg option:
Disable copy protection
When rendering PDF files with protection against copying text and images to HTML, GroupDocs.Viewer adds an inert HTML attribute to the HTML <body> tag.
Use PdfOptions.setDisableCopyProtection() to turn off copy protection. When DisableCopyProtection is set to true, the inert HTML attribute won’t be added to the HTML <body> tag in any case.
Note
This option was added in GroupDocs.Viewer for Java 24.6. Previous versions of GroupDocs.Viewer for Java ignores PDF copy protection and does not add inert HTML attribute to HTML <body> tag.
This option is supported when rendering PDF files to HTML with embedded or external resources.
The following image shows the rendering of protected-resume.pdf with copy protection on the left and with with DisableCopyProtection option set to true on the right:
Repairing corrupted PDF documents
By default GroupDocs.Viewer cannot process the PDF documents with corrupted structure or content — it throws an exception when trying to open such files. However, starting from the version 24.10 GroupDocs.Viewer can try to repair the structural corruptions in PDF documents. By default this feature is disabled. To enable it, need to use the newly added TryRepair boolean property of the LoadOptions class by setting its value to true.
When enabled, this feature addresses the following issues in a PDF document:
Broken references within the document (incorrect object offsets in the Cross-reference list).
Missing critical elements like root object, page object, or page content.