Getting Document Information

GroupDocs.Conversion for Java provides a consistent way to extract metadata from source documents, regardless of how they were loaded—whether from a file, a stream, or cloud storage.

Getting Basic Document Information

To retrieve document information, use the Converter.getDocumentInfo() method. It returns an IDocumentInfo object, which contains general metadata applicable to all supported document formats, including:

  • File format
  • Creation date
  • File size
  • Page count

Below is an example demonstrating how to get basic information about a document:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.contracts.documentinfo.IDocumentInfo;

public class GetSourceDocumentInfo {
    public static void convert() {
        try(Converter converter = new Converter("lorem-ipsum.txt")) {

            IDocumentInfo info = converter.getDocumentInfo();
            // Print basic document info
            System.out.println("Format: " + info.getFormat());
            System.out.println("Pages count: " + info.getPagesCount());
            System.out.println("Creation date: " + info.getCreationDate());
            System.out.println("Size, bytes: " + info.getSize());
        }
    }

    public static void main(String[] args){
        convert();
    }
}
Format: txt
Pages count: 3
Creation date: Sat Jan 01 02:00:00 EET 1
Size, bytes: 7794

lorem-ipsum.txt is sample file used in this example. Click here to download it.

Format-Specific Metadata

Depending on the document type, additional metadata can be extracted. Below are examples of how to retrieve format-specific document information:

PDF Documents

For PDFs, additional information such as title, author, version, and table of contents can be retrieved:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.contracts.documentinfo.PdfDocumentInfo;
import com.groupdocs.conversion.contracts.documentinfo.TableOfContentsItem;

public class GetPdfDocumentInfo {
    public static void convert() {
        try (Converter converter = new Converter("sample-with-toc.pdf")) {
            PdfDocumentInfo pdfInfo = (PdfDocumentInfo) converter.getDocumentInfo();
            // Print PDF document info
            System.out.println("Author: " + pdfInfo.getAuthor());
            System.out.println("Creation Date: " + pdfInfo.getCreationDate());
            System.out.println("Title: " + pdfInfo.getTitle());
            System.out.println("Version: " + pdfInfo.getVersion());
            System.out.println("Pages Count: " + pdfInfo.getPagesCount());
            System.out.println("Width: " + pdfInfo.getWidth());
            System.out.println("Height: " + pdfInfo.getHeight());
            System.out.println("Is Landscaped: " + pdfInfo.isLandscape());
            System.out.println("Is Password-Protected: " + pdfInfo.isPasswordProtected());
            System.out.println("Table of contents:");

            for (TableOfContentsItem item : pdfInfo.getTableOfContents()){
                System.out.printf(" Page %s: Title:  %s\n", item.getPage(), item.getTitle());
            }
        }
    }

    public static void main(String[] args){
        convert();
    }
}
Author: null
Creation Date: Wed Aug 12 16:41:29 EEST 2020
Title: null
Version: 1.7
Pages Count: 5
Width: 612.0
Height: 792.0
Is Landscaped: false
Is Password-Protected: false
Table of contents:
 Page 1: Title:  Page 1 heading!
 Page 2: Title:  Page 2 heading!
 Page 3: Title:  Page 3 heading!
 Page 4: Title:  Page 4 heading!

sample-with-toc.pdf is sample file used in this example. Click here to download it.

Word Documents (DOC, DOCX, etc.)

For word documents, you can retrieve the title, author, word count, line count, password protection status:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.contracts.documentinfo.TableOfContentsItem;
import com.groupdocs.conversion.contracts.documentinfo.WordProcessingDocumentInfo;

public class GetWordDocumentInfo {
    public static void convert() {
        try (Converter converter = new Converter("business-plan.doc")) {
            WordProcessingDocumentInfo doc_info = (WordProcessingDocumentInfo) converter.getDocumentInfo();
            // Print DOC document info
            System.out.println("Author: "+ doc_info.getAuthor());
            System.out.println("Creation Date: "+ doc_info.getCreationDate());
            System.out.println("Format: "+ doc_info.getFormat());
            System.out.println("Is Password Protected: "+ doc_info.isPasswordProtected());
            System.out.println("Lines: "+ doc_info.getLines());
            System.out.println("Pages Count: "+ doc_info.getPagesCount());
            System.out.println("Size, bytes: "+ doc_info.getSize());
            System.out.println("Title: "+ doc_info.getTitle());
            System.out.println("Words: "+ doc_info.getWords());
            System.out.println("Table of contents:");

            for(TableOfContentsItem toc_item : doc_info.getTableOfContents()){
                System.out.printf(" Page %s: Title: %s\n", toc_item.getPage(),toc_item.getTitle());
            }
        }
    }

    public static void main(String[] args){
        convert();
    }
}
Author: GroupDocs
Creation Date: Sun Nov 03 12:05:00 EET 2024
Format: doc
Is Password Protected: false
Lines: 180
Pages Count: 19
Size, bytes: 414208
Title: 
Words: 3789
Table of contents:
 Page 3: Title: INTRODUCTION
 Page 5: Title: 1. EXECUTIVE SUMMARY
 Page 6: Title: 2. COMPANY OVERVIEW
 Page 7: Title: 3. BUSINESS DESCRIPTION
 Page 8: Title: 4. MARKET ANALYSIS
 Page 10: Title: 5. OPERATING PLAN
 Page 11: Title: 6. MARKETING AND SALES PLAN
 Page 12: Title: 7. FINANCIAL PLAN
 Page 16: Title: APPENDIX
 Page 17: Title: Instructions for Getting Started with Estimated Start-Up Costs
 Page 19: Title: Instructions for Getting Started on Profit & Loss Projections

business-plan.doc is sample file used in this example. Click here to download it.

Spreadsheets (Excel, CSV, etc.)

For spreadsheet documents, you can retrieve the author, number of worksheets, and password protection status:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.contracts.documentinfo.SpreadsheetDocumentInfo;

public class GetSpDocumentInfo {
    public static void convert() {
        try (Converter converter = new Converter("cost-analysis.xlsx")) {
            SpreadsheetDocumentInfo doc_info = (SpreadsheetDocumentInfo) converter.getDocumentInfo();
            // Print XLSX document info
            System.out.println("Author: "+ doc_info.getAuthor());
            System.out.println("Creation Date: "+ doc_info.getCreationDate());
            System.out.println("Format: "+ doc_info.getFormat());
            System.out.println("Is Password Protected: "+ doc_info.isPasswordProtected());
            System.out.println("Pages Count: "+ doc_info.getPagesCount());
            System.out.println("Size, bytes: "+ doc_info.getSize());
            System.out.println("Title: "+ doc_info.getTitle());
            System.out.println("Worksheets Count: "+ doc_info.getWorksheetsCount());
        }
    }

    public static void main(String[] args){
        convert();
    }
}
Author: GroupDocs
Creation Date: Thu Feb 23 18:52:46 EET 2023
Format: xlsx
Is Password Protected: false
Pages Count: 0
Size, bytes: 78940
Title: Cost Analysis
Worksheets Count: 1

cost-analysis.xlsx is sample file used in this example. Click here to download it.

Presentation Documents (PPT, PPTX)

For presentation files, you can extract title, author, and encryption status:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.contracts.documentinfo.PresentationDocumentInfo;

public class GetPresentationDocumentInfo {
    public static void convert() {
        try (Converter converter = new Converter("presentation-template.pptx")) {
            PresentationDocumentInfo doc_info = (PresentationDocumentInfo) converter.getDocumentInfo();
            // Print PPTX document info
            System.out.println("Author: "+ doc_info.getAuthor());
            System.out.println("Creation Date: "+ doc_info.getCreationDate());
            System.out.println("Format: "+ doc_info.getFormat());
            System.out.println("Is Password Protected: "+ doc_info.isPasswordProtected());
            System.out.println("Pages Count: "+ doc_info.getPagesCount());
            System.out.println("Size, bytes: "+ doc_info.getSize());
            System.out.println("Title: "+ doc_info.getTitle());
        }
    }

    public static void main(String[] args){
        convert();
    }
}
Author: GroupDocs
Creation Date: Sat Mar 04 14:58:10 EET 2023
Format: pptx
Is Password Protected: false
Pages Count: 3
Size, bytes: 35210
Title: TEST

presentation-template.pptx is sample file used in this example. Click here to download it.

Image Files (JPG, PNG, TIFF, etc.)

For images, details such as dimensions and bits per pixel can be extracted:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.contracts.documentinfo.ImageDocumentInfo;

public class GetImageDocumentInfo {
    public static void convert() {
        try (Converter converter = new Converter("infographic-elements.tiff")) {
            ImageDocumentInfo doc_info = (ImageDocumentInfo) converter.getDocumentInfo();
            System.out.println("Bits per Pixel: "+ doc_info.getBitsPerPixel());
            System.out.println("Creation Date: "+ doc_info.getCreationDate());
            System.out.println("Format: "+ doc_info.getFormat());
            System.out.println("Height: "+ doc_info.getHeight());
            System.out.println("Width: "+ doc_info.getWidth());
            System.out.println("Size, bytes: "+ doc_info.getSize());
        }
    }

    public static void main(String[] args){
        convert();
    }
}
Bits per Pixel: 32
Creation Date: Sun Feb 09 13:43:01 EET 2025
Format: tiff
Height: 2000
Width: 1500
Size, bytes: 1734560

infographic-elements.tiff is sample file used in this example. Click here to download it.

CAD Drawings (DWG, DXF, etc.)

For CAD drawings, you can extract layout and layer details:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.contracts.documentinfo.CadDocumentInfo;

public class GetCadDocumentInfo {
    public static void convert() {
        try (Converter converter = new Converter("blocks-and-tables.dwg")) {
            CadDocumentInfo doc_info = (CadDocumentInfo) converter.getDocumentInfo();
            // Print DWG document info
            System.out.println("Creation Date: "+ doc_info.getCreationDate());
            System.out.println("Format: "+ doc_info.getFormat());
            System.out.println("Height: "+ doc_info.getHeight());
            System.out.println("Width: "+ doc_info.getWidth());
            System.out.println("Size, bytes: "+ doc_info.getSize());

            System.out.println("Layouts:");
            for(String layout: doc_info.getLayouts()){
                System.out.println(" Layout: "+ layout);
            }

            System.out.println("Layers:");
            for(String layer : doc_info.getLayers()){
                System.out.println(" Layer: "+ layer);
            }
        }
    }

    public static void main(String[] args){
        convert();
    }
}
Creation Date: Sun Feb 09 13:52:09 EET 2025
Format: dwg
Height: 16
Width: 26
Size, bytes: 258848
Layouts:
 Layout: Model
 Layout: ISO A1
Layers:
 Layer: Text
 Layer: Viewports
 Layer: Walls
 Layer: Stairs
 Layer: Deck
 Layer: Cabinetry
 Layer: Schedules
 Layer: Appliances
 Layer: Doors
 Layer: Power
 Layer: Lighting
 Layer: BDRTXT
 Layer: BRDTITLE
 Layer: 0
 Layer: DB - Windows
 Layer: Defpoints
 Layer: Dimensions

blocks-and-tables.dwg is sample file used in this example. Click here to download it.

Email Messages (MSG, EML)

For emails, metadata such as encryption status, signature status, and attachments can be retrieved:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.contracts.documentinfo.EmailDocumentInfo;

public class GetEmailDocumentInfo {
    public static void convert() {
        try (Converter converter = new Converter("invitation.eml")) {
            EmailDocumentInfo doc_info = (EmailDocumentInfo) converter.getDocumentInfo();
            // Print EML document info
            System.out.println("Creation Date: "+ doc_info.getCreationDate());
            System.out.println("Format: "+ doc_info.getFormat());
            System.out.println("Is Encrypted: "+ doc_info.isEncrypted());
            System.out.println("Is Body in HTML: "+ doc_info.isHtml());
            System.out.println("Is Signed: "+ doc_info.isSigned());
            System.out.println("Size: "+ doc_info.getSize());
            System.out.println("Attachments Count: "+ doc_info.getAttachmentsCount());

            for(String attachment_name : doc_info.getAttachmentsNames()){
                System.out.println("Attachment Name: "+ attachment_name);
            }
        }
    }

    public static void main(String[] args){
        convert();
    }
}
Creation Date: Tue Apr 25 14:28:29 EEST 2017
Format: eml
Is Encrypted: false
Is Body in HTML: true
Is Signed: false
Size: 91948
Attachments Count: 0

invitation.eml is sample file used in this example. Click here to download it.

Conclusion

GroupDocs.Conversion provides a powerful way to extract essential metadata from various document formats. This allows users to analyze, filter, and manage document properties efficiently. For more advanced use cases, refer to the official GroupDocs.Conversion API documentation.