Get Document Information

GroupDocs.Merger for Python via .NET lets you retrieve metadata about any supported document without performing any merge or split operation. The get_document_info() method returns an IDocumentInfo object that exposes:

  • info.type — a FileType object describing the format (.file_format for the human-readable name, .extension for the file extension).
  • info.page_count — total number of pages.
  • info.size — file size in bytes.
  • info.pages — a list of IPageInfo objects; each exposes .number, .width, .height, and .visible.

Steps to get document information

  1. Instantiate the Merger class with the path to the document.
  2. Call merger.get_document_info() to obtain the IDocumentInfo object.
  3. Read the desired properties from the returned object.
from groupdocs.merger import Merger

def read_document_info():
    # Load the document whose information should be retrieved
    with Merger("./input.pdf") as merger:
        # Obtain the document information object
        info = merger.get_document_info()
        # Print file type details
        print("Type:", info.type.file_format)
        # Print overall document statistics
        print("Pages:", info.page_count, "Size:", info.size, "bytes")
        # Iterate over individual page metadata
        for page in info.pages:
            print(f"  page {page.number}: {page.width}x{page.height}")

if __name__ == "__main__":
    read_document_info()

input.pdf is a sample file used in this example. Click here to download it.

Type: Portable Document Format File
Pages: 2 Size: 86913 bytes
  page 1: 595x841
  page 2: 595x841

Download full output

Explanation

  • Load Document: The Merger context manager opens the document at the given path.
  • Retrieve Info: merger.get_document_info() reads the document metadata and returns an IDocumentInfo instance.
  • File Type: info.type.file_format returns a descriptive string such as "Portable Document Format File". Use info.type.extension to get the dot-prefixed extension (e.g. ".pdf").
  • Page Count and Size: info.page_count gives the total page count; info.size gives the file size in bytes.
  • Page Details: Iterating info.pages provides per-page IPageInfo objects. Each exposes .number (1-based), .width, and .height (in document units), and .visible (whether the page is visible).
Note
Use info.type, not info.type_. The underscore form is a stale alias that no longer exists in the current API.

Refer to the GroupDocs.Merger API Reference for full details on IDocumentInfo and IPageInfo.

See also