GroupDocs.Merger for Python via .NET lets you retrieve metadata about any supported document without performing any merge or split operation. The get_document_info() method returns an IDocumentInfo object that exposes:
info.type — a FileType object describing the format (.file_format for the human-readable name, .extension for the file extension).
info.page_count — total number of pages.
info.size — file size in bytes.
info.pages — a list of IPageInfo objects; each exposes .number, .width, .height, and .visible.
Steps to get document information
Instantiate the Merger class with the path to the document.
Call merger.get_document_info() to obtain the IDocumentInfo object.
Read the desired properties from the returned object.
fromgroupdocs.mergerimportMergerdefread_document_info():# Load the document whose information should be retrievedwithMerger("./input.pdf")asmerger:# Obtain the document information objectinfo=merger.get_document_info()# Print file type detailsprint("Type:",info.type.file_format)# Print overall document statisticsprint("Pages:",info.page_count,"Size:",info.size,"bytes")# Iterate over individual page metadataforpageininfo.pages:print(f" page {page.number}: {page.width}x{page.height}")if__name__=="__main__":read_document_info()
input.pdf is a sample file used in this example. Click here to download it.
Load Document: The Merger context manager opens the document at the given path.
Retrieve Info: merger.get_document_info() reads the document metadata and returns an IDocumentInfo instance.
File Type: info.type.file_format returns a descriptive string such as "Portable Document Format File". Use info.type.extension to get the dot-prefixed extension (e.g. ".pdf").
Page Count and Size: info.page_count gives the total page count; info.size gives the file size in bytes.
Page Details: Iterating info.pages provides per-page IPageInfo objects. Each exposes .number (1-based), .width, and .height (in document units), and .visible (whether the page is visible).
Note
Use info.type, not info.type_. The underscore form is a stale alias that no longer exists in the current API.