Convert Files Within Document Containers

This topic covers how to convert files embedded within document containers, such as compressed or packaged files, into individual output files. The following diagram illustrates the process of extracting and converting files within a document container:

flowchart LR
    %% Nodes
    A["Document Container"]
    B["Extraction"]
    C["Conversion"]
    D["Converted File 1"]
    E["Converted File 2"]
    F["Converted File N"]

    %% Edge connections between nodes
    A --> B --> C --> D
    C --> E
    C --> F

The Extraction and Conversion processes are performed within a single call to the convert(file_path, convert_options) method of the Converter class. GroupDocs.Conversion opens the container, converts the files it holds, and writes a consolidated output document.

Document Container File Types

The following file types are considered document containers:

Email and Outlook

  • EML - Email Message File.
  • EMLX - Apple Mail Email File.
  • MSG - Microsoft Outlook Message File.
  • OST - Outlook Offline Data File.
  • PST - Outlook Personal Information Store File.

PDF

  • PDF - PDF files that contain embedded resources.

Word Processing

  • DOC - The older Microsoft Word binary format.
  • DOCX - The modern Word format.
  • DOT and DOTX - Word template files.
  • RTF - Rich Text Format.

Compression

  • 7Z - 7-Zip Compressed File.
  • BZ2 - Bzip2 Compressed File.
  • CAB - Windows Cabinet File.
  • CPIO - CPIO Compressed File.
  • GZ - Gnu Zipped Archive.
  • GZIP - Gzip Compressed File.
  • LZ - Lzip Compressed File.
  • LZMA - LZMA Compressed File.
  • RAR - RAR Compressed Archive.
  • TAR - Consolidated Unix File Archive.
  • XZ - Xz Compressed File.
  • Z - Unix Compressed File.
  • ZIP - ZIP Compressed File.

Example: Convert Files Within Document Container

The following example demonstrates how to convert the contents of a ZIP archive to a single consolidated PDF:

from groupdocs.conversion import Converter
from groupdocs.conversion.options.convert import PdfConvertOptions

def convert_files_within_document_container():
    # Instantiate Converter with the input document container
    with Converter("./compressed.zip") as converter:
        # Instantiate convert options
        pdf_convert_options = PdfConvertOptions()

        # Extract the archive, convert the contained files, and save a consolidated PDF
        converter.convert("./converted.pdf", pdf_convert_options)

if __name__ == "__main__":
    convert_files_within_document_container()

compressed.zip is the sample file used in this example. Click here to download it.

Binary file (PDF, 283 KB)

Download full output

Close
Loading

Analyzing your prompt, please hold on...

An error occurred while retrieving the results. Please refresh the page and try again.