This topic covers how to convert files embedded within document containers, such as compressed or packaged files, into individual output files. The following diagram illustrates the process of extracting and converting files within a document container:
flowchart LR
%% Nodes
A["Document Container"]
B["Extraction"]
C["Conversion"]
D["Converted File 1"]
E["Converted File 2"]
F["Converted File N"]
%% Edge connections between nodes
A --> B --> C --> D
C --> E
C --> F
The Extraction and Conversion processes are performed within a single call to the convert_multiple(folder_path, convert_options) method of the Converter class.
Document Container File Types
The following file types are considered document containers:
Email and Outlook
EML - Email Message File.
EMLX - Apple Mail Email File.
MSG - Microsoft Outlook Message File.
OST - Outlook Offline Data File.
PST - Outlook Personal Information Store File.
PDF
PDF - PDF files that contain embedded resources.
Word Processing
DOC - The older Microsoft Word binary format.
DOCX - The modern Word format.
DOT and DOTX - Word template files.
RTF - Rich Text Format.
Compression
7Z - 7-Zip Compressed File.
BZ2 - Bzip2 Compressed File.
CAB - Windows Cabinet File.
CPIO - CPIO Compressed File.
GZ - Gnu Zipped Archive.
GZIP - Gzip Compressed File.
LZ - Lzip Compressed File.
LZMA - LZMA Compressed File.
RAR - RAR Compressed Archive.
TAR - Consolidated Unix File Archive.
XZ - Xz Compressed File.
Z - Unix Compressed File.
ZIP - ZIP Compressed File.
Example: Convert Files Within Document Container
The following example demonstrates how to convert each compressed file in ZIP archive to PDF:
The file name template for the output files is {file name}_{source file extension}.{output file extension}. In this example, compressed file business-plan.docx is being saved converted and saved with file name business-plan_docx.pdf.
fromgroupdocs.conversionimportConverterfromgroupdocs.conversion.options.convertimportPdfConvertOptionsdefconvert_files_within_document_container():# Instantiate Converter with the input document withConverter("./compressed.zip")asconverter:# Instantiate convert options pdf_convert_options=PdfConvertOptions()# Extract, convert and save output files in PDF formatconverter.convert_multiple("./converted-files",pdf_convert_options)if__name__=="__main__":convert_files_within_document_container()
compressed.zip is the sample file used in this example. Click here to download it.
converted-files is the output folder path for the converted files. Click here to download it.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.