This article explains the most common and fundamental principles of GroupDocs.Editor, how it works, what is its purpose, and how it should be used.
GroupDocs.Editor is a GUI-less class library, which means that it has only a programmatic interface (API). This fact means that in order to edit a document the user must use GroupDocs.Editor in conjunction with some 3rd-party editor application, through which GUI the end-user is able to edit document content. For GroupDocs.Editor it is not important which exactly editor software is used. But because GroupDocs.Editor is aimed at web-development, it has the only requirement — the 3rd-party editor should be compatible with HTML documents.
In order to edit a document with GroupDocs.Editor, the user must perform several sequential steps: load the document into GroupDocs.Editor (using optional load options), open the document for editing (with optional edit options), generate HTML markup with resources (using different options and settings), and pass this markup to the 3rd-party WYSIWYG HTML-editor. Then the end-user edits the document content, and when he finishes editing and submits the edited document, this modified markup should be transferred back to GroupDocs.Editor and converted to the output document of the desired format.
From the GroupDocs.Editor perspective, this pipeline can be conditionally divided into three main stages, that are described below.
Loading document into the GroupDocs.Editor
On the loading document stage the user should create an instance of the Editor class and pass an input document (through a file path or binary stream) along with document load options. Loading options are not required and GroupDocs.Editor can automatically detect the document format and select the most appropriate default options for the given format. But it is recommended to specify them explicitly. They are inevitable when trying to load password-protected documents.
fromgroupdocs.editorimportEditor# Passing a path to the constructor; default WordProcessingLoadOptions will be applied automaticallyeditor=Editor("document.docx")
After this stage the document is ready to be opened and edited.
Opening a document for editing
Because GroupDocs.Editor is a GUI-less library, a document cannot be edited directly within it. But in order to edit a document in a WYSIWYG HTML-editor, GroupDocs.Editor needs to generate an HTML-version of the document, because any WYSIWYG editor can work only with HTML/CSS markup. When an instance of the Editor class is created on the 1st stage, the user should open the document for editing by calling the edit() method of the Editor class. This method returns an instance of the EditableDocument class. This class can be described as a converted version of the input document, that is stored in an internal intermediate format, compatible with all formats that GroupDocs.Editor supports. With EditableDocument the user can obtain HTML markup of the input document with different options, stylesheets, images, fonts, save an HTML-document to disk, and other things. It is implied that the HTML-markup, emitted by EditableDocument, is then passed into the client-side WYSIWYG HTML-editor, where the end-user can actually edit the document.
Like with loading, the edit() method accepts optional edit options, that control how exactly the document will be opened for editing.
After this stage the document is ready to be passed to the WYSIWYG HTML-editor and its content can be edited by the end-user.
Saving a document
Saving a document is the final stage, which occurs when document content was edited in the WYSIWYG HTML-editor (or any other software, this makes no difference for GroupDocs.Editor) and should be saved back as a document of some format (like DOCX, PDF, or XLSX, for example). At this stage the user should create a new instance of the EditableDocument class with HTML-markup and resources of the edited version of the original document, that was obtained from the end-user. The EditableDocument class contains several class methods, that allow to create its instances from HTML documents, which may be presented in different forms. And when an EditableDocument instance is ready, it is possible to save it as an ordinary document using the save() method of the Editor class.
fromgroupdocs.editorimportEditableDocumentfromgroupdocs.editor.formatsimportWordProcessingFormatsfromgroupdocs.editor.optionsimportWordProcessingSaveOptionsafter_edit=EditableDocument.from_markup("<body>HTML content of the document...</body>")save_options=WordProcessingSaveOptions(WordProcessingFormats.RTF)editor.save(after_edit,"document.rtf",save_options)
Unlike the previous load options and edit options, save options are mandatory, because GroupDocs.Editor needs to know the exact document format for saving.
Detecting document type
Sometimes it is necessary to detect a document type and extract its metadata before sending it for editing. For such scenarios GroupDocs.Editor allows to detect the document type and extract its most necessary metainfo depending on the document type:
Is the document encoded or not;
Exact document format;
Document size;
Number of pages (tabs);
Text encoding, if the document is textual.
In order to detect the document type and gather its meta info, the user should load the desired document into the Editor class and then call the get_document_info() method.
Describing options
On every stage the user can adjust (tune) the processing by different options:
Load options for loading a document.
Edit options for opening a document for editing.
Save options for saving an edited document.
Some of these options may be optional in specific cases, some are mandatory. For example, it is possible to load a document into the Editor class without load options — in such a case GroupDocs.Editor will try to detect the document format automatically and apply the most appropriate default options for the detected document format.
Describing family formats
All document formats, which GroupDocs.Editor supports, are grouped into family formats. Each family format has a lot of common features, so there are no options for each format — only for the family format. The relation between formats, family formats, import/export formats and options is illustrated in the table below.
Family format
Supported formats
Load
Save
Load options
Edit options
Save options
Metadata
WordProcessing
DOC, DOCX, DOCM, DOT, DOTX, DOTM, RTF, WordprocessingML Flat XML, ODT, OTT, Word 2003 XML
Detailed information about every stage of document processing along with source code examples, options explanations and so on, can be found in the next articles: