This demonstration shows how to open input document, convert it to intermediate EditableDocument, and get HTML markup in different forms depending on client requirements.
Preparations
When input document is loaded into Editor class and opened for edit by transforming to the intermediate EditableDocument class, it is possible to generate and get HTML markup in different forms. Code below shows all variations of such procedure.
First of all user needs to load document into Editor class and open it for editing, what is demonstrated in the code below.
stringinputFilePath="C:\\input_path\\document.docx";//path to some documentWordProcessingLoadOptionsloadOptions=newWordProcessingLoadOptions();Editoreditor=newEditor(inputFilePath,delegate{returnloadOptions;});//passing path and load options (via delegate) to the constructorEditableDocumentdocument=editor.Edit(newWordProcessingEditOptions());//opening document for editing with format-specific edit options
Piece of code above has prepared a ready-to-use instance of EditableDocument class, that contains the original document in its own intermediate format and is able to generate HTML markup in different forms.
Getting whole HTML content
The most default and standard method for generating HTML markup is parameterless GetContent method:
stringhtmlContent=document.GetContent();
If document has external resources (stylesheets, fonts, images), they are referenced via different HTML elements: stylesheets are specified through LINK elements, while images — through IMG. When using the GetContent() method, such external resources will be referenced by external links. For example:
Quite often on the web-server, where such HTML will be edited, resources are processed by specific HTTP handler. In such cases it is required to adjust paths to such endpoints. More advanced overload of the GetContent() method can help:
In the example above specified prefixes will be added to every external link in the document’s markup. For example, with the code above link will be the next:
Lot of HTML WYSIWYG editors are not able to process the whole HTML document, with HEAD section and so on. They are able only to process inner content of HTML->BODY element. In order to obtain such part of HTML markup, EditableDocument class contains the GetBodyContent() method, which, as previous one, has two overloads, that are provided below:
First parameterless overload, like previous one, leaves links to the external images intact. Second, that obtains external resource prefix, adds this prefix to every url in the ‘src’ attribute of every IMG tag, that is found inside HTML->BODY markup.
Getting base64-encoded content
Sometimes it is necessary to obtain all content of all document with all used resources into one single string.GroupDocs.Editor allows to do this:
In such string all stylesheets will be placed into the STYLE elements in the HTML->HEAD section, all images in IMG elements will be serialized with base64 encoding and placed directly in the ‘src’ attributes. All fonts and images, which are used in stylesheets, will also be serialized and stored in appropriate locations in the corresponding stylesheet. Such string will be fully autonomous and self-sufficient.
Conclusion
This guide has explained different ways of obtaining HTML markup from a document in different forms.
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.