Options for Importing HTML snippets into a Word document

If you are generating a Word document with OfficeWriter and you wish to import some HTML-formatted text, there are various  options, depending on the version of OfficeWriter you are using and the file format of your document (.doc or .docx/.docm)

The Template-based Approach
The Programmatic Approach

Template-based approach

With WordWriter’s template-based approach, using the WordTemplate API or  OfficeWriter’s SSRS integration, HTML-formatted text can be imported by using special merge field modifiers.  This functionality was introduced in version 8.0 with some limitations   (i.e. no SSRS support).     Further enhancements were added in version 9.0 and 9.1.    Using version 9.1 is highly recommended, as it provides the most comprehensive support for this functionality in both custom .NET applications and SSRS integration mode.

Advantages of template-based approach

  1. No complex coding required, everything is controlled through your template and data
  2. Allows HTML-formatted text to be used with WordWriter’s mail merge and grouping functionality

Limitations of template-based approach

  1. Only supports the OOXML file format.  The template must be a .docx or .docm file
  2. The feature relies on Word’s “altChunk” functionality.  Each HTML snippet is embedded as a separate little file, and Word renders the contents when the file is opened on the client machine.   Therefore, if you are viewing the output file in something other than Word (i.e. a mobile device), the HTML may not be rendered correctly. Note: If you resave the output in MS Word, the HTML will be merged into the main document and can be viewed in any Word-compatible application.

How to Use It

Version 8.0 through 9.0 – use the document(html) modifier

  • Your merge field should look something like this:
    <<DataSourceName.ColumnName(document(html)>>
  • Your HTML-formatted text must be passed in as a byte array, since the “document(format)” modifier expects a file rather than a string.
  • Starting in Version 9.1, it is possible for your data to include a filepath or URL rather than a byte array, by using  the new AllowURIs property. However, in 9.1 the “document(format)” modifier is no longer the best way to import HTML snippets.  In 9.1 and above, the “document(format)” modifier should be reserved for cases where you wish to embed an entire document (HTML, DOCX or RTF)
  • Prior to version 9.0, the data must contain an opening and closing <html> tag.  Beginning in version 9.0, WordWriter will add the opening and closing tags for you.
  • For more information about using the document modifier, see Inserting an Embedded Document

Version 9.1 and above – use the new HTMLSnippet modifier

  •  Instructions for using the new HTMLSnippet modifier are in the WordWriter documentation.
  •  Your merge field for the field containing the HTML-formatted text should look like this: <<DataSourceName.ColumnName(HTMLSnippet)>>
  • The data being bound to a merge field with an HTMLSnippet modifier must be a string.  The string does not need to include opening and closing <html> tags.


Programmatic approach

If you have OfficeWriter Enterprise Edition, you can import HTML snippets using the WordApplication API together with our open source project HTMLToWord.

Advantages of the Programmatic Approach

  1. The HTML snippets are converted into true Word formatting, unlike the altChunk approach used by the WordTemplate object.     Therefore the output file will be viewable in any Word-compatible application.
  2. HTMLToWord provides very fine-grained control of the HTML import. For example:
    • Using the HTMLInsertProperties settings, you can specify a default font to override fonts in the HTML, and you can specify whether to ignore unknown tags or insert their contents as text.
    • Using the delegate methods (like InsertDelegate) you can override the default behavior when processing certain tags, or write your own code to handle custom tags in your XHTML.
  3. HTMLToWord is open source project so you can modify the source code as desired.

Limitations of the programmatic approach

  1. WordApplication only supports the .doc file format.  HTMLToWord cannot be used with .docx or .docm files.
  2. This approach is code-intensive
  3. The string must be valid XHTML
  4. HTMLToWord is an open source project, separate from the OfficeWriter product itself.  OfficeWriter support contracts do not cover support for HTMLToWord.

How to Use it

  1. Make sure you have WordWriter Enterprise Edition version 4.0 or above
  2. Download the HTMLToWord project from sourceforge
  3. Follow instructions in Using HTMLToWord for compiling the dll and referencing it in your application
  4. For detailed information about how to use the API, see the section of the documentation under Inserting HTML with WordApplication

Related posts:

Leave a Reply

Your email address will not be published. Required fields are marked *