How to populate XLS files with more than 65536 rows

Problem

This post describes 3 ways to populate a binary Excel template when there are more than 65536 rows in the data source. In order to accommodate a variable number of data rows, these approaches use a combination of ExcelApplication and ExcelTemplate. You can download a Visual Studio 2008 project containing a demo at the end of the article.

This post also discusses an alternative in the case that you do not have access to ExcelApplication.

Solution

For a binary Excel file (.xls), the number of rows of each worksheet is limited to 65536 rows (source: Excel specifications and limits). By comparison, an OOXML file (.xlsx or .xlsm) has a maximum of about 1 million (exactly 1048576) rows per worksheet. If the data source contains more than 65536 rows (and fewer than 1 million rows) and you are using ExcelWriter version 7.0 or higher, you can use ExcelTemplate with an OOXML template so that all data fit into a single worksheet.

If you want to use a binary template and the data source contains more than 65536 rows, the template must contain as many worksheets as necessary to accommodate all data rows. In order to generate the required number of worksheets, you can start with a template containing a single worksheet, then make copies of the original worksheet. Alternatively, you can start with the maximum number of worksheets required, then delete worksheets as necessary.

In this post, we use a binary template containing a single worksheet. Then we use ExcelApplication to make copies of the original worksheet so that the processed template contains exactly the number of worksheets required to accommodate all data rows. Depending on the approach, we will also need to modify the data marker(s) on each copy. Finally, we use ExcelTemplate to populate the processed template.

Support for OOXML files in ExcelApplication was introduced in ExcelWriter 8. In ExcelWriter v7.6.1 and earlier, ExcelApplication only supports binary files. If you are using an older version of ExcelWriter, you will not be able to dynamically insert or delete worksheets with an OOXML file.

If you cannot use ExcelApplication, you cannot dynamically insert or remove worksheets. In this case, you can have multiple templates, each with a different number of worksheets. See Using ExcelTemplate only.

As of ExceWriter 7.5, the following table summarizes the possible solutions for different scenarios. Note that more options would be available for ExcelWriter EE once ExcelApplication supports the OOXML format in a future version.

ExcelWriter edition ExcelWriter version Template format Available solution(s)
EE All Binary 1234
EE 7.0-7.6.1 OOXML Populate all data on single worksheet, 4
SE All Binary 4
SE 7.0+ OOXML Populate all data on single worksheet, 4

Using ExcelApplication and ExcelTemplate

1. Populate using a DataReader

When you use a forward-only data source such as a DataReader, the template must contain identical data markers on each worksheet. After the number of rows on a given worksheet reaches a limit, additional rows automatically overflow onto the next worksheet. This limit is specified in the MaxRows property of the DataBindingProperties parameter of the ExcelTemplate.BindData() method.

DataBindingProperties props = excelTemplate.CreateDataBindingProperties();
props.MaxRows = 65536;
excelTemplate.BindData(dataReader, "", props);

For illustration purpose, the above code snippet sets MaxRows to the maximum number of rows allowable on a worksheet. However, if the worksheet contains header or blank rows in addition to data rows, adjust this value accordingly.

If the template has more worksheets containing data markers than necessary to accommodate all rows, ExcelTemplate would attempt to populate the extra worksheets after DataReader reaches its end and consequently throw the following exception: SoftArtisans.OfficeWriter.ExcelWriter.SAException: Exhausted data marker at XX value: %%=YY.ZZ.

Note for ExcelWriter 3.9.x

In ExcelWriter 3.9.x, ExcelTemplate does not support the BindData method. Replace the data-binding code above with:

excelTemplate.SetDataSource(dataReader, "", 65536);

2. Populate using the (continue) data marker modifier

This approach is applicable when you use a scrollable data source such as a DataTable. The data markers on the overflow worksheets must have a (continue) modifier; e.g., %%=datasource.field1(continue). Except for the (continue) modifier, the data markers are otherwise the identical. In order to create the required number of worksheets, we use ExcelApplication to make copies of the first worksheet, then append (continue) to all data markers on each copy. See Data Marker Modifiers.

The (continue) modifier indicates that ExcelTemplate should continue to read the data source at the point where the previous worksheet leaves off. Again, set the DataBindingProperties.MaxRows property to limit the number of rows per worksheet.

DataBindingProperties props = excelTemplate.CreateDataBindingProperties();
props.MaxRows = 65536;
excelTemplate.BindData(dataTable, "", props);

For illustration purpose, the above code snippet sets MaxRows to the maximum number of rows allowable on a worksheet. However, if the worksheet contains header or blank rows in addition to data rows, adjust this value accordingly.

If the template has more worksheets containing data markers than necessary to accommodate all rows, the DataTable would rewind after reaching the last worksheet and the extra worksheets would contain repeated rows.

Note for ExcelWriter 3.9.x

In ExcelWriter 3.9.x, ExcelTemplate does not support the BindData method. Replace the data-binding code above with:

excelTemplate.SetDataSource(dataTable, "", 65536);

3. Populate individual worksheets

You must bind data to each individual worksheet using a different data source. Use the DataBindingProperties.WorksheetName property to specify the name of the worksheet you are targeting. Each worksheet’s data source must have only the rows you want on that worksheet. Because each worksheet has a different data source, it’s possible to have a different number of rows on each worksheet.

DataBindingProperties props = excelTemplate.CreateDataBindingProperties();
props.WorksheetName = workbook.Worksheets[0].Name;
excelTemplate.BindData(dataTable1, "datasource1", props);

The data markers on each worksheet must have a data source identifier matching its data source. If you use ordinal data source identifier, the first worksheet to be data-bound should have data markers like %%=#1.field1, the second data-bound worksheet, %%=#2.field2, and so on. If you use named data source identifier, the data marker should contain the name of the data source specified in ExcelTemplate.BindData. For example, if the data-binding call is BindData(dataTable, “datasourceN”, dataBindingProperties), the corresponding data markers should be %%=datasourceN.field1%%=datasourceN.field2, and so on.

If you start with a single data source, you can partition it into smaller data sources containing non-overlapping blocks of rows. In the sample, we partition the original DataTable into a set of smaller DataTables, each of which contains 65536 rows (or whichever value the row limit is set to). The last DataTable gets the remaining rows and can have fewer than 65536 rows. Each smaller DataTable is used as the data source for a different worksheet.

If you bind the same data source to more than one worksheet, even under different names, ExcelTemplate will return the following error: This binding source named xxx was already added under the name yyy.

Note for ExcelWriter 3.9.x

This approach is not compatible with ExcelWriter 3.9.x because it isn’t possible to bind data to a specific worksheet.

Using ExcelTemplate only

The following discussion applies to the following scenarios:

  • You do not have access to ExcelApplication
  • You are unable to use ExcelApplication with an OOXML template

4. Using multiple templates

It may occur to you to use a template with as many worksheets as necessary and append the (optional) modifier to all data markers on each worksheet. The assumption is that ExcelTemplate would ignore such data markers if there are no data bound to a worksheet. However, this assumption is not correct. The (optional) modifier has to do with the presence or absence of a column, not a row. ExcelTemplate ignores a data marker marked as optional if there is no corresponding column in the data source. But if a data marker is mapped to a column in the data source, ExcelTemplate would attempt to populate all instances of such a data marker. If ExcelTemplateencounters a data marker and there is no data row left, its behavior depends on the type of data source. For a forward-only data source such as DataReader, ExcelTemplate would throw an “exhausted data marker” error. For a scrollable data source such as DataTable, ExcelTemplate would rewind the data source to the beginning, resulting in duplicate data.

Consequently, it’s imperative that the template contains exactly the number of worksheets required to accommodate all data rows. If you cannot use ExcelApplication, you would not be able to dynamically insert or remove worksheets at run time. An alternative is to create multiple templates, each one containing a different number of worksheets. You can determine which template to use depending on the number of rows in the data source.

Conclusions

Using DataReader for data retrieval is the simplest approach and has performance benefits. Using the (continue) data marker modifier involves performing additional processing on a worksheet after copying it and therefore is less efficient. Binding data to individual worksheets, because the data source must be processed, can have a lower performance than the other approaches.

About the demo

The attachment contains a Visual Studio 2008 project illustrating the 3 approaches described in this article. Each of the approaches is contained in a separate method. You must install ExcelWriter EE before running the demo.

Attachments

Related posts: