Tag Archives: excelwriter

How to select a group of rows that have specific values in a cell or column

Problem

Often it necessary to filter the rows on one worksheet based on a certain criteria and copy over the affected rows to a separate worksheet. It is fairly easy to achieve this result with a little coding.

ExcelApplication object allows you to fully parse and design the Excel document from your code, therefore giving you the ability to provide all the conditional logic to structure your final report.

Solution

Given the following data in one worksheet:

Company Name
PRICES
Date Open High Low Close Volume Adj Close*
5-Jul-06 20.47 20.47 20.13 20.28 336,400 20.28
3-Jul-06 20.97 21.07 20.85 21.05 115,000 21.05
30-Jun-06 21.27 21.27 21.02 21.13 215,700 21.13
29-Jun-06 20.4 21.02 20.38 21.02 314,000 21.02
28-Jun-06 20.37 20.37 20.05 20.14 403,900 20.14
27-Jun-06 20.24 20.24 19.8 19.86 257,700 19.86
26-Jun-06 20.2 20.27 20.05 20.22 703,700 20.22
23-Jun-06 20.45 20.45 20.17 20.2 302,500 20.2
22-Jun-06 20.3 20.5 20.07 20.26 291,700 20.26
21-Jun-06 20.08 20.45 20.04 20.31 160,000 20.31
20-Jun-06 20.12 20.2 20 20.03 278,500 20.03
19-Jun-06 20.33 20.37 20.06 20.11 301,100 20.11
16-Jun-06 20.03 20.31 20.03 20.16 480,800 20.16
15-Jun-06 19.82 20.5 19.82 20.42 301,000 20.42
14-Jun-06 19.4 19.66 19.38 19.61 211,000 19.61
13-Jun-06 19.25 19.53 18.96 19.09 523,700 19.09
12-Jun-06 20.45 20.56 20.11 20.11 239,900 20.11
9-Jun-06 20.16 20.53 20.11 20.22 297,600 20.22
8-Jun-06 19.3 20.21 19.25 20.15 1,641,200 20.15
7-Jun-06 21.35 21.41 21.21 21.29 684,700 21.29
6-Jun-06 22.15 22.15 21.51 21.67 299,000 21.67
5-Jun-06 22.51 22.51 21.89 21.91 293,300 21.91
2-Jun-06 22.63 22.78 22.5 22.7 754,100 22.7
1-Jun-06 21.9 22.24 21.86 22.24 281,200 22.24

**Criteria for selected rows (blue highlight) to copy is volume < 300,000

Result sheet should contain:

3-Jul-06 20.97 21.07 20.85 21.05 115,000 21.05
30-Jun-06 21.27 21.27 21.02 21.13 215,700 21.13
27-Jun-06 20.24 20.24 19.8 19.86 257,700 19.86
22-Jun-06 20.3 20.5 20.07 20.26 291,700 20.26
21-Jun-06 20.08 20.45 20.04 20.31 160,000 20.31
20-Jun-06 20.12 20.2 20 20.03 278,500 20.03
14-Jun-06 19.4 19.66 19.38 19.61 211,000 19.61
12-Jun-06 20.45 20.56 20.11 20.11 239,900 20.11
9-Jun-06 20.16 20.53 20.11 20.22 297,600 20.22
6-Jun-06 22.15 22.15 21.51 21.67 299,000 21.67
5-Jun-06 22.51 22.51 21.89 21.91 293,300 21.91
1-Jun-06 21.9 22.24 21.86 22.24 281,200 22.24

Algorithm

  1. Loop through the cells and evaluate your test expression. In this case the cells to test are in the Volume column.
  2. If the expression evaluates to false, move on to the next row value in our test column.
  3. If the expression evaluates to true, call a helper function CopyWorksheetRow (sample below) to copy the values of current row from original worksheet to the destination worksheet.

Helper Method


/// /// Copies values of up to 20 columns for a given row from the original sheet
/// to the destination sheet
/// /// Worksheet object from which to copy from
/// Worksheet object to which to copy to
/// Row number to copy from the origin to the
/// destination
public static void CopyWorksheetRow(Worksheet origin, Worksheet destination, int row_from)
{
// create an area of 1 row / 20 columns (Note our sample only has 7 columns)
Area selected_row = origin.CreateArea(row_from, 0, 1, 20);
// out of that area extract only the cells
// with values (Note this will select that area of only 7 columns)
Area populated_cells = selected_row.PopulatedCells;
for(int x = 0; x < populated_cells.ColumnCount; x++)
{
destination.Cells[row_to_start_at,x].Value = populated_cells[0, x].Value;
destination.Cells[row_to_start_at,x].Style = populated_cells[0, x].Style;
}
row_to_start_at++;
}

Code to test the condition


ExcelApplication xap = new ExcelApplication();
Workbook wb = xap.Open(Page.MapPath("datadoc.xls"));


Worksheet data_sheet = wb.Worksheets[0];
Worksheet filtered_datasheet = wb.Worksheets.CreateWorksheet("mysheet", 1);


// set up a loop to look throught the cells in column 6 (indexed as 5) from
// rows 6 to 71
for(int x = 5; x < 71; x++)
{
int cellval = int.Parse(data_sheet.Cells[x, 5].Value.ToString());


// if the cells value matches our criteria
// call a function to copy this row onto a separate sheet.
// in my case I want to copy the rows where
// the volume of stock traded is under 300,000
if(cellval < 300000)
{
CopyWorksheetRow(data_sheet, filtered_datasheet, data_sheet.Cells[x, 5].RowNumber);
}
}
//Save to disk on the server
xap.Save(wb, Page.MapPath("C:\\MyReports\\output.xls"));

Saving ExcelWriter and WordWriter files

Problem

What options are available for saving an ExcelWriter or WordWriter generated file?

Solution

ExcelTemplate.SaveExcelApplication.SaveWordTemplate.Save, and WordApplication.Save all have the same four output options:

  • Save to disk – saves the generated file on the server
  • Save to an IO stream – streams the file to the specified IO stream or class derived from System.IO.Stream
  • Stream to the client as an attachment
  • Stream to the client as an inline file – If the user is using Internet Explorer, the file will be opened in the browser using IE’s inline browsing option. Otherwise, the file will be streamed to client.

How to use CopySheet with PivotTables

Problem

When a worksheet with a PivotTable is copied in Excel, the PivotTable data source is not updated. If the worksheet is copied within a workbook, then the data sources of the original and copied PivotTables point to the same area. If the worksheet is copied to a new workbook, the data source of the copied PivotTable will point to the area in the original workbook. For example: “[OldWorkbookName.xlsx]OriginalWorksheet!A1:C10″.

ExcelWriter’s CopySheet follows this behavior. If a worksheet that contains a PivotTable is copied with CopySheet, the data source will not be updated.

Solution

Starting in 8.4, ExcelApplication has the ability to change the data source of a PivotTable. Use PivotTable.ChangeDataSource to update the data source of a copied PivotTable.

Example:

PivotTable pt = ws.PivotTables["PivotTable1"];
Area data_area = ws_new_data.CreateArea("A1:G10");
pt.ChangeDataSource(data_area);

Note

PivotTables are sensitive to the column header names in the data source for a PivotTable. If a column header name is changed, the source field associated with that column will get updated, and any PivotTableFields that were created using that source field will be removed from the PivotTable.

This happens in Excel as well.

When changing the data source of the PivotTable, if the new data source is missing any of the columns from the original data source, then fields created from the missing columns will be removed from the PivotTable.

For example, if PivotTable1 has 3 columns: Col1, Col2, and Col3. Changing the data source to an area that has 3 columns: Col1, Col2, and ColC, will cause any fields that were created from Col3 to be removed from the PivotTable.

How to unmerge a group of cells with ExcelWriter

Problem

Starting in ExcelWriter 6.8.1, the ability to unmerge cells in a worksheet as introduced with the Cell.Unmerge method. The Cell.IsMerged has also been added to determine whether or not a given cell is merged.

This post covers some examples of unmerging an individual cell or an area of cells.

Solution

Unmerging a single cell


Cell cell = wb.Worksheets[0].Cells[0,0];
if(cell.IsMerged) cell.Unmerge();

Unmerging an Area of cells

Although the Merge method is on the Area object, the Unmerge method is only on the Cell object. This is because each cell knows whether it is part of a merged cell, but a given area can be defined to include both merged and unmerged cells.

There are two ways to unmerge an area of cells:

  1. If you know exactly where the merged cells are, you can unmerge any one cell and the entire merged area will be unmerged.
  2. Or, loop through all the cells in an area to find merged cells and unmerge them.

C# example:

Area a = wb.Worksheets[0].CreateArea("A1:C4");
for(int i=0; i<a.RowCount-1; i++)
{
for(int j=0;j<a.ColumnCount-1;j++)
Cell cell = wb.Worksheets[0].Cells[i,j];
if(cell.IsMerged) { cell.Unmerge(); }
}
}

For more information about merging cells, refer to the Area.MergeCells() documentation.

How to hide the secondary axis of a chart

Problem

When programatically adding a secondary x axis to a chart using ExcelApplication, a secondary y axis will automatically appear as well (and vice versa).

Solution

To hide a secondary axis in ASP.NET use the Visible property of the Axis element. Both the SecondaryCatagoryAxis property and the SecondaryValueAxis property, return an object that extends the Axis element.

The following code hides the secondary axes in ASP.NET:

Chart1.SecondaryCategoryAxis.Visible = false; //For the secondary X Axis
Chart1.SecondaryValueAxis.Visible = false; //For the secondary Y Axis

How to designate number formats in ExcelWriter across languages

Problem

While Microsoft Excel is available in different language versions, ExcelWriter is US-English based. What this means is that while ExcelWriter can generate spreadsheets in any language, there is no French, Russian, Chinese, etc. version of ExcelWriter. This requires special considerations when creating number formats for other languages in ExcelWriter.

You must understand what symbols are used as “separators,” “decimal placeholders,” and what remaining symbols will be interpreted as “literals.” This article will help you understand how ExcelWriter does this, and how this will be interpreted by a non-English versions of Excel (example, French, Chinese, Russian, etc). This post will use French as the non-English language example.

Solution

What is a number separator?

A number separator is a symbol or space that is used to group numbers so that they are easier to read. In English(US) and many other languages, separators occur between the thousands position and the hundreds position, and then again for every three numbers moving left of the decimal placeholder. The decimal placeholder may also change from language to language.

Compare these values for English (United States) and French:

Language Separator symbol Decimal Symbol Example using 1234567
English(US) Commas Period 1,234,567.00
French Spaces Comma 1 234 567,00

Specifying Number Formats for non-English Spreadsheets

Since ExcelWriter is only available in US-English, you must specify your number formats according to US-English standards. This will allow ExcelWriter to correctly identify the separators and decimal place holders. When the spreadsheet is opened in a non-English version of Microsoft Excel, those separators and placeholders will be correctly translated according to the language and regional settings in that version of Microsoft Excel.

ExcelWriter code sample


//--- Declare variables
ExcelApplication xla = new ExcelApplication();
Workbook wb = xla.Create();
Worksheet ws = wb.Worksheets[0];


ws.Cells["A1"].Value = 1234567
ws.Cells["A1"].Format.Number = "#,###.##;-#,###.##;;"
...[rest of code]

Here is how Cell A1 will display its number:

Language of Microsoft Excel: How the format will be translated:
English 1,234,567.00
French 1 234 567,00

How to populate XLS files with more than 65536 rows

Problem

This post describes 3 ways to populate a binary Excel template when there are more than 65536 rows in the data source. In order to accommodate a variable number of data rows, these approaches use a combination of ExcelApplication and ExcelTemplate. You can download a Visual Studio 2008 project containing a demo at the end of the article.

This post also discusses an alternative in the case that you do not have access to ExcelApplication.

Solution

For a binary Excel file (.xls), the number of rows of each worksheet is limited to 65536 rows (source: Excel specifications and limits). By comparison, an OOXML file (.xlsx or .xlsm) has a maximum of about 1 million (exactly 1048576) rows per worksheet. If the data source contains more than 65536 rows (and fewer than 1 million rows) and you are using ExcelWriter version 7.0 or higher, you can use ExcelTemplate with an OOXML template so that all data fit into a single worksheet.

If you want to use a binary template and the data source contains more than 65536 rows, the template must contain as many worksheets as necessary to accommodate all data rows. In order to generate the required number of worksheets, you can start with a template containing a single worksheet, then make copies of the original worksheet. Alternatively, you can start with the maximum number of worksheets required, then delete worksheets as necessary.

In this post, we use a binary template containing a single worksheet. Then we use ExcelApplication to make copies of the original worksheet so that the processed template contains exactly the number of worksheets required to accommodate all data rows. Depending on the approach, we will also need to modify the data marker(s) on each copy. Finally, we use ExcelTemplate to populate the processed template.

Support for OOXML files in ExcelApplication was introduced in ExcelWriter 8. In ExcelWriter v7.6.1 and earlier, ExcelApplication only supports binary files. If you are using an older version of ExcelWriter, you will not be able to dynamically insert or delete worksheets with an OOXML file.

If you cannot use ExcelApplication, you cannot dynamically insert or remove worksheets. In this case, you can have multiple templates, each with a different number of worksheets. See Using ExcelTemplate only.

As of ExceWriter 7.5, the following table summarizes the possible solutions for different scenarios. Note that more options would be available for ExcelWriter EE once ExcelApplication supports the OOXML format in a future version.

ExcelWriter edition ExcelWriter version Template format Available solution(s)
EE All Binary 1234
EE 7.0-7.6.1 OOXML Populate all data on single worksheet, 4
SE All Binary 4
SE 7.0+ OOXML Populate all data on single worksheet, 4

Using ExcelApplication and ExcelTemplate

1. Populate using a DataReader

When you use a forward-only data source such as a DataReader, the template must contain identical data markers on each worksheet. After the number of rows on a given worksheet reaches a limit, additional rows automatically overflow onto the next worksheet. This limit is specified in the MaxRows property of the DataBindingProperties parameter of the ExcelTemplate.BindData() method.

DataBindingProperties props = excelTemplate.CreateDataBindingProperties();
props.MaxRows = 65536;
excelTemplate.BindData(dataReader, "", props);

For illustration purpose, the above code snippet sets MaxRows to the maximum number of rows allowable on a worksheet. However, if the worksheet contains header or blank rows in addition to data rows, adjust this value accordingly.

If the template has more worksheets containing data markers than necessary to accommodate all rows, ExcelTemplate would attempt to populate the extra worksheets after DataReader reaches its end and consequently throw the following exception: SoftArtisans.OfficeWriter.ExcelWriter.SAException: Exhausted data marker at XX value: %%=YY.ZZ.

Note for ExcelWriter 3.9.x

In ExcelWriter 3.9.x, ExcelTemplate does not support the BindData method. Replace the data-binding code above with:

excelTemplate.SetDataSource(dataReader, "", 65536);

2. Populate using the (continue) data marker modifier

This approach is applicable when you use a scrollable data source such as a DataTable. The data markers on the overflow worksheets must have a (continue) modifier; e.g., %%=datasource.field1(continue). Except for the (continue) modifier, the data markers are otherwise the identical. In order to create the required number of worksheets, we use ExcelApplication to make copies of the first worksheet, then append (continue) to all data markers on each copy. See Data Marker Modifiers.

The (continue) modifier indicates that ExcelTemplate should continue to read the data source at the point where the previous worksheet leaves off. Again, set the DataBindingProperties.MaxRows property to limit the number of rows per worksheet.

DataBindingProperties props = excelTemplate.CreateDataBindingProperties();
props.MaxRows = 65536;
excelTemplate.BindData(dataTable, "", props);

For illustration purpose, the above code snippet sets MaxRows to the maximum number of rows allowable on a worksheet. However, if the worksheet contains header or blank rows in addition to data rows, adjust this value accordingly.

If the template has more worksheets containing data markers than necessary to accommodate all rows, the DataTable would rewind after reaching the last worksheet and the extra worksheets would contain repeated rows.

Note for ExcelWriter 3.9.x

In ExcelWriter 3.9.x, ExcelTemplate does not support the BindData method. Replace the data-binding code above with:

excelTemplate.SetDataSource(dataTable, "", 65536);

3. Populate individual worksheets

You must bind data to each individual worksheet using a different data source. Use the DataBindingProperties.WorksheetName property to specify the name of the worksheet you are targeting. Each worksheet’s data source must have only the rows you want on that worksheet. Because each worksheet has a different data source, it’s possible to have a different number of rows on each worksheet.

DataBindingProperties props = excelTemplate.CreateDataBindingProperties();
props.WorksheetName = workbook.Worksheets[0].Name;
excelTemplate.BindData(dataTable1, "datasource1", props);

The data markers on each worksheet must have a data source identifier matching its data source. If you use ordinal data source identifier, the first worksheet to be data-bound should have data markers like %%=#1.field1, the second data-bound worksheet, %%=#2.field2, and so on. If you use named data source identifier, the data marker should contain the name of the data source specified in ExcelTemplate.BindData. For example, if the data-binding call is BindData(dataTable, “datasourceN”, dataBindingProperties), the corresponding data markers should be %%=datasourceN.field1%%=datasourceN.field2, and so on.

If you start with a single data source, you can partition it into smaller data sources containing non-overlapping blocks of rows. In the sample, we partition the original DataTable into a set of smaller DataTables, each of which contains 65536 rows (or whichever value the row limit is set to). The last DataTable gets the remaining rows and can have fewer than 65536 rows. Each smaller DataTable is used as the data source for a different worksheet.

If you bind the same data source to more than one worksheet, even under different names, ExcelTemplate will return the following error: This binding source named xxx was already added under the name yyy.

Note for ExcelWriter 3.9.x

This approach is not compatible with ExcelWriter 3.9.x because it isn’t possible to bind data to a specific worksheet.

Using ExcelTemplate only

The following discussion applies to the following scenarios:

  • You do not have access to ExcelApplication
  • You are unable to use ExcelApplication with an OOXML template

4. Using multiple templates

It may occur to you to use a template with as many worksheets as necessary and append the (optional) modifier to all data markers on each worksheet. The assumption is that ExcelTemplate would ignore such data markers if there are no data bound to a worksheet. However, this assumption is not correct. The (optional) modifier has to do with the presence or absence of a column, not a row. ExcelTemplate ignores a data marker marked as optional if there is no corresponding column in the data source. But if a data marker is mapped to a column in the data source, ExcelTemplate would attempt to populate all instances of such a data marker. If ExcelTemplateencounters a data marker and there is no data row left, its behavior depends on the type of data source. For a forward-only data source such as DataReader, ExcelTemplate would throw an “exhausted data marker” error. For a scrollable data source such as DataTable, ExcelTemplate would rewind the data source to the beginning, resulting in duplicate data.

Consequently, it’s imperative that the template contains exactly the number of worksheets required to accommodate all data rows. If you cannot use ExcelApplication, you would not be able to dynamically insert or remove worksheets at run time. An alternative is to create multiple templates, each one containing a different number of worksheets. You can determine which template to use depending on the number of rows in the data source.

Conclusions

Using DataReader for data retrieval is the simplest approach and has performance benefits. Using the (continue) data marker modifier involves performing additional processing on a worksheet after copying it and therefore is less efficient. Binding data to individual worksheets, because the data source must be processed, can have a lower performance than the other approaches.

About the demo

The attachment contains a Visual Studio 2008 project illustrating the 3 approaches described in this article. Each of the approaches is contained in a separate method. You must install ExcelWriter EE before running the demo.

Attachments

How to check if a cell is empty

Solution

A cell can contain a value and/or a formula. To check if a cell is empty use the Cell.Value and Cell.Formula properties to look for the following conditions:

XLS Files XLSX and XLSM Files
Language Cell.Value Cell.Formula Cell.Value Cell.Formula
C# null (empty string) (empty string) OR null (empty string)
VB.NET Nothing Nothing (empty string) OR Nothing Nothing
ASP/COM * (empty string) (empty string) (empty string) (empty string)

 

Example

//--- myCell is a Cell object 
if (string.IsNullOrEmpty(myCell.Value) && string.IsNullOrEmpty(myCell.Formula))
{
//--- Cell is empty
}


How to create a chart with different chart types

Problem

ExcelWriter supports creating custom charts, including charts that contain several different chart types. For example, a chart wtih two area series and one line series.

Solution

This code snippet creates a chart with two area series and one line series:

Chart chart = ws.Charts.CreateChart(ChartType.Area.StandardArea,ws.CreateAnchor(5, 0, 0, 0));
chart.SeriesCollection.CreateSeries("A1:A3");
chart.SeriesCollection.CreateSeries("B1:B3", ChartType.Area.StandardArea, AxisType.Primary);
chart.SeriesCollection.CreateSeries("C1:C3", ChartType.Line.StandardLine, AxisType.Primary);

Optimizing ExcelApplication performance to reduce memory or time usage

Problem

Your program, which uses the ExcelApplication object, is using more memory than you would like or is taking too long to generate a report.

Solution

The Best Pratices with Large Reports article in our documentation discusses how to optimize for large ExcelApplication reports by reducing memory or run time. It includes, but is not limited to:

  • What to avoid when referencing cells, inserting rows or columns, and applying styles
  • Certain methods are known to be memory intensive, such as AutoFitWidth
  • Ways to improve performance by changing how and when data is imported

There are also code examples that compare inefficient code and code that was improved by the recommendations in the article. These code examples are split up into Memory Related Performance Issues and Time Related Performance Issues.