The Problem and Criteria for a viable solution
It seems to be common place these days that people want their data in Excel so they can play around with it. The problem is that using Excel automation/COM/VBScript, etc is not a thread-safe option and also requires the installation of Excel. On the desktop these are not real big issues since most users have Excel on their desktop. However, this is a huge issue when allowing users to download large (dynamically generated) Excel files from the web. You really should not be installing Excel on the server to do this.
There are lots of solutions out there that are easy to use and many of them are free. The problem is that they load all the data into memory BEFORE it is written out to an Excel file. This is fine for smaller exports, but when you get into say 50 columns and 500 rows of text this can quickly become very memory intensive. On a server where there are lots of users this can become a problem very quickly.
The only solution that I am aware of must not load all the data in memory all at once. Instead, it must load only small chunks such as a cell or row at a time. This way the RAM footprint will stay constant and quite small. This means that the amount of RAM your process takes is NOT dependent on the amount of data you are trying to write to the Excel file. This is great news!
So to summarize, we want a solution that meets the following criteria:
- Memory footprint is not tied to the amount of data being written to Excel
- Excel does not need to be installed
- Accessible via .NET. and prefer Managed Code.
- Thread-safe (since we are running in a server environment)
- Easy to use
- Supports dates, text, and number formats correctly.
- Allows for most all kinds of things in Excel like Pivot tables, Auto-Filters, Freezing rows, etc.
- Fast
The Solution Concept
The solution is based on the
Open XML SDK 2.0. However, I am using a less used part of the Open XML SDK 2.0. In particular, I use the
OpenXmlReader and
OpenXmlWriter. These work differently from how people usually use the SDK. These objects work like the Java SAX parser for XML; meaning they step through the XML tags one at a time and it is our responsibility to do whatever we need to do. This is a bit more work, but not pad because most of the same objects you would normally use are still used and just passed to the OpenXmlWriter to serialize them as XML.
My solution uses a template approach such that you create your basic Excel document in Excel and then read it in and just modify the parts that we want to modify. This is nice in the sense that we don’t have to code everything. This can be a HUGE time saver. On the downside, this can be a bit more difficult when the sheet we are modifying is the one that we are writing this large amount of data to. The good news is that is is not that difficult once you understand the solution. The basic idea is that use the OpenXmlReader to read the contents of the original Sheet. We never write to this Sheet. We actually create a new Sheet that we then use the OpenXmlWriter to write the data to. So, as we read a tag from the OpenXmlReader we write, ignore if we like, or create something new into the new Sheet using the OpenXmlWriter. When we are done, we delete the original sheet, and give the new sheet the id of the one we deleted so it can take its place. The basics for this design I have to give credit to
this blog. I did expand on it quite a bit for my final solution though.
Excel File Format Basics
The Excel File format is actually a zip file of several files. Each sheet is actually a xml file in the zip file. The styles (formatting basically) is in a separate xml file and is shared among the sheets. Each file is called a part. Each file must be saved after you make changes to it. These files have relationships that you must define between the files.
A blending of models
The first thing we will do in our code is copy the Excel file we are using as a template. Once we do that we can use the Open XML SDK 2.0 to make any changes we want to it. This is easily done in the common way that everyone edits Excel file using the Open XML SDK. There is very little overhead since the template file should be relatively small because it doesn’t have much data in it yet.
Once we have made some changes to the copy of the template we then save our changes. Saving is Critical because when we use the OpenXmlReader, it will read the content from DISK, not memory.
Keeping Consistency
One of the difficulties when working with a template based solution is that not all tags you expect to be in the file will be there because many tags are optional. This can be a pain when working with a parser that only gives you the current parser and you don’t know if a tag will exist later or not. This is especially true when you need to add a tag and it has to be before or after a particular tag(s). I found that the parsing code quickly got cluttered and felt hacked if I tried to handle these optional tags while parsing. I found the better solution use the part of the SDK that loads the entire sheet into memory to check if the tags are there and if not add them. That way they will be guaranteed to be there when I am parsing the tags with the OpenXmlReader. You don’t have to do this for all optional tags, just the ones that you care about. For example, PageSetup and PageMargins or TableParts and TablePart.
The Code
Download
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using DocumentFormat.OpenXml.Spreadsheet;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml;
using System.Data.SqlClient;
using System.Data;
using System.IO;
public class LargeExcelFile
{
// we can reuse these object to reduce overhead
Row r = new Row();
Cell cell = new Cell();
public LargeExcelFile()
{
}
private WorksheetPart GetWorksheetFromSheetName(WorkbookPart workbookPart, string sheetName)
{
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().FirstOrDefault(s => s.Name == sheetName);
if (sheet == null) throw new Exception(string.Format("Could not find sheet with name {0}", sheetName));
else return workbookPart.GetPartById(sheet.Id) as WorksheetPart;
}
public void Write(SqlDataReader dbReader, string sheetName, string excelTableName, List<string> columnHeaders, List<double?> columnWidths, List<uint?> columnHeaderStyleIndexes, List<uint?> columnStyleIndexes, string templateFilename, string outputFilename, bool overwriteContent, bool addAutoFilters)
{
// create directories up to the file we are going to write
Directory.CreateDirectory(new FileInfo(outputFilename).Directory.FullName);
File.Copy(templateFilename, outputFilename, true);
int existingRows = 0;
int numDataRows = 0;
int numCols = 0;
TableDefinitionPart tableDefPart = null;
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(outputFilename, true))
{
WorkbookPart workbookPart = myDoc.WorkbookPart;
WorksheetPart worksheetPart = GetWorksheetFromSheetName(workbookPart, sheetName);
#region let's do some things to the worksheet to make parsing easier and consistent.
#region make sure we have a cols section
var colsSection = worksheetPart.RootElement.Descendants<Columns>().FirstOrDefault();
if (colsSection == null)
{
var sheetData = worksheetPart.RootElement.Descendants<SheetData>().FirstOrDefault();
colsSection = worksheetPart.RootElement.InsertBefore<Columns>(new Columns(), sheetData);
}
// we don't really care about the col tags in the cols section since we will add our own anyway... just remove them.
colsSection.RemoveAllChildren<Column>();
#endregion
#region make sure we have a TableParts section
var tableParts = worksheetPart.RootElement.Descendants<TableParts>().FirstOrDefault();
if (tableParts == null)
{
var sheetData = worksheetPart.RootElement.Descendants<SheetData>().FirstOrDefault();
tableParts = worksheetPart.RootElement.InsertAfter<TableParts>(new TableParts(), sheetData);
}
// we don't really care about the col tags in the cols section since we will add our own anyway... just remove them.
tableParts.RemoveAllChildren<TablePart>();
#endregion
// be sure to save before we create the OpenXmlReader(), otherwise it won't see the changes
worksheetPart.Worksheet.Save(); #endregion
string origninalSheetId = workbookPart.GetIdOfPart(worksheetPart);
WorksheetPart replacementPart = workbookPart.AddNewPart<WorksheetPart>();
string replacementPartId = workbookPart.GetIdOfPart(replacementPart);
// creates actual table part (file) (i.e. xl/tables/table1.xml)
tableDefPart = replacementPart.AddNewPart<TableDefinitionPart>();
string newTablePartID = replacementPart.GetIdOfPart(tableDefPart);
OpenXmlReader excelReader = OpenXmlReader.Create(worksheetPart);
OpenXmlWriter excelWriter = OpenXmlWriter.Create(replacementPart);
cell.CellValue = new CellValue();
bool isReadingSheetDataInOverwriteMode = false;
while (excelReader.Read())
{
// if we get to the end of the SheetData (end tag) then let's append our new data before we write the end tag
if (excelReader.ElementType == typeof(SheetData) && (overwriteContent || excelReader.IsEndElement))
{
// if we are supposed to overwrite the content on the data sheet
// then ignore any data that is there already on the specified sheet
// and instead create a new start sheet data tag.
if (overwriteContent)
{
isReadingSheetDataInOverwriteMode = excelReader.IsStartElement;
if (excelReader.IsEndElement)
{
// do nothing and then exit to top of while loop so we can process the next tag
continue;
}
else // start element
{
excelWriter.WriteStartElement(new SheetData());
}
}
// add start row tag to SheetData for header row
excelWriter.WriteStartElement(r);
// read row 1 from the database to get the column headers
for (int colIdx = 0; colIdx < columnHeaders.Count(); colIdx++)
{
WriteColumnHeader(excelWriter, columnHeaders[colIdx], columnHeaderStyleIndexes[colIdx]);
}
// add end row tag to SheetData
excelWriter.WriteEndElement();
//excelWriter.WriteElement(new AutoFilter() { Reference = "A1:AA1" });
DateTime start = DateTime.Now;
// loop through our data and create corresponding rows in SheetData
while (dbReader.Read())
{
// add start row tag to SheetData for the row of data we just read from the database
excelWriter.WriteStartElement(r);
// get the number of columns if we haven't done so already.
if (numCols == 0)
{
numCols = dbReader.FieldCount;
}
// loop through the columns and write a cell for each one
for (int colIdx = 0; colIdx < dbReader.FieldCount; colIdx++)
{
WriteCell(excelWriter, dbReader[colIdx], columnStyleIndexes[colIdx]);
}
// add end row tag to SheetData
excelWriter.WriteEndElement();
// update the number of rows
numDataRows++;
}
// write end SheetData tag
excelWriter.WriteEndElement();
DateTime end = DateTime.Now;
Console.WriteLine("File Writing Duration: " + end.Subtract(start).TotalSeconds.ToString());
} else if (excelReader.ElementType == typeof(Selection))
{
// only write the entire tag once when we see the start element.
if (excelReader.IsStartElement)
{
// we want to change the select to be the top left cell. // This is not a required step, but does prevent the user from seeing a selection
// that just happen to be the last thing that the person that put together the Template had selected.
// Assuming the sheet we are writing is the one that is the active sheet when the user opens the file in Excel
excelWriter.WriteElement(new Selection() { ActiveCell = "A1"});
}
}
else if (excelReader.ElementType == typeof(TablePart) || excelReader.ElementType == typeof(TableParts))
{
// do nothing for the TablePart or TableParts tags since we don't have it after we delete the sheet
// if they existed on the original sheet, and we are adding the one we need explicitly
if (excelReader.IsStartElement)
{
// write the TableParts (notice the 's') start tag
excelWriter.WriteStartElement(excelReader);
// we want to reference the new one we created not the one that will be deleted when the original sheet is deleted later
excelWriter.WriteElement(new TablePart() { Id = newTablePartID });
}
else // end TableParts tag
{
// write the TableParts (notice the 's') end tag
excelWriter.WriteEndElement();
}
}
else if (excelReader.ElementType == typeof(PageSetup))
{
// do nothing for the PageSetup tag since we don't have it after we delete the sheet
}
else if (excelReader.ElementType == typeof(PageMargins))
{
// do nothing for the PageMargins tag since we don't have a PageSetup tag
}
else if (excelReader.ElementType == typeof(Columns))
{ // we want to add our own colummns to the columns section.
// only write the entire tag once when we see the start element.
if (excelReader.IsStartElement)
{
excelWriter.WriteStartElement(excelReader);
// write a column tag, one for each column we are showing
for (int i = 0; i < columnWidths.Count; i++)
{
UInt32Value colNum = (UInt32)i + 1;
DoubleValue colWidth;
// if there is a column width specified then use it
if (columnWidths[i].HasValue)
{
colWidth = new DoubleValue(columnWidths[i].Value);
}
else // no column width specified so use a default.
{
// NOTE: 8.43 is the default that Excel uses, but we can make it whatever we like as the default
colWidth = new DoubleValue(8.43D);
}
excelWriter.WriteElement(new Column() { Min = colNum, Max = colNum, CustomWidth = true, Width = colWidth }); }
}
else // end Cols tags, so just write it
{
excelWriter.WriteEndElement();
}
}
else
{
// if overwritingContent or reading if (!isReadingSheetDataInOverwriteMode)
{
if (excelReader.IsStartElement)
{
excelWriter.WriteStartElement(excelReader);
}
else if (excelReader.IsEndElement)
{
excelWriter.WriteEndElement();
if (excelReader.ElementType == typeof(Row))
{
existingRows++;
}
}
// don't forget to copy the content between tags that is just text (i.e. not a tag). This is the case for leaf nodes.
string text = excelReader.GetText();
if (!string.IsNullOrEmpty(text))
{
excelWriter.WriteString(text);
}
}
}
}
excelReader.Close();
excelWriter.Close();
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Id.Value.Equals(origninalSheetId)).First();
sheet.Id.Value = replacementPartId;
workbookPart.DeletePart(worksheetPart);
// create a Table in Excel so that we can reference in a Pivot table if we want to.
// it is also needed to do auto filtering.
// Also useful for calculations, etc
int numRows = numDataRows + 1;
// we should start at row 1 if overwriting content, // but it if we are not overwriting content then // the start range is really the row after the rows that are already there.
// NOTE: This assumes that the rows that are alredy there are NOT part of the data we want in the Defined Range
int startRow = 1;
if (!overwriteContent)
{
startRow = existingRows + 1;
}
int startColumn = 1;
string range = String.Format("{0}{1}:{2}{3}", GetColumnId(startColumn), startRow, GetColumnId(numCols), numRows);
PopulateTableDefinitionPart(tableDefPart, excelTableName, range, columnHeaders, addAutoFilters);
}
} // Translates the column number to the column reference string (e.g. 1 -> A, 2-> B) // Copied from OpenXmlPowerTools.WorksheetAccessor.cs
private static string GetColumnId(int columnNumber)
{
string result = "";
do
{
result = ((char)((columnNumber - 1) % 26 + (int)'A')).ToString() + result;
columnNumber = (columnNumber - 1) / 26;
} while (columnNumber != 0);
return result;
}
// since we are reusing the cell object, we need to clear in in between uses.
private void ResetCell()
{
cell.StyleIndex = null;
}
/// <summary>
/// Fills in the content in the actual table part (i.e. xl/tables/table1.xml)
/// NOTE: The columns added here MUST be in the Excel sheet in the cells that are referenced.
/// </summary>
/// <param name="part">A TableDefinitionPart that has been created, but still needs the Table added to it.</param>
/// <param name="excelTableName">The name of the table. This is used in Excel and shown as the table name. (i.e. Table1)</param>
/// <param name="reference">An Excel range like A1:J4027</param>
/// <param name="columnNames">A list of column headers in row one of the rows indicated in the reference parameter.</param>
/// <param name="addAutoFilters">true to add auto filters to columns, else false to leave alone.</param>
private void PopulateTableDefinitionPart(TableDefinitionPart part, string excelTableName, string reference, List<string> columnNames, bool addAutoFilters)
{
Table table1 = new Table() { Id = (UInt32Value)1U, Name = excelTableName, DisplayName = excelTableName, Reference = reference, TotalsRowShown = false };
AutoFilter autoFilter1 = null;
if (addAutoFilters)
{
autoFilter1 = new AutoFilter() { Reference = reference };
}
TableColumns tableColumns = new TableColumns() { Count = (UInt32Value)(UInt32)columnNames.Count };
for (int i = 0; i < columnNames.Count; i++)
{
string colName = columnNames[i];
tableColumns.Append(new TableColumn() { Id = (UInt32Value)(UInt32)i+1, Name = colName });
}
TableStyleInfo tableStyleInfo1 = new TableStyleInfo() { Name = "TableStyleMedium9", ShowFirstColumn = false, ShowLastColumn = false, ShowRowStripes = true, ShowColumnStripes = false };
if (addAutoFilters)
{
table1.Append(autoFilter1);
}
table1.Append(tableColumns);
table1.Append(tableStyleInfo1);
part.Table = table1;
}
protected void WriteColumnHeader(OpenXmlWriter excelWriter, string value, uint? styleIndex)
{
ResetCell();
//// write a string
cell.DataType = CellValues.String;
cell.CellValue.Text = value;
if (styleIndex.HasValue)
{
cell.StyleIndex = styleIndex.Value;
}
excelWriter.WriteElement(cell);
}
// ******* Understanding the StyleIndex ********
// this is the order that it appears in the style file under CellStyleFormats (<cellXfs ...>. // For instance if there is a style custom style defined as yyyy-mm you will find this in the <numFmts...> tag
// Then search the file for 164. You will see that there is a reference to it in the cellXfs tag's children // as numFmtId="164". Now that we have found the CellStyleFormat that uses our number format we just have to // look at it zero based position in the CellSytleFormats (<cellXfs....> tag). In the case below it is // at index 1 (remember start counting at zero like arrays in C#). In this example we would set cell.StyleIndex = 1
// if we wanted it to be formatted as yyyy-mm.
//...
//<numFmts count="1">
// <numFmt numFmtId="164" formatCode="yyyy\-mm"/>
//</numFmts>
//...
//<cellXfs count="3">
// <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
// <xf numFmtId="164" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
// <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
//</cellXfs>
protected void WriteCell(OpenXmlWriter excelWriter, object val, uint? styleIndex)
{
Type valType = val.GetType();
if (valType == typeof(string))
{
WriteTextCell(excelWriter, val, styleIndex);
}
else if (valType == typeof(DateTime))
{
WriteDateTimeCell(excelWriter, val, styleIndex);
}
else if (valType == typeof(Int16) || valType == typeof(Int32) || valType == typeof(Int64))
{
WriteIntegerCell(excelWriter, val, styleIndex);
}
else if (valType == typeof(float) || valType == typeof(double) || valType == typeof(decimal))
{
WriteDecimalCell(excelWriter, val, styleIndex);
}
else
{
WriteGeneralCell(excelWriter, val, styleIndex);
}
}
protected void WriteDateTimeCell(OpenXmlWriter excelWriter, object value, uint? styleIndex)
{
ResetCell();
// write a date
cell.DataType = CellValues.Date;
cell.CellValue.Text = DateTime.Now.ToOADate().ToString();
if (styleIndex.HasValue)
{
cell.StyleIndex = styleIndex.Value;
}
excelWriter.WriteElement(cell);
}
protected void WriteGeneralCell(OpenXmlWriter excelWriter, object value, uint? styleIndex)
{
ResetCell();
// write a string
cell.DataType = CellValues.String;
cell.CellValue.Text = Convert.ToString(value);
if (styleIndex.HasValue)
{
cell.StyleIndex = styleIndex.Value;
}
excelWriter.WriteElement(cell);
}
protected void WriteTextCell(OpenXmlWriter excelWriter, object value, uint? styleIndex)
{
ResetCell();
//// write a string
cell.DataType = CellValues.String;
cell.CellValue.Text = Convert.ToString(value);
if (styleIndex.HasValue)
{
cell.StyleIndex = styleIndex.Value;
}
excelWriter.WriteElement(cell);
}
protected void WriteIntegerCell(OpenXmlWriter excelWriter, object value, uint? styleIndex)
{
ResetCell();
//write a decimal
cell.DataType = CellValues.Number;
cell.CellValue.Text = Convert.ToString(value); if (styleIndex.HasValue)
{
cell.StyleIndex = styleIndex.Value;
}
excelWriter.WriteElement(cell);
}
protected void WriteDecimalCell(OpenXmlWriter excelWriter, object value, uint? styleIndex)
{
ResetCell(); //write a decimal
cell.DataType = CellValues.Number;
cell.CellValue.Text = Convert.ToString(value);
if (styleIndex.HasValue)
{
cell.StyleIndex = styleIndex.Value;
}
excelWriter.WriteElement(cell);
}
}