READ BEFORE USING IT – Open XML SDK performance analysis

by Maciej Greń on 25th August 2009

PerformanceChart Do you want to generate large Excel 2007 spreadsheets in short time and low memory usage? You can have some problems with that. I have tried to do it using Open XML Format SDK. Below you can see my results.

What do I want to achieve?

I want to generate Excel file in Open XML format with large amount of data, ca. 200 000 records. My application has to read data from database and generate reports in real-time with as low memory usage as possible .

My solution

To solve my issue I decided to use the newest version of Open XML Format SDK 2.0. This SDK provides very simple creating the Open XML code from scratch and has very nice  features for creating and manipulating Open XML files, like strongly typed document object model or tool for generating C# code named Document Reflector. I thought this SDK would help me to create Excel spreedsheets in simple and fast way and, moreover, solve my performance issues. In order to improve memory usage I wanted to write partialy buffered generated Excel file stream into file. I was disappointed when I didn’t find any feature to do this in SDK. So I did buffering myself. In code where I append data to spreadsheet I flush stream into file when was added “BufferSize” records. This shows code below:
foreach (Record m in dataList)
{
    Row contentRow = CreateContentRow(index, m);         // my own method to create row content

    //Append new row to sheet data.
    sheetData.AppendChild(contentRow);

    if (index  % BufferSize == 0)
    worksheetPart.Worksheet.Save();

    index++;
}
The only posibility in SDK to do flushing is by the method Worksheet.Save() which I use. The documentation says that the method Save(): “saves the data in the DOM tree back to the part. It could be called multiple times as well. Each time it is called, the stream will be flushed. ” Unfortunetly this method doesn’t work so good as I thought.

Performance profiling

I wanted to test my application performance to check if there is some performance improvement. My colleagues recommended me tools in the following order:
  1. dotTrace Profiler
  2. CRL Profiler
  3. ANTS Performance Profiler
I downloaded the first apllication and I discovered that I wanted to have presentation of memory usage in runtime by chart. dotTrace don’t have this feature. CLR Profiler either. But when I downloaded and ran ANTS I was positively surprised. I recommend this tool, especially for memory profiling. Results In my tests I generated Excel file with 100 000 records(rows). The results you can see below:
PerformanceWithoutBuffer

Application performance without buffering

PerformanceWithBuffer

Application performance with buffer size 10 000 records

As you can see the above first test was finished in 18 seconds and memory was growing steadily to 150 MB .In the second case, when I used buffer, the apllication was behaving differently. The application finished in 50 seconds and the memory usage was growing, droping and achieved ca. 220 MB. Besides I made additional tests for checking how fast memory usage grows for different number of records. See the results. MemoryComparision

Conclusions

All performance tests show that my solution is a failure. Flushing stream using method Save() with my own buffering doesn’t give me any improvement. I was expecting that execution time will be longer with regards to writing on disk costs, but I hoped my solution improve memory usage. In solution without as well as with buffering, memory usage finally grows linear. In the second case an abnormal phenomenon occured. Open XML Format SDK method Save() consumes a lot of memory instead of releasing it. Method Save() flush stream as it is shown on second chart above. Additionally, it uses more memory to do this. Summary: Don’t use Open XML Format SDK if you want to effectively generate large amount of data in Open XML file. Share your opinion and experience with us below or meet us on Twitter: @GOYELLO.
Thanks to Karol Świder for helping me write this blog post.
  • lindabrownell

    I am not sure if the performance chart at the top is meant for speed or not and what the axis mean. But if it does mean for the performance, here is probably why: SDK stores all content of the part into the DOM tree and when Save() is called, it serializes the whole tree back into the stream and writes it out. In another word, SDK does not have a support to manipulate documents in a stream model yet. With this design, the idea of buffer records won’t help much in terms of the SDK's consumption of memory. ‘BufferSize’ could help a bit with performance as it reduces the number of calls to Save(). If user calls Save() one time after he has inserted all the records into the sheet, it is probably the fastest way – if memory is not an issue.

  • http://vtd-xml.sf.net/ anon

    You may want to look at vtd-xml as the state of the art in XML processing, consuming far less memory than DOM

    http://vtd-xml.sf.net

  • Patrick

    Did they fix the buffering problem in OpeXml Document 2.0 (CTP December) version? have you ran these test using the newest version?

  • http://recordsresources.com public records

    hi, i see this site and i think there are more information which is help people!

  • http://www.goyello.com Maciej Greń

    Thank you for your comment and good word.

  • http://www.louis-vuitton-outlet-store.com louis vuitton for sale

    I like writing, hurriedly, not that you are not
    http://www.oantcogs.com