Do you want to generate large Excel 2007 spreadsheets in short time and low memory usage? You can have some problems with that. I have tried to do it using Open XML Format SDK. Below you can see my results.
What do I want to achieve?
I want to generate Excel file in Open XML format with large amount of data, ca. 200 000 records. My application has to read data from database and generate reports in real-time with as low memory usage as possible .
To solve my issue I decided to use the newest version of Open XML Format SDK 2.0. This SDK provides very simple creating the Open XML code from scratch and has very nice features for creating and manipulating Open XML files, like strongly typed document object model or tool for generating C# code named Document Reflector. I thought this SDK would help me to create Excel spreedsheets in simple and fast way and, moreover, solve my performance issues. In order to improve memory usage I wanted to write partialy buffered generated Excel file stream into file. I was disappointed when I didn’t find any feature to do this in SDK. So I did buffering myself. In code where I append data to spreadsheet I flush stream into file when was added “BufferSize” records. This shows code below:
Row contentRow=CreateContentRow(index,m);// my own method to create row content
//Append new row to sheet data.
The only posibility in SDK to do flushing is by the method Worksheet.Save() which I use. The documentation says that the method Save(): “saves the data in the DOM tree back to the part. It could be called multiple times as well. Each time it is called, the stream will be flushed. ” Unfortunetly this method doesn’t work so good as I thought.
I wanted to test my application performance to check if there is some performance improvement. My colleagues recommended me tools in the following order:
I downloaded the first apllication and I discovered that I wanted to have presentation of memory usage in runtime by chart. dotTrace don’t have this feature. CLR Profiler either. But when I downloaded and ran ANTS I was positively surprised. I recommend this tool, especially for memory profiling.
In my tests I generated Excel file with 100 000 records(rows). The results you can see below:
Application performance without buffering
Application performance with buffer size 10 000 records
As you can see the above first test was finished in 18 seconds and memory was growing steadily to 150 MB .In the second case, when I used buffer, the apllication was behaving differently. The application finished in 50 seconds and the memory usage was growing, droping and achieved ca. 220 MB.
Besides I made additional tests for checking how fast memory usage grows for different number of records. See the results.
All performance tests show that my solution is a failure. Flushing stream using method Save() with my own buffering doesn’t give me any improvement. I was expecting that execution time will be longer with regards to writing on disk costs, but I hoped my solution improve memory usage. In solution without as well as with buffering, memory usage finally grows linear. In the second case an abnormal phenomenon occured. Open XML Format SDK method Save() consumes a lot of memory instead of releasing it. Method Save() flush stream as it is shown on second chart above. Additionally, it uses more memory to do this.
Summary: Don’t use Open XML Format SDK if you want to effectively generate large amount of data in Open XML file.
Share your opinion and experience with us below or meet us on Twitter: @GOYELLO.
Thanks to Karol Świder for helping me write this blog post.