What do I want to achieve?I want to generate Excel file in Open XML format with large amount of data, ca. 200 000 records. My application has to read data from database and generate reports in real-time with as low memory usage as possible .
My solutionTo solve my issue I decided to use the newest version of Open XML Format SDK 2.0. This SDK provides very simple creating the Open XML code from scratch and has very nice features for creating and manipulating Open XML files, like strongly typed document object model or tool for generating C# code named Document Reflector. I thought this SDK would help me to create Excel spreedsheets in simple and fast way and, moreover, solve my performance issues. In order to improve memory usage I wanted to write partialy buffered generated Excel file stream into file. I was disappointed when I didn’t find any feature to do this in SDK. So I did buffering myself. In code where I append data to spreadsheet I flush stream into file when was added “BufferSize” records. This shows code below:
foreach (Record m in dataList)
Row contentRow = CreateContentRow(index, m); // my own method to create row content
//Append new row to sheet data.
if (index % BufferSize == 0)
Performance profilingI wanted to test my application performance to check if there is some performance improvement. My colleagues recommended me tools in the following order: Results In my tests I generated Excel file with 100 000 records(rows). The results you can see below: As you can see the above first test was finished in 18 seconds and memory was growing steadily to 150 MB .In the second case, when I used buffer, the apllication was behaving differently. The application finished in 50 seconds and the memory usage was growing, droping and achieved ca. 220 MB. Besides I made additional tests for checking how fast memory usage grows for different number of records. See the results.
ConclusionsAll performance tests show that my solution is a failure. Flushing stream using method Save() with my own buffering doesn’t give me any improvement. I was expecting that execution time will be longer with regards to writing on disk costs, but I hoped my solution improve memory usage. In solution without as well as with buffering, memory usage finally grows linear. In the second case an abnormal phenomenon occured. Open XML Format SDK method Save() consumes a lot of memory instead of releasing it. Method Save() flush stream as it is shown on second chart above. Additionally, it uses more memory to do this. Summary: Don’t use Open XML Format SDK if you want to effectively generate large amount of data in Open XML file. Share your opinion and experience with us below or meet us on Twitter: @GOYELLO.
Thanks to Karol Świder for helping me write this blog post.