Often we face a problem where and how to store our domain objects. This decision may be of great importance later. In this article, I’ll present you an option of storing serialized domain objects in a database. We’ll look at XML object serialization in a relational database. We’ll get to know the pros and cons of such an approach.

When we may want to store a serialized object in the database?

The most popular approach of storing objects in relational databases is a table with a column per property.

Let’s assume we have an object like this:

public class Object
{
public int Id { get; set; }
public int Property1 { get; set; }
public int Property2 { get; set; }
public string Property3 { get; set; }
public string Property4 { get; set; }
} 

Standard approach would be to create a table with 5 columns:

Id Property 1 Property 2 Property 3 Property 4
         

All looks clear and simple. Yet, what if we have a complex object with dozens of properties which are not always primitive types but custom types? If we don’t use ORM code-first approach to generate database tables the process of creating tables, columns and mapping will be time-consuming.

There might be an idea to create a table with one xml column and store serialize object there. This might look like this:

When would you consider using that approach?

  • Objects to store are big so it’s time-consuming to maintain database schema and code mapping
  • Limited time for development
  • Objects rarely (or never) need to be modified

Interaction with the database doesn’t need any mapping so code is significantly reduced.

Reading an object might look like this:

Object obj = Deserialize(xmlFromDatabase);

And writing:

string xmlToSave = Serialize(obj);

I’m involved in an enterprise project that uses such an approach for some big and complex domain objects. I’d like to share with you what are downsides of such an approach that we face on during application maintenance.

Worse performance when interacting with the database

Reading and deserializing from the XML column will always take more time than reading from separate database columns. Loading a collection of objects might become a slow operation with this storage model. So, you’re limited to loading 1 object at a time. We want modern applications to be as fast as possible. That means that we would accept such a storage model only in back-office applications. When there is no risk that the client will leave the page because it is too slow.

No option to change serialized object class or property names

Let’s assume we have a property called Number on a class named Order. At some point, it turned out that the more appropriate name for this property would be Identifier. With a serialized object, we can’t rename this property as we’d lost old values stored in a database.

It’s even worse if we want to change the type of property. If we change non nullable type to nullable, int to string, boolean to enum, it’d result in serialization runtime error and object can’t be deserialized at all.

Such constraints make the development process harder as you cannot adjust your model to constantly changing business requirements. This might get especially problematic in big enterprise applications when appropriate names are important. If we use Domain-Driven Design the rule of ubiquitous language can easily be violated.

Difficult data fixing

Often, we have some data in the database that we want to modify (for example data fix scenario). To update serialized data, we need to write queries that change xml. This makes the task much more time consuming than it’d be if we could use the standard T-SQL Update statement. A simple query to update xml property might look like below:

UPDATE Table1

SET Column_XML.modify('

replace value of (/Property1/text())[1]

with "some value"

');

WHERE SOME CONDITION

It gets even more complicated if we want to modify xml collections.

Difficult search on a serialized column

Let’s say we have some production data already. There might appear a business need to search by data stored in a serialized column. So, we have 2 options, either we use XML query that might have a bad performance or we write a SQL script to migrate existing serialized data into a new column. A simple task might become troublesome with this storage model.

A full object must always be sent/received from the application

With this storage model, the easiest way to return the data to the client is to return full xml. There are downsides of this approach. If we don’t need all properties, we end up with worse performance. Moreover, we expose all the properties to the client even though we might not want to do this. It’s also an issue the other way around. The easiest way to update is that an application sends a full object to a database.

If we need to avoid these downsides there is an option to select or update required data by XPath queries. This requires additional, not always straightforward work tough.

Data duplication

We might decide to store some properties in a separate column (for example to enable faster database search queries). To avoid duplication, we should get rid of this property from serialized xml. This, however, requires additional effort and might not always be done. We end up with storing the same data in 2 places: specific column + xml. The first disadvantage is that we have an unnecessary database space allocated. Second is the need for keeping these 2 values in sync during updates. It’s easy to imagine someone making a data fix and updating only 1 place. We’d end up having inconsistent data.

Summary

As you can see there are many disadvantages by storing a serialized object in the relational database. Especially when application code constantly changes. Therefore, choosing it as a solution must be considered very carefully. Future requirements (that we don’t know yet) might be hard to achieve if we’ve some production data serialized in a database.

It might be a valuable option tough for data storage tables when the client only reads the content and performance is not of great importance (like reporting).

We should choose other solutions for such storage models. Some of them are JSON documents storage (available now in MSSQL) or NoSql databases. These give us better flexibility and performance than the model that we discussed.