RavenDB: Documents, Nothing But Documents
Last time we’ve made the first steps with RavenDB. I’ve already mentioned that RavenDB is a document database (see also Wikipedia). However in our examples we just stored objects. So where are the documents?
Object to JSON Document
What does happen when we pass an object to the session’s store-method? Well under the cover the RavenDB client library serializes that object to a JSON document. So your object graph is converted into a document.
That has a few implications. A document is a hierarchical data structure. Therefore your object should be hierarchically structured and shouldn’t contain any circular references. If you try to store an object with circular references you will get an exception.
When you pass in an object which references other objects then all referenced objects are serialized to a single document. What does that mean for us? Well it means that the child-objects are not separate ‘units-of-storage’ in the database. You can only access the document and then peek into it. Let me demonstrate this with a simple example. We store persons with the city their living in. For this we create a person and a city class. Then we store a person with its city. Now we can easily query for persons, but not the city. The reason is that the city was embedded in to person document and therefore there’s no city document.
Document-Design
We’ve seen that RavenDB stores documents. Now the question is how we split our domain model across different documents. The general rule of thumb is to split your documents in such a way up that they fit together with the operations. By that I mean that operations in your application don’t have to meddle with hundreds of documents. Instead a document should contain all the data which are needed for the most common operations.
Let’s look at an example, an oversimplified online shop. We store costumers, orders and order-items:
Now how do we split that up? Everything in one document? Each entity in its own document? Well let’s think about it what operations we do in a shop and which entities we manipulate in those operations:
- Registering customer –> customer entity
- Changing customer data –> customer entity
- New ‘shopping’-tour –> order entity
- Adding and removing item –> order and order-item entity
- Showing shopping list –> order and order-item entity
- Send the order / finish shopping –> order and order-item entity
After taking a look at the operation it looks like that either the customer entity is touched or the order together with order-item entity. Therefore I suggest storing the customer in a document and the order with its order entities in a document.
Referencing Documents
We’ve decided to split up our entities in multiple documents. But how do we reference documents to each other? Well that’s done by storing the id of the referenced document. Each document has an id by which it can be referenced. When your entity has a property ‘Id’ RavenDB will by convention put the id there. That way we get the id of a document and use it for references. Like in this example, where the order references the customer by its id. (Here are the entity-classes for the example)
Later on we can load the referenced documents by id. However we should remind our self that when possible operations should be able to more or less operate on one document. If we load hundreds of documents by reference we are doing something wrong or our problem is a bad fit for a document database.
Batch-Loading Referenced Documents
Of course in reality there will be places where we need to load referenced documents. The code above creates an additional round trip to the database to load the referenced document. Network round trips are costly, that’s why we might want to get referenced documents in one go. We can do this by explicitly telling RavenDB to include referenced documents. Like this:
Don’t fear the Demoralization
As you might already noticed documents are not normalized! We pack things together in a document and some data are redundantly stored. For example when we store blog-posts we certainly embed the tags in the same document. Above we’ve taken a look at references. Often we’ve the issue that we need only a few things from a referenced document. For example in our web shop we want to show the user name for current order we are piling together. Instead of loading the referenced costumer document every time we just could store this information redundantly in our order document.
For example we can create a class which holds the id and a name. This class is used to represent a reference to another document, but also copies the name of that document. That way we don’t need to do any document lookup as long as we only need the name:
Conclusion & Next Time
This time we’ve looked at documents and how we can split up data in different documents. Next time we will look at RavenDB’s queries and indexes because they behave quite differently than in most databases.
- First Steps with RavenDB
- Gamlor Mixtape Begin of July 2011 Video
Great series. I’m looking forward to the next article! I’m also curious what Ayende thinks 😉
Thanks. No idea what Ayende is thinking. Maybe he’s secretly planning the next awesome thing =D
Great series so far. You did a really nice job explaining the denormalized referneces.
Ayende is at least retweeting your posts.