RavenDB: Documents, Nothing But Documents

Last time we’ve made the first steps with RavenDB. I’ve already mentioned that RavenDB is a document database (see also Wikipedia). However in our examples we just stored objects. So where are the documents?

Object to JSON Document

What does happen when we pass an object to the session’s store-method? Well under the cover the RavenDB client library serializes that object to a JSON document. So your object graph is converted into a document.

Documents, Nothing But Documents

Documents, Nothing But Documents

That has a few implications. A document is a hierarchical data structure. Therefore your object should be hierarchically structured and shouldn’t contain any circular references. If you try to store an object with circular references you will get an exception.

When you pass in an object which references other objects then all referenced objects are serialized to a single document. What does that mean for us? Well it means that the child-objects are not separate ‘units-of-storage’ in the database. You can only access the document and then peek into it. Let me demonstrate this with a simple example. We store persons with the city their living in. For this we create a person and a city class. Then we store a person with its city. Now we can easily query for persons, but not the city. The reason is that the city was embedded in to person document and therefore there’s no city document.

using (var session = documentStore.OpenSession())
{
        // We store a person and its city
	session.Store(new Person("Gamlor",new City("Vals")));
	session.SaveChanges();
}
using (var session = documentStore.OpenSession())
{
        // We can get the person, because it's in a person document
	var hasPeopleInDB = session.Query<Person>().Any();
	Console.Out.WriteLine("Do we have people-documents in the db? {0}",
		hasPeopleInDB ? "Yes, we do" : "No, we dont");
        // However the city isn't in its own document
	var hasCitiesInDB = session.Query<City>().Any();
	Console.Out.WriteLine("Do we have city-documents in the db? {0}",
		hasCitiesInDB ? "Yes, we do" : "No, we dont");
}

Document-Design

We’ve seen that RavenDB stores documents. Now the question is how we split our domain model across different documents. The general rule of thumb is to split your documents in such a way up that they fit together with the operations. By that I mean that operations in your application don’t have to meddle with hundreds of documents. Instead a document should contain all the data which are needed for the most common operations.

Document Design

Document Design

Let’s look at an example, an oversimplified online shop. We store costumers, orders and order-items:

Our Entity Model

Our Entity Model

Now how do we split that up? Everything in one document? Each entity in its own document? Well let’s think about it what operations we do in a shop and which entities we manipulate in those operations:

  • Registering customer –> customer entity
  • Changing customer data –> customer entity
  • New ‘shopping’-tour –> order entity
  • Adding and removing item –> order and order-item entity
  • Showing shopping list –> order and order-item entity
  • Send the order / finish shopping –> order and order-item entity

After taking a look at the operation it looks like that either the customer entity is touched or the order together with order-item entity. Therefore I suggest storing the customer in a document and the order with its order entities in a document.

Referencing Documents

We’ve decided to split up our entities in multiple documents. But how do we reference documents to each other? Well that’s done by storing the id of the referenced document. Each document has an id by which it can be referenced. When your entity has a property ‘Id’ RavenDB will by convention put the id there. That way we get the id of a document and use it for references. Like in this example, where the order references the customer by its id. (Here are the entity-classes for the example)

var customer = new Customer("Gamlor");
session.Store(customer);

// After storing we have a valid id
var firstOrder = new Order()
			{
				CustomerId = costumer.Id
			};
firstOrder.AddToOrder(new OrderItem("Magic Unicorn"));
session.Store(firstOrder);

session.SaveChanges();

Later on we can load the referenced documents by id. However we should remind our self that when possible operations should be able to more or less operate on one document. If we load hundreds of documents by reference we are doing something wrong or our problem is a bad fit for a document database.

var order = session.Query<Order>().First();
var costumer = session.Load<Customer>(order.CustomerId);

References to Other Documents

References to Other Documents

Batch-Loading Referenced Documents

Of course in reality there will be places where we need to load referenced documents. The code above creates an additional round trip to the database to load the referenced document. Network round trips are costly, that’s why we might want to get referenced documents in one go. We can do this by explicitly telling RavenDB to include referenced documents. Like this:

var order = session.Query<Order>()
   .Customize(x => x.Include<Order>(o=>o.CustomerId)) // Load also the costumer
   .First();
var customer = session.Load<Customer>(order.CustomerId);

Batch Load Documents

Batch Load Documents

Don’t fear the Demoralization

As you might already noticed documents are not normalized! We pack things together in a document and some data are redundantly stored. For example when we store blog-posts we certainly embed the tags in the same document. Above we’ve taken a look at references. Often we’ve the issue that we need only a few things from a referenced document. For example in our web shop we want to show the user name for current order we are piling together. Instead of loading the referenced costumer document every time we just could store this information redundantly in our order document.

Denormalisation by Copying Information

Denormalisation by Copying Information

For example we can create a class which holds the id and a name. This class is used to represent a reference to another document, but also copies the name of that document. That way we don’t need to do any document lookup as long as we only need the name:

public interface INamedObject
{
	string Id { get; set; }
	string Name { get; set; }
}

internal class Customer : INamedObject
{
	public Customer(string name)
	{
		Name = name;
	}
	public string Id { get; set; }
	public string Name { get; set; }
	public string Address { get; set; }
}

internal class Order
{
	public Order()
	{
		Items = new List<OrderItem>();
	}
	public string Id { get; set; }
	public DenormalizedReference<Customer> CustomerReference { get; set; }
	public IList<OrderItem> Items { get; private set; }
}
// Denormalized reference, which stores the name of the named object.
public class DenormalizedReference<T> where T : INamedObject
{
	public string Id { get; set; }
	public string Name { get; set; }

	public static implicit operator DenormalizedReference<T>(T doc)
	{
		return new DenormalizedReference<T>
				   {
					   Id = doc.Id,
					   Name = doc.Name
				   };
	}
}

Conclusion & Next Time

This time we’ve looked at documents and how we can split up data in different documents. Next time we will look at RavenDB’s queries and indexes because they behave quite differently than in most databases.

Tagged on: ,

3 thoughts on “RavenDB: Documents, Nothing But Documents

  1. Peter

    Great series so far. You did a really nice job explaining the denormalized referneces.

    Ayende is at least retweeting your posts.