RavenDB: Queries and Indexes
So far we’ve covered the very basics and the document database nature of RavenDB. This time we take a look at the queries and indexes. I’ve already mentioned that RavenDB behaves different than most other databases in this matter.
Queries Go Hand in Hand with Indexes
On what do we define an index in most databases? In most databases we put our index on the stored piece of data. For example we put it on a column in a relational database or on a field in document and object databases. Then a query may utilizes an index to speed up the processing. In most real applications we need an index, otherwise the query is just too slow.
In RavenDB indexing works differently. RavenDB indexes the query instead of the fields in documents. Basically RavenDB takes our query, analyses it and extracts an index which can answer the query. Such indexes are called ‘dynamic indexes’. Every query we run builds or reuses a dynamic index. RavenDB tries to be smart about the indexes it builds. When a query is executed over and over again the index will become a permanent index. Less regular used queries only build a temporary index. All put together in RavenDB queries are just a way to create a new ‘dynamic’ index. Because of this behavior RavenDB nearly always talks about indexes and not queries. Also in this blog series I now will refer to ‘indexes’ most of the time.
When Is Stuff Indexed?
Another difference is when RavenDB indexes a document. Most databases index data when it is inserted or updated. That also implies that indexes slow down the update and insert process.
RavenDB indexes the document in the background. When we store or update a document RavenDB puts it in a queue. Then a background tasks picks it up and updates all the existing indexes. However this implies also that an index (and remember, a query always uses an index) might return a stale result. When we store a document and then use an index the document is maybe not in there yet.
Deal With the Stale Results
Now we have to deal with the potential stale results on an index. First, we actually can force RavenDB to return an accurate result. That means we wait until our last write has been indexed and then get the results. This can be achieved in two ways. We can turn it on for a particular query with the ‘WaitForNonStaleResultsAsOfLastWrite’ option. There are additional options available, which you can explore yourself.
Alternatively we can change the default consistency level for the session or even for document store.
Choose the Right Consistency
Well now you probably think: ‘Well I just use a highest consistency level possible for all my queries/indexes’. Of course then we also pay the much higher costs of waiting for the index to be updated. It’s much better to think about which parts of our application can deal with stale results and which parts not.
Let’s look at an example. We build a simple blog / news site. Now let’s think about the consistency here. On the public website stale results shouldn’t be an issue. Why? Because you as a visitor can’t tell the difference between the ‘super-latest’ articles and the one published a few seconds ago. When an article shows up a few seconds later it makes no difference to you. Of course we also have an ‘administration’ backend. For the website administrator which is editing articles the story is different. He has just edited an article and wants to see his changes immediately. If he would encounter a stale content he would probably think that his changes are lost. What’s the conclusion? Well we use the ‘QueryYourWrites’ consistency for administration-backend, while we allow stale results on the public website. That makes also a lot of sense performance wise. Most traffic will be from visitors and there we don’t spend any time ensuring that there’s not stale result. Only for the few website administrators we spend a little more time getting the most recent stuff.
Permanent Indexes
Now so far we only used dynamic indexes, which are created when we are running a regular query. RavenDB also allows creating permanent and named indexes. Such an index is stored and maintained on the server until it is explicitly removed. We can create these permanent queries, give them a name and later on use them directly. First we create an index definition in a class like this:
After that we tell RavenDB that there are index definitions in our assembly. RavenDB will include all our defined indexes of that assembly:
The index definition can contain all kinds of details. It can be a simple query like above or a complex map-reduce operation. I’m not going into details here due to lack of expertise and to keep the blog post short =). After we’ve created such an index we can use it:
Permanent indexes are certainly useful for very advanced index tuning. Or when we want to define some ‘view’ which returns certain data of our database. By the way when a dynamic index is used enough times RavenDB will promote it to a permanent index which we can treat like any other permanent index.
Conclusion
Now we’ve covered how queries and indexes work in RavenDB. Most of the time we just can query RavenDB like any other database, but we never have to fear that a query runs slow because of a missing index. On the other hand we need to be aware of stale results. Next time we look at another important design detail of RavenDB, which protects the database from killing itself due to programming mistakes. Stay tuned =).
- Doctor Who First Half of Series 6 *Spoilers*
- IRIS
Indexes can be defined in code (LINQ) using the AbstractIndexCreationTask
See here http://ayende.com/blog/4668/ravendb-defining-indexes
Good set of articles btw.
Thanks for the tip. Changed the index definition to use this API.
Great series! Really like the simplicity in your articles, they are nicely structured and right to the point and not overloaded by a lot of text. Also the pictures makes it very fun to read.
(maybe you could write tutorials for the official site, Ayende?)
Keep up the good work!
Thanks =).