Beginner's Guide to RavenDB

December 2011

Overview

RavenDB fits into a movement that is called "NoSQL" - which, of course, implies that SQL is not used to retrieve data from the data store. But it also implies that a schema is not defined before data is written to the store. After all, if we defined a schema why wouldn't we use SQL to query the data. This lack of SQL or schema may seem a bit odd. The idea is to make querying very simple, fast and scalable.

This article will serve as an introduction to the NoSQL way of thinking in the specific context of RavenDB. It may really help you to read up on NoSQL and all the reasons for its existence before trying to understand RavenDB itself. On the other hand, if you're a .NET programmer it may help you to see, and play with, examples that build on your expertise. RavenDB was specifically built to provide a nice NoSQL data store for .NET programmers. This limits its user-base but increases its utility there.

About ten years ago programmers heard quite a bit of buzz about "Object Databases". These never took off in any major way. But they're back in technologies like RavenDB where everything is stored as an object.

Installation

Installation of RavenDB is very simple. Head on over to ravendb.net/download and grab either the stable or unstable compiled binaries. These comes as a zip which you can unpack into a "RavenDB" folder in "Program Files". After that I'd recommend opening a command-line prompt and starting the server by executing /Server/Raven.Server.exe. The server logs to the command-line so you'll be able to see what's going on as you do things like add indexes. The "raven db management studio" Silverlight UI can then be accessed by browsing to localhost:8080/.

Storing Your Data

As you get used to thinking in the NoSQL a few things will become clear:

  1. Denormalization is acceptable and normal.
  2. Joins are typically not desirable.
  3. Data does not need to be flat.

#1 and #2 are corollaries. If you have data in an RDBMS that you are thinking about putting into RavenDB think about how you would denormalize it to make it very easy to query later. Disk is cheap so we'll store data redundantly so that we can read and write it fast.

#3 will become clear when you understand that RavenDB is a document database. Think of a document as a .NET class. A .NET class can have any number of properties including arrays of other objects. A class instance with all of its data can be stored and retrieved as one entry in the database. An RDBMS stores records; RavenDB stores object instances.

RavenDB has the following organizational concepts in terms of the data it stores: Documents, Collections and Indexes. That's it! A Collection is a series of Documents. Indexes index and organize Collections.

Interfacing with RavenDB

.NET programmers working with RavenDB exlusively interface with RavenDB via objects. When we store something we store an object. When we query the store we get back objects. There's no layer in between that the programmer needs to translate to/from in order to access the store. Let's look at some code to clarify.

public class Article
{
    public string Id { get; set; }
    public string Text { get; set; }
    public string Title { get; set; }
}

static void Main()
{
    Article ravenIntro = new Article()
      { Title = "RavenDB Introduction", 
        Text = "RavenDB fits into a movement called ..." };

    using (IDocumentStore documentStore = new DocumentStore() 
    { Url = "http://localhost:8080" })
    {
        documentStore.Initialize();
        using (IDocumentSession session = documentStore.OpenSession())
        {
            session.Store(ravenIntro);
            session.SaveChanges();
        }        
    }
}

We have a class called Article that has a few, simple properties. Without doing anything in RavenDB we can create an instance of it and ask RavenDB to store it. We talk to RavenDB inside the scope of a session. Sessions are intended to be short-lived, open-and-close affairs. In fact, if you try to perform more than 30 actions against the data store inside a single session you will get an exception that politely asks you not to do that. This is intended to force good programming practice.

The important part here is that we store the object directly. We don't have to restructure the data in any way to get RavenDB to understand it. If this is the first object of type Article that RavenDB has to store it will start a new Collection. The next time RavenDB is asked to store an object of type Article it will know to add it to the existing collection.

The format RavenDB uses to store the object is JSON. This provides a compact and easily-readable format. As stated, the NoSQL movement prides itself on being able to store "unstructured" (schema-less) data. RavenDB is technically unstructured but if you access RavenDB in the usual way (via .NET code and not by directly inputing JSON) you essentially have a schema or data contract - your class definitions. It certainly is possible that for you to change a class definition and store object instances that have different properties inside the same collection.

Retrieving data is equally simple:

using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
    documentStore.Initialize();
    using (IDocumentSession session = documentStore.OpenSession())
    {
        Article ravenIntro = (from a in session.Query<Article>() 
          where a.Title == "RavenDB Introduction" 
          select a).First<Article>();
    }        
}

Once again we retrieve the object directly - no need to access RavenDB and then translate the data received into our objects.

Id Generation

RavenDB supports the concepts of unique identifier generation and retrieving documents based on a single, unique identifier. ID generation is simple. When we store an object RavenDB generates a unique identifier and stores it with the object. The identifier is a string formatted as follows: "collectionName/number". For example "articles/123". If we want access to the identifier for an object we can add a property named "Id" to our class. Note that this should be a property with get and set accessors - not a field. Notice that our Article class had an "Id" property. We can see what identifier RavenDB has generated after we call SaveChanges. For example:

session.Store(ravenIntro);
session.SaveChanges();
string id = ravenIntro.Id;

We can also fetch objects by their identifier:

Article ravenIntro = session.Load<Article>("articles/123")

Querying Basics

Notice the use of LINQ in our previous example. LINQ is an essential part of RavenDB and really makes it easy for .NET programmers to query the data store. In fact, LINQ is not only used to create queries in code, it is the syntax used to make indexes inside of RavenDB itself. If you're familiar with LINQ and can write LINQ to filter, group and order data you're going to be happy. Note that we can use the language-integrated LINQ sytnax (as per the last example) or the LINQ IEnumerable extensions, like so:

using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
    documentStore.Initialize();
    using (IDocumentSession session = documentStore.OpenSession())
    {
        Article ravenIntro = session.Query<Article>()
            .Where(a => a.Title == "RavenDB Introduction")
            .First<Article>();
    }        
}

The place to start with "ad-hoc" queries in code is with the IDocumentSession.Query<T> method (as in the last example). The type T that is specified tells RavenDB which Collection we're querying. After that we're free to use LINQ to filter and order the results from the Collection. Note that grouping is not allowed in session queries. If you try you will get an exception that simply reads: "Method not supported: GroupBy". Grouping is certainly possible but it must be declared "up front" on the database side. This is done via indexing - our next topic.

Indexing Basics

In our last example we were querying the Article collection on Title. If we perform this query often enough RavenDB will automatically create a temporary index for us on Article.Title. This is handy and requires no work on our part. We can also define indexes manually. This is typically done in the "raven db management studio" UI running on localhost:8080/. Click on Indexes, New Index. Sepcificy a name for the index and then enter a query for the index in the "map" field. For example, to index the articles by title:

from article in docs.Articles select new { article.Title }

The general idea is that you create a LINQ query that returns the data elements that should be indexed. RavenDB then does the rest. Note that there are other, optional fields for the index: "reduce" and "transform results". Reduce froms the second half of the ubiquitous NoSQL Map/Reduce paradigm. You can think of Map/Reduce as the NoSQL replacement for rowset selection via SQL. Think of Map as simply the selector for all the data that your query cares about. The Reduce query is then performed on the Map data to group or summarize it. When you need to use a LINQ "group x into" statement you will typically put it in the Reduce part of an index. Note that Reduce is not required and can be left blank.

Note: As of build 531 when you save an index the UI isn't very helpful when an error occurs. You can see more information about the error in the command-line shell that you're running the server in.

All LINQ queries for indexes must return anonymous types. For instance we can't have the Map statement return the objects directly like this:

from article in docs.Articles select article

If we try this we'll get a "Variable initializer must be a select query expression returning an anonymous object" exception. Always use "select new { ... }".

Indexes can also be created directly from code. This is typically done by creating a class that inherits from AbstractIndexCreationTask. For example:

public class ArticleTitleIndex : AbstractIndexCreationTask<Article>
{
    public ArticleTitleIndex()
    {
        Map = articles => from article in articles select new { article.Title };
    }
}

If we go to the RavenDB UI we'll see an "ArticleTitleIndex" index has been created with the following Map:

docs.Articles
    .Select(article => new {Title = article.Title})

It's worth noting that we can use language-integrated LINQ syntax or the LINQ IEnumerable extension methods in indexes as well. This is important because not all extension methods, for example, Any() and All(), have a corresponding language-integrated keyword. Anyway, we can now query Articles based on Title and expect results very quickly.

RavenDB and Lucene

Now that we know about indexing and querying we should understand that underneath the covers RavenDB is translating LINQ "where" clauses into Lucene queries. Lucene is a mature and powerful text indexing system. RavenDB uses of Lucene to do all the heavy lifting in regards to range-based, partial string and full text searches. When querying indexes via the UI you'll have to use Lucene query syntax (which I won't explain here). We can also tap into Lucene's search capabilities in code via the LuceneQuery method:

var articles = from a in session.Advanced.LuceneQuery<Article>("ArticleTitleIndex")
  .Where("Title:\"RavenDB Introduction\"") select a;

Querying with GroupBy

As mentioned previously, we can't create ad-hoc "GROUP BY" queries in LINQ. To satisfy that querying need we need to define indexes that have a Reduce query. We're allowed to use the GroupBy method (or LINQ "group x into g") in the aptly-named-for-this-purpose Reduce phase. Let's start with an example. We're going to add a CategoryName property to our Article class:

public class Article
{
    public string Id { get; set; }
    public string CategoryName { get; set; }
    public string Text { get; set; }
    public string Title { get; set; }
}

Next we'll define an index that finds unique category names (like SQL DISTINCT). This highlights the dual role of indexes. The first role is to index data to increase query performance. The second is to transform the data into something new - what is called a Projection in RavenDB. When we use Reduce (or TransformResults, discussed later) we may not return objects with the same type as our stored in the Collection we're querying. This makes sense for our DISTINCT query - we're going to return category names, not Article objects. Let's look at the index we would create for this:

Map = from article in docs.Articles
      select new { CategoryName = article.CategoryName }
Reduce = from result in results group result by result.CategoryName into g 
         select new { CategoryName = g.Key }

What is returned by this index is not an Article object. It's an object that has one property: "Name". Let's suppose we defined a class called Category as follows:

public class Category
{
    public string Name { get; set; }
}

We could now write code to query the index (named "ArticleCategories") as follows:

var categories = from c in session.Query<Category>("ArticleCategories") select c;

Notice we used T=Category for Query<T>. Raven would have happily populated any object with a Name property. We didn't have to do any special to make RavenDB aware of the definition of our Category class (schema-less operation). We also passed a string to the Query method, this is the name of the index.

The last basic concept of querying is that of TransformResults. Like Reduce this is place where we can do grouping and summarization. There are two things that we can do in TransformResults that we can't do in Reduce:

  1. Reference documents in a different collection than Map is querying on.
  2. Return results in a different format than was returned by Map.

#1 means that you can effectively do a JOIN to other data. As stated, in NoSQL land we try not to organize the data such that we have to do joins. But sometimes this just isn't reasonable. To do a join we typically rely on the unique identifier ("Id" property) that was generated for our objects. We'll show how that's done in the Common Examples section.

Common Query Examples

Text Contains

aka partial match
aka wildcard search

The important thing to know about doing more advanced text matching/searches is that RavenDB uses Lucene behind the scenes. Lucene is powerful and has many options for indexing text. Some of this complexity gets exposed through RavenDB so that you can choose the right option. For text matching that will handle most of the text matching that you're used to you can use Lucene's "StandardAnalyzer". This is easily done when you create an index. And yes, in order to do partial text matching (CONTAINS) you have to make an index.

Class Definitions

public class Article
{
    public string Id { get; set; }
    public string Text { get; set; }
    public string Title { get; set; }
}

Data

Article ravenIntroArticle = new Article()
{
    Text = "RavenDB fits into a movement that is called ...", 
    Title = "RavenDB Introduction",
};

Article csharpUsingArticle = new Article()
{
    Text = "The full value of the C# using statement ...",
    Title = "Your Friend the C# Using Statement",
};

Article nutsAndProteinArticle = new Article()
{
    Text = "Nuts are a great source of protein ...",
    Title = "Nuts and Protein",
};

using (IDocumentSession session = documentStore.OpenSession())
{
    session.Store(ravenIntroArticle);
    session.Store(csharpUsingArticle);
    session.Store(nutsAndProteinArticle);
    session.SaveChanges();
}

Index

We create an index in the usual way:

from article in docs.Articles select new { article.Text }

Now we add a field specification to the index. Click "Add Field" in the UI. Specify "Text" as the Field and set Indexing to "Analyzed". This will cause RavenDB to use Lucene's StandardAnalyzer.

Query

Now we can do partial text queries on the index:

var articles = from a in session.Advanced.LuceneQuery<Article>("ArticleTitle")
                          .Where("Title:\"*Protein\"") select a;

Querying and Indexing Sub/Child Objects

Overview

Creating indexes on child object properties is fairly simple and only requires a Map query.

Class Definitions

public class Article
{
    public string Id { get; set; }
    public Tag[] Tags { get; set; }
    public string Text { get; set; }
    public string Title { get; set; }
}
public class Tag
{
    public string Name { get; set; }
    public int Count { get; set; }
}

Data

using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
    documentStore.Initialize();

    Article ravenIntroArticle = new Article()
    {
        Text = "RavenDB fits into a movement that is called ...",
        Title = "RavenDB Introduction"
    };

    Article csharpUsingArticle = new Article()
    {
        Text = "The full value of the C# using statement ...",
        Title = "Your Friend the C# Using Statement"
    };

    Article nutsAndProteinArticle = new Article()
    {
        Text = "Nuts are a great source of protein ...",
        Title = "Nuts and Protein"
    };

    Tag documentsTag = new Tag() { Name = "documents database" };
    Tag dotNetTag = new Tag() { Name = ".NET" };
    Tag noSqlTag = new Tag() { Name = "NoSQL" };
    Tag csharpTag = new Tag() { Name = "C#" };
    Tag nutsTag = new Tag() { Name = "nuts" };

    ravenIntroArticle.Tags = new Tag[] { documentsTag, dotNetTag, csharpTag };
    csharpUsingArticle.Tags = new Tag[] { dotNetTag, csharpTag };
    nutsAndProteinArticle.Tags = new Tag[] { nutsTag };

    using (IDocumentSession session = documentStore.OpenSession())
    {
        session.Store(ravenIntroArticle);
        session.Store(csharpUsingArticle);
        session.Store(nutsAndProteinArticle);
        session.SaveChanges();
    }
}

Each article has a number of Tag subobjects.

Index

The index is not necessary for the query to run but is a good idea if the data is large.

Map = from article in docs.Articles
      from tag in article.Tags
      select new { tag.Name }

Query

var articles = from article in session.Query
() where article.Tags.Any(t => t.Name == ".NET") select article;

Grouping Flat Data Into Hierarchical Structure

aka GROUP BY
aka grouping with children

Overview

If you're storing data denormalized sometimes you may want to query it so that it is organized hierarchically. We'll write a query to take "flat" data and organize into parents and children.

Class Definitions

public class Article
{
    public string CategoryName { get; set; }
    public string Id { get; set; }
    public string Text { get; set; }
    public string Title { get; set; }
}
public class Category
{
    public string Name { get; set; }
    public Article[] Articles { get; set; }
}

Data

using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
    documentStore.Initialize();

    Article ravenIntroArticle = new Article()
    {
        CategoryName = "Technology",
        Text = "RavenDB fits into a movement that is called ...",
        Title = "RavenDB Introduction"
    };

    Article csharpUsingArticle = new Article()
    {
        CategoryName = "Technology",
        Text = "The full value of the C# using statement ...",
        Title = "Your Friend the C# Using Statement"
    };

    Article nutsAndProteinArticle = new Article()
    {
        CategoryName = "Health",
        Text = "Nuts are a great source of protein ...",
        Title = "Nuts and Protein"
    };

    using (IDocumentSession session = documentStore.OpenSession())
    {
        session.Store(ravenIntroArticle);
        session.Store(csharpUsingArticle);
        session.Store(nutsAndProteinArticle);
        session.SaveChanges();
    }
}

Index

We must define an index so that we can use the Reduce query to group the data.

Map = from article in docs.Articles
      select new
      { Name = article.CategoryName,
        Articles = article }
Reduce = from result in results
         group result by new
         { Name = result.Name }
         into g select new
         { Name = g.Key.Name,
           Articles = from a in g select a.Articles }

Query

var categories = from category in session.Query<Category>("CategoriesWithArticles")
                 select category;

With the data in this example this query will return two Category objects. One will have Name=Technology and will have two Articles, the other will have Name=Health and will one Article.

JOIN

aka SQL join
aka left outer join

Overview

While joining is not an ideal scenario in RavenDB (or NoSQL in general) it can be done. We'll contrive an example to demonstrate by adding a collection of Tags to the database. In NoSQL land we would likely just add an array of Tag objects to the Article class directly. In RDBMS land we would normalize the Tags into a separate table and just store the tag IDs in Article. For the purposes of this example we'll do things the RDBMS way but hopefully you can appreciate that we're forcing a join scenario when we probably shouldn't.

Class Definitions

public class Article
{
    public string Id { get; set; }
    public string[] TagIds { get; set; }
    public string Text { get; set; }
    public string Title { get; set; }
}
public class ArticleWithTagNames
{
    public string[] TagNames { get; set; }
    public string Text { get; set; }
    public string Title { get; set; }
}
public class Tag
{
    public string Id { get; set; }
    public string Name { get; set; }
}

The ArticleWithTagNames class is necessary to handle the Projection that will be returned from the following index. This might make it clearer why we try to store all our data together in a NoSQL database. Joins result in various things that aren't optimal.

Data

Tag documentsDatabaseTag = new Tag() { Name = "documents database" };
Tag dotNetTag = new Tag() { Name = ".NET" };
Tag noSqlTag = new Tag() { Name = "NoSQL" };
Tag csharpTag = new Tag() { Name = "C#" };
Tag nutsTag = new Tag() { Name = "nuts" };

using (IDocumentSession session = documentStore.OpenSession())
{
    session.Store(documentsDatabaseTag);
    session.Store(dotNetTag);
    session.Store(noSqlTag);
    session.Store(csharpTag);
    session.Store(nutsTag);

    session.SaveChanges();
}

Article ravenIntroArticle = new Article()
{
    Text = "RavenDB fits into a movement that is called ...", 
    Title = "RavenDB Introduction",
    TagIds = new string[] {
        documentsDatabaseTag.Id, 
        dotNetTag.Id,
        noSqlTag.Id }
};

Article csharpUsingArticle = new Article()
{
    Text = "The full value of the C# using statement ...",
    Title = "Your Friend the C# Using Statement",
    TagIds = new string[] {
        dotNetTag.Id,
        csharpTag.Id }
};

Article nutsAndProteinArticle = new Article()
{
    Text = "Nuts are a great source of protein ...",
    Title = "Nuts and Protein",
    TagIds = new string[] {
        nutsTag.Id }
};

using (IDocumentSession session = documentStore.OpenSession())
{
    session.Store(ravenIntroArticle);
    session.Store(csharpUsingArticle);
    session.Store(nutsAndProteinArticle);
    session.SaveChanges();
}

Index

In order to join to other collections we need to implement an index and specify the TransformResults query.

public class ArticlesWithTagNames : AbstractIndexCreationTask<Article>
{
    public ArticlesWithTagNames()
    {
        Map = articles => 
            from article in articles 
            select new { article.Title, article.Text, article.TagIds };
        TransformResults = (database, articles) =>
            from article in articles
            let tags = database.Load<Tag>(article.TagIds)
            select new { Text = article.Text, Title = article.Title, 
                         TagNames = from t in tags select t.Name };
    }
}

This relies on database.Load being overloaded to accept either one ID or an array of IDs. Note that this is doing a LEFT OUTER JOIN. If the tag ID references a Tag that doesn't exist a NULL value is returned for t.Name.

Query

var articles = from a in 
               session.Query<ArticleWithTagNames>("ArticlesWithTagNames") select a;
Comments Add a Comment Name: Comment: Enter the text in the image below:
w3 enterprises