Overview
RavenDB fits into a movement that is called "NoSQL" - which, of course, implies that SQL is not used to retrieve data from the data store. But it also implies
that a schema is not defined before data is written to the store. After all, if we defined a schema why wouldn't we use SQL to query the data.
This lack of SQL or schema may seem a bit odd. The idea is to make querying very simple, fast and scalable.
This article will serve as an introduction to the NoSQL way of thinking in the specific context of RavenDB. It may really help you to read up on NoSQL and all
the reasons for its existence before trying to understand RavenDB itself. On the other hand, if you're a .NET programmer it may help you to see, and play with, examples
that build on your expertise. RavenDB was specifically built to provide a nice NoSQL data store for .NET programmers. This limits its user-base but increases its
utility there.
About ten years ago programmers heard quite a bit of buzz about "Object Databases". These never took off in any major way. But they're back in technologies
like RavenDB where everything is stored as an object.
Installation
Installation of RavenDB is very simple. Head on over to ravendb.net/download and grab either
the stable or unstable compiled binaries. These comes as a zip which you can unpack into a "RavenDB" folder in "Program Files". After that I'd recommend opening
a command-line prompt and starting the server by executing /Server/Raven.Server.exe. The server logs to the command-line so you'll be able to see what's going on
as you do things like add indexes. The "raven db management studio" Silverlight UI can then be accessed by browsing to localhost:8080/.
Storing Your Data
As you get used to thinking in the NoSQL a few things will become clear:
- Denormalization is acceptable and normal.
- Joins are typically not desirable.
- Data does not need to be flat.
#1 and #2 are corollaries. If you have data in an RDBMS that you are thinking about putting into RavenDB think about how you would denormalize it to make it
very easy to query later. Disk is cheap so we'll store data redundantly so that we can read and write it fast.
#3 will become clear when you understand that RavenDB is a document database. Think of a document as a .NET class. A .NET class can have any number of properties
including arrays of other objects. A class instance with all of its data can be stored and retrieved as one entry in the database. An RDBMS stores records; RavenDB
stores object instances.
RavenDB has the following organizational concepts in terms of the data it stores: Documents, Collections and Indexes. That's it! A Collection is a series of Documents. Indexes
index and organize Collections.
Interfacing with RavenDB
.NET programmers working with RavenDB exlusively interface with RavenDB via objects. When we store something we store an object. When we query the store we get
back objects. There's no layer in between that the programmer needs to translate to/from in order to access the store. Let's look at some code to clarify.
public class Article
{
public string Id { get; set; }
public string Text { get; set; }
public string Title { get; set; }
}
static void Main()
{
Article ravenIntro = new Article()
{ Title = "RavenDB Introduction",
Text = "RavenDB fits into a movement called ..." };
using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
documentStore.Initialize();
using (IDocumentSession session = documentStore.OpenSession())
{
session.Store(ravenIntro);
session.SaveChanges();
}
}
}
We have a class called Article that has a few, simple properties. Without doing anything in RavenDB we can create an instance of it and ask RavenDB to store it.
We talk to RavenDB inside the scope of a session. Sessions are intended to be short-lived, open-and-close affairs. In fact, if you try to perform more than 30 actions
against the data store inside a single session you will get an exception that politely asks you not to do that. This is intended to force good programming practice.
The important part here is that we store the object directly. We don't have to restructure the data in any way to get RavenDB to understand it. If this is the first
object of type Article that RavenDB has to store it will start a new Collection. The next time RavenDB is asked to store an object of type Article it will know to add it
to the existing collection.
The format RavenDB uses to store the object is JSON. This provides a compact and easily-readable format. As stated, the NoSQL movement prides itself on being able to
store "unstructured" (schema-less) data. RavenDB is technically unstructured but if you access RavenDB in the usual way (via .NET code and not by directly inputing JSON)
you essentially have a schema or data contract - your class definitions. It certainly is possible that for you to change a class definition and store object instances that
have different properties inside the same collection.
Retrieving data is equally simple:
using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
documentStore.Initialize();
using (IDocumentSession session = documentStore.OpenSession())
{
Article ravenIntro = (from a in session.Query<Article>()
where a.Title == "RavenDB Introduction"
select a).First<Article>();
}
}
Once again we retrieve the object directly - no need to access RavenDB and then translate the data received into our objects.
Id Generation
RavenDB supports the concepts of unique identifier generation and retrieving documents based on a single, unique identifier. ID generation is simple. When we store an
object RavenDB generates a unique identifier and stores it with the object. The identifier is a string formatted as follows: "collectionName/number". For example "articles/123".
If we want access to the identifier for an object we can add a property named "Id" to our class. Note that this should be a property with get and set accessors - not a field.
Notice that our Article class had an "Id" property. We can see what identifier RavenDB has generated after we call SaveChanges. For example:
session.Store(ravenIntro);
session.SaveChanges();
string id = ravenIntro.Id;
We can also fetch objects by their identifier:
Article ravenIntro = session.Load<Article>("articles/123")
Querying Basics
Notice the use of LINQ in our previous example. LINQ is an essential part of RavenDB and really makes it easy for .NET programmers to query the data store. In fact,
LINQ is not only used to create queries in code, it is the syntax used to make indexes inside of RavenDB itself. If you're familiar with LINQ and can write LINQ to
filter, group and order data you're going to be happy. Note that we can use the language-integrated LINQ sytnax (as per the last example) or the LINQ IEnumerable extensions, like so:
using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
documentStore.Initialize();
using (IDocumentSession session = documentStore.OpenSession())
{
Article ravenIntro = session.Query<Article>()
.Where(a => a.Title == "RavenDB Introduction")
.First<Article>();
}
}
The place to start with "ad-hoc" queries in code is with the IDocumentSession.Query<T> method (as in the last example). The type T that is specified tells
RavenDB which Collection we're querying. After that we're free to use LINQ to filter and order the results from the Collection. Note that grouping is not allowed
in session queries. If you try you will get an exception that simply reads: "Method not supported: GroupBy". Grouping is certainly possible but it
must be declared "up front" on the database side. This is done via indexing - our next topic.
Indexing Basics
In our last example we were querying the Article collection on Title. If we perform this query often enough RavenDB will automatically create a temporary index for us
on Article.Title. This is handy and requires no work on our part. We can also define indexes manually. This is typically done in the "raven db management studio" UI running on localhost:8080/.
Click on Indexes, New Index. Sepcificy a name for the index and then enter a query for the index in the "map" field. For example, to index the articles by title:
from article in docs.Articles select new { article.Title }
The general idea is that you create a LINQ query that returns the data elements that should be indexed. RavenDB then does the rest. Note that there are other, optional fields
for the index: "reduce" and "transform results". Reduce froms the second half of the ubiquitous NoSQL Map/Reduce paradigm. You can think of Map/Reduce as the NoSQL replacement
for rowset selection via SQL. Think of Map as simply the selector for all the data that your query cares about. The Reduce query is then performed on the Map data to group or summarize it.
When you need to use a LINQ "group x into" statement you will typically put it in the Reduce part of an index. Note that Reduce is not required and can be left blank.
Note: As of build 531 when you save an index the UI isn't very helpful when an error occurs. You can see more information about the error in the command-line shell that you're running the
server in.
All LINQ queries for indexes must return anonymous types. For instance we can't have the Map statement return the objects directly like this:
from article in docs.Articles select article
If we try this we'll get a "Variable initializer must be a select query expression returning an anonymous object" exception. Always use "select new { ... }".
Indexes can also be created directly from code. This is typically done by creating a class that inherits from AbstractIndexCreationTask. For example:
public class ArticleTitleIndex : AbstractIndexCreationTask<Article>
{
public ArticleTitleIndex()
{
Map = articles => from article in articles select new { article.Title };
}
}
If we go to the RavenDB UI we'll see an "ArticleTitleIndex" index has been created with the following Map:
docs.Articles
.Select(article => new {Title = article.Title})
It's worth noting that we can use language-integrated LINQ syntax or the LINQ IEnumerable extension methods in indexes as well. This is important because not all extension methods,
for example, Any() and All(), have a corresponding language-integrated keyword. Anyway, we can now query Articles based on Title and expect results very quickly.
RavenDB and Lucene
Now that we know about indexing and querying we should understand that underneath the covers RavenDB is translating LINQ "where" clauses into Lucene queries. Lucene is a mature
and powerful text indexing system. RavenDB uses of Lucene to do all the heavy lifting in regards to range-based, partial string and full text searches. When querying indexes via the UI
you'll have to use Lucene query syntax (which I won't explain here). We can also tap into Lucene's search capabilities in code via the LuceneQuery method:
var articles = from a in session.Advanced.LuceneQuery<Article>("ArticleTitleIndex")
.Where("Title:\"RavenDB Introduction\"") select a;
Querying with GroupBy
As mentioned previously, we can't create ad-hoc "GROUP BY" queries in LINQ. To satisfy that querying need we need to define indexes that have a Reduce query. We're allowed to use
the GroupBy method (or LINQ "group x into g") in the aptly-named-for-this-purpose Reduce phase. Let's start with an example. We're going to add a CategoryName property to our Article class:
public class Article
{
public string Id { get; set; }
public string CategoryName { get; set; }
public string Text { get; set; }
public string Title { get; set; }
}
Next we'll define an index that finds unique category names (like SQL DISTINCT). This highlights the dual role of indexes. The first role is to index data to increase query performance.
The second is to transform the data into something new - what is called a Projection in RavenDB. When we use Reduce (or TransformResults, discussed later) we may not return objects with
the same type as our stored in the Collection we're querying. This makes sense for our DISTINCT query - we're going to return category names, not Article objects. Let's look at the index
we would create for this:
Map = from article in docs.Articles
select new { CategoryName = article.CategoryName }
Reduce = from result in results group result by result.CategoryName into g
select new { CategoryName = g.Key }
What is returned by this index is not an Article object. It's an object that has one property: "Name". Let's suppose we defined a class called Category as follows:
public class Category
{
public string Name { get; set; }
}
We could now write code to query the index (named "ArticleCategories") as follows:
var categories = from c in session.Query<Category>("ArticleCategories") select c;
Notice we used T=Category for Query<T>. Raven would have happily populated any object with a Name property. We didn't have to do any special to make RavenDB aware of the
definition of our Category class (schema-less operation). We also passed a string to the Query method, this is the name of the index.
The last basic concept of querying is that of TransformResults. Like Reduce this is place where we can do grouping and summarization. There are two things that we can do
in TransformResults that we can't do in Reduce:
- Reference documents in a different collection than Map is querying on.
- Return results in a different format than was returned by Map.
#1 means that you can effectively do a JOIN to other data. As stated, in NoSQL land we try not to organize the data such that we have to do joins. But sometimes this just
isn't reasonable. To do a join we typically rely on the unique identifier ("Id" property) that was generated for our objects. We'll show how that's done in the Common Examples section.
Common Query Examples
Text Contains
aka partial match
aka wildcard search
The important thing to know about doing more advanced text matching/searches is that RavenDB uses Lucene behind the scenes. Lucene is powerful and has many options for
indexing text. Some of this complexity gets exposed through RavenDB so that you can choose the right option. For text matching that will handle most of the text matching
that you're used to you can use Lucene's "StandardAnalyzer". This is easily done when you create an index. And yes, in order to do partial text matching (CONTAINS) you have
to make an index.
Class Definitions
public class Article
{
public string Id { get; set; }
public string Text { get; set; }
public string Title { get; set; }
}
Data
Article ravenIntroArticle = new Article()
{
Text = "RavenDB fits into a movement that is called ...",
Title = "RavenDB Introduction",
};
Article csharpUsingArticle = new Article()
{
Text = "The full value of the C# using statement ...",
Title = "Your Friend the C# Using Statement",
};
Article nutsAndProteinArticle = new Article()
{
Text = "Nuts are a great source of protein ...",
Title = "Nuts and Protein",
};
using (IDocumentSession session = documentStore.OpenSession())
{
session.Store(ravenIntroArticle);
session.Store(csharpUsingArticle);
session.Store(nutsAndProteinArticle);
session.SaveChanges();
}
Index
We create an index in the usual way:
from article in docs.Articles select new { article.Text }
Now we add a field specification to the index. Click "Add Field" in the UI. Specify "Text" as the Field and set Indexing to "Analyzed". This will cause RavenDB to use Lucene's
StandardAnalyzer.
Query
Now we can do partial text queries on the index:
var articles = from a in session.Advanced.LuceneQuery<Article>("ArticleTitle")
.Where("Title:\"*Protein\"") select a;
Querying and Indexing Sub/Child Objects
Overview
Creating indexes on child object properties is fairly simple and only requires a Map query.
Class Definitions
public class Article
{
public string Id { get; set; }
public Tag[] Tags { get; set; }
public string Text { get; set; }
public string Title { get; set; }
}
public class Tag
{
public string Name { get; set; }
public int Count { get; set; }
}
Data
using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
documentStore.Initialize();
Article ravenIntroArticle = new Article()
{
Text = "RavenDB fits into a movement that is called ...",
Title = "RavenDB Introduction"
};
Article csharpUsingArticle = new Article()
{
Text = "The full value of the C# using statement ...",
Title = "Your Friend the C# Using Statement"
};
Article nutsAndProteinArticle = new Article()
{
Text = "Nuts are a great source of protein ...",
Title = "Nuts and Protein"
};
Tag documentsTag = new Tag() { Name = "documents database" };
Tag dotNetTag = new Tag() { Name = ".NET" };
Tag noSqlTag = new Tag() { Name = "NoSQL" };
Tag csharpTag = new Tag() { Name = "C#" };
Tag nutsTag = new Tag() { Name = "nuts" };
ravenIntroArticle.Tags = new Tag[] { documentsTag, dotNetTag, csharpTag };
csharpUsingArticle.Tags = new Tag[] { dotNetTag, csharpTag };
nutsAndProteinArticle.Tags = new Tag[] { nutsTag };
using (IDocumentSession session = documentStore.OpenSession())
{
session.Store(ravenIntroArticle);
session.Store(csharpUsingArticle);
session.Store(nutsAndProteinArticle);
session.SaveChanges();
}
}
Each article has a number of Tag subobjects.
Index
The index is not necessary for the query to run but is a good idea if the data is large.
Map = from article in docs.Articles
from tag in article.Tags
select new { tag.Name }
Query
var articles = from article in session.Query()
where article.Tags.Any(t => t.Name == ".NET")
select article;
Grouping Flat Data Into Hierarchical Structure
aka GROUP BY
aka grouping with children
Overview
If you're storing data denormalized sometimes you may want to query it so that it is organized hierarchically. We'll write a query to
take "flat" data and organize into parents and children.
Class Definitions
public class Article
{
public string CategoryName { get; set; }
public string Id { get; set; }
public string Text { get; set; }
public string Title { get; set; }
}
public class Category
{
public string Name { get; set; }
public Article[] Articles { get; set; }
}
Data
using (IDocumentStore documentStore = new DocumentStore()
{ Url = "http://localhost:8080" })
{
documentStore.Initialize();
Article ravenIntroArticle = new Article()
{
CategoryName = "Technology",
Text = "RavenDB fits into a movement that is called ...",
Title = "RavenDB Introduction"
};
Article csharpUsingArticle = new Article()
{
CategoryName = "Technology",
Text = "The full value of the C# using statement ...",
Title = "Your Friend the C# Using Statement"
};
Article nutsAndProteinArticle = new Article()
{
CategoryName = "Health",
Text = "Nuts are a great source of protein ...",
Title = "Nuts and Protein"
};
using (IDocumentSession session = documentStore.OpenSession())
{
session.Store(ravenIntroArticle);
session.Store(csharpUsingArticle);
session.Store(nutsAndProteinArticle);
session.SaveChanges();
}
}
Index
We must define an index so that we can use the Reduce query to group the data.
Map = from article in docs.Articles
select new
{ Name = article.CategoryName,
Articles = article }
Reduce = from result in results
group result by new
{ Name = result.Name }
into g select new
{ Name = g.Key.Name,
Articles = from a in g select a.Articles }
Query
var categories = from category in session.Query<Category>("CategoriesWithArticles")
select category;
With the data in this example this query will return two Category objects. One will have Name=Technology and will have two Articles, the other will have Name=Health and will
one Article.
JOIN
aka SQL join
aka left outer join
Overview
While joining is not an ideal scenario in RavenDB (or NoSQL in general) it can be done. We'll contrive an example to demonstrate by adding a collection of Tags to the database.
In NoSQL land we would likely just add an array of Tag objects to the Article class directly. In RDBMS land we would normalize
the Tags into a separate table and just store the tag IDs in Article. For the purposes of this example we'll do things the RDBMS way but hopefully you can appreciate that we're
forcing a join scenario when we probably shouldn't.
Class Definitions
public class Article
{
public string Id { get; set; }
public string[] TagIds { get; set; }
public string Text { get; set; }
public string Title { get; set; }
}
public class ArticleWithTagNames
{
public string[] TagNames { get; set; }
public string Text { get; set; }
public string Title { get; set; }
}
public class Tag
{
public string Id { get; set; }
public string Name { get; set; }
}
The ArticleWithTagNames class is necessary to handle the Projection that will be returned from the following index. This might make it clearer why we try to store all our data
together in a NoSQL database. Joins result in various things that aren't optimal.
Data
Tag documentsDatabaseTag = new Tag() { Name = "documents database" };
Tag dotNetTag = new Tag() { Name = ".NET" };
Tag noSqlTag = new Tag() { Name = "NoSQL" };
Tag csharpTag = new Tag() { Name = "C#" };
Tag nutsTag = new Tag() { Name = "nuts" };
using (IDocumentSession session = documentStore.OpenSession())
{
session.Store(documentsDatabaseTag);
session.Store(dotNetTag);
session.Store(noSqlTag);
session.Store(csharpTag);
session.Store(nutsTag);
session.SaveChanges();
}
Article ravenIntroArticle = new Article()
{
Text = "RavenDB fits into a movement that is called ...",
Title = "RavenDB Introduction",
TagIds = new string[] {
documentsDatabaseTag.Id,
dotNetTag.Id,
noSqlTag.Id }
};
Article csharpUsingArticle = new Article()
{
Text = "The full value of the C# using statement ...",
Title = "Your Friend the C# Using Statement",
TagIds = new string[] {
dotNetTag.Id,
csharpTag.Id }
};
Article nutsAndProteinArticle = new Article()
{
Text = "Nuts are a great source of protein ...",
Title = "Nuts and Protein",
TagIds = new string[] {
nutsTag.Id }
};
using (IDocumentSession session = documentStore.OpenSession())
{
session.Store(ravenIntroArticle);
session.Store(csharpUsingArticle);
session.Store(nutsAndProteinArticle);
session.SaveChanges();
}
Index
In order to join to other collections we need to implement an index and specify the TransformResults query.
public class ArticlesWithTagNames : AbstractIndexCreationTask<Article>
{
public ArticlesWithTagNames()
{
Map = articles =>
from article in articles
select new { article.Title, article.Text, article.TagIds };
TransformResults = (database, articles) =>
from article in articles
let tags = database.Load<Tag>(article.TagIds)
select new { Text = article.Text, Title = article.Title,
TagNames = from t in tags select t.Name };
}
}
This relies on database.Load being overloaded to accept either one ID or an array of IDs. Note that this is doing a LEFT OUTER JOIN. If the tag ID references a Tag that doesn't exist
a NULL value is returned for t.Name.
Query
var articles = from a in
session.Query<ArticleWithTagNames>("ArticlesWithTagNames") select a;