Entity Relationships in a Document Database

This week I attended and presented at CouchConf Boston. I’d like to thank Couchbase for inviting me to speak—it was a great opportunity to meet other CouchDB and Couchbase users. I’ve posted the slides from my Entity Relationships in a Document Database talk to Speaker Deck (and SlideShare, if you prefer):

The presentation covered four patterns for modeling entity relationships in document databases such as CouchDB and Couchbase:

  • Embedded Entities
  • Related Documents
  • List of Keys
  • Relationship Documents

See the presentation for more details. This topic is also covered in the MapReduce Views for SQL Users chapter in the second edition of Writing and Querying MapReduce Views in CouchDB.

4 Comments

  1. phil westwell
    Posted May 20, 2012 at 12:13 pm | Permalink

    Thanks for posting this. I think formalising the idea of “view collation” helps think certain problems through.

    I thought the “Relationship Document” idea was interesting, but I have some doubts about how it would work in practice that I hope you can clear up.

    Say the author document already exists and I want to create a book document. So I (i) use a PUT to add the book and then (ii) use a PUT to add the book-author relationship document.

    But if step (ii) fails, then I have a book that has no author. Do you envisage a solution to deal with this?

  2. Posted May 20, 2012 at 5:03 pm | Permalink

    @phil westwell: You’re welcome! A fundamental limitation of document databases is that you cannot have atomic transactions across document boundaries. There are very good reasons for this limitation—this limitation brings with it great benefits in the forms of availability and partition tolerance. However, this limitation is a critical one to understand when designing your application and database. So, the short answer is that there is no solution to the problem you mention. If your relationships must be consistent, then they must be in one document. This, of course, precludes use of the Relationship Document pattern.

    The slightly longer answer is to use the List of Keys pattern instead (assuming we’re talking about a Many to Many relationship here). This solves the problem for one side of the relationship—the side on which you store the List of Keys—but you still have the problem on the other side of the relationship. Note that when you move from the Relationship Documents pattern to the List of Keys Pattern you are giving up availability (increased probability of document update conflicts) in exchange for consistency. Broadly speaking, document databases are a good fit when availability is more important than consistency. If you must have this level of consistency, then perhaps a document database isn’t a good fit—but note what you’re giving up in terms of availability and partition tolerance.

  3. jonathan
    Posted August 13, 2012 at 9:32 am | Permalink

    Great slides, very helpful, for all of us coming from the SQL world.
    Quick question – on the “Additional Techniques” page you mention that UUIDs have better performance than natural keys. Can you elaborate on this? (or at least point me to where I could do some reading on it?)

  4. Posted August 14, 2012 at 10:51 am | Permalink

    @jonathan: Thanks! Glad you found it helpful. As for the performance of UUIDs, I talked a bit about this in my Scaling CouchDB book. For best performance, you’ll want to use mostly monotonic (basically means sequential) document identifiers. This has to do with the underlying B-tree structure used to store documents. UUIDs are mostly monotonic while still working in a distributed environment, but other approaches could work as well. This is mainly a concern when dealing with large data sets.