Friday, November 14, 2014

Before You Use SpringData and MongoDB

The Upfront Summary

For those who don't have time to read a long blog post, here's the gist of this article: always always always annotate your persisted SpringData entity classes with @Document(collection="<custom-collection-name>") and @TypeAlias("<custom-type>") . This should be an unbreakable rule. Otherwise you'll be opening yourself up to a world of hurt later.

SpringData is Easy to Get Started With

Like many Java developers, I rely on the Spring Framework. Everything, from my data access layer to my MVC controllers are managed within a Spring application context. So when I decided to add MongoDB to the mix, it was without a second thought that I decided to use SpringData to interact with Mongo.

That was months ago, and I've run into a few problems. As it turns out, these particular problems were easy to solve, but it took awhile to recognize what was going on and come up with a solution. Surprisingly little information existed on StackOverflow or the Spring forums for what I'm imagining is a common problem.

Let me explain.

My Data Model

My application is basically an editor. Think of a drawing program, where users can edit a multi-page "drawing" document. Within a drawing's page, users can create and manipulate different shapes. As a document store, MongoDB is well-suited for persisting this sort of data. Roughly speaking, my data model was something like this (excuse the lack of UML):

  • Drawings are the top-level container
  • A Drawing has one or more Pages
  • A Page consists of many Shapes. 
  • Shape is an abstract class. It has some properties shared by all Shape subclasses, such as size, border with and color, background color, etc
  • Concrete subclasses of Shape can contain additional properties. For example, Star has number of points, inner radius, outer radius, etc

Drawing are stored separately then pages; i.e. they are not nested. Shapes, however, are nested within Pages. For example, here's a snippet from the Drawing class:

public class Drawing extends BaseDocument {

    @Id 
    private String id;

    // ....

}

and the Page class:

public class Page extends BaseDocument {

    @Id 
    private String id;

    @Indexed
    private String drawingId;

    private List<Shape> shapes = new ArrayList<Shape>();

    // ....

}

So in other words, when a user goes to edit a given drawing, we simply retrieve all of the Pages whose drawingId matches the ID of the drawing being edited.

Don't Accept SpringData's Defaults!

SpringData offers you the ability to customize how your entities are persisted in your datastore. But if you don't explicitly customize, SpringData will make do as best as it can. While that might seem helpful up front, I've found the opposite. SpringData's default behavior will invariably paint you into a corner that's difficult to get out of. I'd argue that, rather than guessing, SpringData should throw a runtime exception at startup. Short of that, every tutorial about SpringData/MongoDB should strongly encourage developers of production applications to tell Spring how to persist their data.

The first default is how SpringData maps classes to collections. Collections are how many NoSQL data stores, MongoDB included, store groups of related data. Although it's not always appropriate to compare NoSQL databases to traditional RDBMs, you can roughly think of a collection the same way you think of a table in a SQL database.

Chapter 7 of the SpringData/Mongo docs explains how, by default classes are mapped to collections:
The short Java class name is mapped to the collection name in the following manner. The class 'com.bigbank.SavingsAccount' maps to 'savingsAccount' collection name.
So based on my data model, I knew I'd find a drawing collection and a page collection in my MongoDB instance.

Now, I've used ORMs like Hibernate extensively. Probably for that reason, I wasn't content to let my Mongo collections be named for me. So I looked for a way to specify my collection names.

The answer was simple enough. Although not a strict requirement, persisted entities should be annotated with the @org.springframework.data.mongodb.core.mapping.Document annotation. Furthermore, that annotation takes a collection argument in which you can pass your desired collection name.

So my Drawing class became annotated with @Document(collection="drawing"), and my Page class became annotated with @Document(collection="page"). The end result would be the same--a drawing and a page collection in Mongo--but I now had control. I specified the collection name simply because it made me feel more comfortable, but it turns out there's an important, tangible reason to do so (more on that later).

With everything in place, I started testing my app. I created a few drawings, added some pages and shapes, and saved them all to MongoDB. Then I used the mongo command-line tool to examine my data. One thing immediately stuck out. Every document stored in there had a _class property which pointed to the fully-qualified name of the mapped class. For example, each Page document contained the property "_class" : "com.myapp.documents.Page".

The purpose of that value, as you might guess, is to instruct Spring Data how to map the document back to a class when reading data out. This convention struck me as a little concerning. After all, my application might be pure Java at this point, but my data store should be treated as language-agnostic. Why would I want Java-specific metadata associated with my data?

After thinking about it, I shook off my concern. Sure, the _class property would be there on every record, but if I started using another platform to access the data, then the property could just be ignored. Practically speaking, what could actually go wrong?

What Could Go Wrong

Then one day I decided to refactor my entire application. I'd been organizing my code based on architectural layer, and I decided instead to try organizing it by feature instead. Eclipse of course allowed me to do this in a matter of minutes. So I WARred up my changes, deployed them to Tomcat, and viola! I could no longer read in any of my drawing/page/shape data.

It quickly became clear what the problem was. My data contained _class information that pointed to a now-non-existence fully-qualified class name. Shape was no longer in the com.myapp.documents package.

With the problem identified, what was the solution?

Making it Right

As mentioned above, SpringData offers the @TypeAlias annotation. Annotating a document as such and providing a value tells Spring to store that value, rather than the fully-qualified classname, as the document's _class parameter.

So here's what I did:

@Document(collection="page")
@TypeAlias("myapp.page")
public class Page extends BaseDocument {
    // ...
}

Of course, I still couldn't read any of my existing data in, but moving forward, any new data would be refactor-proof. Fortunately my app was nowhere near production-ready at this point, so deleting all of my old drawings and starting with new ones was no problem. If that wasn't an option, then I figure I'd have two options:
  1. Change the @TypeAlias value to match the old, fully-qualified class name, rather than the generic myapp.page value. Of course I'd be stuck with a confusing, language-specific value at that point.
  2. Go through each of the affected collections in my MongoDB store and update their _class values to the new, generic aliases. Certainly possible, although a bit scary for my taste as a MongoDB newbie.
One additional improvement could be made at this point. The property in the MongoDB documents is still called _class, but now that's a bit of a misnomer. I'd prefer something like, well, _alias. This is easy to change. Somewhere in your XML or Java config, you've probably specified a DefaultMongoTypeMapper. Simply pass a new typeKey value in the constructor. For example, here's a snippet from my XML config:

  <bean id="mongoTypeMapper" class="org.springframework.data.mongodb.core.convert.DefaultMongoTypeMapper">
<constructor-arg name="typeKey" value="_alias"></constructor-arg>

</bean>

Are We All Set?

It turns out that I immediately ran into another problem. This one is a bit more specific to my particular application, so I'll describe it in my next article.

No comments:

Post a Comment