Saturday, November 15, 2014

Spring Data, Mongo, and Lazy Mappers

In a previous post, I mentioned two things that every developer should do when using Spring Data to access a MongoDB datastore. Specifically, you should be sure to annotate all of your persistent entities with @Document(collection="<custom-collection-name>") and @TypeAlias("<custom-type>"). This decouples your Mongo document data from your specific Java class names (which Spring Data will otherwise closely couple by default) making things like refactoring possible.

With my particular application, however, I ran into an additional problem. Let me recap. My application is a drawing application of sorts. Drawings are are modeled by, well, a Drawing class. A Drawing can contain multiple instances of Page, and within each Page, multiple Shape objects. Shape, in turn, is an abstract class, containing a number of subclasses (Circle, Star, Rectangle, etc).

For our purposes, let's focus on the relationship between a Page and its Shapes. Here's a snippet from the Page class:

@Document(collection="page")
@TypeAlias("myapp.page")
public class Page extends BaseDocument {

    @Id 
    private String id;

    @Indexed
    private String drawingId;

    private List<Shape> shapes = new ArrayList<Shape>();

    // ....

}

First not that I've annotated this class so that I have control over the name of the collection that stores Page documents (in this case, "page"), and so that Spring Data will store an alias to the Page class (in this case, "my app.page") along with the persisted Page documents, rather than storing the fully-qualified class name.

Also of importance here is that the Page class knows nothing about any specific Shape subclasses. This is important from an OO perspective, of course; I should be able to add any number of Shapes to my app's ecosystem, and the Page class should continue to work with no modifications.

Now let's look at my Shape class:

public abstract class Shape extends BaseDocument {

    @Id
    private String id;

    @Indexed
    private String pageId;

    // attributes
    private int x;

    private int y;

    // ...

}

Nothing surprising here. Note that Shape has none of the SpringData annotation; that's because no concrete instance of Shape will be persisted along with any Pages. It is abstract, after all. Instead, a Page will contain instances of Shape subclasses. Let's take a look at one such subclass:

@Document(collection="shape")
@TypeAlias("myapp.shape.star")

public class Star extends Shape {

    private int numPoints;
    private float innerRadius;

    private float outerRadius;

}

The @Document(collection="shape") annotation is currently unused, because per my app design, any Shape subclass instance will always be stored as a nested collection within a Page. But it would certainly be possible to store different shapes directly into a specific collection.

The @TypeAlias annotation, however, is very important. The purpose of that one is to tell Spring Data how to map the different Shapes that it finds within a Page back into the appropriate class. After all, if a Page containing a nine-point star is persisted, then when it's read back in, that star had better be mapped back into a Star class, not a simple Shape class. After all, Shape itself knows nothing about number of points!

Feeling pretty happy with myself, I tried out my code. Upon trying to read my drawings back in, I began getting errors of this type:

org.springframework.data.mapping.model.MappingInstantiationException: Could not instantiate bean class [com.myapp.documents.Shape]: Is it an abstract class?; nested exception is java.lang.InstantiationException

Indeed, Shape is an abstract class, and so indeed, it cannot be directly instantiated. But why was Spring Data trying to directly instantiate a Shape? I played around, tweaked a few things, but nothing fundamentally changed. I checked StackOverflow and the Spring forums. Nothing. So it was time to dig into the documentation.

As with most typical Spring Data/Mongo apps, mine was configured to use a bean of type org.springframework.data.mongodb.core.convert.DefaultMongoTypeMapper to map persistence documents to and from Java classes:

     <bean id="mongoTypeMapper" class="org.springframework.data.mongodb.core.convert.DefaultMongoTypeMapper">
        <constructor-arg name="typeKey" value="_alias"></constructor-arg>

    </bean>

    <bean id="mappingMongoConverter"
class="org.springframework.data.mongodb.core.convert.MappingMongoConverter">
        <constructor-arg ref="mongoDbFactory" />
        <constructor-arg ref="mappingContext" />
        <property name="typeMapper" ref="mongoTypeMapper"/>
    </bean>

    <bean id="mongoTemplate" class="org.springframework.data.mongodb.core.MongoTemplate">
        <constructor-arg ref="mongoDbFactory" />
        <constructor-arg ref="mappingMongoConverter" />

    </bean>

The docs indicated that DefaultMongoTypeMapper was responsible for reading and writing the type information stored with persistent data. By default, this would be a _class property pointing to com.myapp.documents.Star; with my customizations it became an _alias property pointing to may app.shape.star. But if DefaultMongoTypeMapper wouldn't do the trick, perhaps I needed another mapper.

Looking through the documentation, I found org.springframework.data.convert.MappingContextTypeInformationMapper. Here's what its Javadocs indicated:
TypeInformationMapper implementation that can be either set up using a MappingContext or manually set up Map of String aliases to types. If a MappingContext is used the Map will be build inspecting the PersistentEntity instances for type alias information.
That seemed promising. If I could replace my DefaultMongoTypeMapper with a MappingContextTypeInformationMapper that could scan my persistent entities and build a type-to-alias mapping, then that should solve my problem. The docs also said something about manually creating a Map, but a) It wasn't readily apparent how to create a Map myself, and b) I didn't like that approach; I didn't want to have to manually configure an entry for any new Shape that might be created.

One problem. You'll notice above that my DefaultMongoTypeMapper is wired into my MappingMongoConverter by way of the latter's typeMapper property. In fact, typeMapper is itself of type MongoTypeMapper. While DefaultMongoTypeMapper implements MongoTypeMapper,  MappingMongoConverter does not. Fortunately, DefaultMongoTypeMapper allows you to chain together fallback mappers by way of an internal property, mappers, which itself is a List<? extends TypeInformationMapper>. And as luck would have it, MappingMongoConverter implements TypeInformationMapper.

So I would keep my DefaultMongoTypeMapper, and add a MappingMongoConverter to its mappers list. I modified my spring XML config like so:

  <bean id="mongoTypeMapper" class="org.springframework.data.mongodb.core.convert.DefaultMongoTypeMapper">
<constructor-arg name="typeKey" value="_alias"></constructor-arg>
    <constructor-arg name="mappers">
        <list>
            <ref bean="mappingContextTypeMapper" />
        </list>
    </constructor-arg> 
</bean>
  <bean id="mappingContextTypeMapper" class="org.springframework.data.convert.MappingContextTypeInformationMapper">
      <constructor-arg ref="mappingContext" />

  </bean>

I redeployed and ran my app.

And I ran into the same exact error. Damn.

At this point, I became concerned that maybe all of the TypeAlias information was completely ignored by SpringData with nested documents, such as my Shapes nested within Pages. So I decided to roll up my sleeves, fire up my debugger, and start getting intimate with the Spring Data source code.

After a bit of debugging, I learned that Spring Data was indeed attempting to determine if any TypeAlias information applied to the Shapes that were being read in for any Page. But in a lazy, half-hearted way.

When I say lazy, I mean that there was absolutely no scanning of entities to search for @TypeAlias annotation like I'd assumed there would be. Everything was done at runtime, as new data types were discovered. The MappingMongoConverter would read my base entity; i.e. a Page document. It would then discover that this document had a collection of things called shapes. Then it would examine the Page class to find the shapes property, and discover that shapes was of type List<Shape>. And finally it would examine the Shape class to determine if it had any TypeAlias data that it could cache for later.

In other words, it was completely backwards from what I needed. This mapper wouldn't work for me, either.

By this time, I'd developed enough understanding as to what was going on, that creating my own mapper didn't seem too tough. And that's what I did. Really, all I needed was a mapper that I could configure to scan one or more packages to discover persistent entities with TypeAlias information, and cache that information for later use.

My class was called EntityScanningTypeInformationMapper, and its full source code is a the end of this post. But the relevant parts are:

  • Its constructor takes a List<String> of packages to scan.
  • It has an init() method that scans the provided packages
  • Scanning a package entails using reflection to read in the information for each class in the package, determining if it is annotated with @TypeAlias, and if so, mapping the alias to the class.

My Spring XML config was modified thusly:

  <bean id="mongoTypeMapper" class="org.springframework.data.mongodb.core.convert.DefaultMongoTypeMapper">
<constructor-arg name="typeKey" value="_alias"></constructor-arg>
    <constructor-arg name="mappers">
        <list>
            <ref bean="entityScanningTypeMapper" />
        </list>
    </constructor-arg> 
</bean>
  <bean id="entityScanningTypeMapper" class="com.myapp.utils.EntityScanningTypeInformationMapper" init-method="init">
    <constructor-arg name="scanPackages">
        <list>
            <value>com.myapp.documents.shapes</value>
        </list>
    </constructor-arg> 

  </bean>

I redeployed and retested, and it ran like a champ.

So my lesson is that Spring Data, out of the box, doesn't seem to work well with polymorphism, which is a shame given the schema-less nature of NoSQL data stores like MongoDB. But it doesn't take too much effort to write your own mapper to compensate.

Oh, and here's the EntityScanningTypeInformationMapper source:

public class EntityScanningTypeInformationMapper implements TypeInformationMapper {

    private Logger log = Logger.getLogger(this.getClass());
    
    private final List<String> scanPackages;
    private Map<String, Class<? extends Object>> aliasToClass;

    public EntityScanningTypeInformationMapper(List<String> scanPackages) {
        this.scanPackages = scanPackages;
    }

    public void init() {
       this.scan(scanPackages);
    }
    
    private void scan(List<String> scanPackages) {
        aliasToClass = new HashMap<>();
        for (String pkg : scanPackages) {
            try {
                findMyTypes(pkg);
            } catch (ClassNotFoundException | IOException e) {
                log.error("Error scanning package " + pkg, e);
            }
        }
    }
    
    private void findMyTypes(String basePackage) throws ClassNotFoundException, IOException {
        ResourcePatternResolver resourcePatternResolver = new PathMatchingResourcePatternResolver();
        MetadataReaderFactory metadataReaderFactory = new CachingMetadataReaderFactory(resourcePatternResolver);

        String packageSearchPath = ResourcePatternResolver.CLASSPATH_ALL_URL_PREFIX +
                                   resolveBasePackage(basePackage) + "/" + "**/*.class";
        Resource[] resources = resourcePatternResolver.getResources(packageSearchPath);
        for (Resource resource : resources) {
            if (resource.isReadable()) {
                MetadataReader metadataReader = metadataReaderFactory.getMetadataReader(resource);
                Class<? extends Object> c = Class.forName(metadataReader.getClassMetadata().getClassName());
                log.debug("Scanning package " + basePackage + " and found class " + c);
                if (c.isAnnotationPresent(TypeAlias.class)) {
                    TypeAlias typeAliasAnnot = c.getAnnotation(TypeAlias.class);
                    String alias = typeAliasAnnot.value();
                    log.debug("And it has a TypeAlias " + alias);
                    aliasToClass.put(alias, c);
                }
            }
        }
    }

    private String resolveBasePackage(String basePackage) {
        return ClassUtils.convertClassNameToResourcePath(SystemPropertyUtils.resolvePlaceholders(basePackage));
    }

    @Override
    public TypeInformation<?> resolveTypeFrom(Object alias) {
        if (aliasToClass == null) {
            scan(scanPackages);
        }
        
        if (alias instanceof String) {
            Class<? extends Object> clazz = aliasToClass.get( (String)alias );
            if (clazz != null) {
                return ClassTypeInformation.from(clazz);
            }
        }
        return null;
    }

    @Override
    public Object createAliasFor(TypeInformation<?> type) {
        log.debug("EntityScanningTypeInformationMapper asked to create alias for type: " + type);
        return null;
    }


}

Friday, November 14, 2014

Before You Use SpringData and MongoDB

The Upfront Summary

For those who don't have time to read a long blog post, here's the gist of this article: always always always annotate your persisted SpringData entity classes with @Document(collection="<custom-collection-name>") and @TypeAlias("<custom-type>") . This should be an unbreakable rule. Otherwise you'll be opening yourself up to a world of hurt later.

SpringData is Easy to Get Started With

Like many Java developers, I rely on the Spring Framework. Everything, from my data access layer to my MVC controllers are managed within a Spring application context. So when I decided to add MongoDB to the mix, it was without a second thought that I decided to use SpringData to interact with Mongo.

That was months ago, and I've run into a few problems. As it turns out, these particular problems were easy to solve, but it took awhile to recognize what was going on and come up with a solution. Surprisingly little information existed on StackOverflow or the Spring forums for what I'm imagining is a common problem.

Let me explain.

My Data Model

My application is basically an editor. Think of a drawing program, where users can edit a multi-page "drawing" document. Within a drawing's page, users can create and manipulate different shapes. As a document store, MongoDB is well-suited for persisting this sort of data. Roughly speaking, my data model was something like this (excuse the lack of UML):

  • Drawings are the top-level container
  • A Drawing has one or more Pages
  • A Page consists of many Shapes. 
  • Shape is an abstract class. It has some properties shared by all Shape subclasses, such as size, border with and color, background color, etc
  • Concrete subclasses of Shape can contain additional properties. For example, Star has number of points, inner radius, outer radius, etc

Drawing are stored separately then pages; i.e. they are not nested. Shapes, however, are nested within Pages. For example, here's a snippet from the Drawing class:

public class Drawing extends BaseDocument {

    @Id 
    private String id;

    // ....

}

and the Page class:

public class Page extends BaseDocument {

    @Id 
    private String id;

    @Indexed
    private String drawingId;

    private List<Shape> shapes = new ArrayList<Shape>();

    // ....

}

So in other words, when a user goes to edit a given drawing, we simply retrieve all of the Pages whose drawingId matches the ID of the drawing being edited.

Don't Accept SpringData's Defaults!

SpringData offers you the ability to customize how your entities are persisted in your datastore. But if you don't explicitly customize, SpringData will make do as best as it can. While that might seem helpful up front, I've found the opposite. SpringData's default behavior will invariably paint you into a corner that's difficult to get out of. I'd argue that, rather than guessing, SpringData should throw a runtime exception at startup. Short of that, every tutorial about SpringData/MongoDB should strongly encourage developers of production applications to tell Spring how to persist their data.

The first default is how SpringData maps classes to collections. Collections are how many NoSQL data stores, MongoDB included, store groups of related data. Although it's not always appropriate to compare NoSQL databases to traditional RDBMs, you can roughly think of a collection the same way you think of a table in a SQL database.

Chapter 7 of the SpringData/Mongo docs explains how, by default classes are mapped to collections:
The short Java class name is mapped to the collection name in the following manner. The class 'com.bigbank.SavingsAccount' maps to 'savingsAccount' collection name.
So based on my data model, I knew I'd find a drawing collection and a page collection in my MongoDB instance.

Now, I've used ORMs like Hibernate extensively. Probably for that reason, I wasn't content to let my Mongo collections be named for me. So I looked for a way to specify my collection names.

The answer was simple enough. Although not a strict requirement, persisted entities should be annotated with the @org.springframework.data.mongodb.core.mapping.Document annotation. Furthermore, that annotation takes a collection argument in which you can pass your desired collection name.

So my Drawing class became annotated with @Document(collection="drawing"), and my Page class became annotated with @Document(collection="page"). The end result would be the same--a drawing and a page collection in Mongo--but I now had control. I specified the collection name simply because it made me feel more comfortable, but it turns out there's an important, tangible reason to do so (more on that later).

With everything in place, I started testing my app. I created a few drawings, added some pages and shapes, and saved them all to MongoDB. Then I used the mongo command-line tool to examine my data. One thing immediately stuck out. Every document stored in there had a _class property which pointed to the fully-qualified name of the mapped class. For example, each Page document contained the property "_class" : "com.myapp.documents.Page".

The purpose of that value, as you might guess, is to instruct Spring Data how to map the document back to a class when reading data out. This convention struck me as a little concerning. After all, my application might be pure Java at this point, but my data store should be treated as language-agnostic. Why would I want Java-specific metadata associated with my data?

After thinking about it, I shook off my concern. Sure, the _class property would be there on every record, but if I started using another platform to access the data, then the property could just be ignored. Practically speaking, what could actually go wrong?

What Could Go Wrong

Then one day I decided to refactor my entire application. I'd been organizing my code based on architectural layer, and I decided instead to try organizing it by feature instead. Eclipse of course allowed me to do this in a matter of minutes. So I WARred up my changes, deployed them to Tomcat, and viola! I could no longer read in any of my drawing/page/shape data.

It quickly became clear what the problem was. My data contained _class information that pointed to a now-non-existence fully-qualified class name. Shape was no longer in the com.myapp.documents package.

With the problem identified, what was the solution?

Making it Right

As mentioned above, SpringData offers the @TypeAlias annotation. Annotating a document as such and providing a value tells Spring to store that value, rather than the fully-qualified classname, as the document's _class parameter.

So here's what I did:

@Document(collection="page")
@TypeAlias("myapp.page")
public class Page extends BaseDocument {
    // ...
}

Of course, I still couldn't read any of my existing data in, but moving forward, any new data would be refactor-proof. Fortunately my app was nowhere near production-ready at this point, so deleting all of my old drawings and starting with new ones was no problem. If that wasn't an option, then I figure I'd have two options:
  1. Change the @TypeAlias value to match the old, fully-qualified class name, rather than the generic myapp.page value. Of course I'd be stuck with a confusing, language-specific value at that point.
  2. Go through each of the affected collections in my MongoDB store and update their _class values to the new, generic aliases. Certainly possible, although a bit scary for my taste as a MongoDB newbie.
One additional improvement could be made at this point. The property in the MongoDB documents is still called _class, but now that's a bit of a misnomer. I'd prefer something like, well, _alias. This is easy to change. Somewhere in your XML or Java config, you've probably specified a DefaultMongoTypeMapper. Simply pass a new typeKey value in the constructor. For example, here's a snippet from my XML config:

  <bean id="mongoTypeMapper" class="org.springframework.data.mongodb.core.convert.DefaultMongoTypeMapper">
<constructor-arg name="typeKey" value="_alias"></constructor-arg>

</bean>

Are We All Set?

It turns out that I immediately ran into another problem. This one is a bit more specific to my particular application, so I'll describe it in my next article.