2012. június 1., péntek

MultipleBagFetchException and Java equality

Today I've encountered an interesting problem: suppose you've got a Hibernate entity called Parent and which is in a N-1 relationship with two other classes called, Child1 and Child2:

@Entity
public class Parent implements Serializable {

    @Id
    Integer id;

    @OneToMany(mappedBy="parent", cascade={CascadeType.PERSIST, CascadeType.REMOVE})
    List<Child1> child1Coll = new ArrayList<Child1>();
   
    @OneToMany(mappedBy="
parent", cascade={CascadeType.PERSIST, CascadeType.REMOVE})
    List<Child2>
child2Coll = new ArrayList<Child2>();


This went fine, except for the performance when having a large dataset. Actually this is the classical N+1 ORM problem, with a standard solution: add EAGER fetching. In this case this was pretty much an easy decision: indeed I never need a Parent without Child1 and Child2 collections.

So I've added the option to fetch eagerly. An eager fetch means two extra left joins in practice, which is much faster than retrieving all the Child1 and Child2 collections later by separate queries. Fine. Except for the resulting org.hibernate.loader.MultipleBagFetchException.

The problem is, when you have multiple rows in both child collections, a join will turn into a cartesian product and Hibernate cannot figure out how to extract the originally stored values. This is nicely explained here.

The whole problem just set my mind up to think about ORMs in general. For example what differentiates a list from a set? The constraint of only having distinct elements or the ability to order?

These are two different features, however, in Java, we have them mixed up:
List = non distinct elements + order
Set = distinct elements + optional ordering

Also note, that by order we mean two different things in case of a List and a Set. In the former case, order is something external, while with sets, order is internal (the order is derived from the properties of the things stored, not an arbitrary external index).

Hibernate resolves this ambiguity by having the @IndexColumn annotation. With @IndexColumn, order is stored explicitly in a column using simple integers. With JPA2's @OrderBy, order is derived from the things stored. (Actually JPA2 also has @OrderColumn, which is very similar to Hibernate's @IndexColumn :)) )

This ambiguity set up a whole cottage industry of different terminology:
Bag = non distinct elements, without order
List = non distinct elmenets + explicit order
Set = distinct elements without order
SortedSet = distinct elements + implicit order

(note: explicit order is sometimes referenced as "indexed data structure")

To complicate matters a bit more, in case of Hibernate, it makes no sense to talk about distinct and non-distinct elements. All elements must be distinct, as Hibernate's philosophy is that all entities must have unique primary keys.

Even if we use the above annotations and take advantage of the fact that Hibernate can now work reliably using Sets as collection types, many times some dependant API won't accept a SortedSet, since it only implements Iterable but not List. Iterable should be enough though if we are only reading and displaying a data structure, but we don't need indexed writes (that is, writing at the Nth index of a data structure).

That's all I had to say about ordering, now what about equality? :)

As we all know, all Java objects have an .equals method, which does reference equality by default but can be overridden to use other another definition of equality. This works nice, except for more complex situations where the notion of equality itself can change. For example, in simple cases, differentiation using attribute1 is enough. In complex cases, you need attribute1+attribute2.

This is a bit hard to implement, as we cannot override the equals method on a case-by-case basis. So why does this matter? When we have an application that is used to enter data using manual methods or from an external system, we'd like to use the strictest possible definition of equality so we can catch data errors at the earliest possible moment. Currently we need to abandon equality by means of the .equals method and just have an external method as a filter..

Nincsenek megjegyzések:

Megjegyzés küldése