2012. június 10., vasárnap

JPA query result as POJO

As we all know, JPA is not the greatest thing when it comes to OLAP style queries or queries that go through many relationships. The reason is, when you only need some very specific data, it is a waste to create thousands of POJOs and then use Java to extract and aggregate what's needed.

So what we need here is some way to write a query that only extracts needed information and returns that as a POJO.

First try: using a JPQL View object

SELECT NEW com.company.MyPojo(o.rel.blah, o.rel2.blah2, o.rel3.rel33.blah3) FROM Object o
 
This works nice, as long as it is possible to write the query in JPQL. This may not be easy, because:
  • Need to use functions, DB features not in JPQL
  • Have many inheritance relationships in your model, which complicates your query.
JPA2.0 added some new stuff that helps with all of these, see here.

Due to a Hibernate bug though, you cannot use TypedQuery in this case..

Second try: use a native query

Simply use EntityManager's createNativeQuery method. Be warned, however if you return multiple columns, the returned result will be of List<Object[]> and it seems that there is no easy way to change that. This all means that you still need to post-process the returned result and create your own POJOs.

In many forums, the recommended way to solve this is to use @SqlResultSetMapping, however that only works for entities not POJOs. (Somewhere it was mentioned that it also works for @MappedSuperclass, I've found this not to work with Hibernate 4.1.2).

If you are using EclipseLink, you can add a query hint to return the result as a Map instead of Object[], using the @ResultType annotation.

2012. június 1., péntek

MultipleBagFetchException and Java equality

Today I've encountered an interesting problem: suppose you've got a Hibernate entity called Parent and which is in a N-1 relationship with two other classes called, Child1 and Child2:

@Entity
public class Parent implements Serializable {

    @Id
    Integer id;

    @OneToMany(mappedBy="parent", cascade={CascadeType.PERSIST, CascadeType.REMOVE})
    List<Child1> child1Coll = new ArrayList<Child1>();
   
    @OneToMany(mappedBy="
parent", cascade={CascadeType.PERSIST, CascadeType.REMOVE})
    List<Child2>
child2Coll = new ArrayList<Child2>();


This went fine, except for the performance when having a large dataset. Actually this is the classical N+1 ORM problem, with a standard solution: add EAGER fetching. In this case this was pretty much an easy decision: indeed I never need a Parent without Child1 and Child2 collections.

So I've added the option to fetch eagerly. An eager fetch means two extra left joins in practice, which is much faster than retrieving all the Child1 and Child2 collections later by separate queries. Fine. Except for the resulting org.hibernate.loader.MultipleBagFetchException.

The problem is, when you have multiple rows in both child collections, a join will turn into a cartesian product and Hibernate cannot figure out how to extract the originally stored values. This is nicely explained here.

The whole problem just set my mind up to think about ORMs in general. For example what differentiates a list from a set? The constraint of only having distinct elements or the ability to order?

These are two different features, however, in Java, we have them mixed up:
List = non distinct elements + order
Set = distinct elements + optional ordering

Also note, that by order we mean two different things in case of a List and a Set. In the former case, order is something external, while with sets, order is internal (the order is derived from the properties of the things stored, not an arbitrary external index).

Hibernate resolves this ambiguity by having the @IndexColumn annotation. With @IndexColumn, order is stored explicitly in a column using simple integers. With JPA2's @OrderBy, order is derived from the things stored. (Actually JPA2 also has @OrderColumn, which is very similar to Hibernate's @IndexColumn :)) )

This ambiguity set up a whole cottage industry of different terminology:
Bag = non distinct elements, without order
List = non distinct elmenets + explicit order
Set = distinct elements without order
SortedSet = distinct elements + implicit order

(note: explicit order is sometimes referenced as "indexed data structure")

To complicate matters a bit more, in case of Hibernate, it makes no sense to talk about distinct and non-distinct elements. All elements must be distinct, as Hibernate's philosophy is that all entities must have unique primary keys.

Even if we use the above annotations and take advantage of the fact that Hibernate can now work reliably using Sets as collection types, many times some dependant API won't accept a SortedSet, since it only implements Iterable but not List. Iterable should be enough though if we are only reading and displaying a data structure, but we don't need indexed writes (that is, writing at the Nth index of a data structure).

That's all I had to say about ordering, now what about equality? :)

As we all know, all Java objects have an .equals method, which does reference equality by default but can be overridden to use other another definition of equality. This works nice, except for more complex situations where the notion of equality itself can change. For example, in simple cases, differentiation using attribute1 is enough. In complex cases, you need attribute1+attribute2.

This is a bit hard to implement, as we cannot override the equals method on a case-by-case basis. So why does this matter? When we have an application that is used to enter data using manual methods or from an external system, we'd like to use the strictest possible definition of equality so we can catch data errors at the earliest possible moment. Currently we need to abandon equality by means of the .equals method and just have an external method as a filter..