2018. február 10., szombat

Using Oracle JSON functions via JPA and Hibernate

I've recently realized that Oracle 12 has functions that support parsing JSON strings and accessing given keys from that:
How can you use these functions from a JPA or Hibernate application? Here are some options.

Use native queries

Won't give you an example, this works as usual :)

An example query:

Use CriteriaBuilder / Criteria Queries

Let's say one of your entities has a JSON property and you're writing some code to filter this table and only pick entities which has a certain JSON key with given value. Unfortunately there is no easy way to do this using JPA / Hibernate.

If you are using Hibernate, but not JPA, one option is to use Restrictions#sqlRestriction. This works, however is not compatible with JPA's criteria API, which means if the rest of your criteria is written using JPA, you cannot easily conjugate ("AND") it together with this restriction. 

However, since JPA 2.0, there is a new method named CriteriaBuilder#function. This allows calling any function built into the database or defined by yourself. Unfortunately doing the following results in an exception:



The reason is, that in the background, Hibernate will create a templated SQL statement (eg. JSON_VALUE(?, ?) ) which won't work with oracle. The solution is, to inline the function parameters instead of binding them. Unfortunately there is not built-in way in JPA or Hibernate, but with a small helper class, it will work perfectly:




Using calculated (derived) properties

 

Another interesting usage is, when you have a JSON valued column in a table, and would like to extract certain properties from it. You can do this by either creating a custom JPQL query where you select the property by calling JSON_VALUE. This works, but using the calculated property feature of Hibernate is more elegant:


This way, you can avoid mapping the JSON column alltogether and not transmit potentially large amount of data to the client side, just to parse it in Java and read a single property.

Summary

Hope you find this useful, please comment if you've found some other interesting usage!

2018. január 14., vasárnap

Little known Intellij feature: maven surefire argline integration

While adjusting the test setup of a project, I was surprised to see, that whatever parameters I had in my pom.xml for surefire also took effect when launching a test inside IntelliJ IDEA, without maven involved at all!












So the above memory setting takes effect even if you use the "Run as test" feature of IntelliJ IDEA. Here is the respective part of the user manual of Intellij.



2012. június 10., vasárnap

JPA query result as POJO

As we all know, JPA is not the greatest thing when it comes to OLAP style queries or queries that go through many relationships. The reason is, when you only need some very specific data, it is a waste to create thousands of POJOs and then use Java to extract and aggregate what's needed.

So what we need here is some way to write a query that only extracts needed information and returns that as a POJO.

First try: using a JPQL View object

SELECT NEW com.company.MyPojo(o.rel.blah, o.rel2.blah2, o.rel3.rel33.blah3) FROM Object o
 
This works nice, as long as it is possible to write the query in JPQL. This may not be easy, because:
  • Need to use functions, DB features not in JPQL
  • Have many inheritance relationships in your model, which complicates your query.
JPA2.0 added some new stuff that helps with all of these, see here.

Due to a Hibernate bug though, you cannot use TypedQuery in this case..

Second try: use a native query

Simply use EntityManager's createNativeQuery method. Be warned, however if you return multiple columns, the returned result will be of List<Object[]> and it seems that there is no easy way to change that. This all means that you still need to post-process the returned result and create your own POJOs.

In many forums, the recommended way to solve this is to use @SqlResultSetMapping, however that only works for entities not POJOs. (Somewhere it was mentioned that it also works for @MappedSuperclass, I've found this not to work with Hibernate 4.1.2).

If you are using EclipseLink, you can add a query hint to return the result as a Map instead of Object[], using the @ResultType annotation.

2012. június 1., péntek

MultipleBagFetchException and Java equality

Today I've encountered an interesting problem: suppose you've got a Hibernate entity called Parent and which is in a N-1 relationship with two other classes called, Child1 and Child2:

@Entity
public class Parent implements Serializable {

    @Id
    Integer id;

    @OneToMany(mappedBy="parent", cascade={CascadeType.PERSIST, CascadeType.REMOVE})
    List<Child1> child1Coll = new ArrayList<Child1>();
   
    @OneToMany(mappedBy="
parent", cascade={CascadeType.PERSIST, CascadeType.REMOVE})
    List<Child2>
child2Coll = new ArrayList<Child2>();


This went fine, except for the performance when having a large dataset. Actually this is the classical N+1 ORM problem, with a standard solution: add EAGER fetching. In this case this was pretty much an easy decision: indeed I never need a Parent without Child1 and Child2 collections.

So I've added the option to fetch eagerly. An eager fetch means two extra left joins in practice, which is much faster than retrieving all the Child1 and Child2 collections later by separate queries. Fine. Except for the resulting org.hibernate.loader.MultipleBagFetchException.

The problem is, when you have multiple rows in both child collections, a join will turn into a cartesian product and Hibernate cannot figure out how to extract the originally stored values. This is nicely explained here.

The whole problem just set my mind up to think about ORMs in general. For example what differentiates a list from a set? The constraint of only having distinct elements or the ability to order?

These are two different features, however, in Java, we have them mixed up:
List = non distinct elements + order
Set = distinct elements + optional ordering

Also note, that by order we mean two different things in case of a List and a Set. In the former case, order is something external, while with sets, order is internal (the order is derived from the properties of the things stored, not an arbitrary external index).

Hibernate resolves this ambiguity by having the @IndexColumn annotation. With @IndexColumn, order is stored explicitly in a column using simple integers. With JPA2's @OrderBy, order is derived from the things stored. (Actually JPA2 also has @OrderColumn, which is very similar to Hibernate's @IndexColumn :)) )

This ambiguity set up a whole cottage industry of different terminology:
Bag = non distinct elements, without order
List = non distinct elmenets + explicit order
Set = distinct elements without order
SortedSet = distinct elements + implicit order

(note: explicit order is sometimes referenced as "indexed data structure")

To complicate matters a bit more, in case of Hibernate, it makes no sense to talk about distinct and non-distinct elements. All elements must be distinct, as Hibernate's philosophy is that all entities must have unique primary keys.

Even if we use the above annotations and take advantage of the fact that Hibernate can now work reliably using Sets as collection types, many times some dependant API won't accept a SortedSet, since it only implements Iterable but not List. Iterable should be enough though if we are only reading and displaying a data structure, but we don't need indexed writes (that is, writing at the Nth index of a data structure).

That's all I had to say about ordering, now what about equality? :)

As we all know, all Java objects have an .equals method, which does reference equality by default but can be overridden to use other another definition of equality. This works nice, except for more complex situations where the notion of equality itself can change. For example, in simple cases, differentiation using attribute1 is enough. In complex cases, you need attribute1+attribute2.

This is a bit hard to implement, as we cannot override the equals method on a case-by-case basis. So why does this matter? When we have an application that is used to enter data using manual methods or from an external system, we'd like to use the strictest possible definition of equality so we can catch data errors at the earliest possible moment. Currently we need to abandon equality by means of the .equals method and just have an external method as a filter..

2012. április 10., kedd

Pentaho Kettle: importing into date and time dimensions

In a recent project, my task was to import data into a Pentaho OLAP cube.

First, I've tried importing using simple SQL statements, however, as it has turned out, creating the dimensional tables and connecting proper foreign keys is a non-trivial task. One solution would have been stored procedures, however, it is platform dependent. Also, coding a custom Java app for the sake of importing seemed to be an overkill also, so I've decided to use Pentaho's ETL tool, recently renamed from Kettle to Pentaho Data Integration.

Using an ETL tool has many advantages:
  • Database and platform independence: you can apply your scripts to any DB using any OS (unless you do some vendor specific calls)
  • Possibly greater performance as the ETL tool can do transformations in parallel (conversion, sorting, selection, etc.)
  • Schedulability (the ETL vendor usually has an admin panel to schedule ETL tasks, like Pentaho BI server). Also, nice logging to check that transformations completed successfully.
  • Visual design of the transformation
and disadvantages:
  • Hard to test transformations
  • Custom transformational steps are doable but not easy (you can use BeanShell, Javascript and similar Java-compatible scripting languages)
Creation of an import job involves two transformations:
  1. A transformation to create necessary dimensional tables
  2. A transformation to select and convert business data and connect it to dimensional data using foreign keys (so that the result is a star schema)
Creation of dimensional tables: date and time

First: do we need separate date and time dimensions? This is an ongoing debate and obviously depends on the particulars, however having them separate often has many advantages. One is that it makes your MDX queries easier to write as you don't bump into limitations on having the same dimensions on both rows and columns in case one of your requirements needs that.

Creating a date dimension:
  1. Add a row mapper, configure it to generate the necessary amount of rows.
  2. Add a "Get value from sequence" to add an ID to each row.
  3. Add a Calculator to create separate year, month and day columns (or add dayofweek, etc.)
  4. Add a table output step to save the data.
Creating a time dimension:

Same as above, use a row mapper to generate rows (24 if you only need hours), use "Get value from sequence" to count from 0 to 23.

My tranformation looked like this:


Transforming business data

You use the usual transformational steps (filtering, sorting, etc.). The tricky part is how you hook up the data to the dimensional tables.

To do this, use the Lookup -> Database Value Lookup step. The lookup table should be your dimensional table. The key to look up is something you have in both your business data and your dimensional table (eg hour of day). The returned value should be the ID of the dimensional row.

This works flawlessly, as long as you are only working with dates (without time value), as the Calculator step can easily take separate date and time. However, it was not trivial, how to match only part of a time value (hours, minutes, seconds, etc.) as the Calculator step has no way of separating these..

The solution here is to treat a date+time value as a long containing miliseconds and use division, rounding, etc. to take it apart. First, separate the date and time part using the "Remove time from date A" function of Calculator and then do your arithmetic to implement hours, minutes, etc.

The following worked in my case:


2012. január 21., szombat

Clojure REPL special variables

There are some special variables available only in the REPL, to ease interaction with previous evaluations:

  • *1 and *2 refer to the last and before-last evaluation result of the REPL.
  • *e refers to the last exception thrown (you can use clojure.stacktrace functions to inspect this)
Examples:

user> 123
123
user> (pprint *1)
123
nil
user> (pprint *1)
nil
nil
user> (throw (java.lang.Exception. "My exception"))
; Evaluation aborted.
user> (clojure.stacktrace/print-stack-trace *e)
 java.lang.RuntimeException:
 java.lang.Exception:
My exception at
clojure.lang.Util.runtimeException (Util.java:165)
clojure.lang.Compiler.eval (Compiler.java:6476)
clojure.lang.Compiler.eval (Compiler.java:6431)

2012. január 20., péntek

Clojure REPL: use pretty print


When displaying a hierarchical data structure consisting of lists, sets and maps, use the clojure.pprint package. You can dynamically rebind options to change the default number base, desired column width, etc. Here are some examples:


 user> (def posts [{:id 1000 :summary "This is the first post" :tags #{"java" "clojure" "pprint"}}
       {:id 1001 :summary "
Very very very very very very very very very very long summary" :tags #{"java" "clojure" "pprint"}}])
#'user/posts


 user> (use 'clojure.pprint)
nil


 user> (pprint posts)
[{:summary "This is the first post",
  :id 1000,
  :tags #{"java" "clojure" "pprint"}}
 {:summary "
Very very very very very very very very very very long summary",
  :id 1001,
  :tags #{"java" "clojure" "pprint"}}]
nil


 user> (binding [*print-base* 2]
    (pprint posts))
[{:summary "This is the first post",
  :id 1111101000,
  :tags #{"java" "clojure" "pprint"}}
 {:summary
  "Very very very very very very very very very very long summary",
  :id 1111101001,
  :tags #{"java" "clojure" "pprint"}}]
nil


 user> (binding [*print-right-margin* 30]
    (pprint posts))
[{:summary
  "This is the first post",
  :id 1000,
  :tags
  #{"java"
    "clojure"
    "pprint"}}
 {:summary
  "Very very very very very very very very very very long summary",
  :id 1001,
  :tags
  #{"java"
    "clojure" "pprint"}}]
nil