CloverETL is now CloverDX - Learn Why

Back to CloverDX Blog on Data Integration

Meet Joda and get 30% more power!

Posted by bigpavel on Nov 3, 2010 8:25:09 AM

Joda… don’t get too excited - Clover has (most probably) not contracted a Jedi master to squeeze in a portion of extra power to the product. In the real world, Joda is a quite useful third-party library for handling date and time operations. It has been in CloverETL for some time as an alternative option to the standard Java date implementation. Although not having superpowers of the aforementioned sci-fi character it is well worth being friends with and using it wisely might give your data transformations a noticeable punch in terms of performance.

The goal of Joda creators is simple – to create a perfect implementation of date and time handling functions in Java which would be superior to the standard Java built-in ones. Joda is well-designed, extensible and easy to use. But for us data maniacs, there’s one single sweet thing about it – it’s damn fast.

Using Joda proves especially useful for flat files and parsing dates from strings and putting them back to formatted strings on the output end of a transformation. Joda can yield around 30 % speed increase compared to the standard Java! So if you’re dealing with data files with lots of date fields in various formats and need to read them, perform some date operations and then output the results to a formatted string representation, you’re definitely not going to regret switching to Joda.

Although such huge performance gains usually come at a price, with Joda there’s actually very little to be concerned about. There are basically two things you could think of: The first one being that Joda is really strict on the format of the data – far more than Java. In Java you don’t need to care too much when your data is of incorrect case, contains extra white spaces, etc. However, Joda fails on data that do not exactly conform to the specified format. So if “1-JAn-2010” (notice the “A”) is fine for Java, Joda ends with an error on it. So bear this in mind.

Joda uses the same formatting symbols as you get in Java. But here comes another small drawback of the current Joda version: you cannot parse a time zone name – like “Pacific Standard Time” or “PST”. For those of you who are familiar with formatting strings, it’s the “z” symbol. The “Z” option works quite well and formatting dates using both “z” and “Z” (output) works just fine too.

Using Joda in CloverETL is fairly easy. You don’t need to install or link anything – it’s right there already. As you know each “date” field in CloverETL metadata can have a specific format specified. By specifying a prefix in the format string you can control whether you want to use Java or Joda engine. Let’s see an example:

Use Java engine:

DD/MM/YYYY (default)

java:DD/MM/YYYY

Use Joda engine:

joda:DD/MM/YYYY

So by prefixing the date format with either java: or joda: you can explicitly say which date engine that field should use. As you can see you can even control which engine to use on a per-field basis. That’s it, nothing else. So try it for yourself and see the difference!

Interesting resources:

Joda project page:

http://joda-time.sourceforge.net/

Formatting string symbols:

http://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.html

Data integration software and ETL tools provided by the CloverDX platform (formerly known as CloverETL) offer solutions for data management tasks such as data integration, data migration, or data quality. CloverDX is a vital part of enterprise solutions such as data warehousing, business intelligence (BI) or master data management (MDM). CloverDX Designer (formerly known as CloverETL Designer) is a visual data transformation designer that helps define data flows and transformations in a quick, visual, and intuitive way. CloverDX Server (formerly known as CloverETL Server) is an enterprise ETL and data integration runtime environment. It offers a set of enterprise features such as automation, monitoring, user management, real-time ETL, data API services, clustering, or cloud data integration.