In this post, I'd like to cover a few things not only related to building components, but also related to:
- subgraphs and their ability to make your life easier;
- working with CloverDX public API;
- ...and some other things I consider useful.
This should give you a good idea of how to build your own (reusable) components and make them part of your projects.
Doesn't CloverDX already have everything I'd ever need?
It would be very optimistic to say yes. Even though it comes with a variety of components, sometimes you may find yourself in a situation where; usage of proprietary API is required, legacy Java code developed prior CloverDX adoption needs to be reused; or as in case of this article - it is necessary to introduce functionality of 3rd party libraries into the data stream. Reasons could be many. In this article, I'd like to show you how to build your own custom-made components, which you can share with colleagues and that way - build your own component library.
In this article we will build encryption/decryption component making use of popular Bouncy Castle library in its Java implementation. Same approach, may be taken for both local and server projects.
Also, the following sections will assume you’re familiar with Java programming and CloverDX concepts and basics (making new projects, ETL transformations). If not, check out our learning center, where you can pick those basics up in about 1 hour of video tutorials.
To get the project running, make sure you add Bouncy Castle provider (you may use .jar file from project’s lib directory) into your JVM. Here is how.
First things first...
Our ultimate goal would be a component which takes advantage of external Java library and fits nicely into our regular graphs like the one pictured in the image below. Our component will have its own custom made icon, proper colour code, will be configurable through standard CloverDX dialogs and will provide an optional error output.
Internally, our component may look like this /you may notice, this component actually accepts all sorts of metadata on input which then passes to the output/:
So let’s start.... The component we are building will consume any input data stream, processes (string or byte) field and puts result into the same or different field on output.
Making the external library-aware project
A project with custom Java components is no different from any other project. Some things just need to be done in order to get external libraries loaded and made available.
First thing as we create a new project, we need to add library we'd like to use into lib directory - already part of project’s default directory structure. This however is not sufficient; we'll also need to register library into your project’s class path in order to get it recognized by the compiler. Dialog may be used for this purpose; Project → Properties, Java Build Path → Libraries → Add JARs subsection, then navigate to your lib folder, select file you want to use in your project and confirm selection.
It's worth mentioning at this point, you can even select multiple .jar files if your project requires it. Our does not. Files should appear in the list of libraries. Remember to close Properties dialog by OK button, otherwise your changes won’t be applied. This process needs to be done in every project where your component is used, unless you ship it with your .classpath file - this may though override any other compiler settings and thus discouraged.
Result of this was: an additional line in your .classpath file (located in root of your project) was added and now it may look like this:
<?xml version="1.0" encoding="UTF-8"?>
<classpathentry kind="con" path="com.cloveretl.gui.CLOVER_ENGINE_CONTAINER"/>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
<classpathentry kind="src" path="trans"/>
<classpathentry kind="lib" path="lib/bcprov-ext-jdk15on-154.jar"/>
<classpathentry kind="output" path="trans"/>
So if you're familiar with syntax of .classpath files, you can insert it manually yourself. It may save a couple of seconds of life.
Since we already know our ultimate goal and have the environment set, we’re good to go for real this time.
It is possible to place our Java component into the graph directly; although I would not recommend it. Let’s create a new subgraph, remove the default SimpleCopy component and place a CustomJavaTransformer (from now on referred to as “Java component”) instead; there are couple of reasons why to use Java component wrapped in subgraph. For example, you and ultimately users of your component will gain re-usability, rich configuration dialog options and separated development environment. You can read more about subgraphs on our blog.
Now, since we have our primary component placed and we want to have our error output in place for cases where encryption fails, we need to connect the second output port from component to the second port on the right pillar (subgraph output). Setting this port as optional (discard all records) will assure, disconnected port won't cause execution to fail. Port can be set as optional either from the graph outline section, or using a right click on the port itself. Non-optional ports will raise an error in the master graph when the edge is disconnected.
For the time being, let’s define just a simple error metadata containing 3 fields (recordNr – record order, errMessage – exception message and dataString – original value) for the second output and let those metadata propagate outside to the master graph.
Also, to ease up our development process, we may also add Trash components to our debug output section of a subgraph, enable debugging on subgraph edges and configure DataGenerator to produce test data.
Parametrization of Java component
At this point, we have everything set up on project and design side but still missing one vital thing - parametrization. Giving our component some versatility will result in increased usefulness and therefore making our work somewhat more meaningful. In order for our component to be as easy to deploy and use as possible, we can make use of wide variety of pre-prepared dialogs like Enumeration editor where user will be able to pick from supported encryption algorithms instead of typing one for himself; eliminating typo errors and plain inconvenience on the way. With configuration dialogs, we can get as far as for mapping dialog like ExtHashJoin or Aggregator has.
It's surprisingly easy to set up, in case you’re using subgraphs. Click on + (plus) sign in the toolbar of edit dialog of your Java component and attach new attribute to it. This way, we can add as many attributes as needed. Keep in mind that all of them will be accessible from your Java code as String values, no matter if it is in fact Numeric or Boolean value. Any conversions needs to be taken in the Java code explicitly. It does not matter what type of configuration dialog is used. Type of configuration dialog will affect only user experience.
When property is created, to be able to take advantage of subgraph advanced dialogs, property needs to be exported as subgraph parameter. This is as easy as two clicks in edit dialog (we’re still at Java component’s dialog). Notice Editor type option on an image below, where you can define which editor will user be facing when configuring component. For example; list of “AES” and “ARC4” illustrated on the image Enumeration editor for exported parameter is type Enumeration. You can also use the editor associated with any component’s (featuring in your subgraph) attribute. Just choose Component binding in the Editor type dialog and pick the component attribute editor you want to use.
Making custom Java component walk and talk
Earlier, we had placed CustomJavaTransformer, but CloverDX palette also comes with CustomJavaReader, CustomJavaWriter and CustomJavaComponent. They all follow the same logic and interfaces but serving different purposes. Main difference would be different default Java template and colour coding.
Open edit dialog, changing Algorithm attribute. New dialog will appear with sample Java code. We can use that dialog, but I usually don’t do that because this way I’m losing some of the IDE features features like context help, code explorer, etc... What I usually do is to click on Switch to Java editor (in the top left corner) which will allow you to use the standard Java editor capabilities of Eclipse IDE.
Clicking on this button will raise a prompt with details about the new Java class to be created. Confirmation using Finish button will take you to the Java editor with the same predefined template. Your new Java class (compiled code) and java code is going to be available in&;trans directory of your project (by default - may be changed in .classpath file discussed earlier or via dialog).
Before you start coding, there are 5 methods, required by public API interface and which you should know. Each of those methods are executed at certain point of lifecycle of our component.
|init()||is called when graph starts its execution and all graph’s properties are resolved|
|preExecute()||is called when every component in a graph is initialized (including the one you’re developing) before arrival of a first input record|
|execute()||is called ONCE per lifetime of a component and is responsible for taking action for each incoming record of the component|
|postExecute()||is called when all records were processed by the component and no other record can arrive|
|checkConfig()||is special method, called whenever graph is being validated; it is possible to use customized sanity check algorithm, to establish whether current component configuration is valid or not|
For more detailed information about public API, consult official documentation on website or access it using help icon from Java component’s edit dialog.
There are some aspects of design your custom Java code you should be aware.
First, objects representing both input and output records are being constantly rewritten (not recreated) upon arrival/departure of each record. That effectively means, it is not possible to store each incoming record into i.e. HashSet without calling duplicate() method first. It would cause this one object to be replicated in your HashSet many times with the same content – last arriving record. It is also recommended NOT to use duplicate() method when working with a record. It creates (for large amount of records) non-trivial overhead since Java is forced to create an instance of a new class and populate that instance with data. Sometimes however, its usage is innevitable - usually in join operations.
Second, remember to call method reset() every time a record is outputted from the component. Otherwise, you could end up with some fields keeping their value if not overwritten by their successors.
Those I think are the most important notes about development of your own transformations/readers/writers. You may find a sample project, utilizing Bouncy Castle library attached to this article. I hope it will show you how to use CloverDX public API and how to integrate your custom code into CloverDX data flows. Do not use this sample project in your implementations since it is more of a training code than robust production grade one.
I have nothing more to add than to thank you for reading this article and to wish you happy clovering! Feel free to download the Bouncy Castle component.
Everything you need to know to choose the right solution for your business - Download 'The Buyers' Guide to Data Integration Software'
More from the CloverDX Blog
Check out some of our most popular posts: